Introduction
Hi, this is Trueman, an Application Engineer working at Rakuten's Osaka Branch. This is sort of a sequel to my first post https://commerce-engineer.rakuten.careers/entry/en/tech/0048. Think of it as a story of how there's always room for improvement, no matter how "finished" you think your code is. Basically, in my first "hello world" post, I talked about how I started from some hello-world-tutorials, and gradually transformed it into a useful mock application (mock-dmc) that we actually use for local development and CICD testing. However, although the solution "worked" just fine (after multiple rounds of bug fixing), there was still a problem.
Problem
In order to work with the mock-dmc, we build it as a docker container that our actual non-mock application runs against. For reference, here's a sample of the dockerfile (with some changes) below:
Dockerfile
ARG BASE_BUILD_IMAGE
ARG BASE_RUN_IMAGEFROM $BASE_BUILD_IMAGE AS TEMP_BUILD_IMAGE
ENV APP_HOME=/usr/app/
WORKDIR $APP_HOME# only download dependencies first so docker can cache dependencies layer
COPY build.gradle.kts settings.gradle.kts $APP_HOME
# the "|| true" is meant to silently ignore expected failure due to no source code copied at this stage
RUN gradle clean build -Dhttps.proxyHost=add.your.proxy.here -Dhttps.proxyPort=XXXX --no-daemon > /dev/null 2>&1 || trueCOPY ./ $APP_HOME
RUN gradle clean build -Dhttps.proxyHost=add.your.proxy.here -Dhttps.proxyPort=XXXX --no-daemon# actual container
FROM $BASE_RUN_IMAGE
ENV ARTIFACT_NAME=car-repair-mock-dmc-1.0.0.jar
ENV APP_HOME=/usr/app/WORKDIR $APP_HOME
COPY --from=TEMP_BUILD_IMAGE $APP_HOME/build/libs/$ARTIFACT_NAME .EXPOSE XXXX
ENTRYPOINT exec java -jar ${ARTIFACT_NAME}
The way docker works is that during the build phase, it caches each line to be re-used the next time, so long as everything up to that line remains completely unchanged from the last time it was cached (reference: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#leverage-build-cache). This Dockerfile was already created with that in mind, with the build step being split up into two different instructions so that at least downloading dependencies for build.gradle.kts can be cached and re-used even if the source code itself changed.
However, we noticed that often on our own machines (and always on jenkins), the "COPY ./ $APP_HOME" line would be invalidating our cache even if nothing was changed in the source code. The problem with this is that the next build step actually takes a long time, almost 2 min, and we really should be able to use the cache if the source code wasn't changed.
In short, the problem was that this mock-dmc tool that was supposed to help our development process was slowing down both our development and deployment pipeline for no good reason.
Investigation
Naturally, the first step was to check what files we were copying over with the "COPY ./ $APP_HOME" command. Of course, when first created this, I was already aware of some irrelevant files we wanted to ignore from our docker build context (very good reference here: https://codefresh.io/blog/not-ignore-dockerignore-2/) and so I did have a simple .dockerignore file. Here's how the project structure looked like against my initial .dockerignore file.
At first glance, I really couldn't see what was changing between docker builds to cause our cached layer to become invalid. Even after digging down into the different folders didn't reveal anything suspicious.
Eventually, I gave up guessing and started looking into different ways to really see what was actually being loaded into the docker context (that would get picked up by "COPY ./ $APP_HOME" and thus invalidating our cache). Luckily, I found this StackOverflow answer where I was able to find exactly what I was looking for!
After copying that Dockerfile and running it as instructed, I found the culprit! Turns out there's a hidden .git folder that intelliJ doesn't show for some reason. In hindsight, I feel like that should've been obvious but it honestly didn't cross my mind until that point. So Ricardo Branco, if you're reading this, thank you! (Also thank you Steve for asking the question on Stack Overflow in the first place as well).
Solution
Once I found the root cause, the solution was of course simply adding the .git folder to the .dockerignore file. Of course, next I had to test it out on our STG deployment to see how it affects our build.
Unexpectedly, our build failed with this change! Turns out that speeding up the mock_dmc image build sped up the "docker-compose up --build -d" process so much that the tests started running before our mysql container was ready. Thankfully, we already had a solution ready for that situation so I just needed to add this one line to our jenkinsfile just before running the tests.
sh 'while ! docker exec jenkins_mysql_shaken_1 mysql --user=some_db_username_here --password=some_db_password_here -e "SELECT 1" >/dev/null 2>&1; do sleep 10; done'
Conclusion
Our builds on STG were sped up by about 2min thanks to no longer needlessly re-downloading dependencies and re-building the mock_dmc application. I've included screenshots below (failed and aborted build was due to the db-not-ready issue mentioned above). Building the container locally also speeds up by roughly the same amount.
I hope this has inspired you to take a second look at your current processes to see if there’s anything you can do about the day-to-day things that are bothering you (like slow build times). If you’re interested in looking into and solving interesting problems like these, consider applying to join us at Rakuten.