by Billy Korando | Published February 20, 2019
Initially designed to run on mobile devices from the early 2000’s, OpenJ9, a Java Virtual Machine for the cloud, uses about half as much memory as JDK8 Hotspot, while nearly matching its throughput. This performance gain comes straight out the box; however, there is more tuning that can be done. In this article, learn how to enable OpenJ9’s class sharing functionality when running in a containerized environment.
If you are unfamiliar with the class sharing feature in OpenJ9, the tutorial “Class sharing in Eclipse OpenJ9” (IBM Developer, June 2018) goes in depth on how class sharing works and why you should use it. The short version is that class sharing allows OpenJ9 JVMs to compile and perform optimization on Java code and cache that information in a common location to be used by other OpenJ9 JVMs. Class sharing provides significant benefits, including improved startup speed and reduced CPU and memory usage.
When outside of a containerized environment, using OpenJ9’s class sharing feature can be as simple as adding the JVM arg -Xshareclasses to your startup script and letting OpenJ9’s defaults handle the rest. However, when in a containerized environment, as is often the case when running Java applications in the cloud, a little more work is needed. Let’s look at two methods of setting up OpenJ9’s class sharing in a containerized environment, and weigh the benefits and drawbacks of each method.
Using a Docker volume as a shared classes cache comes straight from the OpenJ9 Docker page. Setting up shared classes to use a Docker volume is pretty simple.
Create the volume.
Here is the code I used to create mine:
docker volume create java-shared-classes
In the Dockerfile, with either CMD or ENTRYPOINT, you need to enable class sharing and explicitly define the location where you will be storing class information:
ENTRYPOINT ["java", "-Xshareclasses:cacheDir=/cache", "-Xscmx300M", ...]
-Xshareclasses can take several sub-options, with one of them being cacheDir, which allows us to explicitly define the directory to store class data in. You can also optionally define how large the cache should be. In this case, I am setting the cache to a very generous 300MB.
The volume must be mounted when the Docker container is run. In the script for our Docker RUN command, add the following:
docker run --mount source=java-shared-classes,target=/cache <image name>
It is important that target is the same directory as cacheDir, from -Xshareclasses in your Dockerfile.
Note: How large you want the shared classes cache to be will depend upon a number of factors, including the following considerations:
Check out the OpenJ9 user docs on -Xshareclasses for information on utility methods for running and maintaining a shared classes cache.
My colleague, Mike Thompson, introduced me to the method of “pre-warming” a Docker container. His original code can be found on GitHub.
“Pre-warming” a Docker container is accomplished by executing the Java application that the container will be running as the Docker image is being built. By executing the Java application, with -Xshareclasses, a cache can be pre-populated and stored within the Docker image. When the Docker image is later run, the Java application that it will be executing can then be pulled from the pre-existing cache. To accomplish this, we will use the Docker RUN command:
RUN /bin/bash -c 'java -Xshareclasses -Xscmx20M -jar batch-processor-0.0.1-SNAPSHOT.jar --run_type=short &' ; sleep 15 ; xargs kill -1
There are a number of considerations to take when designing the RUN command. I will go into more detail on that later; but first, let’s compare the performance of using a Docker volume, versus pre-warming a Docker container.
To compare the performance of the two different methods of using -Xshareclasses that I previously described, I used a demo application that I created for a presentation on OpenJ9. The demo is a Spring Boot application executing a Spring Batch process that is doing transforms on about 200 records. If you want more details on what the application is doing, check out the README for my project.
There are three containers running in sequence:
Each image was ran ~30 times, and their run time and max memory usage were collected and presented in the following figures. Let’s check out the charts in Figures 1 and 2 to get an idea as to how to compare the class sharing methodologies.
Figure 1 shows the results of how long it took to execute the batch application.
Figure 1. Line graph comparing the execution time of each Docker container over ~30 runs
There are a couple of spikes for both the “warm” and “cold” containers. But, overall, their performance, as seen with the trend line, is pretty consistent. That is to be expected, as they are only utilizing what is native to their container. What really stands out is the “volume” container.
In my demo, I started with an empty cache, which means the “volume” container not only has to compile the classes like the “cold” container would, but also write those classes to the cache. The graph in Figure 1 shows these results in a pretty substantial throughput penalty. Whereas the “cold” container consistently finished executing in roughly 9 seconds, the initial run for the “volume” container took nearly 13 seconds. (Note: It is not shown, but I ran the scenario of starting the “volume” container several times with an empty cache to validate the results seen above.)
However, while the “volume” container was much slower during the first run, it ran just as fast as the “warm” container the second time around. This makes sense, as the “warm” container executed the batch application once when it was being built as well.
I ended up running more executions of each “volume” container. The “cold” and “warm” remained pretty consistent in their performance, with the “volume” container continuing to get a little better. This is because each time the container is executed, OpenJ9 adds a little more class information and JIT optimizations to the cache. The benefits from the first execution were massive, and there was another noticeable bump when executing the container a third time (after which, the benefits plateaued). The “volume” method did continue to get a little better over time when compared to the “warm” method.
The chart in Figure 2 provides a little bit more analysis on weighing the costs and drawbacks, but let’s look at another performance factor first: memory usage.
To demonstrate the differences in memory usage between the different images, I used a box and whiskers chart. The max and minimum (in other words, the lowest max) memory used from each time the Docker containers ran are seen in the “whiskers,” whereas the middle 50% of memory used is seen in the “box.” Hopefully this chart shows the outliers that occur when running demos, but also gives an idea of typical memory usage to expect. With that being said, let’s analyze the chart in Figure 2.
Figure 2. Box and whiskers graph, comparing the memory footprint of each Docker container
With class sharing, you are getting a pretty noticeable reduction in memory usage. Without class sharing, the “cold” container would top out at around 140MB of memory and spike as high as 240MB. The “warm” and “volume” container came in around 125MB, never spiking above 150MB.
One method isn’t outright better than the other, and there are a number of factors to consider when determining which method you should use for your organization, some of which aren’t well-captured above or simply from testing on a local machine. Let’s go through these considerations, to provide a better understanding as to how these two methods might behave in more real world scenarios.
The first factor I want to cover concerns latency. I ran the above tests on my local machine, so the volume I created is on the same hard drive as the Docker containers being run. In a cloud environment, it’s possible that wouldn’t always be the case. The additional latency when the logical location of a volume is on a physically separate machine could have a noticable impact on the start-up time for an application.
For using pre-warmed volumes, there are a couple of factors to consider as well. The first is the additional storage overhead associated with them. In my example, the pre-warmed Docker container was 390MB in size, compared to the 369MB of its “cold” and “volume” counterparts. Dependening on how many services your organization maintains as Docker containers and for how long images are retained, it could create a noticable increase in storage usage. That said, proper pruning practices could easily address this issue.
Another factor to consider is that it might take some work to properly pre-warm a Docker container. In my demo, I am using a Spring Batch job with a pre-defined dataset, so I am simply executing a shortened version of the job with a RUN command. However, most applications aren’t batch jobs. For those, a bit more work is needed to simulate a load. Otherwise, there won’t be many optimizations provided by the JIT available in the cache when it initially starts.
I wouldn’t consider any of these factors to be particularly significant. They might become somehwat important at a very large scale, or if you have really tight performance criteria; but, generally, these factors are relatively small technical hurdles to overcome.
In the short time I have spent learning about OpenJ9, I have become fascinated by it. The huge memory savings, while matching Hotspot’s performance, makes OpenJ9 really exciting, and fun to talk and write about.
The class sharing explored in this article, which further adds to memory savings as well as improves startup, paired with features like -Xquickstart (see the OpenJ9 user docs for more info), make OpenJ9 very interesting with respect to Serverless functions. Stay tuned, as I plan on exploring the latter in a future article.
Back to top