In the previous article we focused on memory management, providing information about the garbage collection policies that can be used for different workloads. We also looked at how diagnostic tools can be used to monitor performance, pinpoint problems, and suggest tuning options that relate to memory management. In this last article we will cover Java concurrency, focusing on how lock contention and deadlock situations can be diagnosed and avoided. We will also look at how you can monitor CPU utilization to identify problems and target improvements in application performance.

Other articles in this series:

Links to further information are provided throughout the article.

Java concurrency and locking

Concurrency

Both the Java language and the Java Virtual Machine (JVM) support concurrent programming. This allows a Java application to be decomposed into smaller tasks that can be executed in parallel. If Java concurrency and synchronization features are used appropriately, then a Java application can attain optimal throughput and performance. However, improper use of such features can cripple a Java application’s performance and responsiveness.

It therefore becomes vital to identify and resolve negative issues that might arise from using such features. For a Java application, several factors must be studied in order to comprehend the impact of concurrency and synchronization features: lock contention, deadlocks, CPU utilization, and CPU time.

  1. Lock Contention. A lock typically prevents more than one entity from accessing a shared resource. Each object in the Java language has an associated lock, also referred to as a monitor, which a thread obtains by using a synchronized method or block of code. In the case of the JVM, threads compete for various locks on Java objects. See “Locking in the J9 JVM” later in this article.

    Lock contention occurs when many threads share a single lock and compete or contend to take the lock. A typical symptom of contention is an application consuming a lot of CPU time but producing little output with many threads blocked on the same lock. Lock contention issues can occur due to improper use of Java synchronization features such as long critical sections, IO blocking etc. A Java application’s responsiveness and availability is negatively impacted if a large number of threads are blocked rather than doing useful work.

    The Threads perspective in Health Center (Figure 1) provides a user-friendly interface to view a thread’s state and stack in real-time. You can use this information to study lock contention.

    Figure 1 – IBM Health Center: Threads perspective

    Another approach to investigating contention would be to look at a monitor’s behavior. The Locking perspective in Health Center (Figure 2) provides real-time statistics about all the monitors that are used in a Java application. This information is available in the form of a bar chart as well as a table. In the bar chart, the height of the bars represents the non-recursive attempts to acquire a monitor which causes the requesting thread to block while waiting for the monitor to become unowned (Slow lock count). The color of the bars represents the percentage of total acquires where the requesting thread is blocked while waiting on the monitor (“% miss”). A dark colored bar represents a more contended monitor in comparison to a light colored bar. In Figure 2, the table contains more information such as:

    • Average hold time: the average time the monitor is held by a thread.
    • Utilization: (“% util”) the amount of time the monitor is held, divided by the amount of time the output is taken over.

    Overall, these statistics show how the various critical sections are being accessed in a Java application.


    Figure 2 – IBM Health Center: Locking perspective

    Other tools, such as IBM Thread and Monitor Dump Analyzer for Java, can also be used to study lock contention.

  2. Deadlocks. Deadlocks happen when multiple threads are blocked forever while waiting to acquire a lock. Unlike lock contention issues, deadlocks are more detrimental to the stability of a Java application because they block the application from progressing further.

    The IBM JVM can find deadlocks by detecting cycles that involve locks obtained through synchronization, locks that extend the java.util.concurrent.locks.AbstractOwnableSynchronizer class, or a mix of both.

    Figure 3 illustrates detection of a deadlock where two threads, “DeadLock Thread 0” and “DeadLock Thread 1”, unsuccessfully attempt to synchronize on a java/lang/String object, and lock an instance of the java.util.concurrent.locks.ReentrantLock class.

    This typical deadlock situation is caused by an error in application design. You can use the Javadump tool to help you detect such events. Solutions such as lock reordering, avoiding execution of third party code while holding a lock, and using interruptible locks, can help in resolving deadlock events.

    NULL           ------------------------------------------------------------------------
    0SECTION       LOCKS subcomponent dump routine
    NULL           ===============================
    NULL           
    1LKPOOLINFO    Monitor pool info:
    2LKPOOLTOTAL     Current total number of monitors: 2
    NULL           
    1LKMONPOOLDUMP Monitor Pool Dump (flat & inflated object-monitors):
    2LKMONINUSE      sys_mon_t:0x00007F5E24013F10 infl_mon_t: 0x00007F5E24013F88:
    3LKMONOBJECT      java/lang/String@0x00007F5E5E18E3D8: Flat locked by "Deadlock Thread 1" (0x00007F5E84362100), entry count 1
    3LKWAITERQ            Waiting to enter:
    3LKWAITER                "Deadlock Thread 0" (0x00007F5E8435BD00)
    NULL           
    1LKREGMONDUMP  JVM System Monitor Dump (registered monitors):
    2LKREGMON          Thread global lock (0x00007F5E84004F58): 
    2LKREGMON          &(PPG_mem_mem32_subAllocHeapMem32.monitor) lock (0x00007F5E84005000): 
    2LKREGMON          NLS hash table lock (0x00007F5E840050A8): 
                < lines removed for brevity >
    
    1LKDEADLOCK    Deadlock detected !!!
    NULL           ---------------------
    

    Figure 3 – Deadlock detection

  3. CPU Utilization and CPU time. Monitoring CPU utilization and CPU time can help in identifying regression issues or improvements in a Java application.

    High CPU usage and CPU time might result from poor resource utilization such as using a large number of threads, equivalent to several times the number of available CPUs. Such symptoms should not be ignored; they might indicate genuine issues in a Java application that prevent it from scaling and handling larger workloads.

    Tools such as Health Center, Application Performance Management (APM) solutions, and Java Core Dump analysis (via multiple snapshots) can be used to study CPU usage and CPU time. Intrusive methods, such as logging CPU time, will change an application’s behavior and should therefore be avoided.

    The CPU perspective in Health Center (Figure 4) shows processor usage for the application and for the system on which the application is running. In addition to processor usage, the graph also shows the number of methods that were profiled since the monitoring agent started.

    Figure 4 – IBM Health Center: “Locking” perspective

    The Method Profiling perspective in Health Center (Figure 5) provides the results of the sampling method profiler. These results include full call stack information and sampling statistics. For large Java applications, the analysis process can be made easier by using a systematic approach and focusing on methods that utilize the highest CPU resources. In Figure 5, “Self (%)” is the percentage of samples taken while a particular method is being run at the top of the stack. This is a good indicator of how expensive a method is in terms of using processing resources. “Tree (%)” is the percentage of samples taken while a particular method is anywhere in the call stack and shows the proportion of time that this method, and the methods it called (descendants), were being processed. This is a good guide to the areas of the application where most processing time is spent. Based on this information, code can be inspected in order to identify regression issues related to concurrency.

    Figure 5 – IBM Health Center’s “Locking” perspective

    The “Threads CPU Usage Summary” output from the IBM J9 Java core dump (Figure 6) shows the CPU time utilized by threads of major JVM components. The summary can be used to derive the time taken by the Virtual Machine (VM) threads, the Garbage Collector (GC) threads and the Just In Time (JIT) compiler threads. This information is valuable since it helps in identifying the source of high CPU utilization: VM, GC, or JIT compiler.

    Threads CPU Usage Summary
    =========================
    All JVM attached threads: 0.083877000 secs
    |
    +--System-JVM: 0.083877000 secs
    |  |
    |  +--GC: 0.0 secs
    |  |
    |  +--JIT: 0.00961700 secs
    |
    +--Application: 0.0 secs
    

    Figure 6 – “Threads CPU Usage Summary” in IBM J9’s Java Core Dump

    In some Java applications, high CPU usage and CPU time might be due to heavy thread lock contention. In such cases, the -Xthr:minimizeUserCPU JVM option might help to reduce reducing CPU usage and CPU time.

Locking in IBM’s JVM

The Java language provides two basic synchronization idioms: synchronized methods and synchronized statements

You can use these idioms most efficiently if you understand how locking is implemented in the J9 JVM at a high level, and customize the JVM by using command-line options so that the application’s concurrency needs are optimized. See the following IBM developerWorks articles to understand more about the locking implementation within the J9 JVM:

Summary

In this series of articles we have discussed some of the most performance-sensitive components in the IBM SDK for Java, namely the JIT compiler, the garbage collector (GC), and the locking mechanism used by the underlying J9 virtual machine. Several common performance problems were described, along with some guidelines about how to diagnose and resolve them. We hope you find these articles useful and welcome your feedback.

Further reading

The DZone Refcard article Java Performance Optimization further explains the high level areas in which you might encounter performance problems.

The article gives a technical explanation of how to resolve performance problems at the Java level while also providing an overview of how a typical JVM works internally and suggestions of options for tuning the JVM. The focus is largely on the Hotspot JVM but it also mentions the J9 JVM as well.

Authors: Vijay Sundaresan, Aleksandar Micic, Daniel Heidinga, and Babneet Singh.

2 comments on"IBM SDK for Java performance optimizations: part three"

  1. […] 本文翻译自:IBM SDK for Java performance optimizations: part thre(2017-02-23) […]

  2. […] 要进一步了解线程和锁定,请参阅这篇文章: Java 并发性和锁定。 […]

Join The Discussion

Your email address will not be published. Required fields are marked *