In the last article we provided a high level overview of the IBM J9 JVM architecture and looked at how the JIT compiler optimizes machine code to improve application performance. This article focuses on memory management, providing information about the garbage collection policies that can be used for different workloads. We will also look at how diagnostic tools can be used to monitor performance, pinpoint problems, and suggest tuning options that relate to memory management.

Other articles in this series:

Links to further information are provided throughout the article.

Memory spaces

The J9 JVM is composed of the following memory spaces:

Memory space Description
Java Heap Primary storage for the Java program class instances and arrays.
Class metadata Holds the internal representation of classes (ROM/RAM classes).
JVM work areas Holds internal data structures used by the JVM and GC. The JCL might also allocate native memory (for example, direct byte buffers).
JIT compiled code cache Holds generated native code.
JIT compiled code data cache Holds metadata (extra information created by the JIT compiler) for generated native code.
JIT compiler scratch memory JIT compiler work area that holds internal data structures used while compiling a Java method. Can be freed entirely after compilation.
JIT compiler persistent memory JIT compiler internal data structures that persist across compilations. Used both at compile time and at run time. Examples include the class hierarchy table, runtime assumption tables, and profiling data.

Diagram depicting the Java process memory layout. Further explanation is provided in the text.

J9 object model

JVM implementations differ in the object model used to represent Java objects; the same object might have a different size and shape in two different JVMimplementations.

There are four aspects to the way the object model represents a Java object:

  • Field and array element sizes. The number of bytes that are used to represent a field or array element of a given datatype might vary. For example, if the datatype is byte the JVM might represent the object either as an 8-bit value or as a 32-bit value. In J9 a Boolean, byte, short, or char field is represented as follows:

    • non-array objects: 32-bit value
    • byte or boolean array element: 8-bit value
    • char or short array element: 16-bit value
  • Object and field alignment. Generally, every Java object is allocated a minimum guaranteed alignment boundary in all JVM implementations. Furthermore, a JVM usually aligns a field of size N bytes to an address that is guaranteed to be a multiple of N. However, the amount of padding that is inserted for alignment might vary.

    In J9 all objects are guaranteed to be at least aligned to an 8-byte boundary and every object is at least 16 bytes in size.

  • 64-bit compressed references. JVM implementations support 64-bit reference compression (representing Java object references as 32-bit values instead of 64-bit values) up to a maximum Java heap threshold that might vary.

    In J9, compressed references are supported up to a theoretical heap maximum of 64 GB. In practical terms on most operating systems, a maximum heap size of about 57 GB is allowed; on z/OS systems a maximum of 62 GB can be used, from IBM JDK version 8, service refresh 2.

    Compressed references incur a throughput overhead due to the shifting that must be done to compress and decompress reference field accesses when the user has specified a maximum heap size greater than 4 GB In practice, however, maximum values greater than 2.5 GB can be enough to cause shifting to occur with the corresponding overhead. It is therefore important that you are aware of the compressed-references feature if you want to get peak 64-bit performance because the transition from shifting between 32-bit values and 64-bit values, to disabling the use of compressed references, is based on the maximum heap size that is used.

    Of course, the benefits of running with compressed references both in terms of better object locality and reduced GC frequency are usually enough to lead to a net positive benefit even in the presence of shifting. See More effective heap usage using compressed references in IBM Knowledge Center.

  • Object header. The number of header fields and the meaning of each field in the header for each Java object and array might also vary. In J9 there is one header field for non-array objects and there is an extra arraylength field for array objects. The class pointer slot size is:

    • 32 bits on a 32-bit JVM or 64-bit compressed references JVM
    • 64 bits on a 64-bit uncompressed references JVM

    In each case, the bottom 8 bits are used for flags such as object age and hashed bit. Classes in J9 are therefore aligned at 8 byte boundaries.

    There is also a monitor field for object types that we consider likely to be synchronized and an optional hash field after objects that have been moved by the GC and hashed by the program.


    Diagram depicting the J9 object model. Further explanation is provided in the text.

    The header can contain additional slots:
    • Array size (arrays only)
    • Monitor (synchronized objects)
    • Hash (can be at the end of the objects)
    Object reference and class pointer slots can be:
    • 32 bits (32-bit JVM or 64-bit JVM with compressed references)
    • 64 bits (64-bit JVM)
    Example flags:
    • Object age
    • Object hashed or not

    Diagram depicting the J9 64-bit object model. Further explanation is provided in the text.

    The 64-bit JVM with compressed references is the preferred choice for the following reasons:
    • Low footprint (as 32-bit).
    • Benefits from various 64-bit features (for example, more registers on x86 architectures).

    Note: There is some performance degradation due to shifting to and from native and compressed forms.

    The 32-bit JVM is a separate downloadable VM:
    • Inferior in performance, therefore phasing out.
    • Might be needed to link in legacy 32-bit JNI code

    Garbage collection (GC)

    GC options and defaults

    Common heap sizing options:
    • -Xmx Maximum heap size
    • -Xms Initial and minimum total heap size. If using the Generational Concurrent policy (gencon), this value includes the Tenure and Nursery areas.
      • The Nursery initial and minimum size is 25% of the -Xms value; the maximum is 25% of the maximum -Xms value.
      • The current Nursery size is anywhere between its minimum and maximum, regardless of the current -Xms value.
    • -Xmn Exact Nursery size (Allocate (Eden) size for the Balanced policy)
    Finer heap sizing options:

    Option Description
    -Xmca32K RAM class segment increment
    -Xmco128K ROM class segment increment
    -Xmcrs200M Compressed references metadata initial size
    -Xmns1M Initial new space size
    -Xmmnx128M Maximum new space size
    -Xms4M Initial memory size
    -Xmos3m Initial old space size
    -Xmox512M Maximum old space size
    -Xmx512M Memory maximum
    -Xmr16K Remembered set size
    -Xlp: objectheap:pagesize=2M Large page size (4K and 2M are available)
    -Xlp:codecache:pagesize=2M Large page size for JIT code cache (4K and 2M are available)
    -Xmso256K Operating system thread stack size
    -Xiss2K Java thread stack initial size
    -Xssi16K Java thread stack increment
    -Xss1M Java thread stack maximum size

    Note that default values might vary on different platform. Therefore, run java -verbose:sizes to obtain the default values being used by your system.

    J9 uses a policy of interleaving Java heap memory equally across all non-uniform memory access (NUMA) nodes on the system if the Java process is not pinned to any particular core or NUMA node. This tends to produce lower fluctuations in performance (remote NUMA accesses are expected to be roughly even across the different threads, although this might not be the case if the memory accessed by the different threads is allocated abnormally). However, it might also lead to a lower overall performance in cases where a single allocating thread is used to access the memory in most cases.

    From IBM JDK version 8, service refresh 2, use the –XX:-InterleaveMemory disable option to disable NUMA interleaving. The policy to do NUMA interleaving by default might be reassessed in future versions of the product and so you should consult the release documentation to find out future changes.

    Performance can also be affected by the use of huge pages (see, for example, hugetlbpage support in the Linux kernel).

    J9 follows a policy of using manually configured huge pages by default, without having to specify any options, for all huge page sizes under 1GB. For 1 GB huge pages, the option –Xlp:pagesize=1G must be specified. On systems with transparent huge pages support (see Transparent Hugepage Support), the kernel uses huge pages without any involvement from the JVM at all.

    GC Polices

    The IBM JVM has several GC polices.

    The default policy is “generational concurrent” (gencon), which typically is the preferred choice. For better control of GC pause times, typically “region based” (balanced) and “soft real time” (metronome) policies are used.

    Generational concurrent -Xgcpolicy:gencon Handles short-lived objects (in Nursery) differently from objects that are long-lived (in Tenure). Applications that have many short-lived objects typically have shorter pause times with this policy, while still producing good throughput.
    Region based -Xgcpolicy:balanced Uses similar concurrent and generational techniques to gencon, but with region based heap organization. If you have problems with application pause times that are caused by global garbage collections, particularly compactions, this policy might improve application performance responsiveness at some throughput tradeoff. It might also be preferred to gencon in some other cases: NUMA exploitation, incremental class unloading, and better handling of workloads that are very variable in the liveness of objects, heap occupancy, and object allocation,, for example.
    Soft real time -Xgcpolicy:metronome

    An incremental, deterministic garbage collector with short pause times and guarantied maximum GC throughput overhead. Applications that are dependent on precise response times can take advantage of this technology by avoiding potentially long delays from garbage collection activity.

    All GC policies are listed in the -Xgcpolicy topic in IBM Knowledge Center. Read more about GC policies in the following IBM developerWorks articles:

    IBM GC diagnostic tools

    1. Verbose GC logs and Garbage Collection and Memory Visualizer (GCMV). Verbose GC logs provide detailed information about free memory, allocation statistics, and various Java-specific metrics (finalization and soft reference processing, for example) and non-specific metrics (objects copied and remembered-set, for example) about garbage collection activity.

    The following example is a sample log for a Scavenge (local GC cycle within the gencon policy), showing memory in Nursery/Allocate space being exhausted (0% free) before GC, and recovered (85% free) after the GC operation, as well as details about the GC operation itself.

    
    exclusive-start id="113" timestamp="2016-06-21T15:54:51.043" intervalms="2419.486">
      <response-info timems="3.056" idlems="2.612" threads="37" lastid="080FA100" lastname="JIT Compilation Thread-2" />
    </exclusive-start>
    <af-start id="114" totalBytesRequested="152" timestamp="2016-06-21T15:54:51.043" intervalms="2419.457" />
    <cycle-start id="115" type="scavenge" contextid="0" timestamp="2016-06-21T15:54:51.043" intervalms="2419.447" />
    <gc-start id="116" type="scavenge" contextid="115" timestamp="2016-06-21T15:54:51.043">
      <mem-info id="117" free="1563849376" total="2147418112" percent="72">
        <mem type="nursery" free="0" total="536870912" percent="0">
          <mem type="allocate" free="0" total="390266880" percent="0" />
          <mem type="survivor" free="0" total="146604032" percent="0" />
        </mem>
        <mem type="tenure" free="1563849376" total="1610547200" percent="97">
          <mem type="soa" free="1483322016" total="1530019840" percent="96" />
          <mem type="loa" free="80527360" total="80527360" percent="100" />
        </mem>
        <remembered-set count="3167" />
      </mem-info>
    </gc-start>
    <allocation-stats totalBytes="387776608" >
      <allocated-bytes non-tlh="4080" tlh="387772528" />
      <largest-consumer threadName="WebContainer : 29" threadId="33074B00" bytes="13873856" />
    </allocation-stats>
    <gc-op id="118" type="scavenge" timems="7.277" contextid="115" timestamp="2016-06-21T15:54:51.051">
      <scavenger-info tenureage="5" tenuremask="ffe0" tiltratio="72" />
      <memory-copied type="nursery" objects="50073" bytes="2181016" bytesdiscarded="147920" />
      <memory-copied type="tenure" objects="5670" bytes="474060" bytesdiscarded="83288" />
      <finalization candidates="4" enqueued="2" />
      <references type="weak" candidates="12" cleared="0" enqueued="0" />
      <references type="phantom" candidates="280" cleared="8" enqueued="8" />
    </gc-op>
    <gc-end id="119" type="scavenge" contextid="115" durationms="7.472" usertimems="12.001" systemtimems="0.000" timestamp="2016-06-21T15:54:51.051" activeThreads="2">
      <mem-info id="120" free="1980031608" total="2147418112" percent="92">
        <mem type="nursery" free="416743424" total="536870912" percent="77">
          <mem type="allocate" free="416743424" total="419102720" percent="99" />
          <mem type="survivor" free="0" total="117768192" percent="0" />
        </mem>
        <mem type="tenure" free="1563288184" total="1610547200" percent="97">
          <mem type="soa" free="1482760824" total="1530019840" percent="96" />
          <mem type="loa" free="80527360" total="80527360" percent="100" />
        </mem>
        <pending-finalizers system="2" default="0" reference="8" classloader="0" />
        <remembered-set count="2425" />
      </mem-info>
    </gc-end>
    <cycle-end id="121" type="scavenge" contextid="115" timestamp="2016-06-21T15:54:51.051" />
    <allocation-satisfied id="122" threadId="36983A00" bytesRequested="152" />
    <af-end id="123" timestamp="2016-06-21T15:54:51.051" />
    <exclusive-end id="124" timestamp="2016-06-21T15:54:51.051" durationms="7.765" />
    

    The IBM Monitoring and Diagnostic Tools – Garbage Collection and Memory Visualizer (GCMV) tool can graphically illustrate most of this information as a function of time to make it easier to understand and to help spot unusual activity. The tool also gives summaries and pinpoints problems with recommended remedies.

    The following illustration shows typical output from GCMV:
    Diagram shows a screen shot from the GCMV tool. Further explanation is provided in the text.

    Graph visualizing how pause time, free heap memory and amount of flipped (within Nursery) objects change over time. One can easily spot a pause time outlier, around 60 sec mark.

    For further information, see the following articles:

    2. Health Center. The IBM Monitoring and Diagnostic Tools – Health Center can monitor GC activity and memory consumption. It can also monitor a lot of other activity such as locking, method profiling, and class loading.

    The following illustration shows typical output from Health Center:

    Diagram shows a screen shot from the Health Center tool. Further explanation is provided in the text.

    Note the two panes showing:

    • the sorted distribution of object instances for each class
    • key GC metrics such as pause times, interval between GCs, and total number of GCs of each type

    For further information, see the following articles:

    3. Memory Analyzer. The IBM Monitoring and Diagnostic Tools – Memory Analyzer examines memory footprint, class hierarchy, and their relationships. It can diagnose memory leaks and pinpoint likely sources.

    The following illustration shows typical output from Memory Analyzer:

    Diagram shows a screen shot from the Memory Analyzer tool. Further explanation is provided in the text.

    Note the pane showing the histogram of object instances with shallow and retained heap size. For the class selected for further inspection, other panes are generated, such as a report of leak suspects and the chain of incoming references of the selected object or class.

    For further information, see the following articles:

    In the next article we will cover Java concurrency, focusing on how lock contention and deadlock situations can be diagnosed and avoided. We will also look at how you can monitor CPU utilization to identify problems and target improvements in application performance.

    Authors: Vijay Sundaresan, Aleksandar Micic, Daniel Heidinga, and Babneet Singh.

1 comment on"IBM SDK for Java performance optimizations: part two"

  1. […] 要进一步了解内存管理和垃圾收集,请参阅这篇文章:内存管理、垃圾收集 (GC) 策略和垃圾收集诊断工具。 […]

Join The Discussion

Your email address will not be published. Required fields are marked *