Just-in-time (JIT) compilers are used in many dynamic languages such as Java to generate native code on the fly while a program is running. There are some resources out there to learn how JIT compilers work, but unless you’re keen to read big textbooks or pore over hundreds of thousands of lines of code, you’re probably not going to feel like you’re ready to write your own JIT compiler. It’s just not that easy to dip your toe into this kind of technology. But I’m trying to change that.

For the last few months, I’ve been thinking about how to make JIT compilers more approachable in the context of the IBM Java JIT technology that IBM will be contributing to the Eclipse OMR project later this year. If you haven’t heard about it, the Eclipse OMR project is working to build a community around shared, reusable open source components for building language runtimes.

IBM has been contributing technology from its enterprise class J9 runtime that powers the IBM JDK. One of the goals of the Eclipse OMR project is to make it easier than ever before to access industrial-strength runtime capabilities. That doesn’t mean that it needs to be easy to wring every last gram of capability from a component, but it does mean you should be able to get high caliber capability for only a little bit of code.

Easier compilers

So how can we make it easier for people to build JIT compilers? How can we get people to start using the OMR JIT without having to invest the time it usually takes to understand such a large piece of software, even if we could document it really, really well?

This is the first post in a series that will describe the new “JitBuilder” infrastructure I’ve been working to implement. To be fair, some of the ideas are similar to what others have done with LLVM and CoreCLR for their compilers, for example, and with Graal for JVM based languages. These ideas were all framed by compiler designers, so it shouldn’t be all that surprising that we have been moving in similar directions.

But my recent focus has been entirely on simplifying the process to dynamically generate decent native code, and I was willing to sacrifice the ability to expose all the power of the underlying compiler. I think the result drastically simplifies the initial work needed to retarget the OMR JIT for a new or existing language, and makes it easier to dynamically generate native code for many more developers who don’t all have the time or inclination to learn the deep details. This interface is absolutely a work in progress and I openly welcome feedback. Please tell me what you think, so we can make it better together!

How does JitBuilder work?

JitBuilder packages the guts of the compiler into a static library that you can then link against using APIs, which are described in just a few header files. These APIs abstract away all the details of the underlying Intermediate Language (IL) used by the OMR JIT, though their “shape” is still influenced by how the IL is structured.

By abstracting away the details, users don’t need to worry about the complex and often subtle nuances built into the intermediate languages used by most compilers. Because a compiler’s IL has such rich expressive capability, it can be painful to specify all the details all the time. That means the process to re-target a JIT compiler for a new language typically involves specifying and juggling lots of little details that do not always interact in straight-forward ways. Instead, with JitBuilder you program against a much simpler (though, in many ways, less powerful) API to get you going quickly.

In fact, as you’ll see in a future post, the JitBuilder APIs are structured in layers as part of an evolving strategy to enable evolution from the simple API that these articles will focus on towards a more expressive underlying IL as your experience and your demands for higher performance code increase.

But let’s not get too far ahead of ourselves. This post includes a downloadable Docker image so you can play along and experiment with the JitBuilder library. Let’s get started!

Getting started with the Docker image

To follow along with this article, download the JitBuilder Docker image from this Box link: https://ibm.box.com/v/JitBuilder-x86-64-image

Then, on your Linux x86-64 system, run the following commands:

docker load -i jitbuilder-x86_64-image.tgz
docker run -it eclipseomr/jitbuilder /bin/bash

When the docker image starts, you’ll be in the /home/jitbuilder directory:

jitbuilder@c10164831ab6:~$ pwd
/home/jitbuilder
jitbuilder@c10164831ab6:~$ ls
LICENSE  Makefile  README.md  include  libjitbuilder.a  src

Important files

The most important pieces for this article are Makefile, include, libjitbuilder.a, and src. There is also a very simple README as well as the LICENSE file detailing how JitBuilder is dual-licensed under Eclipse Public License v1.0 and Apache License 2.0.

The Makefile can build any of the examples, which can be found in the src directory.

libjitbuilder.a is the static library that contains the JIT code. The code built into this library will be contributed to the open source Eclipse OMR project in the near future, but we’re not quite ready to release the code. Until then, I put together this library so that people can try out the interface I’m experimenting with and give feedback to help shape its evolution.

The include directory contains the headers needed to access the JitBuilder library. The primary header files you’ll need to reference directly are:


	include/Jit.hpp
	include/omr/ilgen/MethodBuilder.hpp
	include/omr/ilgen/TypeDictionary.hpp


The src directory contains a set of examples on how to use the JitBuilder library. I’ll be using those examples in this series of posts.

Build and run

Let’s build and run the example for this article: simple. In the /home/jitbuilder directory:



	jitbuilder@c10164831ab6:~$ make simple
	g++ -o Simple.o -g -O0 -c -fno-rtti -fPIC -I./include/test -I./include/omr -I./include src/Simple.cpp
	g++ -g -fno-rtti -o simple Simple.o -L. -ljit -ldl


You’ve now built the simple example! Its code is contained in a single source code file src/Simple.cpp. Once Simple.cpp has been built into an object, it can be linked against the libjitbuilder.a library to produce the executable program simple. Since we went to the trouble to build it, let’s see what simple does:



	jitbuilder@c10164831ab6:~$ ./simple
	Step 1: initialize JIT
	Step 2: define type dictionary
	Step 3: compile method builder
	SimpleMethod::buildIL() running!
	Step 4: invoke compiled code and print results
	increment(0) == 1
	increment(1) == 2
	increment(10) == 11
	increment(-15) == -14
	Step 5: shutdown JIT


The program prints out the high level steps as it goes, which matches the order we’ll go through them in this article. As you might be able to guess from the output in step 4, we’re going to compile a function that increments a number. Don’t worry, we’ll get to more interesting functions in later posts. For now, I’m just going to cover the basics for how to use the JitBuilder library. Let’s start by going though the main() function in the src/Simple.cpp and see where that goes!

Step 1: initialize JIT

The first part of main() runs this code:


	cout << "Step 1: initialize JIT\n";
	bool initialized = initializeJit();
	if (!initialized)
	   {
	   cerr << "FAIL: could not initialize JIT\n";
	   exit(-1);
	   }


This code demonstrates how to initialize the JitBuilder compiler using the initializeJit() function, which is declared via an earlier #include "Jit.hpp" directive. The Jit.hpp file can be found in the /home/jitbuilder/include directory. If initializeJit() returns true, then the JIT compiler has been successfully initialized. That’s it! Of course, it could also return false, but it usually won’t.

Running out of memory when allocating the code cache, for example, is one possible reason that initializeJit() could return false. Returning an error code indicating the problem would be more useful, but for now only a boolean go / no-go result is provided.

Step 2: define type dictionary

The next part of main() creates a thing called a TypeDictionary. Every compiler needs to understand the types of the data accessed by the code its compiling, and the OMR JIT is no exception.

For JitBuilder, these types are all maintained in a structure called the TypeDictionary. This example doesn’t require very much from the TypeDictionary, so we’re just going to create it and leverage one of the simple primitive types it manages: 32-bit integers denoted as Int32. More on that later. For now, here’s the line of code that creates the type dictionary:

 
   cout << "Step 2: define type dictionary\n";
   TR::TypeDictionary types;


Why is it “TR” ? The heritage of the OMR JIT is the IBM Testa Rossa JIT compiler. The Testa Rossa compiler framework is built to be somewhat extensible in that you can separate functionality into different classes, which tend to be arranged in different namespaces.

Why would you do that? Well, one reason would be to separate functionality that is useful across many languages versus functionality that is specific to a particular language. Or cross-platform function versus function that is specific to a platform. When you combine all the pieces you need, the resulting class is declared in the TR namespace so that other classes can refer to it.

JitBuilder does not really expose this extensibility notion so you can ignore these details for now and just remember that OMR JIT classes are typically found in the TR namespace.

Step 3: compile method builder

Once we've initialized the JIT and created the TypeDictionary, we’re ready to compile code! You describe the code you want to compile to the JIT library by creating a MethodBuilder object. A MethodBuilder object corresponds to a method (or function) that you will eventually call from your program. Because it’s callable, it can take parameters and it can optionally return a value.

Your method will be compiled with “system linkage” which, at a high level, just means you’ll be able to call it directly from a C or C++ program. MethodBuilders are a specific example of a more general kind of object called an IL Builder object. I’ll go into more detail about IL Builder objects in the next post, but for now we are just going to work with one simple MethodBuilder object.

First, let’s look at the TR::MethodBuilder object defined by the src/Simple.hpp file:


	#include "ilgen/MethodBuilder.hpp"
 	class SimpleMethod : public TR::MethodBuilder
	   {
	   public:
	   SimpleMethod(TR::TypeDictionary *types);
	   virtual bool buildIL();
	   };


There isn’t much to it: we defined our own SimpleMethod class as an extension of TR::MethodBuilder. To create your own method builder to compile a method, you need to provide a constructor and you need to override the virtual builIL() method.

Let’s look at each of these pieces in the src/Simple.cpp file. First, the constructor is here:


 	SimpleMethod::SimpleMethod(TR::TypeDictionary *types)
	   : TR::MethodBuilder(types)
	   {
	   DefineFile(__FILE__);
	   DefineLine(LINETOSTR(__LINE__));

	   // int32_t increment(int32_t value);
	   DefineName("increment");
	   DefineParameter("value", Int32);
	   DefineReturnType(Int32);
	   }


As mentioned in the second step, The TR::TypeDictionary parameter is used to manage types and won’t be used directly in this example. We’ll get to it in a subsequent post!

The primary purpose of the constructor is to describe the interface for your generated code. Remember that a MethodBuilder corresponds to something you’re going to call from C or C++, so you need to tell the compiler what to expect for parameters, and what to produce as a return value. Let’s go through each bit:

 
	   DefineFile(__FILE__);
	   DefineLine(LINETOSTR(__LINE__));


These two lines are used by the compiler primarily to describe where the generated code came from. They just record some strings that the C++ preprocessor can automatically provide. __FILE__ is replaced by the name of the current file, and LINETOSTR(__LINE__) is some preprocessor magic to convert the current line number in the source file to a string.

Together with the signature of the generated method, this information can be helpful to connect diagnostic output from the JIT compiler to the source code location where the method was specified.

The next part describes the signature of the generated method, also shown in C++ notation in the leading comment:

 
	   // int32_t increment(int32_t value);
	   DefineName("increment");
	   DefineParameter("value", Int32);
	   DefineReturnType(Int32);

The name is like the “File” and “Line” provided in the earlier part: it is used to help identify this method in diagnostics. The really useful parts are DefineParameter() and DefineReturnType(). DefineParameter("value", Int32) tells the JIT compiler that this method will have a parameter called “value” and it is a 32-bit integer. (There are several other kinds of primitive types, too, managed by the TR::TypeDictionary, but I’ll cover that in the next post.) DefineReturnType(Int32) tells the JIT compiler that this method will also return a 32-bit integer.

The second part of the SimpleMethod builder is the buildIL() function. The buildIL() function will be called by the JIT compiler at the beginning of the process to compile the method, and its purpose is to describe the operations that need to be performed by the method we’re trying to generate. Let’s take a look at buildIL now:

 
	bool
	SimpleMethod::buildIL()
	   {
	   cout << "SimpleMethod::buildIL() running!\n";
	   // return (value + 1);
	   Return(
	      Add(
	         Load("value",
	         ConstInt32(1))));
	   }

The first line simply prints some helpful output so you can see that this code is called by the JIT compiler when you ask it to compile a SimpleMethod object. The rest of the code inside the buildIL() function specifies an expression tree of low-level operations. The indentation for these calls has been written specifically to depict the different levels of this expression tree. This code looks like it is actually running commands to load a parameter, create a constant 32-bit integer, to add those numbers together, and to then return that value. But the purpose of this code is not to actually perform these actions. Instead, this code generates the OMR JIT intermediate language (IL) that performs these actions.

As we determined already from the output of the executable, this SimpleMethod we’re going to compile simply returns the value of its parameter plus one. My next post will go into much more detail about other operations that are available that can let you do more elaborate and complicated things.

So, how do you compile one of these method builder objects? Here is the code from main() that creates a SimpleMethod object:

 
	cout << "Step 3: compile method builder\n";
	SimpleMethod method(&types);

Now we have a SimpleMethod method builder object. How do we use it? There is an interface from include/Jit.hpp that can compile a method builder object for you. You use the compileMethodBuilder() function to compile the SimpleMethod object that gives you an entry point (which is just a pointer to the start of the instruction bytes in memory):

 
 	uint8_t *entry = 0;
	int32_t rc = compileMethodBuilder(&method, &entry);
	if (rc != 0)
	   {
	   cerr << "FAIL: compilation error " << rc << "\n";
	   exit(-2);
	   } 

That was easy! The compiled code is stored in a structure called a “code cache” which is currently allocated and managed internally by JitBuilder. In future, we’ll expose more controls over the management of this code cache, but for now it’s completely transparent to users of the library.

Internally, of course, compileMethodBuilder() does a ton of work! It calls buildIL on the method builder object to generate OMR JIT intermediate language and then passes it through a sequence of transformation passes that clean up the code and improve it. You can tell that buildIL is called at this point because the output line SimpleMethod::buildIL() running! comes out between steps 3 and 4 in the output you saw at the beginning of the post.

Finally, the intermediate language is translated into the native instructions for the platform we’re running on. In this case, we’re running an x86-64 Docker image, so the OMR JIT will generate x86-64 instructions for the SimpleMethod operations. If you were to use JitBuilder on another platform, it would generate native instructions for that platform.

But there’s nothing in the JitBuilder code I showed that depends on the platform. The same C++ source code could be used on any platform supported by the JitBuilder library. While you have to build that C++ code for the platform too, the fact that your C++ code can make decisions at runtime about what code to generate opens up a lot of possibilities! But let’s finish off this example…

Step 4: invoke compiled code and print results

We got a pointer to uint8_t as the entry point of our compiled code. How do we call it? The first step is to create a function type:

 
	cout << "Step 4: invoke compiled code and print results\n";
	typedef int32_t (SimpleFunctionType)(int32_t); 

This type matches the signature that we described in the constructor of the SimpleMethod builder. All we have to do is to cast the builder’s entry point to this type and we’ll be able to call it directly:

 
	SimpleFunctionType *increment = (SimpleFunctionType *) entry;

At this point, you can call increment! I wrote a few lines of code just to call increment with some constant values and print the result:

  
	int32_t v;
	v=0;   cout << "increment(" << v << ") == " << increment(v) << "\n";
	v=1;   cout << "increment(" << v << ") == " << increment(v) << "\n";
	v=10;  cout << "increment(" << v << ") == " << increment(v) << "\n";
	v=-15; cout << "increment(" << v << ") == " << increment(v) << "\n";

And here’s the same output I showed at the beginning that was produced by these lines of code:

  
	increment(0) == 1
	increment(1) == 2
	increment(10) == 11
	increment(-15) == -14

As you can see, calling the generated code after applying the function type cast is just like calling any other C function.

Compiling multiple methods

If you have multiple MethodBuilder objects, you can compile them using the same basic steps outlined above. You only need one TypeDictionary object but you’re also free to allocate more than one. You can create new MethodBuilder classes and cause code to be compiled through the services called in their implementation of buildIL(). Or, you can write code inside the buildIL() function that dynamically chooses what services to use to build IL in response to dynamic events or information. In fact, it’s possible to write a single MethodBuilder class that’s designed to be used multiple times to generate different native methods, otherwise known as a JIT compiler!

Step 5: shutdown JIT

The last step, when you’re done using all your compiled code, is to clean things up. Just like initializeJit(), there is a shutdownJit() call that shuts down all the JIT machinery and cleans up all the memory used by the JIT (which includes the code cache that holds any compiled methods). The JIT is pretty good about cleaning up memory that’s only used during the compilation step, but there are also data structures that need to persist longer than each compilation request (for, example the code cache in which the compiled code is stored). It’s a good idea to do the shutdown call when you know you’re done:

 
	   cout << "Step 5: shutdown JIT\n";
	   shutdownJit();

Generated code

Ok, so now you understand how a simple MethodBuilder object can be created, compiled, and called. You know how to initialize and shut down the JIT. But what does the native code actually look like? We’ll use a debugger on the x86 platform to quickly delve into the generated code (the process is similar on other platforms).

We’ll start with the gdb debugger:

  
	jitbuilder@c10164831ab6:~$ gdb ./simple
	< lots of information>
	(gdb)

First, let’s set a breakpoint just after the call to compileMethodBuilder. At the (gdb) prompt:

   
	(gdb) list Simple.cpp:53
	48	
	49	   cout << "Step 3: compile method builder\n";
	50	   SimpleMethod method(&types);
	51	   uint8_t *entry = 0;
	52	   int32_t rc = compileMethodBuilder(&method, &entry);
	53	   if (rc != 0)
	54	      {
	55	      cerr << "FAIL: compilation error " << rc << "\n";
	56	      exit(-2);
	57	      }
	(gdb) b 53
	Breakpoint 1 at 0x404ac3: file src/Simple.cpp, line 53.

Now, we’ll run the command and gdb will stop for us at the breakpoint:

 
	(gdb) run
	Starting program: /home/jitbuilder/simple 
	Step 1: initialize JIT
	Step 2: define type dictionary
	Step 3: compile method builder
	SimpleMethod::buildIL() running!
 	Breakpoint 1, main (argc=1, argv=0x7fffffffe7d8) at src/Simple.cpp:53
	53	   if (rc != 0)
	(gdb)

At this point, the method has been compiled, and the entry point is sitting in the variable “entry”:

 
	(gdb) p entry
	$1 = (uint8_t *) 0x7ffff5ff1034

gdb might try to print out some characters (because the uint8_t * type is really char *, so gdb tries to be helpful in showing you that string) but we don’t really care about it because we know that memory points at instructions.

We can ask gdb to disassemble those instructions for us using its x/<N>i command to disassemble <N> instructions:

  
	(gdb) set disassembly-flavor intel
	(gdb) x/9i $1
	   0x7ffff5ff1034:	push   rbp
	   0x7ffff5ff1035:	mov    rbp,rsp
	   0x7ffff5ff1038:	sub    rsp,0x20
	   0x7ffff5ff103c:	mov    DWORD PTR [rbp-0x18],edi
	   0x7ffff5ff103f:	mov    eax,DWORD PTR [rbp-0x18]
	   0x7ffff5ff1042:	add    eax,0x1
	   0x7ffff5ff1045:	mov    rsp,rbp
	   0x7ffff5ff1048:	pop    rbp
	   0x7ffff5ff1049:	ret 

And that’s the code the JIT compiled for our simple increment function! The first three instructions are part of the system linkage that sets up this method’s stack frame. The next two mov instructions store the argument passed in a register to the stack and then load it again from there (wow!). The add instruction is our actual increment. The mov and pop release the stack frame, and finally the ret instruction returns from the method.

Right now, the compiler in JitBuilder does not have very many optimizations (only seven “local” optimizations) and so there are lots of improvements we can make to this generated code. For example, register allocation is clearly not optimal because moving the incoming parameter between edi and eax required a stack location. The stack frame has not been shrunk to take advantage of local variables that were eliminated during the optimization process. Finally, the system linkage implementation has not been optimized to take advantage of the fact that this function does not call any other functions (also known as a “leaf” function). These are improvements we can make as we get the OMR JIT ready to be open sourced.

Wrapping it all up

You’ve now seen how easy it can be to use the JitBuilder compiler library to generate native code dynamically. This library is still in prototype state but you can still have some fun with it as it gets better. If you have comments or suggestions to make it better, please get in touch with me mstoodle@ca.ibm.com!

In my next post, I’ll describe how to create more complex methods with conditional control flow such as if statements, loops, and switch statements. But if you can’t wait to get started, there are several examples included in the src/ directory in the Docker image to show how you can generate various kinds of code.

In a future post, I’ll show how the JitBuilder library can be used to JIT compile an interpreted language to bring native code performance even to languages that only use an interpreter. Stay tuned!

Learn more about Mark and team’s open source work on runtimes. Watch the replay of the Eclipse OMR Tech Talk, recorded on July 20, 2016.


3 comments on"JitBuilder Library and Eclipse OMR: Just-in-time compilers made easy"

  1. […] the first post of this series, I introduced the JitBuilder library that provides a relatively simple API for generating native […]

  2. […] Just-in-time (JIT) compilers are used in many dynamic languages such as Java to generate native code on the fly while a program is running. There are some resources out there to learn how JIT compiler – Read full story at Hacker News […]

  3. […] you read this post, you should read the first post to understand the basics of how the library works and to download the JitBuilder docker image and […]

Join The Discussion

Your email address will not be published. Required fields are marked *