Java Bytecode Manipulation

In this article, I will show how to manipulate a compiled class file directly without decompiling it to java.

I will be using Javassist (Java Programming Assistant), an external library for most of this tutorial. Download latest JAR file to get examples work. I am using version rel_3_22_0_cr1-4-g6a3ed31.

Every java file compiled will generate a class file which is a binary file containing Java bytecode which can be executed on any Java Virtual Machine. Since the class files are generally not dependent on the platform they are compiled on, it makes Java applications platform independent. In this article, we will explore how to statically analyze class files, modify them programmatically and execute.

Sample Class for Bytecode Manipulation

We will start with a simple test class (ByteCodeEditorTest) which we will use to modify using Javassist. This class file will get an input from user and check if it matches a predefined value within code and output message accordingly.

public String checkStatus(String _inputString){
    if (_inputString.equals("MAGIC"))
        return "Right!";
    return "Wrong";
}

Once compiled, and executed below is a sample behaviour of the class. We will modify compiled class file directly to change its behaviour by modifying equality operator.

$ java ByteCodeEditorTest TEST
Wrong
$ java ByteCodeEditorTest MAGIC
Right!

Let’s start by looking at the compiled class file using javap. I have provided snippet of checkStatus() method from test class.

$ javap -c ByteCodeEditorTest
Compiled from "ByteCodeEditorTest.java"
  public java.lang.String checkStatus(java.lang.String);
    Code:
       0: aload_1
       1: ldc           #7      // String MAGIC
       3: invokevirtual #8      // Method java/lang/String.equals:(Ljava/lang/Object;)Z
       6: ifeq          12
       9: ldc           #9      // String Right!
      11: areturn
      12: ldc           #10     // String Wrong
      14: areturn
}

The disassembled code contains mnemonic for Java bytecode instructions. We will be heavily using these as a part of bytecode manipulation. Refer to Java bytecode instruction listings Wikipedia article which contains all mnemonic and Opcode for Java bytecode.

Interesting line is on index 6 from disassembled code which contains mnemonic ifeq which compares input string against built in value. Let’s use Javassist to modify equality operator from ifeq to ifne.

Bytecode Manipulation using Javassist

Now that we have our test class and details on what has to be modified in bytecode, let’s create a new class file which loads compiled ByteCodeEditorTest class for manipulation. With Javassist JAR in classpath, let’s load the test class file using javassist.CtClass.

ClassPool _classPool = ClassPool.getDefault();
CtClass _ctClass = _classPool.makeClass(new FileInputStream("ByteCodeEditorTest.class"));

Once ByteCodeEditorTest class is loaded, we will use javassist.CtMethod to extract all the methods from class and then use javassist.bytecode.CodeAttribute & javassist.bytecode.CodeIterator to manipulate the class.

CodeIterator allows us to traverse every bytecode instruction from class file and also provides methods to manipulate them. In our case, from the javap output we know index 6 has to modified to change instruction set from ifeq to ifne. Looking at Opcode reference, hex value for ifne is 9a. We will be using decimal format to update bytecode using CodeIterator.

So we will be using CodeIterator.writeByte() method to update index 6 of ByteCodeEditorTest from exising value to 154 (9a converted to decimal). Below table shows existing value (row1) and new value (row2)

Mnemonic Opcode (Hex) Opcode (Decimal)
ifeq 0x99 153
ifne 0x9a 154
for(CtMethod _ctMethods:_ctClass.getDeclaredMethods()){
    CodeAttribute _codeAttribute = _ctMethods.getMethodInfo().getCodeAttribute();
    CodeIterator _codeIterator = _codeAttribute.iterator();
    while (_codeIterator.hasNext()) {
        int _indexOfCode = _codeIterator.next();
        int _valueOfIndex8Bit = _codeIterator.byteAt(_indexOfCode);
        //Checking index 6 and if Opcode is ifeq
        if(_valueOfIndex8Bit==153 && _indexOfCode==6) {
            //Changing instruction from ifeq to ifne
            _codeIterator.writeByte(154, _indexOfCode);
        }
    }
}
//Write changes to class file
_ctClass.writeFile();

Once this code is run, ByteCodeEditorTest class file will be modified with updated instructions. When running javap on ByteCodeEditorTest now, it will produce below result of checkStatus() method.

$ javap -c ByteCodeEditorTest
Compiled from "ByteCodeEditorTest.java"
  public java.lang.String checkStatus(java.lang.String);
    Code:
       0: aload_1
       1: ldc           #7      // String MAGIC
       3: invokevirtual #8      // Method java/lang/String.equals:(Ljava/lang/Object;)Z
       6: ifne          12
       9: ldc           #9      // String Right!
      11: areturn
      12: ldc           #10     // String Wrong
      14: areturn
}

As you can see, index 6 is now changed to ifne. Running ByteCodeEditorTest now will produce results which we were after.

$ java ByteCodeEditorTest TEST
Right!

ByteCodeEditorTest class file was successfully modified to alter program flow without the need for re-compilation or decompilation.

While this is a simple modification to a class file, we can do complex changes of adding new methods, classes, injecting code etc. using Javassist library. I will cover complex scenarios in another article, but will give a high level overview of frequently used in APIs in next section.

Other Javassist APIs

While I covered bytecode manipulation, Javassist is a powerful library which can be used for complex changes. Highlighting some of those features here.

javassist.CtMethod class can be used to inject new methods to existing class files.

//Defrosts so that the class can be modified
_ctClass.defrost();
CtMethod _ctMethod = CtNewMethod.make("public int newMethodFromJA() { return 1; }", _ctClass);
_ctClass.writeFile();

javassist.CtMethod class can also be used to inject code to existing class/methods using insertBefore(), insertAfter() and insertAt() methods.

for(CtMethod method:_ctClass.getDeclaredMethods()){
    //Defrosts so that the class can be modified
    _ctClass.defrost();
    method.insertBefore("System.out.println(\"Before every method call....\");");
    _ctClass.writeFile();
}

Javassist can also be used for static analysis of class files by displaying all method code (disassembled) of a class file or to display bytecode of a class file.

//Display Method Code
PrintStream _printStream = new PrintStream(System.out);
InstructionPrinter instructionPrinter = new InstructionPrinter(_printStream);
for(CtMethod method:_ctClass.getDeclaredMethods()){
    System.out.println("Method: " + method.getName());
    instructionPrinter.print(method);
}
//Display Bytecode
for(CtMethod _ctMethods:_ctClass.getDeclaredMethods()){
    _ctClass.defrost();
    System.out.println("Method: " +_ctMethods.getName());
    CodeAttribute _codeAttribute = _ctMethods.getMethodInfo().getCodeAttribute();
    CodeIterator _codeIterator = _codeAttribute.iterator();
    while (_codeIterator.hasNext()) {
        int _indexOfInstruction = _codeIterator.next();
        int _indexValue8Bit = _codeIterator.byteAt(_indexOfInstruction);
        System.out.println(Mnemonic.OPCODE[_indexValue8Bit]);
    }
}

Full source code for all snippets referenced in this article is available in my github page.

Java String Concatenation and Performance

The quick and dirty way to concatenate strings in Java is to use the concatenation operator (+). This will yield a reasonable performance if you need to combine two or three strings (fixed-size). But if you want to concatenate n strings in a loop, the performance degrades in multiples of n. Given that String is immutable, for large number of string concatenation operations, using (+) will give us a worst performance. But how bad ? How StringBuffer, StringBuilder or String.concat() performs if we put them on a performance test ?. This article will try to answer those questions.

We will be using Perf4J to calculate the performance, since this library will give us aggregated performance statistics like mean, minimum, maximum, standard deviation over a set time span. In the code, we will concatenate a string (*) repeatedly 50,000 times and this iteration will be performed 21 times so that we can get a good standard deviation. The following methods will be used to concatenate strings.

And finally we will look at the byte code to see how each of these operations perform. Let’s start building the class. Note that each of the block in the code should be wrapped around the Perf4J library to calculate the performance in each iteration. Let’s define the outer and inner iterations first.

private static final int OUTER_ITERATION=20;
private static final int INNER_ITERATION=50000;

Now let’s implement each of the four methods mentioned in the article. Nothing fancy here, plain implementations of (+), String.concat(), StringBuffer.append() & StringBuilder.append().

String addTestStr = "";
String concatTestStr = "";
StringBuffer concatTestSb = null;
StringBuilder concatTestSbu = null;

for (int outerIndex=0;outerIndex<=OUTER_ITERATION;outerIndex++) {
    StopWatch stopWatch = new LoggingStopWatch("StringAddConcat");
    addTestStr = "";
    for (int innerIndex=0;innerIndex<=INNER_ITERATION;innerIndex++)
        addTestStr += "*";
    stopWatch.stop();
}

for (int outerIndex=0;outerIndex<=OUTER_ITERATION;outerIndex++) {
    StopWatch stopWatch = new LoggingStopWatch("StringConcat");
    concatTestStr = "";
    for (int innerIndex=0;innerIndex<=INNER_ITERATION;innerIndex++)
        concatTestStr = concatTestStr.concat("*");
    stopWatch.stop();
}

for (int outerIndex=0;outerIndex<=OUTER_ITERATION;outerIndex++) {
    StopWatch stopWatch = new LoggingStopWatch("StringBufferConcat");
    concatTestSb = new StringBuffer();
    for (int innerIndex=0;innerIndex<=INNER_ITERATION;innerIndex++)
        concatTestSb.append("*");
    stopWatch.stop();
}

for (int outerIndex=0;outerIndex<=OUTER_ITERATION;outerIndex++) {
    StopWatch stopWatch = new LoggingStopWatch("StringBuilderConcat");
    concatTestSbu = new StringBuilder();
    for (int innerIndex=0;innerIndex<=INNER_ITERATION;innerIndex++)
        concatTestSbu.append("*");
    stopWatch.stop();
}

Let’s run this program and generate the performance metrics. I ran this program in a 64-bit OS (Windows 7), 32-bit JVM (7-ea), Core 2 Quad CPU (2.00 GHz) with 4 GB RAM.

The output from the 21 iterations of the program is plotted below.

Well, the results are pretty conclusive and as expected. One interesting point to notice is how better String.concat performs. We all know String is immutable, then how the performance of concat is better. To answer the question we should look at the byte code. I have included the whole byte code in the download package, but let’s have a look at the below snippet.

45: new #7; //class java/lang/StringBuilder
48: dup
49: invokespecial #8; //Method java/lang/StringBuilder."<init>":()V
52: aload_1
53: invokevirtual #9; //Method java/lang/StringBuilder.append:
    (Ljava/lang/String;)Ljava/lang/StringBuilder;
56: ldc #10; //String *
58: invokevirtual #9; //Method java/lang/StringBuilder.append:
    (Ljava/lang/String;)Ljava/lang/StringBuilder;
61: invokevirtual #11; //Method java/lang/StringBuilder.toString:()
    Ljava/lang/String;
64: astore_1

This is the byte code for String.concat(), and its clear from this that the String.concat is using StringBuilder for concatenation and the performance should be as good as String Builder. But given that the source object being used is String, we do have some performance loss in String.concat.

So for the simple operations we should use String.concat compared to (+), if we don’t want to create a new instance of StringBuffer/Builder. But for huge operations, we shouldn’t be using the concat operator, as seen in the performance results it will bring down the application to its knees and spike up the CPU utilization. To have the best performance, the clear choice is StringBuilder as long as you do not need thread-safety or synchronization.

Full source of this application is available in my github page.