Java String Concatenation and Performance

The quick and dirty way to concatenate strings in Java is to use the concatenation operator (+). This will yield a reasonable performance if you need to combine two or three strings (fixed-size). But if you want to concatenate n strings in a loop, the performance degrades in multiples of n. Given that String is immutable, for large number of string concatenation operations, using (+) will give us a worst performance. But how bad ? How StringBuffer, StringBuilder or String.concat() performs if we put them on a performance test ?. This article will try to answer those questions.

We will be using Perf4J to calculate the performance, since this library will give us aggregated performance statistics like mean, minimum, maximum, standard deviation over a set time span. In the code, we will concatenate a string (*) repeatedly 50,000 times and this iteration will be performed 21 times so that we can get a good standard deviation. The following methods will be used to concatenate strings.

And finally we will look at the byte code to see how each of these operations perform. Let’s start building the class. Note that each of the block in the code should be wrapped around the Perf4J library to calculate the performance in each iteration. Let’s define the outer and inner iterations first.

private static final int OUTER_ITERATION=20;
private static final int INNER_ITERATION=50000;

Now let’s implement each of the four methods mentioned in the article. Nothing fancy here, plain implementations of (+), String.concat(), StringBuffer.append() & StringBuilder.append().

String addTestStr = "";
String concatTestStr = "";
StringBuffer concatTestSb = null;
StringBuilder concatTestSbu = null;

for (int outerIndex=0;outerIndex<=OUTER_ITERATION;outerIndex++) {
    StopWatch stopWatch = new LoggingStopWatch("StringAddConcat");
    addTestStr = "";
    for (int innerIndex=0;innerIndex<=INNER_ITERATION;innerIndex++)
        addTestStr += "*";
    stopWatch.stop();
}

for (int outerIndex=0;outerIndex<=OUTER_ITERATION;outerIndex++) {
    StopWatch stopWatch = new LoggingStopWatch("StringConcat");
    concatTestStr = "";
    for (int innerIndex=0;innerIndex<=INNER_ITERATION;innerIndex++)
        concatTestStr = concatTestStr.concat("*");
    stopWatch.stop();
}

for (int outerIndex=0;outerIndex<=OUTER_ITERATION;outerIndex++) {
    StopWatch stopWatch = new LoggingStopWatch("StringBufferConcat");
    concatTestSb = new StringBuffer();
    for (int innerIndex=0;innerIndex<=INNER_ITERATION;innerIndex++)
        concatTestSb.append("*");
    stopWatch.stop();
}

for (int outerIndex=0;outerIndex<=OUTER_ITERATION;outerIndex++) {
    StopWatch stopWatch = new LoggingStopWatch("StringBuilderConcat");
    concatTestSbu = new StringBuilder();
    for (int innerIndex=0;innerIndex<=INNER_ITERATION;innerIndex++)
        concatTestSbu.append("*");
    stopWatch.stop();
}

Let’s run this program and generate the performance metrics. I ran this program in a 64-bit OS (Windows 7), 32-bit JVM (7-ea), Core 2 Quad CPU (2.00 GHz) with 4 GB RAM.

The output from the 21 iterations of the program is plotted below.

Well, the results are pretty conclusive and as expected. One interesting point to notice is how better String.concat performs. We all know String is immutable, then how the performance of concat is better. To answer the question we should look at the byte code. I have included the whole byte code in the download package, but let’s have a look at the below snippet.

45: new #7; //class java/lang/StringBuilder
48: dup
49: invokespecial #8; //Method java/lang/StringBuilder."<init>":()V
52: aload_1
53: invokevirtual #9; //Method java/lang/StringBuilder.append:
    (Ljava/lang/String;)Ljava/lang/StringBuilder;
56: ldc #10; //String *
58: invokevirtual #9; //Method java/lang/StringBuilder.append:
    (Ljava/lang/String;)Ljava/lang/StringBuilder;
61: invokevirtual #11; //Method java/lang/StringBuilder.toString:()
    Ljava/lang/String;
64: astore_1

This is the byte code for String.concat(), and its clear from this that the String.concat is using StringBuilder for concatenation and the performance should be as good as String Builder. But given that the source object being used is String, we do have some performance loss in String.concat.

So for the simple operations we should use String.concat compared to (+), if we don’t want to create a new instance of StringBuffer/Builder. But for huge operations, we shouldn’t be using the concat operator, as seen in the performance results it will bring down the application to its knees and spike up the CPU utilization. To have the best performance, the clear choice is StringBuilder as long as you do not need thread-safety or synchronization.

Full source of this application is available in my github page.

Recursive File Tree Traversing in Java using NIO.2

In my previous article about NIO.2, we have seen how to implement a service which monitors a directory recursively for any changes. In this article we will look at another improvement in JDK7 (NIO.2) called FileWatcher. This will allow us to implement a search or index. For example, we can find all the $some_pattern$ files in a given directory recursively and (or) delete / copy all the all the $some_pattern$ files in a file system. In a nutshell FileWatcher will get us a list of files from a file system based on a pattern which can be processed based on our requirement.

The FileVistor is an interface and our class should implement it. We have two methods before the traversal starts at the directory level & file level, and one method after the traversal is complete, which can be used for clean up or post processing. The important points from the interface is given in the below diagram.

While I think FileVistor is the best way to handle this, JDK7 NIO.2 has given another option to achieve the same, a class named SimpleFileVistor (which implements FileVisitor). It should be self explanatory, a simplified version of FileVisitor. We can extend the SimpileFileVisitor into our class and then traverse the directory with overriding only the methods we need, and if any step fails we will get an IOException.

According to me, FileVisitor is better because it forces you to implement the methods (sure, you can leave them blank) since these methods are really important if you plan to implement recursive delete / copy or work with symbolic links. For example, if you are copying some files to a directory you should make sure that the directory should be created first before copying which can be done in the preVisitDirectory().

The other area of concern is symbolic links and how this will be handled by FileVisitor. This can be achieved using FileVisitOption enums. By default, the symbolic links are not followed so that we are not accidentally deleting any directories in a recursive delete. If you want to handle manually, there are two options FOLLOW_LINKS (follow the links) & DETECT_CYCLES (catch circular references).

If you want to exclude some directory from FileVisitor or if you are looking for a directory or a file in the file system and once you find it you want to stop searching that can be implemented by using the return type of FileVisitor, called FileVisitResult. SKIP_SUBTREE allows us to skip directories & subdirectories. TERMINATE stops the traversing.

The search can be initiated by the walkFileTree() method in Files class. This will take the starting directory (or root directory in your search) as a parameter. You can also define Integer.MAX_VALUE if you want to manually specify the depth. And as mentioned in the above diagram, define FileVisitOption for symbolic links if needed.

Enough with the API description, let’s write some sample code to implement what we discussed. We will be using the SimpleFileVisitor so that in our demo we don’t need to implement all the methods.

Let’s start with defining the pattern which needs to be searched for. In this example, we will search for all the *txt file / directory names recursively in any given directory. This can be done with getPathMatcher() in FileSystems

PathMatcher pathMatcher = FileSystems.getDefault().getPathMatcher("glob:" + "*txt*");

Now, let’s initiate the search by calling walkFileTree() as mentioned below. We are not defining anything specific for symbolic links so, by default its NO_FOLLOW.

Files.walkFileTree(Paths.get("D://Search"), fileVisitor);

Let’s go through the implementations of class SimpleFileVisitor, we will be overriding only visitFile() & preVisitDirectory() in this example, but its a good practice to override all the five methods so that we have more control over the search. The implementation is pretty simple, based on the pattern the below methods will search for a directory or file and print the path.

@Override
public FileVisitResult visitFile(Path filePath, BasicFileAttributes basicFileAttributes) {
    if (filePath.getName() != null && pathMatcher.matches(filePath.getName()))
        System.out.println("FILE: " + filePath);
    return FileVisitResult.CONTINUE;
}

@Override
public FileVisitResult preVisitDirectory(Path directoryPath) {
    if (directoryPath.getName() != null && pathMatcher.matches(directoryPath.getName()))
        System.out.println("DIR: " + directoryPath);
    return FileVisitResult.CONTINUE;
}

Once this is completed, we can use the postVisitDirectory() to perform additional tasks or any cleanup if needed. A sample output from my machine is given below.

The complete source code is given below. Please note that you need JDK7 to run this code. Source is also available in my github page.

public class NIO2_FileVisitor extends SimpleFileVisitor<Path> {
    private PathMatcher pathMatcher;

    @Override
    public FileVisitResult visitFile(Path filePath, BasicFileAttributes basicFileAttributes) {
        if (filePath.getName() != null && pathMatcher.matches(filePath.getName()))
                System.out.println("FILE: " + filePath);
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult preVisitDirectory(Path directoryPath) {
        if (directoryPath.getName() != null && pathMatcher.matches(directoryPath.getName()))
                System.out.println("DIR: " + directoryPath);
        return FileVisitResult.CONTINUE;
    }

    public static void main(String[] args) throws IOException {
        NIO2_FileVisitor fileVisitor = new NIO2_FileVisitor();
        fileVisitor.pathMatcher = FileSystems.getDefault().getPathMatcher("glob:" + "*txt*");
        Files.walkFileTree(Paths.get("D://Search"), fileVisitor);
    }
}