Today I learned that Javascript does not need a StringBuilder for accumulating a large number of concatinations. As a Java programmer, that came as a shock to me. This article summarizes my exploration of this fact.

Effective Java

A StringBuilder is grilled into Java programmers. Effective Java [1] says, “Item 51: Beware the performance of string concatenation.” Many books, articles, blog posts and stack overflow questions have extensively covered this in detail [2] [3] [4] [5] [6] [7] [8].

Now you can see why it came as such a shock to me. I assumed Javascript would do the same thing as Java to maintain the immutability of strings. Java allocates a new array containing the result of each successive concatenation resulting in the O(N^2) runtime complexity.

Experiments

When I first learned this from a co-worker, I couldn’t believe it! I put together some experiments to confirm. I measured how long it would take to do different powers of 2 of concatenations in Java and Javascript. I am not interested in a comparison between the languages. I am curious about how runtime grows as the number of concatenations grows.

If you would like to checkout a copy of the source code take a look at the last two sections (Java source and Javascript source) of this page or check my repo.

You can try out the Javascript experiment in your browser at my git hub page.

I’ve summarized how long each number of concats took in the table below. I looked at using naive += String concatenation, Java’s StringBuilder, and Javascript’s Array.join.

  # Concats                  
Algorithm 2^18 2^19 2^20 2^21 2^22 2^23 2^24 2^25 2^26 2^27
Java8 naive += (ms) 21945 94077 412675 1886317 8529431 TOO LONG TOO LONG TOO LONG TOO LONG TOO LONG
Java8 StringBuilder (ms) 5 6 11 39 70 119 209 492 898 1738
Java14 naive += (ms) 9726 38619 173218 860218 4435485 TOO LONG TOO LONG TOO LONG TOO LONG TOO LONG
Java14 StringBuilder (ms) 2 5 12 20 41 115 185 379 943 1754
Chrome Javascript naive += (ms) 30 79 144 408 906 1650 3418 7411 14548 OOM
Chrome Javascript Array.join (ms) 18 37 71 156 318 621 1416 3275 5385 OOM
Firefox Javascript naive += (ms) 14 12 42 68 172 272 518 1836 5075 9850
Firefox Javascript Array.join (ms) 11 18 44 70 149 384 766 1846 3655 10912
NodeJS Javascript naive += (ms) 29 112 206 460 813 2233 3599 6384 13266 33434
NodeJS Javascript Array.join (ms) 22 54 93 245 425 753 1920 4048 20333 OOM

At 2^23, the naive Java String concatenation took too long to run. I gave up at 2^22 when it took over 2 hours! Each doubling of the input size results in more than double the runtime, as we expected because of the O(N^2) complexity. I tested both Java8 and and Java14, expecting it to be optimized. Even though these stack overflow articles [9] and [10] say its been optimized, the optimization does not fix the concatenation in the loop. I believe the optimization cannot apply over multiple statements, only with in a single statement. However JEP 280: Indify String Concatenation has laid the groundwork to more easily optimize String concatenation in the future without having to tweak the the bytecode on the compiler.

With Javascript the growth looks kind of linear. Lets investigate with a plot. We won’t plot the naive Java concatenation since it grows too quickly and the scale would make it difficult to analyze the growth of the remaining methods. I also left out Java 14 StringBuilder since it performed almost the same as Java 8.

After plotting, the growth looks linear for all of these variations. However, it’s best to keep reading for a bit of complexity analysis.

Why Javascript Does not Need a StringBuilder

Following the experiments, I wanted to know why. My research led me to this email chain where Dr. Axel Rauschmayer proposes adding StringBuilder to the next version of ECMAScript:

Is this worthy of ES.next support? Or does it belong into a library?

The two concatenation approaches I know of are:

  1. via +=
  2. push() into an array, join() it after the last push()

(1) can’t possibly be efficient, but if (2) is OK on all(!) platforms, then a library would be OK. However, given how frequently this is used, I would like this to become part of the standard library.

Tom Schuster responds that it’s not necessary.

(1) is in fact really good optimized in modern engines. (In case you are interested search for “Ropes: an alternative to strings”)

I think today it’s not a very good idea to propose methods on probably existing performance problems.

I did more digging and learned that ropes are a datastructure the allow immutable String concatenations in O(lgN) time.

Firefox’s Javascript VM’s sourcecode comments on what Tom Schuster was saying the email chain.

To avoid O(n^2) char buffer copying, a “rope” node (JSRope) can be created to represent a delayed string concatenation. Concatenation (called flattening) is performed if and when a linear char array is requested. In general, ropes form a binary dag whose internal nodes are JSRope string headers with no associated char array and whose leaf nodes are linear strings.

Details about the the rope data structure are available on Wikipedia. Concetenation takes O(lgN) time where N is the number of concats in the rope so far. As a result, to concat N times, the complexity is O( sum from 1 to N (lg N)) which is O(N lg N ) as explained in this Stackoverflow article. It’s not as good as using a StringBuilder for really large data, which amoritzes to O(N). However, for the number of concatenations that will work in the browser (2^27), the rope is good enough because behaves linearly until then. If you plan to need more than that, at some point it will be better to use a builder.

Javascript StringBulder

I tried implementing my own StringBuilder for fun to see if I could improve upon the 10 seconds it takes in Firefox and failed. My version of StringBuilder implemented in Javascript in Firefox took over 70 seconds to handle 2^27 concatenations. Javascript provides a couple ways of creating strings directly from arrays.

The first I tried was String.fromCharCode. This method looked like it would be faster; however, at 2^17 it fails with Uncaught RangeError: Maximum call stack size exceeded because String.fromCharCode uses function parameter arguments. As a result, when you use ... array expansion or Function.apply, you blow up your stack.

In my second attempt, I tried using TextDecoder and TextEncoder. In Firefox took over 70 seconds to handle 2^27 concatenations! As a result, you cannot implement a StringBuilder efficiently in Javascript yet.

I was able to improve the performance for 2^27 concatenations by using Rust’s String and connecting it to Javascript through WASM. Rust’s String acts similarly to Java’s StringBuilder using an expanding buffer that amortizes each concat to be O(1) over all the concats. Unfortunately, it only brought it down to 23 seconds, which is still worse than Javascript’s += and Array.join.

Discussion

Could Java implement a similar optimization to the JVM so that people who overlook this mistake aren’t hit with an O(N^2) but a O(NlgN) algorithm instead?

Are ropes good enough? At 2^24 to 2^27 we’re measuring the concatenations in seconds and the growth appears linear until that point even though asymptotically it is O(NlgN). You could argue that’s good enough because that’s the memory limit of the browser.

Would a StringBuilder implemented in WASM perform significantly better than += and Array.join if there was a sufficient amount of concatenations? Unfortunately, I was limited to 2^27 concatenations before running out of memory. If I was able to exceed that memory limit, I expect that WASM, and even my Javascript StringBuilder to eventually perform better due to the O(N) versus O(NlgN) complexity.

Finally, would we see a performance improvement if we implemented a StringBuilder in the javascript virtual machine instead of JSRope and rebuilt the browser?

Java source

public static void main(String [] args) {
    final int base;
    final int startPower;
    final int concatPowerLimit;
    final int powerLimit;

    if(args.length >= 4) {
        base = Integer.parseInt(args[0]);
        startPower = Integer.parseInt(args[1]);
        concatPowerLimit = Integer.parseInt(args[2]);
        powerLimit = Integer.parseInt(args[3]);
    } else {
        base = 2;
        startPower = 1;
        concatPowerLimit = 21; // at least 2 hours to compute the 22nd one
        powerLimit = 27;
    }

    for(int power = startPower; power <= 27; power++) {
        int size = (int)Math.pow(2, power);
        runStringBuilderExperiment(base, power, size);
        if (power <= concatPowerLimit) {
            runConcatExperiment(base, power, size);
        }
    }
}

public static void runConcatExperiment(int base, int power, int size) {
    String result = "";
    long start = System.currentTimeMillis();
    for (int i = 0; i < size; i++) {
        result += (i%10);
    }
    long end = System.currentTimeMillis();
    long duration = end - start;
    System.out.println(String.format("concat %d^%d %d %d %d", base, power, size, result.length(), duration));
}

public static void runStringBuilderExperiment(int base, int power, int size) {
    StringBuilder builder = new StringBuilder();
    long start = System.currentTimeMillis();
    for (int i = 0; i < size; i++) {
        builder.append(i%10);
    }
    String result = builder.toString();
    long end = System.currentTimeMillis();
    long duration = end - start;
    System.out.println(String.format("builder %d^%d %d %d %d", base, power, size, result.length(), duration));
}

Javascript Naive Concat and Array.join Source

function runConcatExperiment(size) {
    var result = "";
    var start = new Date().getTime();
    for (var i = 0; i < size; i++) {
        result += (i%10);
    }
    var end = new Date().getTime();
    var duration = end - start;
    console.log("concat %d %d %d", size, result.length, duration);
    return duration;
}

function runArrayJoinExperiment(size) {
    var builder = [];
    var start = new Date().getTime();
    for (var i = 0; i < size; i++) {
        builder.push(i%10);
    }
    var result = builder.join();
    var end = new Date().getTime();
    var duration = end - start;
    console.log("concat %d %d %d", size, result.length, duration);
    return duration;
}

function addRow(base, power, size, concatDuration, arrayJoinDuration) {
    var resultsTable = document.getElementById("results");
    var tableRow = document.createElement("tr");
    var powerCell = document.createElement("td");
    powerCell.innerText = base + "^" + power;
    tableRow.append(powerCell);
    var sizeCell = document.createElement("td");
    sizeCell.innerText = size;
    tableRow.append(sizeCell);
    var concatCell  = document.createElement("td");
    concatCell.innerText = concatDuration;
    tableRow.append(concatCell);
    var joinCell  = document.createElement("td");
    joinCell.innerText = arrayJoinDuration;
    tableRow.append(joinCell);
    resultsTable.appendChild(tableRow);
}

function runExperiment() {
    var baseInput = document.getElementById("base");
    var base = baseInput.value;
    var powerLimitInput = document.getElementById("powerLimit");
    var powerLimit = powerLimitInput.value;
    for(var power = 1; power <= powerLimit; power++) {
        var size = Math.pow(2, power);
        var concatDuration = runConcatExperiment(size);
        var arrayJoinDuration = runArrayJoinExperiment(size);
        addRow(base, power, size, concatDuration, arrayJoinDuration);
    }
}

Javascript String Builder

function StringBuilder() {

    // cannot use String.fromCharCode due to
    // Uncaught RangeError: Maximum call stack size exceeded
    // so we are stuck with TextEncoder and TextDecoder
    // https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder
    // https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder
    let utf8Decoder = new TextDecoder();
    let utf8Encoder = new TextEncoder();

    this.bufferConsumed = 0;
    this.capacity = 128;
    this.buffer = new Uint8Array(128);

    this.append = function (strToAdd) {
        // O(N) copy but ammortized to O(1) over all concats
        var encodedStr = utf8Encoder.encode(strToAdd);
        while (encodedStr.length + this.bufferConsumed > this.capacity) {
            var tmpBuffer = this.buffer;
            this.buffer = new Uint8Array(this.capacity*2);
            this.capacity = this.capacity*2;
            for(var i = 0; i < this.bufferConsumed; i++) {
                this.buffer[i] = tmpBuffer[i];
            }
        }

        // add the characters to the end
        for(var i = 0; i < encodedStr.length; i++) {
            this.buffer[i+this.bufferConsumed] = encodedStr[i];
        }
        this.bufferConsumed += encodedStr.length;

        return this;
    }

    this.build = function() {
        return utf8Decoder.decode(this.buffer.slice(0, this.bufferConsumed));
    }
}

Rust WASM String

mod utils;

use wasm_bindgen::prelude::*;

// When the `wee_alloc` feature is enabled, use `wee_alloc` as the global
// allocator.
#[cfg(feature = "wee_alloc")]
#[global_allocator]
static ALLOC: wee_alloc::WeeAlloc = wee_alloc::WeeAlloc::INIT;

#[wasm_bindgen]
pub struct RustStringBuilder {
    // String push_str uses a vector internally
    // vector amortizes push to O(1) over many appends
    internal: String,
}

#[wasm_bindgen]
impl RustStringBuilder {

    #[wasm_bindgen(constructor)]
    pub fn new() -> RustStringBuilder {
        RustStringBuilder { internal: String::from("") }
    }

    pub fn append(&mut self, to_append: String) {
        self.internal.push_str(to_append.as_str())
    }

    pub fn build(&self) -> String  {
        String::from(self.internal.as_str())
    }
}