Site icon JVM Advent

Changes to String.substring in Java 7

It is common knowledge that Java optimizes the substring operation for the case where you generate a lot of substrings of the same source string. It does this by using the (value, offset, count) way of storing the information. See an example below:

In the above diagram you see the strings “Hello” and “World!” derived from “Hello World!” and the way they are represented in the heap: there is one character array containing “Hello World!” and two references to it. This method of storage is advantageous in some cases, for example for a compiler which tokenizes source files. In other instances it may lead you to an OutOfMemorError (if you are routinely reading long strings and only keeping a small part of it – but the above mechanism prevents the GC from collecting the original String buffer). Some even call it a bug. I wouldn’t go so far, but it’s certainly a leaky abstraction because you were forced to do the following to ensure that a copy was made: new String(str.substring(5, 6)).

This all changed in May of 2012 or Java 7u6. The pendulum is swung back and now full copies are made by default. What does this mean for you?

Thankfully the development of Java is an open process and such information is at the fingertips of everyone!

A couple of more references (since we don’t say pointers in Java :-)) related to Strings:

Hope I didn’t strung you along too much and you found this useful! Until next time
– Attila Balazs

Meta: this post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on! Want to write for the blog? We are looking for contributors to fill all 24 slot and would love to have your contribution! Contact Attila Balazs to contribute!

Author: gpanther

Exit mobile version