Monday, May 8, 2023

What is String deduplication in Java? How to Save Memory from Duplicate String in Java 8? [Answer]

You might not be aware that Java 8 update 20 has introduced a new feature called "String deduplication" which can be used to save memory from duplicate String objects in Java application, which can improve the performance of your Java application and prevent java.lang.OutOfMemoryError if your application makes heavy use of String. If you have profiled a Java application to check which object is taking the bulk of memory, you will often find char[] object at the top of the list, which is nothing but an internal character array used by String object. Some of the tools and profilers might show this as java.lang.String[] as well like Java Flight Recorder, but they are essentially pointing to the same problem i.e. a major portion of memory is occupied with String objects.

Since from Java 7 onward, String has stopped sharing character array with sub-strings, the memory occupied by String object has gone higher, which had made the problem even worse.

If you remember, earlier both substring and String share the same character objects (see how Substring works in Java), which was actually a bug that had the potential to cause a serious memory leak. The bug was fixed in JDK 7, but it created this new problem.

The String deduplication is trying to bridge that gap. It reduces the memory footprint of String objects on the Java Heap space by taking advantage of the fact that many String objects are identical. Instead of each String object pointing to its own character array, identical String objects can point to the same character array.

Btw, this is not exactly the same as it was before Java 7 update 6, where substring also points to the same character array, but can greatly reduce memory occupied by duplicate String in JVM. Anyway, In this article, you will see how you can enable this feature in Java 8 to reduce memory consumed by duplicate String objects.

Btw, if you are not familiar with new features on Java 8 then I suggest you first go through a comprehensive and up-to-date Java course like The Complete Java MasterClass on Udemy. It's also very affordable and you can buy in just $10 on Udemy sales which happen every now and then.




How to enable String deduplication in Java 8

String deduplication is not enabled by default in Java 8 JVM. You can enable the String deduplication feature by using -XX:+UseStringDeduplication option. Unfortunately, String deduplication is only available for the G1 garbage collector, so if you are not using G1 GC then you cannot use the String deduplication feature.

It means just providing -XX:+UseStringDeduplication will not work, you also need to turn on the G1 garbage collector using -XX:+UseG1GC option.

String deduplication also doesn't consider relatively young String for processing. The minimal age of processed String is controlled by -XX:StringDeduplicationAgeThreshold=3 option. The default value of this parameter is 3.


Now, you might be thinking that how does this method compare with the traditional way to reduce memory due to duplicate String e.g. by using intern() method of java.lang.String class? Well, this approach has an advantage because you don't need to write a single line of code. Just enable this feature using JVM parameters and you are done.

If you have ever optimized your code by using String.intern() method then you know that it's not easy. It not only compromise readability by adding additional lines of code without adding any functionality but also increase the size of the code.  

Btw, if you are interested in learning more about G1 Garbage collectors,  I suggest checking out these Advanced Java Performance courses to learn JVM Tuning, Garbage Collection, and Perperofmrance tuning.  which covers some good information about G1 Garbage collectors.






Important points about String Deduplication Features

Here are some of the important points about the String deduplication feature of Java 8:

1. This option is only available from Java 8 Update 20 JDK release.

2. This feature will only work along with the G1 garbage collector, it will not work with other garbage collectors like Concurrent Mark Sweep Garbage collector.

3. You need to provide both -XX:+UseG1GC and -XX:+StringDeduplication JVM options to enable this feature, the first one will enable the G1 garbage collector and the second one will enable the String deduplication feature within G1 GC.

4. You can optionally use -XX:+PrintStringDeduplicationStatistics JVM option to analyze what is happening through the command line.

5. Not every String is eligible for deduplication, especially young String objects are not visible, but you can control this by using  -XX:StringDeduplicationAgeThreshold=3 option to change when Strings become eligible for deduplication.

6. It is observed in general this feature may decrease heap usage by about 10%, which is very good, considering you don't have to do any coding or refactoring.

7. String deduplication runs as a background task without stopping your application.

If you want to learn more about Java performance, JVM options, and profiling Java applications, I suggest reading Java Performance Companion by Charlie Hunt. It is an up-to-date book one of the best books to learn and understand tools and techniques of Java performance tuning It also contained some really good information about G1 Garbage Collector, which is more relevant related to Sring deduplication.

Use String Deduplication to Save Memory from Duplicate String in Java 8



That's all about how to use enable String deduplication in Java 8 to reduce memory consumed by duplicate String objects. This is one of the useful features to know about it but unfortunately, it is only available for G1 Garbage Collector. You also need Java 8 Update 20 to use to enable this option. 

Hopefully, in Java 9, when the G1 Garbage collector will become the default collector, it can use this feature to further improve performance. If we are lucky, we may also see this feature extended for other major garbage collectors like the Concurrent Mark Sweep Garbage collector.

Other Java GC and JVM Articles you may like

Thanks for reading this article so far. If you found my explanation of the String deduplication feature of Java JVM then please share it with your friends and colleagues. If you have any questions, feel free to ask. 


2 comments :

Cristian Daniel Ortiz Cuellar said...

Sorry i dont get it. All the String literals are interned right? how can be a duplicate Strings? If i got String name1 = new String("JOHN");String name2 = new String("JOHN"); only 1 String will be on the heap? if i set XX:+UseG1GC -XX:+UseStringDeduplication? Sorry if the question is plain!!

javin paul said...

Hello Christian, yes, but JOHN and JOHNN will each have their own character array even though one is substring of other. In second paragraph of this article I have mentioned that "Since from Java 7 onward, String has stopped sharing character array with sub-strings, the memory occupied by String object has gone higher, which had made the problem even worse."

That's the exact reason of using this flag. Hope this clears your doubt.

Post a Comment