intern() is an interesting function in java.lang.String object. intern() function eliminates duplicate string objects from the application and has potential to reduce overall memory consumption of your application. In this post, lets learn more about this intern() function.
1. How does the String intern() function work?
In Java heap memory, a pool of string objects is maintained. When you invoke an intern() function on a String object, JVM will check whether this string object already exists in the pool. If it exists, then that same object is returned back to the invoker. If the string object doesnt exist, then this string object is added to the pool and the newly added string object is returned to the invoker.
Its always easy to learn through examples and pictures. Lets do it. Lets look at the below code snippet:
1: String s1 = new String("yCrash").intern(); 2: String s2 = new String("yCrash").intern();
Fig: JVM heap memory when launched initially
All the objects that your application creates are stored in the JVMs heap memory. This JVM heap memory internally has a string intern pool. When you launch the program initially, JVMs heap memory will have no string objects.
Fig: JVM heap memory when String s1 = new String(yCrash).intern(); is executed
When the first statement String s1 = new String(yCrash).intern(); is executed, JVM will check whether the yCrash string object is present in the intern string pool. Since it doesnt exist, this yCrash string will be added to the intern string pool and this newly created String objects reference will be returned back to s1.
Fig: JVM heap memory when String s2 = new String(yCrash).intern(); is executed
When the second statement String s2 = new String(yCrash).intern(); is executed, JVM will once again check whether the yCrash string object is present in the intern string pool. This time, yCrash string object is present in the intern string pool because it got added when the statement #1 is executed. Now this old string objects reference will be returned to s2. Now both s1 and s2 will be pointing to the same yCrash string object. Thus, duplicate string object yCrash created in statement #2 will be discarded.
2. How String works without intern() function?
1: String s3 = new String("yCrash"); 2: String s4 = new String("yCrash");
Fig: JVM heap memory when String s3 = new String(yCrash); is executed
When the first statement String s3 = new String(yCrash); is executed, JVM will add the yCrash string object to the heap memory, but not within the intern string pool.
Fig: JVM heap memory when String s4 = new String(yCrash); is executed
When the second statement String s4 = new String(yCrash); is executed, JVM will create a new yCrash string object in the heap memory. Thus duplicate yCrash will be created in the memory. In case if your application is creating n yCrash objects without invoking intern(), n yCrash string objects will be created in the memory. It will lead to a considerable amount of memory wastage.
3. How intern() and == work?
Since s1 and s2 are pointing to the same yCrash string object, when you invoke == operation between s1 and s2 as shown below you will get true as result.
Since s3 and s4 are pointing to two different yCrash string objects, when you invoke == operation between s3 and s4 as shown below you will get false as result.// true will be printed System.out.println(s1 == s2);
// false will be printed System.out.println(s3 == s4);
4. In which JVM memory region Intern String pool are stored?
JVM memory has following regions:
a. Heap region (i.e. Young Generation + Old Generation)
b. Metaspace
c. Others region
To learn about these JVM memory regions, you may refer to this video clip. In the earlier versions of Java starting from 1 to 6, string intern pool was stored in the Perm Generation. Starting from java 7, String intern pool is stored in the JVMs heap memory. To confirm it, we conducted this simple experiment
5. Is it better to use intern() or -XX:+UseStringDeduplication?
When you pass -XX:+UseStringDeduplication JVM argument during application startup, JVM will try to eliminate duplicate strings as part of the garbage collection process. During the garbage collection process, JVM inspects all the objects in memory. As part of this process, it tries to identify duplicate strings among them and tries to eliminate them. However, there are certain limitations in using the -XX:+UseStringDeduplication JVM argument, such as it will only work with G1 GC algorithm and eliminate duplicates only on the long living string objects, To learn more about this argument you can refer to this post. Here is an interesting case study of a major application which tried to use the -XX:+UseStringDeduplication JVM argument.
On the other hand, intern() function can be used with any GC algorithms and on both short-lived/long-lived objects. However, intern() function might impact application response time more than -XX:+UseStringDeduplication, for more details refer to this blog post
6. What is the performance impact of intern() function?
Based on this post, you might have understood that invoking the intern() function on the string objects has a potential to eliminate duplicate strings from memory, thus reducing overall memory utilization. However, it can have a toll on the response time and CPU utilization. To understand the performance impact of using intern() function, you may refer to this post
Video
https://youtu.be/HiL2634pZaA