0% found this document useful (0 votes)

54 views

The Curious Case of Java String HashCode

The document discusses the performance of lookups in a Java HashSet for strings of different lengths. It finds that lookups for some strings are much slower than others. This is because Java's default String hashCode implementation caches the hashCode value, but recomputes it from the characters if the initial value is 0. Strings that hash to 0 therefore have slower lookups. To optimize this, a flag could be used to skip recomputation after the first call to hashCode.

Uploaded by

NikolaMilosevic

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views

The Curious Case of Java String HashCode

Uploaded by

NikolaMilosevic

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Programming Become a member Sign in Get started

You have 2 free stories left this month. Sign up and get an extra one for free.

The curious case of Java String

HashCode
Animesh Gaitonde Follow
Oct 11, 2019 · 5 min read

Introduction
I have been programming in Java since the past 1.5 years. Recently, I was
experimenting with performance analysis of Java Data structures. To get
my hands dirty, I decided to play around with my favourite data structure
i.e HashSet. HashSet provides an O(1) lookup and inserts time. I
measured & compared the time taken to lookup random strings having
different sizes in a HashSet.

Following is the code snippet that was written by me:-

1 package performance;
2
3 import com.google.common.base.Stopwatch;
4
5 import java.util.*;
6
7 public class HashCodePerformance {
8 public static void main(String[] args) {
9 Set<String> stringHashSet = new HashSet<>();
10 stringHashSet.add("London");
11 stringHashSet.add("Mumbai");
12 stringHashSet.add("NewYork");
13 List<String> stringsToSearch = Arrays.asList("f5a5a608", "48abre7a6 i8a5r507",
14 "7e50bc488 pl43fvf1p 65", "e843r6f1p vfvdfv vdvdg vgbgd ", "38aeaf9a6");
15 for (String string : stringsToSearch) {
16 Stopwatch timer = Stopwatch.createStarted();
17 for (int index=0; index < 10000000; ++index) {
18 stringHashSet.contains(string);
19 }
20 System.out.println("Search String \"" + string + "\" time taken " + timer.st
21 }
22 }
23 }

JavaPerformance.java hosted with ❤ by GitHub view raw

Let’s have a look at the performance of the lookup done for random
strings:-

Performance of string lookup

We come across an interesting observation. The first & the last string
lookup (highlighted in red) takes almost 3x to 4x time than the middle
three strings. Even though the length of the middle three strings is more,
still the lookup is far more efficient. This implies HashSet lookup is
independent of the string’s length.

To understand this unusual behaviour of HashSet, let’s get back to the

basics and understand the fundamentals.

HashSet- Internal Working

Java HashSet internally uses an array of lists to perform an O(1)
insertion, lookup and deletion. HashSet first calculates the hash of the
object to determine the array index where the object will be stored.
Later, the object is stored at the calculated index. The same principle is
applied during lookup & deletion.

How HashSet/HashMap stores objects

Accessing an array element is O(1) operation, so the only overhead is

calculating the hash of the object. Hence, the hash function needs to be
optimal to avoid any performance impact.

Furthermore, the output of the Hash function should have a uniform

distribution. In case of collisions, the length of lists at a given index will
keep growing and worst-case complexity will become O(n).

HashSet/HashMap collisions as a result of non-uniform hashing

HashCode Design
In Java, every object has a hashCode() function. HashSet invokes this
function to determine the object index. Let’s revisit the example where
were analysing the performance of string lookup and see the value of
hashCodes for the random strings.

HashCode of random strings in the example

We can see that the outlier strings have hashCode as 0. Now, its time to
dig into some code & glance at the implementation.

In the older versions of JDK 1.0+ and 1.1+, hashCode function for
strings sampled every nth character. The downside of this approach was
many strings mapped to the same hash and resulted in collisions.

In Java 1.2, the below algorithm is used. This is a bit slow but helps in
avoiding collisions.

Calculation of String’s HashCode

1 public int hashCode() {

2 int h = hash;
3 if (h == 0 && value.length > 0) {
4 char val[] = value;
5
6 for (int i = 0; i < value.length; i++) {
7 h = 31 * h + val[i];
8 }
9 hash = h;
10 }
11 return h;
12 }

StringHashCode.java hosted with ❤ by GitHub view raw

It can be seen from the above code that when hashCode is called for the
first time, the default value of the variable hash will be 0 and line 3–9 will
be executed. Subsequent calls to hashCode() will not execute line 3–9 if
the hash is non-zero.

It can be inferred that the hashCode() function uses a caching approach

where the hash is calculated only the first time its called and later calls
will get the same calculated value.

If the hash of a string is 0, then the hash computation will be done every
time the function is called. Now, it is clear why the lookup of a few strings
takes more time than others.

Overcoming the performance penalty

The above HashCode computation performs poorly for strings that hash
to 0. How can we optimize this?

Any CS101 student would recommend using a boolean flag that will be
set after the first computation and would skip computations in the
subsequent calls.

1 public int hashCode() {

2 int h = hash;
3 if (!computed && value.length > 0) {
4 char val[] = value;
5
6 for (int i = 0; i < value.length; i++) {
7 h = 31 * h + val[i];
8 }
9 hash = h;
10 computed = true;
11 }
12 return h;
13 }

HashCodeOptimized.java hosted with ❤ by GitHub view raw

You might now be wondering why didn’t Java developers think of this
optimization in the first place or why wasn’t this patched in the later
releases of Java?

Why not fix HashCode?

As per the implementation, the following is the formula of the hash of
any String having ’n’ characters.

hash = s[0]31^(n-1) + s[1]31^(n-2) + ... + s[n-1]

here s[n] is the nth character in the string

This hash function provides uniform distribution of hash across the

range of integers. It implies that the probability of a string hashing to 0 is
1 in ²³² strings.

We can think of the below cases where the string can hash to zero:-

1. String only contains 0 (Unicode character)

2. Empty String

3. Hash code is 0 as a result of integer overflow

In real-world applications, the probability of getting such kind of strings

is minuscule.

Fixing the hashCode implementation is trivial & Java developers might

have thought about it. However, there is no added advantage in fixing it.

Currently, only strings that hash to 0 are impacted. Let’s say we fix the
hashCode by adding a boolean. Overall, we won’t see any huge impact on
the performance of a real-world system. It might result in 0.000010 %
improvement in speed. It’s analogous to saying we optimized a task that
could be finished in 1hr to 59 min 59 sec 7 milliseconds.

So, we are better off not making any change in the hashCode() by
introducing a new variable as there is no significant speed gain.

English Strings that hash to 0

I took a list of 20k English Dictionary words and tried combining the
words to check if they hash to zero. When I considered single meaningful
English words, none of them hashed to zero. Combination of two or
more words resulted in a zero hash.

Few are some examples of sentences (meaningful words)that hash to

zero:-

carcinomas motorists high

zipped daydreams thunderflashes

where inattentive agronomy

drumwood boulderhead

HashSet or HashMaps showcase the best behaviour in getting, setting or

deleting the value. However, there are cases where the performance can
get impacted due to unusual behaviour.

References:-

https://stackoverflow.com/questions/2310498/why-doesnt-strings-
hashcode-cache-0

https://www.geeksforgeeks.org/equals-hashcode-methods-java/

Data Structures Programming Java Algorithms Computer Science

280 claps 2 responses

WRIT T EN BY
Animesh Gaitonde Follow

Software Engineer. Interested in Distributed Computing,

Data Science, Cryptocurrencies & Stock Markets.

More From Medium

Creating the T-Rex Updating Your Unity Four shortcuts for Handling Errors With
game with Flutter and Project to Watson SDK nding bugs in a large or Swift Result Type
Flame for Unity 3.1.0 (and Core unfamiliar codebase Navdeep Singh in Better
Renan C. Araujo SDK 0.2.0) Adam Dudley in T he Startup Programming
Amara Graham

SQL for Data Scientists, First steps to SQL! Guix Packaging by Angular Flex-Layout:
in 6 Minutes or Less tugce satir in Analytics Vidhya Example Flexbox and Grid Layout
Andre Ye in Analytics Vidhya Jethro Cao in T he Startup for Angular Component
Suguru Inatomi in Angular In
Depth

Discover Medium Make Medium yours Become a member

Welcome to a place where words matter. On Medium, Follow all the topics you care about, and we’ll deliver the Get unlimited access to the best stories on Medium — and
smart voices and original ideas take center stage - with no best stories for you to your homepage and inbox. Explore support writers while you’re at it. Just $5/month. Upgrade
ads in sight. Watch

About Help Legal

Hassing Dsa
No ratings yet
Hassing Dsa
28 pages
Lecture 4 Hashtable and HashMap
No ratings yet
Lecture 4 Hashtable and HashMap
62 pages
Ken's Java Notes: Jar - Manifest (Sample File)
No ratings yet
Ken's Java Notes: Jar - Manifest (Sample File)
16 pages
Java Code
No ratings yet
Java Code
7 pages
51 Stringsorts
No ratings yet
51 Stringsorts
69 pages
java sol 1,4
No ratings yet
java sol 1,4
6 pages
Java Syntax Notes
No ratings yet
Java Syntax Notes
27 pages
corejavaInterviewQuestions1
No ratings yet
corejavaInterviewQuestions1
14 pages
Intro To Hashing
No ratings yet
Intro To Hashing
10 pages
Implementing Hashing in Java
No ratings yet
Implementing Hashing in Java
23 pages
Collections
No ratings yet
Collections
21 pages
10 HashMap Etc
No ratings yet
10 HashMap Etc
24 pages
Java - Lang: 1. The Basic Java Application
No ratings yet
Java - Lang: 1. The Basic Java Application
9 pages
Unit 4 and Unit 5 First Part
No ratings yet
Unit 4 and Unit 5 First Part
17 pages
Hash Table Time Costs - Hash Functions - The Map Interface and Implementations
No ratings yet
Hash Table Time Costs - Hash Functions - The Map Interface and Implementations
25 pages
HashMaps and HashSets Upgrad
No ratings yet
HashMaps and HashSets Upgrad
19 pages
String Sorts (Java)
No ratings yet
String Sorts (Java)
71 pages
Hashing: Hash Functions Collision Resolution Applications
No ratings yet
Hashing: Hash Functions Collision Resolution Applications
50 pages
Sets and Maps: Part of The Collections Framework
No ratings yet
Sets and Maps: Part of The Collections Framework
23 pages
Hash Tables With Chaining
No ratings yet
Hash Tables With Chaining
5 pages
Practicing at The Cutting Edge: Learning and Unlearning About Performance
No ratings yet
Practicing at The Cutting Edge: Learning and Unlearning About Performance
104 pages
FALLSEM2024-25_TCSE201E_ETH_VL2024250105691_2024-09-09_Reference-Material-I (1)
No ratings yet
FALLSEM2024-25_TCSE201E_ETH_VL2024250105691_2024-09-09_Reference-Material-I (1)
21 pages
34 Hash Tables
No ratings yet
34 Hash Tables
44 pages
06_ HashMap & HashSet and how do they internally work_ What is a hashing function_ _ 800+ Big Data & Java Interview FAQs
No ratings yet
06_ HashMap & HashSet and how do they internally work_ What is a hashing function_ _ 800+ Big Data & Java Interview FAQs
7 pages
Hash PDF
No ratings yet
Hash PDF
7 pages
Java II
No ratings yet
Java II
68 pages
Java Collections Framework
No ratings yet
Java Collections Framework
11 pages
22CS302_LM21
No ratings yet
22CS302_LM21
7 pages
Class 30: Active Learning: Hashing
No ratings yet
Class 30: Active Learning: Hashing
24 pages
Hashing Reading
No ratings yet
Hashing Reading
10 pages
Overview of Java Arraylist, Hashtable, Hashmap, Hashet, Linkedlist
No ratings yet
Overview of Java Arraylist, Hashtable, Hashmap, Hashet, Linkedlist
11 pages
L21 Sets and Maps
No ratings yet
L21 Sets and Maps
48 pages
Unit 3 Hashing
No ratings yet
Unit 3 Hashing
23 pages
DSA Lab Manual
100% (1)
DSA Lab Manual
65 pages
DSAL Lab Manual
No ratings yet
DSAL Lab Manual
61 pages
Hashing
No ratings yet
Hashing
14 pages
Hash Tables
No ratings yet
Hash Tables
33 pages
Sets in Java - notes
No ratings yet
Sets in Java - notes
7 pages
Hashing Powerpoint
No ratings yet
Hashing Powerpoint
58 pages
10605mc-1_decrypted
No ratings yet
10605mc-1_decrypted
3 pages
Java Strings
No ratings yet
Java Strings
7 pages
210 Maps PDF
No ratings yet
210 Maps PDF
39 pages
Hashing Problem Set Solutions
No ratings yet
Hashing Problem Set Solutions
3 pages
Concepts to Know Java
No ratings yet
Concepts to Know Java
37 pages
HW Algo 3
No ratings yet
HW Algo 3
16 pages
Exercises Syntax Utilities 2
No ratings yet
Exercises Syntax Utilities 2
2 pages
final-dsal-lab-manual-2023-24-sem-ii
No ratings yet
final-dsal-lab-manual-2023-24-sem-ii
39 pages
Data Structures in Java
No ratings yet
Data Structures in Java
20 pages
14-HashTable
No ratings yet
14-HashTable
38 pages
String Class Methods
No ratings yet
String Class Methods
35 pages
Java - CollectionFramework - Examples A
No ratings yet
Java - CollectionFramework - Examples A
11 pages
File 10
No ratings yet
File 10
12 pages
CS.13.SymbolTables
No ratings yet
CS.13.SymbolTables
52 pages
Questions
No ratings yet
Questions
15 pages
Hashing:: Michael Levin
No ratings yet
Hashing:: Michael Levin
146 pages
100 Recipes for Programming Java
From Everand
100 Recipes for Programming Java
Jamie Munro
4.5/5 (2)
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Ruby Gems Mastery: 100 Essential Packages for 2024
From Everand
Ruby Gems Mastery: 100 Essential Packages for 2024
Kanto
No ratings yet
Ian Talks JS A-Z: WebDevAtoZ, #1
From Everand
Ian Talks JS A-Z: WebDevAtoZ, #1
Ian Eress
No ratings yet
Java Comparator
No ratings yet
Java Comparator
1 page
Asynchronous Programming in Java
No ratings yet
Asynchronous Programming in Java
1 page
Java Abstract Class What Is It Good For
No ratings yet
Java Abstract Class What Is It Good For
1 page
Anonymous Classes in Java
No ratings yet
Anonymous Classes in Java
1 page
Why You Need Null Objects
No ratings yet
Why You Need Null Objects
1 page
A Study List For Java Developers
No ratings yet
A Study List For Java Developers
1 page
HTML5 Security Cheat Sheet
No ratings yet
HTML5 Security Cheat Sheet
7 pages
Introducing Java 8
No ratings yet
Introducing Java 8
35 pages
Index UNIT - II: Regular Expressions No: Khit, Cse
No ratings yet
Index UNIT - II: Regular Expressions No: Khit, Cse
11 pages
Get Algorithms and Theory of Computation Handbook Second Edition Volume 1 General Concepts and Techniques Chapman Hall CRC Applied Algorithms and Data Structures series Mikhail J. Atallah free all chapters
100% (3)
Get Algorithms and Theory of Computation Handbook Second Edition Volume 1 General Concepts and Techniques Chapman Hall CRC Applied Algorithms and Data Structures series Mikhail J. Atallah free all chapters
77 pages
Practice Set: Operation Research (BMA342)
No ratings yet
Practice Set: Operation Research (BMA342)
10 pages
Chapter 9
No ratings yet
Chapter 9
5 pages
Operation Research Sir Haidar Ali PDF
No ratings yet
Operation Research Sir Haidar Ali PDF
70 pages
Array Operation and Sorting
No ratings yet
Array Operation and Sorting
10 pages
544 39 Solutions-Instructor-Manualchapter 3
No ratings yet
544 39 Solutions-Instructor-Manualchapter 3
20 pages
Lesson 3: Mind Reader Game - Improving The Algorithm: Teacher-Student Activities
No ratings yet
Lesson 3: Mind Reader Game - Improving The Algorithm: Teacher-Student Activities
13 pages
The First Question (0 Points) : CSE 373, Spring 2012 Midterm Solutions
No ratings yet
The First Question (0 Points) : CSE 373, Spring 2012 Midterm Solutions
8 pages
COS3701 2024 Oct-Nov Examination
No ratings yet
COS3701 2024 Oct-Nov Examination
4 pages
Lec 10 Gauss Elimination1
No ratings yet
Lec 10 Gauss Elimination1
27 pages
Turing Machines: Costas Busch - LSU 1
No ratings yet
Turing Machines: Costas Busch - LSU 1
66 pages
Power Flow Solution by Newton-OK
No ratings yet
Power Flow Solution by Newton-OK
11 pages
Cot 6410 Notes Spring 2014
No ratings yet
Cot 6410 Notes Spring 2014
550 pages
Advance Computer Architecture
No ratings yet
Advance Computer Architecture
16 pages
Algorithm Analysis and Design: Aad Cse Srm-Ap 1
No ratings yet
Algorithm Analysis and Design: Aad Cse Srm-Ap 1
41 pages
Linear Programming Applications: Assignment Problem
No ratings yet
Linear Programming Applications: Assignment Problem
27 pages
Design and Analysis of Algorithms: Mordecai Golin
No ratings yet
Design and Analysis of Algorithms: Mordecai Golin
22 pages
1 S and 2 S Complement
No ratings yet
1 S and 2 S Complement
1 page
Basic TOC Regular - Sloution
No ratings yet
Basic TOC Regular - Sloution
211 pages
Fixed and Floating Point Numbers: Dr. Ashish GUPTA Sense, Vit-Ap Ashish - Gupta@vitap - Ac.in
No ratings yet
Fixed and Floating Point Numbers: Dr. Ashish GUPTA Sense, Vit-Ap Ashish - Gupta@vitap - Ac.in
34 pages
Master Theorem
100% (1)
Master Theorem
21 pages
Longest Job First Algorithm
No ratings yet
Longest Job First Algorithm
6 pages
DAA (5th) May2019
No ratings yet
DAA (5th) May2019
2 pages
SAT Driven Prime Factorization
No ratings yet
SAT Driven Prime Factorization
0 pages
Linearprogramming Problems With Solution
No ratings yet
Linearprogramming Problems With Solution
8 pages
Nonparametric Density Estimation Nearest Neighbors, KNN
No ratings yet
Nonparametric Density Estimation Nearest Neighbors, KNN
31 pages
ADA Unit II GCR
No ratings yet
ADA Unit II GCR
58 pages
Final
No ratings yet
Final
2 pages
Ec 8553
No ratings yet
Ec 8553
1 page

The Curious Case of Java String HashCode

Uploaded by

The Curious Case of Java String HashCode

Uploaded by

Programming Become a member Sign in Get started

The curious case of Java String

Following is the code snippet that was written by me:-

JavaPerformance.java hosted with ❤ by GitHub view raw

Performance of string lookup

To understand this unusual behaviour of HashSet, let’s get back to the

HashSet- Internal Working

How HashSet/HashMap stores objects

Accessing an array element is O(1) operation, so the only overhead is

Furthermore, the output of the Hash function should have a uniform

HashSet/HashMap collisions as a result of non-uniform hashing

HashCode of random strings in the example

Calculation of String’s HashCode

1 public int hashCode() {

StringHashCode.java hosted with ❤ by GitHub view raw

It can be inferred that the hashCode() function uses a caching approach

Overcoming the performance penalty

1 public int hashCode() {

HashCodeOptimized.java hosted with ❤ by GitHub view raw

Why not fix HashCode?

hash = s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

here s[n] is the nth character in the string

This hash function provides uniform distribution of hash across the

1. String only contains 0 (Unicode character)

3. Hash code is 0 as a result of integer overflow

In real-world applications, the probability of getting such kind of strings

Fixing the hashCode implementation is trivial & Java developers might

English Strings that hash to 0

Few are some examples of sentences (meaningful words)that hash to

carcinomas motorists high

zipped daydreams thunderflashes

where inattentive agronomy

HashSet or HashMaps showcase the best behaviour in getting, setting or

Data Structures Programming Java Algorithms Computer Science

280 claps 2 responses

Software Engineer. Interested in Distributed Computing,

More From Medium

Discover Medium Make Medium yours Become a member

About Help Legal

You might also like

hash = s[0]31^(n-1) + s[1]31^(n-2) + ... + s[n-1]