Why Overriding hashCode() and Equal() method contract?
EveryJava
object
has two very important methods i.e. hashCode()
and equals()
method. These methods are designed to be overridden according to their specific general contract. This article describes why and how to override the hashCode()
method that preserves the contract of HashCode. Contract For HashCode Method
The contract for hashCode says“If two objects are equal, then calling hashCode() on both objects must return the same value”.Now the question that will come into your mind is that; is it necessary that the above statement should always be true?
Consider the fact that we have provided a correct implementation of an
equal
method for our class
, then what would happen if we do not obey the above contract. To answer the above question, let us consider the two situations,
Objects
that are equal but return differenthashCodes
Objects
that are not equal but return the samehashCode
What would happen if the two objects are
equal
but return different hashCodes
? Your code would run perfectly fine. You will never come in trouble unless and until you have not stored your object in a collection like HashSet
or HashMap
. But when you do that, you might get strange problems at runtime. To understand this better, you have to first understand how collection classes such as
HashMap
and HashSet
work. These collections classes depend on the fact that the objects that you put as a key in them must obey the above contract. You will get strange and unpredictable results at runtime if you do not obey the contract and try to store them in a collection.Consider an example of
HashMap
. When you store the values in HashMap
, the values are actually stored in a set of buckets. Each of those buckets has been assigned a number which is use to identify it. When you put a value in the HashMap
, it stores the data in one of those buckets. Which bucket is used depends on the hashCode
that will return by your object. Let’s say, if the hashCode()
method returns 49 for an object, then it gets stored in the bucket 49 of the HashMap
.Later when you try to check whether that collection contains an element or not by invoking the
Contains(element)
method, the HashMap
first gets the hashCode
of that “element “. Afterwards, it will look into the bucket that corresponds with the hashCode
. If the bucket is empty, then it means we are done and it's return false which means the HashMap
does not contain the element.If there are one or more objects in the bucket, then it will compare “element” with all other elements in that bucket using your defined
equal()
function.Objects that are not equal but return the same hashCode
The
hashCode
contract does not say anything about the above statement. Therefore different objects might return the same hashCode
value, but collections like HashMap
will work inefficiently if different objects return the same hashCode
value.Why Buckets
The reason why bucket mechanism is used is its efficiency. You can imagine that if all the objects you put in theHashMap
would be stored into one big list, then you have to compare your input with all the objects in the list when you want to check if a particular element is in the Map
. With the use of buckets, you will now compare only the elements of the specific bucket and any bucket usually holds only a small portion of all the elements in the HashMap
.Overriding hashCode Method
How to calculate hashCode?
Writing a goodhashCode()
method is always a tricky task for a new class.Return Fixed Value
You can implement your
hashCode()
method that always returns fix value, for example like this://bad performance @Override public int hashCode() { return 1; }The above method satisfies all the requirements and is considered legal according to the
hash code
contract but it would not be very efficient. If this method is used, all objects will be stored in the same bucket i.e. bucket 1 and when you try to ensure whether the specific object is present in the collection, then it will always have to check the entire content of the collection. On the other hand, if you override the
hashCode()
method for your class and if the method breaks the contract then calling contains()
method may return false for the element which is present in the Collection
but in a different bucket.Method From Effective Java
Joshua Bloch in Effective Java provides good guidelines for generating a
hashCode()
value- 1. Store some constant nonzero value; say 17, in an int variable called result.
- 2. For each significant field f in your object (each field taken into account by the
equals()
), do the following - a. Compute an int
hashCode
c for the field: - i. If the field is a
boolean
, compute c = (f ? 1 : 0). - ii. If the field is a
byte
,char
,short
, orint
, compute c = (int) f. - iii. If the field is a
long
, compute c = (int) (f ^ (f >>> 32)). - iv. If the field is a
float
, compute c = Float.floatToIntBits(f). - v. If the field is a
double
,compute long l = Double.doubleToLongBits(f),
c = (int)(l ^ (l >>> 32)) - vi. If the field is an object reference then
equals( )
callsequals( )
for this field. compute
c = f.hashCode() - vii. If the field is an
array
, treat it as if each element were a separate field.
That is, compute ahashCode
for each significant element by applying above rules to each
element - b. Combine the
hashCode
c computed in step 2.a into result as follows:result = 37 * result + c; - 3. Return result.
- 4. Look at the resulting
hashCode()
and make sure thatequal
instances have equal hash codes.
public class HashTest { private String field1; private short field2; ---- @Override public int hashCode() { int result = 17; result = 37*result + field1.hashCode(); result = 37*result + (int)field2; return result; } }You can see that a constant 37 is chosen. The purpose of choosing a prime number is that it is a prime number. We can choose any other prime number. Using prime number the objects will be distributed better over the buckets. I encourage the user to explore the topic further by checking out other resources.
Using java.util.Objects.hash
java.util.Objects
class contains a utility method hash(Object... values)
that can be used to calculate hash for sequence of objects. With this method, we can implement hashcode for our example HashTest class as follows:
public class HashTest { private String field1; private short field2; ---- @Override public int hashCode() { return java.util.Objects.hash(field1, field2); } }Apache HashCodeBuilder
Writing a good
hashCode()
method is not always easy. Since it can be difficult to implement hashCode()
correctly, it would be helpful if we have some reusable implementations of these. The Jakarta-Commons
org.apache.commons.lang.builder
package is providing a class named HashCodeBuilder
which is designed to help implement a hashCode()
method. Usually, developers struggle hard with implementing a hashCode()
method and this class aims to simplify the process.Here is how you would implement
hashCode
algorithm for our above classpublic class HashTest { private String field1; private short field2; ---- @Override public int hashCode() { return new HashCodeBuilder(83, 7) .append(field1) .append(field2) .toHashCode(); } }Note that the two numbers for the constructor are simply two different, non-zero, odd numbers - these numbers help to avoid collisions in the
hashCode
value across objects.If required, the superclass
hashCode()
can be added using appendSuper(int)
.You can see how easy it is to override
HashCode()
using Apache HashCodeBuilder
.Mutable Object As Collection Key
It is a general advice that you should useimmutable object
as a key in a Collection. HashCode
work best when calculated from immutable data. If you use Mutable object
as key and change the state of the object so that the hashCode
changes, then the store object will be in the wrong bucket in the Collection The most important thing you should consider while implementing
hashCode()
is that regardless of when this method is called, it should produce the same value for a particular object every time when it is called. If you have a scenario like an object produces one hashCode()
value when it is put()
into a HaspMap
and produces another value during a get()
, in that case, you would not be able to retrieve that object. Therefore, if you hashCode()
depends on mutable data in the object, then made changing those data will surely produce a different key by generating a different hashCode()
.Look at the example below
public class Employee { private String name; private int age; public Employee() { } public Employee(String name, int age) { this.name = name; this.age = age; } public String getName() { return name; } public void setName(String name) { this.name = name; } public int getAge() { return age; } public void setAge(int age) { this.age = age; } @Override public boolean equals(Object obj) { //Remember: Some Java gurus recommend you avoid using instanceof if (obj instanceof Employee) { Employee emp = (Employee)obj; return (emp.name == name && emp.age == age); } return false; } @Override public int hashCode() { return name.length() + age; } public static void main(String[] args) { Employee e = new Employee("muhammad", 24); Map<Object,Object> m = new HashMap<Object,Object>(); m.put(e, "Muhammad Ali Khojaye"); // getting output System.out.println(m.get(e)); e.name = "abid"; // it fails to get System.out.println(m.get(e)); e.name = "amirrana"; // it fails again System.out.println(m.get(new Employee("muhammad", 24))); }So we can see in the above examples that how are we getting some unpredictable results after modifying the object state.
Another Example of Mutable Field as Key
Let consider an another example below:public class HashTest { private int mutableField; private final int immutableField; public HashTest(int mutableField, int immutableField) { this.mutableField = mutableField; this.immutableField = immutableField; } public void setMutableField(int mutableField) { this.mutableField = mutableField; } @Override public boolean equals(Object o) { if(o instanceof HashTest) { return (mutableField == ((HashTest)o).mutableField) && (immutableField == ((HashTest)o).immutableField); }else { return false; } } @Override public int hashCode() { int result = 17; result = 37 * result + this.mutableField; result = 37 * result + this.immutableField; return result; } public static void main(String[] args) { Set<HashTest> set = new HashSet<HashTest>(); HashTest obj = new HashTest(6622458, 626304); set.add(obj); System.out.println(set.contains(obj)); obj.setMutableField(3867602); System.out.println(set.contains(obj)); } }After changing mutableField, the computed
hashCode
value is no longer pointing to the old bucket and the contains()
returns false.We can tackle such situation using either of these methods
Hashcode
is best when calculated fromimmutable data
; therefore ensure that onlyimmutable
object would be used as key with Collections.- If you need
mutable
fields included in thehashCode
method then you need to ensure that object state is not changing after they've been used asKey
in a hash-based collection. If for any reason it changed, you can calculate and store thehash value
when the object updatesmutable
field. To do this, you must first remove it from thecollection(set/map)
and then add it back to thecollection
after updating it.
Memory leaks with HashCode and Equal
It is possible that a memory leak can occur in theJava
application if equals()
and hashcode()
are not implemented. Consider a small code example below in which HashMap
keeping references active if equals()
and hashcode()
are not implemented. As a results the HashMap
grows continuously by adding the same key repeatedly and finally throw an OutOfMemoryError
/** * Example demonstrating a Hashcode leak. */ public class HashcodeLeakExample { private String id; public HashcodeLeakExample(String id) { this.id = id; } public static void main(String args[]) { try { Map<HashcodeLeakExample, String> map = new HashMap<HashcodeLeakExample, String>(); while (true) { map.put(new HashcodeLeakExample("id"), "any value"); } } catch (Exception ex) { ex.printStackTrace(); } } }
Nice and Thanks for posting this. This piece of tutorial really helpful to know what happens when we change the mutable object that is in the collection.
ReplyDeleteVery clear description! Great!
ReplyDeletereally great article, but it will be really great to continue it with JPA and entities in mind :)
ReplyDeleteThanks,great article.On first reading,I really understand this subject.
ReplyDeleteI'm bookmarking this article for future reference.
I am so glad, this small tutorial work for you guyz. Thanks for the comments.
ReplyDeleteCheers!!
Please don't implement String equality with ==. Use equals(). Not all Strings are interned, only literals.
ReplyDeleteIllustration:
String one = "test";
String two = new String("test");
String three = new String("test").intern();
System.out.println("equal? " + (one == two));
System.out.println("equal? " + (one == three));
@jmagiera
ReplyDeleteYes, I agree. Equal() should be used instead.
Cheers!
Hi Khojaye, Thanks for posting really this is fruitful for me..i m frm india..thanks bro :)
ReplyDeleteThank you so much. I'm glad you like it :)
ReplyDeleteObjects that are equal but return different hashCodes: For HashSet, this would mean duplicates across the set, however, I am not able to figure out how this will affect HashMap. Can you post an example?
ReplyDeleteConsider below sample example where we do not override hashcode(), and therefore, the object fail to get.
ReplyDeletepublic class Test {
public static void main(String[] args) {
Map employees
= new HashMap();
employees.put(new
Employee("Khojaye", 25), "Khojaye");
String employeeName = employees.get(new
Employee("Khojaye", 25));
System.out.println(employeeName);
// this will print null
System.out.println(new
Employee("Khojaye", 25).equals(new
Employee("Khojaye", 25)));
// this is true because of the
// overriden equals()
}
}
class Employee {
private String name;
private int age;
public Employee(String name, int age) {
this.name = name;
this.age = age;
}
public String getName() {
return name;
}
public int getAge() {
return age;
}
@Override
public boolean equals(Object obj) {
if (obj == this) {
return true;
}
if (!(obj instanceof Employee))
{
return false;
}
Employee person = (Employee)
obj;
return person.getName()
.equals(this.name)
&&
person.getAge() ==
this.age;
}
// didn't override hashCode!!!
}
Please let me know, If there is any thing still unclear.
Nice explanation, I used your article to convince my manager who isn't so much technical about overriding hashcode in all the POJOs we use.
ReplyDeleteI am glad to know that it was helpful for you
ReplyDeleteWhich data structures is been used to store the hashcode in the buckets ?Any idea
ReplyDeleteMy perspective is linked list. but any suggestions from your end
The current implementation is just using the custom inner class named Entry where hashcode saved. Once retrieve the hashcode (and collision occur), the linkedin of object maintain.
ReplyDeleteHow do we know that we have written best implementation of hashcode method for a collection?
ReplyDeleteGenerally, its not easy to answer because it depends on the scenario for which you are implementing the hashCode() method and also on what equality means for objects of that class. A good hashCode implementation returns unique hash codes for different objects, however, it is impossible to guarantee that hashcodes will be unique for each object.
ReplyDeleteA good general guideline-purpose guideline has been explained in heading "Overriding hashCode Method" heading.
I hope it helps.
Hi,
ReplyDeleteThis is one of best tutorial I read about hashcode. I always confuse why should we override hashcode.
I also want to add that we use prime number because It helps elements in the collection to be well distributed among the buckets.
The reason is that it uses the formula hashCode() % numberOfBuckets to compute the bucket where the object will store and prime number is not a multiple of the number of buckets (which is a power of 2) therefore, hashCode() % numberOfBuckets will not repeat quickly to the same number bucket. In short, prime number will helps the objects to be distributed better over the buckets.
Hi Can you just point out how exactly..Does this work in Hash Map, existing implementation ************
ReplyDeleteIf there are one or more objects in the bucket, then it will compare “element” with all other elements in that bucket using your defined equal() function *************
So in oneline if i wish to say about hash,
Every ideal hashkey for an object has to be unique ,considering all the instance variables of the object.
Also can you point where are the contracts given like immutable objects etc..in the javadoc
Thanks, nice explanation
ReplyDeleteThanks, Cheers
DeleteThis comment has been removed by a blog administrator.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteIn "Mutable Object As Key" the solution of using Joshua Recipe or HashCodeBuilder won't solve the problem of not finding elements where hashCode uses mutable fields. It seems that it is offered as solution in the article but in fact it is not as tha hashCode will change after a change of name in any of the mentioned solutions as they all depend on the value of name...
ReplyDeleteThank you for the comment. I have updated that section to make it more clear.
DeleteYou are on the right spot that Joshua Recipe or HashCodeBuilder cannot solve the problem for scenario 'Mutable field as a Key'. General advice is that Key should be immutable since HashMap (or HashSet) like Collection assume that an object’s hash value does not modify while it is in use as a key in the collection.
If for any reason your key is not immutable, you need to either ensure not changing it after it has been used in a hash-based collection or you can follow the advice mentioned under the heading 'Another Example of Mutable Field as Key'.
Cheers,
Yes, well done, it is much clearer now. As an alternative for generating hashCodes also consider java.util.Objects#hash since java 1.7.
DeleteHello Author, The tutorial is very good and I understand now what is the use of hashcode and relation with equal. I have one question. Is it always necessary that i should use same fields in hashcode and equal function? I run an example and if I use different values in hashcode function, it will still work. can you answer this?
ReplyDeleteWe must use all the fields in equals() methods that has been used in hashcode() method calculation. Violating this can result with different hashcode for two equal objects. However, the reverse is not mandatory and we can use the subset of fields in hashcode() method calculation (say, for improving the performance of hashcode).
ReplyDeletewhat a clarity ...loved the way you explained !!
ReplyDelete