Why Overriding hashCode() and Equal() method contract?
Every
Java
object
has two very important methods i.e.
hashCode()
and
equals()
method. These methods are designed to be overridden according to their specific general contract. This article describes why and how to override the
hashCode()
method that preserves the contract of HashCode.
The contract for hashCode says
“If two objects are equal, then calling hashCode() on both objects must return the same value”.
Now the question that will come into your mind is that; is it necessary that the above statement should always be true?
Consider the fact that we have provided a correct implementation of an
equal
method for our
class
, then what would happen if we do not obey the above contract.
To answer the above question, let us consider the two situations,
Objects
that are equal but return different hashCodes
Objects
that are not equal but return the same hashCode
Objects that are equal but return different hashCodes
What would happen if the two objects are
equal
but return different
hashCodes
? Your code would run perfectly fine. You will never come in trouble unless and until you have not stored your object in a collection like
HashSet
or
HashMap
. But when you do that, you might get strange problems at runtime.
To understand this better, you have to first understand how collection classes such as
HashMap
and
HashSet
work. These collections classes depend on the fact that the objects that you put as a key in them must obey the above contract. You will get strange and unpredictable results at runtime if you do not obey the contract and try to store them in a collection.
Consider an example of
HashMap
. When you store the values in
HashMap
, the values are actually stored in a set of
buckets. Each of those buckets has been assigned a number which is use to identify it. When you put a value in the
HashMap
, it stores the data in one of those buckets. Which bucket is used depends on the
hashCode
that will return by your object. Let’s say, if the
hashCode()
method returns
49 for an object, then it gets stored in the
bucket 49 of the
HashMap
.
Later when you try to check whether that collection contains an element or not by invoking the
Contains(element)
method, the
HashMap
first gets the
hashCode
of that “element “. Afterwards, it will look into the bucket that corresponds with the
hashCode
. If the bucket is empty, then it means we are done and it's return false which means the
HashMap
does not contain the element.
If there are one or more objects in the bucket, then it will compare “element” with all other elements in that bucket using your defined
equal()
function.
Objects that are not equal but return the same hashCode
The
hashCode
contract does not say anything about the above statement. Therefore different objects might return the same
hashCode
value, but collections like
HashMap
will work inefficiently if different objects return the same
hashCode
value.
The reason why bucket mechanism is used is its efficiency. You can imagine that if all the objects you put in the
HashMap
would be stored into one big list, then you have to compare your input with all the objects in the list when you want to check if a particular element is in the
Map
. With the use of buckets, you will now compare only the elements of the specific bucket and any bucket usually holds only a small portion of all the elements in the
HashMap
.
How to calculate hashCode?
Writing a good
hashCode()
method is always a tricky task for a new class.
Return Fixed Value
You can implement your
hashCode()
method that always returns fix value, for example like this:
//bad performance
@Override
public int hashCode() {
return 1;
}
The above method satisfies all the requirements and is considered legal according to the
hash code
contract but it would not be very efficient. If this method is used, all objects will be stored in the same bucket i.e. bucket 1 and when you try to ensure whether the specific object is present in the collection, then it will always have to check the entire content of the collection.
On the other hand, if you override the
hashCode()
method for your class and if the method breaks the contract then calling
contains()
method may return false for the element which is present in the
Collection
but in a different bucket.
Method From Effective Java
Joshua Bloch in Effective Java provides good guidelines for generating a
hashCode()
value
- 1. Store some constant nonzero value; say 17, in an int variable called result.
- 2. For each significant field f in your object (each field taken into account by the
equals()
), do the following
- a. Compute an int
hashCode
c for the field:
- i. If the field is a
boolean
, compute c = (f ? 1 : 0).
- ii. If the field is a
byte
, char
, short
, or int
, compute c = (int) f.
- iii. If the field is a
long
, compute c = (int) (f ^ (f >>> 32)).
- iv. If the field is a
float
, compute c = Float.floatToIntBits(f).
- v. If the field is a
double
,compute long l = Double.doubleToLongBits(f),
c = (int)(l ^ (l >>> 32))
- vi. If the field is an object reference then
equals( )
calls equals( )
for this field. compute
c = f.hashCode()
- vii. If the field is an
array
, treat it as if each element were a separate field.
That is, compute a hashCode
for each significant element by applying above rules to each
element
- b. Combine the
hashCode
c computed in step 2.a into result as follows:result = 37 * result + c;
- 3. Return result.
- 4. Look at the resulting
hashCode()
and make sure that equal
instances have equal hash codes.
Here is an example of a class that follows the above guidelines
public class HashTest {
private String field1;
private short field2;
----
@Override
public int hashCode() {
int result = 17;
result = 37*result + field1.hashCode();
result = 37*result + (int)field2;
return result;
}
}
You can see that a constant
37 is chosen. The purpose of choosing a
prime number is that it is a
prime number. We can choose any other
prime number. Using
prime number the objects will be distributed better over the buckets. I encourage the user to explore the topic further by checking out other resources.
Using java.util.Objects.hash
java.util.Objects
class contains a utility method
hash(Object... values)
that can be used to calculate hash for sequence of objects. With this method, we can implement hashcode for our example HashTest class as follows:
public class HashTest {
private String field1;
private short field2;
----
@Override
public int hashCode() {
return java.util.Objects.hash(field1, field2);
}
}
Apache HashCodeBuilder
Writing a good
hashCode()
method is not always easy. Since it can be difficult to implement
hashCode()
correctly, it would be helpful if we have some reusable implementations of these.
The
Jakarta-Commons org.apache.commons.lang.builder
package is providing a class named
HashCodeBuilder
which is designed to help implement a
hashCode()
method. Usually, developers struggle hard with implementing a
hashCode()
method and this class aims to simplify the process.
Here is how you would implement
hashCode
algorithm for our above class
public class HashTest {
private String field1;
private short field2;
----
@Override
public int hashCode() {
return new HashCodeBuilder(83, 7)
.append(field1)
.append(field2)
.toHashCode();
}
}
Note that the two numbers for the constructor are simply two different, non-zero, odd numbers - these numbers help to avoid collisions in the
hashCode
value across objects.
If required, the superclass
hashCode()
can be added using
appendSuper(int)
.
You can see how easy it is to override
HashCode()
using Apache
HashCodeBuilder
.
It is a general advice that you should use
immutable object
as a key in a Collection.
HashCode
work best when calculated from immutable data. If you use
Mutable object
as key and change the state of the object so that the
hashCode
changes, then the store object will be in the wrong bucket in the Collection
The most important thing you should consider while implementing
hashCode()
is that regardless of when this method is called, it should produce the same value for a particular object every time when it is called. If you have a scenario like an object produces one
hashCode()
value when it is
put()
into a
HaspMap
and produces another value during a
get()
, in that case, you would not be able to retrieve that object. Therefore, if you
hashCode()
depends on mutable data in the object, then made changing those data will surely produce a different key by generating a different
hashCode()
.
Look at the example below
public class Employee {
private String name;
private int age;
public Employee() {
}
public Employee(String name, int age) {
this.name = name;
this.age = age;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public int getAge() {
return age;
}
public void setAge(int age) {
this.age = age;
}
@Override
public boolean equals(Object obj) {
//Remember: Some Java gurus recommend you avoid using instanceof
if (obj instanceof Employee) {
Employee emp = (Employee)obj;
return (emp.name == name && emp.age == age);
}
return false;
}
@Override
public int hashCode() {
return name.length() + age;
}
public static void main(String[] args) {
Employee e = new Employee("muhammad", 24);
Map<Object,Object> m = new HashMap<Object,Object>();
m.put(e, "Muhammad Ali Khojaye");
// getting output
System.out.println(m.get(e));
e.name = "abid";
// it fails to get System.out.println(m.get(e));
e.name = "amirrana";
// it fails again
System.out.println(m.get(new Employee("muhammad", 24)));
}
So we can see in the above examples that how are we getting some unpredictable results after modifying the object state.
Let consider an another example below:
public class HashTest {
private int mutableField;
private final int immutableField;
public HashTest(int mutableField, int immutableField) {
this.mutableField = mutableField;
this.immutableField = immutableField;
}
public void setMutableField(int mutableField) {
this.mutableField = mutableField;
}
@Override
public boolean equals(Object o) {
if(o instanceof HashTest) {
return (mutableField == ((HashTest)o).mutableField)
&& (immutableField == ((HashTest)o).immutableField);
}else {
return false;
}
}
@Override
public int hashCode() {
int result = 17;
result = 37 * result + this.mutableField;
result = 37 * result + this.immutableField;
return result;
}
public static void main(String[] args) {
Set<HashTest> set = new HashSet<HashTest>();
HashTest obj = new HashTest(6622458, 626304);
set.add(obj);
System.out.println(set.contains(obj));
obj.setMutableField(3867602);
System.out.println(set.contains(obj));
}
}
After changing mutableField, the computed
hashCode
value is no longer pointing to the old bucket and the
contains()
returns false.
We can tackle such situation using either of these methods
Hashcode
is best when calculated from immutable data
; therefore ensure that only immutable
object would be used as key with Collections.
- If you need
mutable
fields included in the hashCode
method then you need to ensure that object state is not changing after they've been used as Key
in a hash-based collection. If for any reason it changed, you can calculate and store the hash value
when the object updates mutable
field. To do this, you must first remove it from the collection(set/map)
and then add it back to the collection
after updating it.
It is possible that a memory leak can occur in the
Java
application if
equals()
and
hashcode()
are not implemented. Consider a small code example below in which
HashMap
keeping references active if
equals()
and
hashcode()
are not implemented. As a results the
HashMap
grows continuously by adding the same key repeatedly and finally throw an
OutOfMemoryError
/**
* Example demonstrating a Hashcode leak.
*/
public class HashcodeLeakExample {
private String id;
public HashcodeLeakExample(String id) {
this.id = id;
}
public static void main(String args[]) {
try {
Map<HashcodeLeakExample, String> map =
new HashMap<HashcodeLeakExample, String>();
while (true) {
map.put(new HashcodeLeakExample("id"), "any value");
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}