Why Overriding hashCode() and Equal() method contract?
Every
Java object has two very important methods i.e.
hashCode() and
equals() method. These methods are designed to be overridden according to their specific general contract. This article describes why and how to override the
hashCode() method that preserves the contract of HashCode.
The contract for hashCode says
“If two objects are equal, then calling hashCode() on both objects must return the same value”.
Now the question that will come into your mind is that; is it necessary that the above statement should always be true?
Consider the fact that we have provided a correct implementation of an
equal method for our
class, then what would happen if we do not obey the above contract.
To answer the above question, let us consider the two situations,
Objects that are equal but return different hashCodes
Objects that are not equal but return the same hashCode
Objects that are equal but return different hashCodes
What would happen if the two objects are
equal but return different
hashCodes? Your code would run perfectly fine. You will never come in trouble unless and until you have not stored your object in a collection like
HashSet or
HashMap. But when you do that, you might get strange problems at runtime.
To understand this better, you have to first understand how collection classes such as
HashMap and
HashSet work. These collections classes depend on the fact that the objects that you put as a key in them must obey the above contract. You will get strange and unpredictable results at runtime if you do not obey the contract and try to store them in a collection.
Consider an example of
HashMap. When you store the values in
HashMap, the values are actually stored in a set of
buckets. Each of those buckets has been assigned a number which is use to identify it. When you put a value in the
HashMap, it stores the data in one of those buckets. Which bucket is used depends on the
hashCode that will return by your object. Let’s say, if the
hashCode() method returns
49 for an object, then it gets stored in the
bucket 49 of the
HashMap.
Later when you try to check whether that collection contains an element or not by invoking the
Contains(element) method, the
HashMap first gets the
hashCode of that “element “. Afterwards, it will look into the bucket that corresponds with the
hashCode. If the bucket is empty, then it means we are done and it's return false which means the
HashMap does not contain the element.
If there are one or more objects in the bucket, then it will compare “element” with all other elements in that bucket using your defined
equal() function.
Objects that are not equal but return the same hashCode
The
hashCode contract does not say anything about the above statement. Therefore different objects might return the same
hashCode value, but collections like
HashMap will work inefficiently if different objects return the same
hashCode value.
The reason why bucket mechanism is used is its efficiency. You can imagine that if all the objects you put in the
HashMap would be stored into one big list, then you have to compare your input with all the objects in the list when you want to check if a particular element is in the
Map. With the use of buckets, you will now compare only the elements of the specific bucket and any bucket usually holds only a small portion of all the elements in the
HashMap.
How to calculate hashCode?
Writing a good
hashCode() method is always a tricky task for a new class.
Return Fixed Value
You can implement your
hashCode() method that always returns fix value, for example like this:
//bad performance
@Override
public int hashCode() {
return 1;
}
The above method satisfies all the requirements and is considered legal according to the
hash code contract but it would not be very efficient. If this method is used, all objects will be stored in the same bucket i.e. bucket 1 and when you try to ensure whether the specific object is present in the collection, then it will always have to check the entire content of the collection.
On the other hand, if you override the
hashCode() method for your class and if the method breaks the contract then calling
contains() method may return false for the element which is present in the
Collection but in a different bucket.
Method From Effective Java
Joshua Bloch in Effective Java provides good guidelines for generating a
hashCode() value
- 1. Store some constant nonzero value; say 17, in an int variable called result.
- 2. For each significant field f in your object (each field taken into account by the
equals()), do the following
- a. Compute an int
hashCode c for the field:
- i. If the field is a
boolean, compute c = (f ? 1 : 0).
- ii. If the field is a
byte, char, short, or int, compute c = (int) f.
- iii. If the field is a
long, compute c = (int) (f ^ (f >>> 32)).
- iv. If the field is a
float, compute c = Float.floatToIntBits(f).
- v. If the field is a
double,compute long l = Double.doubleToLongBits(f),
c = (int)(l ^ (l >>> 32))
- vi. If the field is an object reference then
equals( ) calls equals( ) for this field. compute
c = f.hashCode()
- vii. If the field is an
array, treat it as if each element were a separate field.
That is, compute a hashCode for each significant element by applying above rules to each
element
- b. Combine the
hashCode c computed in step 2.a into result as follows:result = 37 * result + c;
- 3. Return result.
- 4. Look at the resulting
hashCode() and make sure that equal instances have equal hash codes.
Here is an example of a class that follows the above guidelines
public class HashTest {
private String field1;
private short field2;
----
@Override
public int hashCode() {
int result = 17;
result = 37*result + field1.hashCode();
result = 37*result + (int)field2;
return result;
}
}
You can see that a constant
37 is chosen. The purpose of choosing a
prime number is that it is a
prime number. We can choose any other
prime number. Using
prime number the objects will be distributed better over the buckets. I encourage the user to explore the topic further by checking out other resources.
Using java.util.Objects.hash
java.util.Objects class contains a utility method
hash(Object... values) that can be used to calculate hash for sequence of objects. With this method, we can implement hashcode for our example HashTest class as follows:
public class HashTest {
private String field1;
private short field2;
----
@Override
public int hashCode() {
return java.util.Objects.hash(field1, field2);
}
}
Apache HashCodeBuilder
Writing a good
hashCode() method is not always easy. Since it can be difficult to implement
hashCode() correctly, it would be helpful if we have some reusable implementations of these.
The
Jakarta-Commons org.apache.commons.lang.builder package is providing a class named
HashCodeBuilder which is designed to help implement a
hashCode() method. Usually, developers struggle hard with implementing a
hashCode() method and this class aims to simplify the process.
Here is how you would implement
hashCode algorithm for our above class
public class HashTest {
private String field1;
private short field2;
----
@Override
public int hashCode() {
return new HashCodeBuilder(83, 7)
.append(field1)
.append(field2)
.toHashCode();
}
}
Note that the two numbers for the constructor are simply two different, non-zero, odd numbers - these numbers help to avoid collisions in the
hashCode value across objects.
If required, the superclass
hashCode() can be added using
appendSuper(int).
You can see how easy it is to override
HashCode() using Apache
HashCodeBuilder.
It is a general advice that you should use
immutable object as a key in a Collection.
HashCode work best when calculated from immutable data. If you use
Mutable object as key and change the state of the object so that the
hashCode changes, then the store object will be in the wrong bucket in the Collection
The most important thing you should consider while implementing
hashCode() is that regardless of when this method is called, it should produce the same value for a particular object every time when it is called. If you have a scenario like an object produces one
hashCode() value when it is
put() into a
HaspMap and produces another value during a
get(), in that case, you would not be able to retrieve that object. Therefore, if you
hashCode() depends on mutable data in the object, then made changing those data will surely produce a different key by generating a different
hashCode().
Look at the example below
public class Employee {
private String name;
private int age;
public Employee() {
}
public Employee(String name, int age) {
this.name = name;
this.age = age;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public int getAge() {
return age;
}
public void setAge(int age) {
this.age = age;
}
@Override
public boolean equals(Object obj) {
//Remember: Some Java gurus recommend you avoid using instanceof
if (obj instanceof Employee) {
Employee emp = (Employee)obj;
return (emp.name == name && emp.age == age);
}
return false;
}
@Override
public int hashCode() {
return name.length() + age;
}
public static void main(String[] args) {
Employee e = new Employee("muhammad", 24);
Map<Object,Object> m = new HashMap<Object,Object>();
m.put(e, "Muhammad Ali Khojaye");
// getting output
System.out.println(m.get(e));
e.name = "abid";
// it fails to get System.out.println(m.get(e));
e.name = "amirrana";
// it fails again
System.out.println(m.get(new Employee("muhammad", 24)));
}
So we can see in the above examples that how are we getting some unpredictable results after modifying the object state.
Let consider an another example below:
public class HashTest {
private int mutableField;
private final int immutableField;
public HashTest(int mutableField, int immutableField) {
this.mutableField = mutableField;
this.immutableField = immutableField;
}
public void setMutableField(int mutableField) {
this.mutableField = mutableField;
}
@Override
public boolean equals(Object o) {
if(o instanceof HashTest) {
return (mutableField == ((HashTest)o).mutableField)
&& (immutableField == ((HashTest)o).immutableField);
}else {
return false;
}
}
@Override
public int hashCode() {
int result = 17;
result = 37 * result + this.mutableField;
result = 37 * result + this.immutableField;
return result;
}
public static void main(String[] args) {
Set<HashTest> set = new HashSet<HashTest>();
HashTest obj = new HashTest(6622458, 626304);
set.add(obj);
System.out.println(set.contains(obj));
obj.setMutableField(3867602);
System.out.println(set.contains(obj));
}
}
After changing mutableField, the computed
hashCode value is no longer pointing to the old bucket and the
contains() returns false.
We can tackle such situation using either of these methods
Hashcode is best when calculated from immutable data; therefore ensure that only immutable object would be used as key with Collections.
- If you need
mutable fields included in the hashCode method then you need to ensure that object state is not changing after they've been used as Key in a hash-based collection. If for any reason it changed, you can calculate and store the hash value when the object updates mutable field. To do this, you must first remove it from the collection(set/map) and then add it back to the collection after updating it.
It is possible that a memory leak can occur in the
Java application if
equals() and
hashcode() are not implemented. Consider a small code example below in which
HashMap keeping references active if
equals() and
hashcode() are not implemented. As a results the
HashMap grows continuously by adding the same key repeatedly and finally throw an
OutOfMemoryError
/**
* Example demonstrating a Hashcode leak.
*/
public class HashcodeLeakExample {
private String id;
public HashcodeLeakExample(String id) {
this.id = id;
}
public static void main(String args[]) {
try {
Map<HashcodeLeakExample, String> map =
new HashMap<HashcodeLeakExample, String>();
while (true) {
map.put(new HashcodeLeakExample("id"), "any value");
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}