Java Hashing - Overriding hashcode and equals

Why Overriding hashCode() and Equal() method contract?

Every Java object has two very important methods i.e. hashCode() and equals() method. These methods are designed to be overridden according to their specific general contract. This article describes why and how to override the hashCode() method that preserves the contract of HashCode.

Contract For HashCode Method

The contract for hashCode says
“If two objects are equal, then calling hashCode() on both objects must return the same value”.
Now the question that will come into your mind is that; is it necessary that the above statement should always be true?

Consider the fact that we have provided a correct implementation of an equal method for our class, then what would happen if we do not obey the above contract.
To answer the above question, let us consider the two situations,
  1. Objects that are equal but return different hashCodes
  2. Objects that are not equal but return the same hashCode
Objects that are equal but return different hashCodes
What would happen if the two objects are equal but return different hashCodes? Your code would run perfectly fine. You will never come in trouble unless and until you have not stored your object in a collection like HashSet or HashMap. But when you do that, you might get strange problems at runtime.
To understand this better, you have to first understand how collection classes such as HashMap and HashSet work. These collections classes depend on the fact that the objects that you put as a key in them must obey the above contract. You will get strange and unpredictable results at runtime if you do not obey the contract and try to store them in a collection.

Consider an example of HashMap. When you store the values in HashMap, the values are actually stored in a set of buckets. Each of those buckets has been assigned a number which is use to identify it. When you put a value in the HashMap, it stores the data in one of those buckets. Which bucket is used depends on the hashCode that will return by your object. Let’s say, if the hashCode() method returns 49 for an object, then it gets stored in the bucket 49 of the HashMap.
Later when you try to check whether that collection contains an element or not by invoking the Contains(element) method, the HashMap first gets the hashCode of that “element “. Afterwards, it will look into the bucket that corresponds with the hashCode. If the bucket is empty, then it means we are done and it's return false which means the HashMap does not contain the element.
If there are one or more objects in the bucket, then it will compare “element” with all other elements in that bucket using your defined equal() function.

Objects that are not equal but return the same hashCode
The hashCode contract does not say anything about the above statement. Therefore different objects might return the same hashCode value, but collections like HashMap will work inefficiently if different objects return the same hashCode value.

Why Buckets

The reason why bucket mechanism is used is its efficiency. You can imagine that if all the objects you put in the HashMap would be stored into one big list, then you have to compare your input with all the objects in the list when you want to check if a particular element is in the Map. With the use of buckets, you will now compare only the elements of the specific bucket and any bucket usually holds only a small portion of all the elements in the HashMap.

Overriding hashCode Method

Writing a good hashCode() method is always a tricky task for a new class.

Return Fixed Value
You can implement your hashCode() method that always returns fix value, for example like this:

//bad performance
@Override
public int hashCode() {
    return 1;
}
The above method satisfies all the requirements and is considered legal according to the hash code contract but it would not be very efficient. If this method is used, all objects will be stored in the same bucket i.e. bucket 1 and when you try to ensure whether the specific object is present in the collection, then it will always have to check the entire content of the collection.

On the other hand, if you override the hashCode() method for your class and if the method breaks the contract then calling contains() method may return false for the element which is present in the Collection but in a different bucket.

Method From Effective Java
Joshua Bloch in Effective Java provides good guidelines for generating a hashCode() value
  • 1. Store some constant nonzero value; say 17, in an int variable called result.
  • 2. For each significant field f in your object (each field taken into account by the equals()), do the following
  • a. Compute an int hashCode c for the field:
  • i. If the field is a boolean, compute c = (f ? 1 : 0).
  • ii. If the field is a byte, char, short, or int, compute c = (int) f.
  • iii. If the field is a long, compute c = (int) (f ^ (f >>> 32)).
  • iv. If the field is a float, compute c = Float.floatToIntBits(f).
  • v. If the field is a double,compute long l = Double.doubleToLongBits(f),
    c = (int)(l ^ (l >>> 32))
  • vi. If the field is an object reference then equals( ) calls equals( ) for this field. compute
    c = f.hashCode()
  • vii. If the field is an array, treat it as if each element were a separate field.
    That is, compute a hashCode for each significant element by applying above rules to each
    element
  • b. Combine the hashCode c computed in step 2.a into result as follows:result = 37 * result + c;
  • 3. Return result.
  • 4. Look at the resulting hashCode() and make sure that equal instances have equal hash codes.
Here is an example of a class that follows the above guidelines

public class HashTest {
    private String field1;
    private short  field2;
    ----

    @Override
    public int hashCode() {
        int result = 17;
        result = 37*result + field1.hashCode();
        result = 37*result + (int)field2;
        return result;
    }
}
You can see that a constant 37 is chosen. The purpose of choosing a prime number is that it is a prime number. We can choose any other prime number. Using prime number the objects will be distributed better over the buckets. I encourage the user to explore the topic further by checking out other resources.

Using java.util.Objects.hash
java.util.Objects class contains a utility method hash(Object... values) that can be used to calculate hash for sequence of objects. With this method, we can implement hashcode for our example HashTest class as follows:
public class HashTest {
    private String field1;
    private short  field2;
    ----

    @Override
    public int hashCode() {
        return java.util.Objects.hash(field1, field2);
    }
}
Apache HashCodeBuilder
Writing a good hashCode() method is not always easy. Since it can be difficult to implement hashCode() correctly, it would be helpful if we have some reusable implementations of these.

The Jakarta-Commons org.apache.commons.lang.builder package is providing a class named HashCodeBuilder which is designed to help implement a hashCode() method. Usually, developers struggle hard with implementing a hashCode() method and this class aims to simplify the process.
Here is how you would implement hashCode algorithm for our above class

public class HashTest {
    private String field1;
    private short  field2;
    ----

    @Override
    public int hashCode() {
        return new HashCodeBuilder(83, 7)
            .append(field1)
            .append(field2)
            .toHashCode();
    }
}
Note that the two numbers for the constructor are simply two different, non-zero, odd numbers - these numbers help to avoid collisions in the hashCode value across objects.

If required, the superclass hashCode() can be added using appendSuper(int).
You can see how easy it is to override HashCode() using Apache HashCodeBuilder.

Mutable Object As Collection Key

It is a general advice that you should use immutable object as a key in a Collection. HashCode work best when calculated from immutable data. If you use Mutable object as key and change the state of the object so that the hashCode changes, then the store object will be in the wrong bucket in the Collection

The most important thing you should consider while implementing hashCode() is that regardless of when this method is called, it should produce the same value for a particular object every time when it is called. If you have a scenario like an object produces one hashCode() value when it is put() into a HaspMap and produces another value during a get(), in that case, you would not be able to retrieve that object. Therefore, if you hashCode() depends on mutable data in the object, then made changing those data will surely produce a different key by generating a different hashCode().
Look at the example below
public class Employee {

    private String name;
    private int age;

    public Employee() {
    }

    public Employee(String name, int age) {
        this.name = name;
        this.age = age;
    }

    public String getName() {
       return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }

    @Override
    public boolean equals(Object obj) {
        //Remember: Some Java gurus recommend you avoid using instanceof
        if (obj instanceof Employee) {
            Employee emp = (Employee)obj;
            return (emp.name == name && emp.age == age);
        }
        return false;
    }

    @Override
    public int hashCode() {
        return name.length() + age;
    }

    public static void main(String[] args) {
        Employee e = new Employee("muhammad", 24);
        Map<Object,Object> m = new HashMap<Object,Object>(); 
        m.put(e, "Muhammad Ali Khojaye");  
    
        // getting output 
        System.out.println(m.get(e));
        e.name = "abid";  
    
        // it fails to get System.out.println(m.get(e));
        e.name = "amirrana";  
    
        // it fails again
        System.out.println(m.get(new Employee("muhammad", 24))); 
    } 
So we can see in the above examples that how are we getting some unpredictable results after modifying the object state.

Another Example of Mutable Field as Key

Let consider an another example below:
public class HashTest {
    private int mutableField;
    private final int immutableField;

    public HashTest(int mutableField, int immutableField) {
        this.mutableField = mutableField;
        this.immutableField = immutableField;
    }

    public void setMutableField(int mutableField) {
        this.mutableField = mutableField;
    }

    @Override
    public boolean equals(Object o) {
        if(o instanceof HashTest) {
            return (mutableField == ((HashTest)o).mutableField)
               && (immutableField ==  ((HashTest)o).immutableField);
        }else {
            return false;
        }              
    }

    @Override
    public int hashCode() {
        int result = 17;
        result = 37 * result + this.mutableField;
        result = 37 * result + this.immutableField;
        return result;
    }

    public static void main(String[] args) {
        Set<HashTest> set = new HashSet<HashTest>();
        HashTest obj = new HashTest(6622458, 626304);
        set.add(obj);                 
        System.out.println(set.contains(obj));     
        obj.setMutableField(3867602);        
        System.out.println(set.contains(obj));
    }
}
After changing mutableField, the computed hashCode value is no longer pointing to the old bucket and the contains() returns false.
We can tackle such situation using either of these methods
  • Hashcode is best when calculated from immutable data; therefore ensure that only immutable object would be used as key with Collections.
  • If you need mutable fields included in the hashCode method then you need to ensure that object state is not changing after they've been used as Key in a hash-based collection. If for any reason it changed, you can calculate and store the hash value when the object updates mutable field. To do this, you must first remove it from the collection(set/map) and then add it back to the collection after updating it.

Memory leaks with HashCode and Equal

It is possible that a memory leak can occur in the Java application if equals() and hashcode() are not implemented. Consider a small code example below in which HashMap keeping references active if equals() and hashcode() are not implemented. As a results the HashMap grows continuously by adding the same key repeatedly and finally throw an OutOfMemoryError
/**
 * Example demonstrating a Hashcode leak.
 */
public class HashcodeLeakExample {
    private String id;

    public HashcodeLeakExample(String id) {
        this.id = id;
    }

    public static void main(String args[]) {
        try {
            Map<HashcodeLeakExample, String> map = 
                  new HashMap<HashcodeLeakExample, String>();
            while (true) {
                map.put(new HashcodeLeakExample("id"), "any value");
            }
        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }
}

References and More Information

28 comments:

  1. Nice and Thanks for posting this. This piece of tutorial really helpful to know what happens when we change the mutable object that is in the collection.

    ReplyDelete
  2. really great article, but it will be really great to continue it with JPA and entities in mind :)

    ReplyDelete
  3. Thanks,great article.On first reading,I really understand this subject.
    I'm bookmarking this article for future reference.

    ReplyDelete
  4. I am so glad, this small tutorial work for you guyz. Thanks for the comments.

    Cheers!!

    ReplyDelete
  5. Please don't implement String equality with ==. Use equals(). Not all Strings are interned, only literals.

    Illustration:

    String one = "test";
    String two = new String("test");
    String three = new String("test").intern();
    System.out.println("equal? " + (one == two));
    System.out.println("equal? " + (one == three));

    ReplyDelete
  6. @jmagiera
    Yes, I agree. Equal() should be used instead.

    Cheers!

    ReplyDelete
  7. Hi Khojaye, Thanks for posting really this is fruitful for me..i m frm india..thanks bro :)

    ReplyDelete
  8. Thank you so much. I'm glad you like it :)

    ReplyDelete
  9. Objects that are equal but return different hashCodes: For HashSet, this would mean duplicates across the set, however, I am not able to figure out how this will affect HashMap. Can you post an example?

    ReplyDelete
  10. Consider below sample example where we do not override hashcode(), and therefore, the object fail to get.

    public class Test {
    public static void main(String[] args) {

    Map employees
    = new HashMap();
    employees.put(new
    Employee("Khojaye", 25), "Khojaye");

    String employeeName = employees.get(new
    Employee("Khojaye", 25));

    System.out.println(employeeName);
    // this will print null
    System.out.println(new
    Employee("Khojaye", 25).equals(new
    Employee("Khojaye", 25)));
    // this is true because of the
    // overriden equals()
    }
    }

    class Employee {
    private String name;
    private int age;

    public Employee(String name, int age) {
    this.name = name;
    this.age = age;
    }

    public String getName() {

    return name;
    }

    public int getAge() {
    return age;
    }

    @Override
    public boolean equals(Object obj) {
    if (obj == this) {
    return true;
    }

    if (!(obj instanceof Employee))
    {
    return false;
    }

    Employee person = (Employee)
    obj;

    return person.getName()
    .equals(this.name)
    &&
    person.getAge() ==
    this.age;
    }

    // didn't override hashCode!!!
    }

    Please let me know, If there is any thing still unclear.

    ReplyDelete
  11. Nice explanation, I used your article to convince my manager who isn't so much technical about overriding hashcode in all the POJOs we use.

    ReplyDelete
  12. I am glad to know that it was helpful for you

    ReplyDelete
  13. Which data structures is been used to store the hashcode in the buckets ?Any idea

    My perspective is linked list. but any suggestions from your end

    ReplyDelete
  14. The current implementation is just using the custom inner class named Entry where hashcode saved. Once retrieve the hashcode (and collision occur), the linkedin of object maintain.

    ReplyDelete
  15. How do we know that we have written best implementation of hashcode method for a collection?

    ReplyDelete
  16. Generally, its not easy to answer because it depends on the scenario for which you are implementing the hashCode() method and also on what equality means for objects of that class. A good hashCode implementation returns unique hash codes for different objects, however, it is impossible to guarantee that hashcodes will be unique for each object.

    A good general guideline-purpose guideline has been explained in heading "Overriding hashCode Method" heading.
    I hope it helps.

    ReplyDelete
  17. Hi,
    This is one of best tutorial I read about hashcode. I always confuse why should we override hashcode.

    I also want to add that we use prime number because It helps elements in the collection to be well distributed among the buckets.
    The reason is that it uses the formula hashCode() % numberOfBuckets to compute the bucket where the object will store and prime number is not a multiple of the number of buckets (which is a power of 2) therefore, hashCode() % numberOfBuckets will not repeat quickly to the same number bucket. In short, prime number will helps the objects to be distributed better over the buckets.

    ReplyDelete
  18. Hi Can you just point out how exactly..Does this work in Hash Map, existing implementation ************
    If there are one or more objects in the bucket, then it will compare “element” with all other elements in that bucket using your defined equal() function *************
    So in oneline if i wish to say about hash,
    Every ideal hashkey for an object has to be unique ,considering all the instance variables of the object.
    Also can you point where are the contracts given like immutable objects etc..in the javadoc

    ReplyDelete
  19. This comment has been removed by a blog administrator.

    ReplyDelete
  20. This comment has been removed by a blog administrator.

    ReplyDelete
  21. In "Mutable Object As Key" the solution of using Joshua Recipe or HashCodeBuilder won't solve the problem of not finding elements where hashCode uses mutable fields. It seems that it is offered as solution in the article but in fact it is not as tha hashCode will change after a change of name in any of the mentioned solutions as they all depend on the value of name...

    ReplyDelete
    Replies
    1. Thank you for the comment. I have updated that section to make it more clear.

      You are on the right spot that Joshua Recipe or HashCodeBuilder cannot solve the problem for scenario 'Mutable field as a Key'. General advice is that Key should be immutable since HashMap (or HashSet) like Collection assume that an object’s hash value does not modify while it is in use as a key in the collection.

      If for any reason your key is not immutable, you need to either ensure not changing it after it has been used in a hash-based collection or you can follow the advice mentioned under the heading 'Another Example of Mutable Field as Key'.

      Cheers,

      Delete
    2. Yes, well done, it is much clearer now. As an alternative for generating hashCodes also consider java.util.Objects#hash since java 1.7.

      Delete
  22. Hello Author, The tutorial is very good and I understand now what is the use of hashcode and relation with equal. I have one question. Is it always necessary that i should use same fields in hashcode and equal function? I run an example and if I use different values in hashcode function, it will still work. can you answer this?

    ReplyDelete
  23. We must use all the fields in equals() methods that has been used in hashcode() method calculation. Violating this can result with different hashcode for two equal objects. However, the reverse is not mandatory and we can use the subset of fields in hashcode() method calculation (say, for improving the performance of hashcode).

    ReplyDelete