Understanding equality and object comparison in .NET framework

.NET framework provides several methods to do comparison and equality check for objects. It can be pretty confusing at the beginning by looking at the documentations for each interfaces that framework offers. In this post, I will try explain the possibilities with examples.

Basically, there are two kinds of equality in .NET. Reference Equality and Value Equality. Default is reference equality. To understand reference equality, consider the following class:

class Employee
{
    public Employee(int employeeId)
    {
        this.EmployeeId = employeeId;
    }

    public int EmployeeId { get; private set; }
}

class Program
{
    static void Main(string[] args)
    {
        Employee first = new Employee(10);
        Employee second = new Employee(10);
        Employee firstCopy = first;
        Console.WriteLine(first == second);  // Prints false
        Console.WriteLine(first == firstCopy);  // Prints true
    }
}

Two Employee objects first and second has same employee id but still first == second returns false. You are seeing reference equality in action here. Reference equality checks the objects has got same reference rather than the values. If reference equality is returning true it is assured that both objects will have same values.

Value equality, as the name indicates, checks the values for equality. This can be done by utilizing the Equals method provided in the System.Object class. System.Object provides a virtual Equals(object obj) method which all subclasses can override to provide custom implementation. Let us override Equals in Employee.

public override bool Equals(object obj)
{
    Employee other = (Employee)obj;
    return other.EmployeeId == this.EmployeeId;
}

To get value equality, you need to write like first.Equals(second). It is a best practice that when you override Equals, override GetHashCode also. We will be seeing this soon.

You can also overload operator == and provide custom value equality implementation. Here is how you do that.

public static bool operator ==(Employee thisEmployee, Employee thatEmployee)
{
    return thisEmployee.EmployeeId == thatEmployee.EmployeeId;
}

public static bool operator !=(Employee thisEmployee, Employee thatEmployee)
{
    return thisEmployee.EmployeeId != thatEmployee.EmployeeId;
}

Now the question may arise, which should I use. Equals or == operator? I suggest to use Equals when you need a value comparison for reference types. Use == on value types like int or string. The difference here is: Equals() is polymorphic in nature and operator == is not. To understand clearly, consider

Employee first = new Employee(10);
Employee second = new Employee(10);
object obj = first;
Console.WriteLine(obj == second); // Gives false as it uses reference equality and == is not polymorphic.
Console.WriteLine(obj.Equals(second)); // Equals is polymorphic and Employee classes Equals will be called

When Equals is overridden, it is programmers duty to ensure the method to be exception safe. It is always a good practice to overload == and != operators if you override Equals.

Except from what already discussed, .NET framework comes with a set of interfaces for comparison and equality checks.

  1. IEquatable(T)
  2. IEqualityComparer(T)
  3. IComparable(T)
  4. IComparer(T)

IEquatable(T)

Quoting MSDN:

This interface is implemented by types whose values can be equated (for example, the numeric and string classes). A value type or class implements the Equals method to create a type-specific method suitable for determining equality of instances.

In simple words, this interface provides a strongly-typed Equals() method. Implementing IEquatable<T> on Employee class looks like:

class Employee : IEquatable<Employee>
{
    public Employee(int employeeId)
    {
        this.EmployeeId = employeeId;
    }

    public override bool Equals(object obj)
    {
        Employee other = (Employee)obj;
        return Equals(other); // Using the generic version
    }

    public int EmployeeId { get; private set; }

    public bool Equals(Employee other)
    {
        return other.EmployeeId == this.EmployeeId;
    }
}

Implementing IEquatable<Employee> is like saying “Employee can be equated with other employee“.

IEqualityComparer(T)

Here is what MSDN is saying:

This interface allows the implementation of customized equality comparison for collections. That is, you can create your own definition of equality for type T, and specify that this definition be used with a collection type that accepts the IEqualityComparer(T) generic interface.

You implement this interface on a seperate class to provide equality with two objects of the type. Containers like Dictionary uses this interface to check the object equality. Here is a equality comparer for our employee class.

class EmployeeComparer : IEqualityComparer<Employee>
{
    public bool Equals(Employee x, Employee y)
    {
        return x.Equals(y);
    }

    public int GetHashCode(Employee obj)
    {
        return obj.GetHashCode(); // we will see GetHashCode in detail later in this post
    }
}

You might be wondering why this interface is implemented rather than using the Equals() method available on each object. Assume the Employee class doesn’t override Equals and provide default reference comparison. You want to keep this in a container by providing value equality. IEqualityComparer comes to help here. It allows you to create a comparer without modifying the Employee class.

IComparable(T)

Quoting MSDN:

This interface is implemented by types whose values can be ordered; for example, the numeric and string classes. A value type or class implements the CompareTo method to create a type-specific comparison method suitable for purposes such as sorting.

This interface helps to compare objects and decide which one is greater or lesser than the other. This is used when you need to sort a collection. The collection class uses CompareTo method to decide how to sort the elements.

Let us implement IComparable to our Employee class and keep employee instances in a list. We will try to sort the list depending on the employee id.

class Employee : IComparable<Employee>
{
    public Employee(int employeeId)
    {
        this.EmployeeId = employeeId;
    }

    public int EmployeeId { get; private set; }

    public int CompareTo(Employee other)
    {
        return this.EmployeeId.CompareTo(other.EmployeeId);
    }
}

class Program
{
    static void Main(string[] args)
    {
        List<Employee> employees = new List<Employee>();
        employees.Add(new Employee(10));
        employees.Add(new Employee(13));
        employees.Add(new Employee(5));
        employees.Add(new Employee(1));
        employees.Sort();
        foreach (Employee emp in employees)
            Console.WriteLine(emp.EmployeeId);
    }
}

IComparer(T)

This interface allows us to write custom comparer without modifying the existing class. Using this, we can write a comparer for Employee without changing any code in the Employee class. This comparer can be supplied to methods like Sort().

class EmployeeComparer : IComparer<Employee>
{
    public int Compare(Employee x, Employee y)
    {
        return x.EmployeeId.CompareTo(y.EmployeeId);
    }
}

class Program
{
    static void Main(string[] args)
    {
        List<Employee> employees = new List<Employee>();
        employees.Add(new Employee(10));
        employees.Add(new Employee(13));
        employees.Add(new Employee(5));
        employees.Add(new Employee(1));
        employees.Sort(new EmployeeComparer());
        foreach (Employee emp in employees)
            Console.WriteLine(emp.EmployeeId);
    }
}

Understanding GetHashCode()

One important topic which we haven’t discussed so far is about overriding GetHashCode method. If you search for GetHashCode, you will get plenty of discussions happening on how to implement and when to implement it. Here is what MSDN says:

Serves as a hash function for a particular type. The GetHashCode method is suitable for use in hashing algorithms and data structures such as a hash table.

Important thing about GetHashCode is, if expression object1.Equals(object2) is true, then object1.GetHashCode == object2.GetHashCode() also should be true. If your objects compare using reference equality, there is no need to override GetHashCode. If your class overrides Equals() and does value equality, it is a good practice to override GetHashCode as well.

It is bit tricky to write a good hash function which gives good hash codes and avoid collision. A hash function is poor when it returns same hash code for objects that have different values. This will lead to collision. Hashtable implementations usually handle collisions with chaining. The more collisions you have, the chain gets bigger. This will lead item search to have a sequential complexity and you loose the whole advantage that a hashtable can offer.

Like Effective Java says, you can produce reasonably good hash by using two prime numbers and doing arithmetic with the class members. Here is a hash function that produces good hash for our Employee class (since Employee class has got only one integer field, calling GetHashCode() on that member would also work well in this case).

public override int GetHashCode()
{
    int hashCode = 29;
    hashCode = (hashCode * 31) + this.EmployeeId.GetHashCode();
    return hashCode;
}

It is worth noting that hash function should utilize only immutable data for calculating the hash code. To explain how mutable data makes problem, consider the following hash function.

public override int GetHashCode()
{
    int hashCode = 29;
    hashCode = (hashCode * 31) + this.EmployeeId.GetHashCode();
    hashCode = (hashCode * 31) + this.EmployeeName.GetHashCode();
    return hashCode;
}

I have added a new property EmployeeName with get and set permission. Value of this property is used in the hash calculation. To know how this makes issues, consider the following self explanatory code which utilizes this hash function.

static void Main(string[] args)
{
    Employee first = new Employee(10);
    first.EmployeeName = "Before Change";
    
    // Adding first employee to a hashset
    HashSet<Employee> employees = new HashSet<Employee>();
    employees.Add(first);

    // Changing the first employees name
    // Since employee name is involved in the hashing, the hash code will change now
    first.EmployeeName = "Changed";
    
    // Trying to get the employee from hashset
    // Since hash is changed, it can't be taken from hashset again.
    if (!employees.Contains(first))
        Console.WriteLine("First employee not found!");
}

Several hash implementations uses XOR (^) to calculate hash functions. This is OK but not perfect like the one which uses prime number. This method will have more collisions than the former one. To understand this, let us add a immutable department id property to our employee class.

class Employee
{
    public Employee(int employeeId, int departmentId)
    {
        this.EmployeeId = employeeId;
        this.DepartmentId = departmentId;
    }

    public override int GetHashCode()
    {
        return this.DepartmentId.GetHashCode() ^ this.EmployeeId.GetHashCode();
    }

    public int EmployeeId { get; private set; }

    public int DepartmentId { get; private set; }

    // Other methods

}

the above hash function will give same hash code for an employee who’s id is 10 and department id is 20 and another employee with id as 20 and department id as 10.

Employee first = new Employee(10, 20);
Employee second = new Employee(20, 10);
Console.WriteLine(first.GetHashCode() == second.GetHashCode()); // Prints true!

A good hash function

  1. Will always generate hash codes that are well distributed on a range of int32. So that maximum number of buckets will be utilized and less number of chaining is required.
  2. Will always generate same hash code for objects that have same values.
  3. Will not use mutable data for hash code calculation.

I hope you will find this post helpful. Let me know your feedback.

Happy programming!

Advertisements

2 thoughts on “Understanding equality and object comparison in .NET framework

  1. umm… OK. Lots of cool information, but I got lost in it. The bottom line being, what would be the actual, recommended code to be able to compare two objects of the same class for equality???

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s