Sunday, 18 March 2012

Difference between HashSet, LinkedHashSet and TreeSet in java

All three classes implement the Set interface and offer mostly the same functionality. The most important difference is the order in which iteration through the elements will happen:

HashSet makes absolutely not guarantees about the iteration order. It can (and will) even change completely when new elements are added.

TreeSet will iterate according to the "natural ordering" of the elements according to their compareTo()method (or an externally supplied Comparator). Additionally, it implements the SortedSet and NavigableSet interfaces.
SortedSet provides a total ordering on its elements. The elements are ordered using their natural ordering, or by a Comparator typically provided at sorted set creation time. The set's iterator will traverse the set in ascending element order. 
Several additional operations (first,last,subSet,headSet and tailSet) are provided to take advantage of the ordering.

NavigableSet extends SortedSet interface with navigation methods reporting closest matches for given search targets. Methods lower, floor, ceiling, and higher return elements respectively less than, less than or equal, greater than or equal, and greater than a given element, returning null if there is no such element. A NavigableSet may be accessed and traversed in either ascending or descending order. The descendingSet method returns a view of the set with the senses of all relational and directional methods inverted. The performance of ascending operations and views is likely to be faster than that of descending ones. This interface additionally defines methods pollFirst and pollLast that return and remove the lowest and highest element, if one exists, else returning null. Methods subSet, headSet, and tailSet differ from the like-named SortedSet methods in accepting additional arguments describing whether lower and upper bounds are inclusive versus exclusive.

LinkedHashSet implementation differs from HashSet in that it maintains a doubly-linked list running through all of its entries. This linked list defines the iteration ordering, which is the order in which elements were inserted into the set (insertion-order). Note that insertion order is not affected if an element is re-inserted into the set.
See the example given below


import java.util.*;

public class Main {

    public static void main(String args[]) {

        HashSet<String> hashSet = new HashSet<String>();
        LinkedHashSet<String> linkedSet = new LinkedHashSet<String>();
        TreeSet<String> treeSet = new TreeSet<String>();

        hashSet.add("Pakistan");
        hashSet.add("India");
        hashSet.add("SreeLanka");
        System.out.println("HashSet elements: " + hashSet);

        linkedSet.add("Pakistan");
        linkedSet.add("India");
        linkedSet.add("SreeLanka");
        System.out.println("LinkedHashSet elements: " + linkedSet);

        treeSet.add("Pakistan");
        treeSet.add("India");
        treeSet.add("SreeLanka");
        System.out.println("TreeSet elements: " + treeSet);

    }

}


The output is

HashSet elements: [SreeLanka, Pakistan, India]
LinkedHashSet elements: [Pakistan, India, SreeLanka]
TreeSet elements: [India, Pakistan, SreeLanka]

Performance Comparison

HashSet is a set based on hashing of the elements. It supports O(1) get/put operations. Elements must have consistent implementations of hashCode() and equals() for this to work.


Like HashSet, it provides constant-time performance for the basic operations (add, contains and remove), assuming the the hash function disperses elements properly among the buckets. Performance is likely to be just slightly below that of HashSet, due to the added expense of maintaining the linked list, with one exception: Iteration over a LinkedHashSet requires time proportional to the size of the set, regardless of its capacity. Iteration over a HashSet is likely to be more expensive, requiring time proportional to its capacity.

The TreeSet is implemented using the balanced binary tree and it has a predictable depth which is equal to log2(n) where 'n' is the number of nodes on the binary tree. In balanced binary tree, so the lookup would be O(log2n). If we have 1024 elements, it will take at most ten lookups.This is worst than a HashSet with an efficient hash function, which should only take one look-up.Keeping the tree balanced is expensive since it involves shuffling the nodes around whenever one side of the tree becomes too deep.So inserting an element into TreeSet is more expensive.

Conclusion

We can conclude the above discussion like this.Use TreeSet only when you need to iterate elements in sorted order.If you can satisfy with retrieving elements in the order of insertion, use LinkedHashSet.In all other scenarios use HashSet.Find the table given below




Order of iteration Performance
add/remove/contains
HashSet No order O(1)
LinkedHashSet Order in which elements where put into the set O(1)+expense of maintaining linkedlist
TreeSet Natural ordering or according to provided Comparator. O(log2n)

No comments:

Post a Comment