Friday, March 21, 2014

Binary Search vs Contains Performance in Java List

There are two ways to search an element in a List class, by using contains() method or by using Collections.binarySearch() method. There are two versions of binarySearch() method, one which takes a List and Comparator and other which takes a List and Comparable. This method searches the specified list for the specified object using the binary search algorithm. The list must be sorted into ascending order according to the natural ordering of its elements (as by the sort(List) method) prior to making this call. If List is not sorted, then results are undefined. If the List contains multiple elements equal to the specified object, there is no guarantee which one will be returned. This method runs in log(n) time for a "random access" list (which provides near-constant-time positional access). If the specified list does not implement the RandomAccess interface and is large, this method will do an iterator-based binary search that performs O(n) link traversals and O(log n) element comparisons. In the end this method returns the index of the search key, if it is contained in the list; otherwise, (-(insertion point) - 1). The insertion point is defined as the point at which the key would be inserted into the list: the index of the first element greater than the key, or list.size() if all elements in the list are less than the specified key. This means that return value will be >= 0 if and only if the key is found. Since common implementation of List interface e.g. ArrayList, Vector, CopyOnWriteArrayList and Stack implements RandomAccess interface, they can be used for performing binary search, but there are other implementations like LinkedList, which doesn't implement java.util.RandomAccess, hence not suitable for binary search operation. Since binary search can only be performed in sorted list, you also need to sort your collection before doing search, which may potentially impact performance, especially if your List is big and not in sorted order already.


Java Code for contains() Vs binarySearch()

Binary search vs contains in Java ListHere is our program to find object using binary search in Java List. We have a list of Integer with 1M records and uses both contains() and binarySearch() method to search an element.


import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/** 
  * Java program to perform binary search in Java collection e.g List, Set
  * @author Javin Paul
  */
public class BinarySearchTest {
    public static final Logger logger = LoggerFactory.getLogger(BinarySearchTest.class);
   
    public static void main(String args[]) {       
  
        //creating List
        List<Integer> numbers = new ArrayList<Integer>(1000000); //List of 1M records
       
        //initializing List
        for(int i =0; i<numbers.size(); i++){
            numbers.add(new Integer(i));
        }
      
        //performing contains search
        long startTime = System.nanoTime();
        boolean isExist = numbers.contains(new Integer(1000000));
        long totalTime = System.nanoTime() - startTime;
      
       
        logger.info("Time to search 1Mth Record using contains() is {} nano seconds", totalTime);
       
        //performing binary search
        startTime = System.nanoTime();
        Collections.sort(numbers);  // List needs to be sorted for Binary Search
        Integer number = Collections.binarySearch(numbers, new Integer(1000000));
        totalTime = System.nanoTime() - startTime;
       
        logger.info("Time to search 1Mth Record using binary search is {} nano seconds", totalTime);
    }
  
}

Ouput:
2013-06-04 23:23:17,834 0    [main] INFO  test.BinarySearchTest  - Time to search 1Mth Record using contains() is 51404 nano seconds
2013-06-04 23:23:17,849 15   [main] INFO  test.BinarySearchTest  - Time to search 1Mth Record using binary search is 554261 nano seconds

From the output, you can see that contains() method is almost 10 times faster than binary search, which means it make sense to use contains() for searching objects in List, especially for those which implements RandomAccess interface e.g. ArrayList.


5 comments :

Abhishek Kapoor said...

I would do the sorting prior start time to be more justified.
But yes "contains ' is faster then "binarySearch" for ArrayList,
Below my output :
Time to search 1Mth Record using contains() is {} nano seconds 84585
Time to search 1Mth Record using binary search is {} nano seconds 1283703

Anonymous said...

for searching in a list., we can indexOf() also, This gives better timing than contains()..

correct me if I am wrong...

Prasen said...

@Anonymous: if you check the source code of ArrayList, contains method internally call indexOf method only. So time for both of them will same. I think. (other than a extra method call.)

This is a good tutorial.
Additional info: time complexity of contains and binarySerach depends on the given list. If it is already ordered list, i will go for BinarySearch(which will take O(logn) time). If the input list is not ordered, then do contains, which will do a liner search(take time O(n)).

AlexBonel said...

Well, I think that this is not fare relatively to binarySearch() as you measure it's time in conjunction with sorting invocation which takes additional O(NlogN) time. It would be more convenient if you test these methods on a sorted list or take a timestamp not before sorting a list but just before invoking of binarySearch(). You will get antipodal results.

Smit Jain said...

I don't agree "contains() method is almost 10 times faster than binary search" infact the binary search is always many times faster than contains()
Reason : contains uses traditional linear search algorithm so the complexity is O(n) whereas the Binary search has complexity of O(logn).

Your code is giving incorrect result because the list (numbers) you are filling is not actually filled.
for(int i =0; i numbers = new ArrayList(capacity);
// List of 1M records //initializing List
for (int i = 0; i < capacity; i++) {
numbers.add(new Integer(i));
}

long startTime = System.nanoTime();
Collections.sort(numbers);
long totalTime = System.nanoTime() - startTime;
logger.info("Time to sort 1 Million Records is {} nano seconds", totalTime);

// performing contains search
startTime = System.nanoTime();
boolean isExist = numbers.contains(new Integer(1000000));

totalTime = System.nanoTime() - startTime;
logger.info("Time to search 1Mth Record using contains() is {} nano seconds", totalTime);
// performing binary search
startTime = System.nanoTime();
// List needs to be sorted for Binary Search
Integer number = Collections.binarySearch(numbers, new Integer(1000000));
totalTime = System.nanoTime() - startTime;
logger.info("Time to search 1Mth Record using binary search is {} nano seconds", totalTime);
}
}

The output of the above code is :
[main] INFO BinarySearchTest - Time to sort 1 Million Records is 19167309 nano seconds
[main] INFO BinarySearchTest - Time to search 1Mth Record using contains() is 4889023 nano seconds
[main] INFO BinarySearchTest - Time to search 1Mth Record using binary search is 38720 nano seconds

So you can clearly see the BinarySearch is more than 100 times faster than linear search in this case. Actually it depends upon the size of Array you are searching on.

Post a Comment