3.4.2. Finger search and fast insertion

Learning objectives

Finger search
The operations finger_locate(), finger_locate_from_front(), and finger_locate_from_rear()
The operation insert_at()

What one should not do in daily life, pointing to someone with a finger, may well show its advantages when searching in sorted sequences.

The search operations of sortseq that we have discussed until now take logarithmic time: If n containers are stored in a sortseq, then lookup() and locate() take O(log(n)) steps to find the respective container.

This is due to the fact that sortseq is implemented by a search tree that is lying over the sorted sequence.[35] To be more precise, it is a niveau-connected (a,b)-tree: An (a,b) tree is a tree in which every node has at least a and at most b children; a tree is niveau-connected if the following holds true: Every node knows its children and its parent, every node knows the maximum key in each of its sub-trees (that is the maximum contents of a leaf), and all nodes having the same depth know their neighbors on this depth.[36]

Figure 3.8 shows such a tree: An ordinary search starts up in the root, and since the tree has a depth of O(log(n)) with n containers contained in it, it takes O(log(n)) steps to find the desired leaf (or the position right behind this leaf if the search is not successful).

Finger search, however, opens up the possibility of searching in sub-logarithmic time: This search method assumes that we know a certain container from which we start the search (instead of starting from the root), because we know that what we are searching for lies in the proximity of this container. An item that refers to this start container is called a finger. Fingers can be used for pointing to fields of high activity in a sorted sequence.

If there are d containers lying between the container from which the search starts and the one at which it ends, this search takes only O(log(d)) steps because it can walk up the tree lying over the sequence and use suitable shortcomes, see Exercise 39. If d is small compared to n, that is, if considerably more key-value pairs are contained in the sorted sequence than are lying between the start and end container of the finger search, then finger search is considerably faster than an ordinary search starting at the root. Figure 3.8 shows the difference between these two search methods.

Figure 3.8. Finger search

Finger search

A finger search of beautiful Hanne to beautiful Karin. Hanne is pointed to with a finger. The search first walks up the tree, turns around relatively soon (since Karin lies in the distance of only d to Hanne) and therefore reaches Karin in O(log d) steps. In contrast, a search starting at the root would have taken O(log n) steps. Finger search pays off if d is small.


The class sortseq offers finger search in form of the following methods: A call

seq_item it = S.finger_locate_from_front(k);

starts with an implicit finger pointing to the first, leftmost container of the sequence and searches for the container with the key k. In contrast,

seq_item it = S.finger_locate_from_rear(k);

starts from the last, rightmost container. If the container searched for is at position d in the sorted sequence, these methods take time O(log d) and O(log(n-d)), respectively. One should use these methods if one knows that the container searched for lies near one of the two end containers of the sequence (and then select the suitable method depending on this end container). These methods are semantically equivalent to the ordinary locate(), that is, they return an item referring to the container with the smallest key k' that is equal to or larger than the key k searched for, and nil, if no such key exists.

One can even search much more efficiently if one approximately knows the position at which the container searched for lies in the sequence, that is, if one has an explicit finger pointing to it. Such a finger is an item finger_it that refers to a container near the container searched for. A call

seq_item it = S.finger_locate(finger_it, k);

starts the finger search for the key k at the container that finger_it refers to. This method also is semantically equivalent to locate(), but much faster under certain circumstances: This method takes only O(log min(d,n-d)) steps if there are d containers lying between the start and target container.

If, in contrast, we neither know an explicit finger, nor know whether the key k searched for lies near the beginning or the end of the sorted sequence, a call of

seq_item it = S.finger_locate(k);

may pay off. This method searches simultaneously from the beginning and the end of the sequence, so it alternately executes one step of finger_locate_from_front() and one step of finger_locate_from_rear(). Therefore, its running time is the minimum of the running times of the latter two methods, asymptotically speaking; the actual running time is larger by the factor of 2, though.

Fast insertion

After a fast search we also want to insert fast. For this purpose, sortseq keeps ready the method insert_at(), which also needs a finger; this finger tells it at which position it shall insert a new container.

A call

seq_item new_it = S.insert_at(finger_it, k, i);

first checks whether the container that finger_it refers to contains the key k. If this is the case, the information is set to i in this container. Otherwise, insert_at() inserts a new key-value pair (k,i) behind or in front of the container (k',i') that finger_it refers to. It is a precondition here that k' be either the largest key smaller than k or the smallest key larger than k. In the first case, (k,i) is put in order behind (k',i'), in the second case in front of this pair. So in either case, the insertion takes place next to (k',i') and an item is returned that refers to the container just inserted.

Though, if it is already known that k is larger than k', that is, the insertion must take place behind (k',i'), a call of

seq_item new_it = S.insert_at(finger_it, k, i, LEDA::after);

is even more efficient. If, in contrast, it is known that k is smaller than k', LEDA::before should be specified as the direction of the insertion.

Every version of the method insert_at() takes only expected time O(1).

An experiment

Sorted sequences are frequently used for sorting; they are particularly efficient if the input is already presorted for the most part. Let us imagine for an experiment that numbers arrive in the input in the following order:

n-1, n-2, n-3, ..., n-f, 0, 1, 2, 3, ..., n-f-1

Therein, n be a very large integer number, for example 5,000,000, and f be a small integer number, for example 50. We want to sort this input sequence of 5,000,000 elements, which is already presorted for the most part, by successively inserting the elements into a sortseq.

As the first possibility, we use the normal insert(), which starts off from the root for every insertion. The total time for n insertions will be O(n·log n), since every search takes O(log n).[37]

If we think a little bit about the shape of the sequence, we see that the first f insertions all take place at the beginning of the sortseq, all other n-f (and this being the largest part) exactly in front of the f-th element at the end of the sortseq. So it would be wiser to start with the search from the end of the sortseq by means of finger_locate_from_rear() and then, after a finger has been found pointing to the insertion position, to insert the new element by means of insert_at() there. Here the overall running time will be only O(n·log f) because searching from the end will take only time O(log f). We get the same asymptotic running time if we work with the version of finger_locate() that searches from the beginning and the end of the sequence in parallel.

The stroke of genius is coming now, though: If we look at the sequence even more closely, we notice that every insertion takes place exactly one position in front of or behind the last inserted element! It is therefore even wiser to memorize a finger on the last inserted container and to always start the finger search from there. We then need only time O(1) for searching, and altogether, we get a linear time algorithm: The complete running time is only O(n).

The following program executes these experiments. It first creates a list that consists of the above number sequence. Then it inserts every element of the list into a sortseq by means of the methods discussed above. The times are measured with the help of used_time(). To level out measuring errors, every experiment is executed 10 times.

Filename: FingerSearchExperimentCleverRepeated.C
LEDA users as of version 5.0: Note that header files are now partitioned into modules (subfolders)!
#include <LEDA/sortseq.h>
#include <LEDA/list.h>
#include <LEDA/misc.h>
#include <iostream>

using leda::sortseq;
using leda::seq_item;
using leda::used_time;
using leda::list;
using std::cout;
using std::endl;


int main()
{
  const int n = 5000000;
  const int f = 50;
  const int NUM_EXP = 10; // number of repeated experiments
  
  cout << "n = " << n << "   f = " << f; 
  cout << "   # repeated experiments = " << NUM_EXP << endl;

  sortseq<int,int> S;
  list<int> L;

  seq_item it;
  float startTime;

  // create list of n integers
  for(int i = 1; i <= f; i++)
    L.append(n-i);
  for(int i = 0; i <= n - f - 1; i++)
    L.append(i);

  int k;
  leda::list_item lit;

  // experiment 1:
  // insert all ints without finger search
  startTime = used_time(); 
  for(int num_exp = 0; num_exp < NUM_EXP; num_exp++) {
    S.clear();
    forall(k, L) 
      S.insert(k, 0);
  }
  cout << "Time to insert without finger search: " 
       << used_time(startTime) << endl;


  // experiment 2:
  // insert all ints with finger search from rear end
  // make sortseq empty for 2nd experiment
  startTime = used_time(); 
  for(int num_exp = 0; num_exp < NUM_EXP; num_exp++) {
    S.clear();
    // insert first element of L into S
    lit = L.first();
    S.insert(L.contents(lit), 0);
    lit = L.succ(lit);
    // insert remaining elements of S into L
    while(lit != nil)
      {
        k = L.contents(lit);
        it = S.finger_locate_from_rear(k); 
        if(it) S.insert_at(it, k, 0, LEDA::before);
        else   S.insert_at(S.max_item(), k, 0, LEDA::after);
        lit = L.succ(lit);
      }
  }
  cout << "Time to insert with finger search from rear: " 
       << used_time(startTime) << endl;

  // experiment 3:
  // insert all ints with finger search from both ends
  startTime = used_time();
  for(int num_exp = 0; num_exp < NUM_EXP; num_exp++) {
    S.clear();
    lit = L.first();
    S.insert(L.contents(lit), 0);
    lit = L.succ(lit);
    while(lit != nil)
      {
        k = L.contents(lit);
        it = S.finger_locate(k);
        if(it) S.insert_at(it, k, 0, LEDA::before);
        else   S.insert_at(S.max_item(), k, 0, LEDA::after);
        lit = L.succ(lit);
      }
  }
  cout << "Time to insert with finger search from both ends: " 
       << used_time(startTime) <<" "<< endl;

  
  // experiment 4:
  // insert all ints with previous finger search
  startTime = used_time(); 
  for(int num_exp = 0; num_exp < NUM_EXP; num_exp++) {
    S.clear();
    lit = L.first();
    it = S.insert(L.contents(lit), 0);
    lit = L.succ(lit);
    while(lit != nil)
      {
        k = L.contents(lit);
        it = S.finger_locate(it, k); // use it here as the finger
        it = ( it ? S.insert_at(it, k, 0, LEDA::before)
               : S.insert_at(S.max_item(), k, 0, LEDA::after)
               );  // again, store it for succeeding finger search
        lit = L.succ(lit);
      }
  }
  cout << "Time to insert with finger search from last insertion: " 
       << used_time(startTime) <<" "<< endl;
  
  cout << endl;
}

The first element in a sortseq must always be inserted by an insert() because no item can be available at this time yet. In the experiments 2, 3, and 4, the first element is inserted intentionally outside the loop in order not to have to make the test whether the sortseq is still empty in every iteration; this would cause additional costs of O(n) and thereby could distort the measurement results.

A test run on one of the machines available to the author produced the following times:

Time to insert without finger search: 26.989
Time to insert with finger search from rear: 23.463
Time to insert with finger search from both ends: 24.846
Time to insert with finger search from last insertion: 15.242

We see by this that it always pays off to use finger search. As expected, the version with parallel finger search from the beginning and the end is slower than the one that searches only from the end. Despite this parallel finger search taking twice the time as before, it is not fundamentally slower because the biggest part of the time is used up for inserting. The version memorizing a finger to the last inserted element and starting the next search from there is considerably faster than all other ones.

Exercises

Exercise 39.

Finger search works conceptually as follows: Let us assume that the finger points to a container containing the key H and that we search for the key after the key K, see Figure 3.8. First finger search checks whether K lies on the left or on the right of H; K lie on the right, for example. Then the search walks up the search tree, starting from H and reaching the node v. There it checks whether K lies in one of the subtrees of v or of v's right neighbor on the same niveau. (This can be found out very fast because every node knows the maximum keys of its subtrees.) If this is the case, the search turns around at v or at v's right neighbor and searches for K in the respective subtree. Otherwise, it continues walking up to v's parent w and checks again whether K lies in a subtree of w or of the right neighbor of w; at the root at the latest, it turns around and searches to below.

Work out why finger search takes only O(log(d)) steps if the key to be searched for lies in distance d from the finger. Hint: If the search does not turn around in v but in w on height h, what is an upper bound for the maximum number of keys in the sub-tree with root v'?

Exercise 40.

We said above that the method finger_locate(finger_it, k) takes only O(log min(d,n-d)) steps. However, the description of the basic idea with the help of the tree from Figure 3.8 only talks about O(log d). Explain the second term n-d in the minimum function and how this term possibly makes the running time even smaller.

Exercise 41.

Explain why the statement

it = ( it ? S.insert_at(it, k, 0, LEDA::before)
          : S.insert_at(S.max_item(), k, 0, LEDA::after)
       );

really should read exactly like it is in the above loop.



[35] The standard implementation of a sortseq is a skiplist. Here, however, in favor of the comprehensibility of the description, we want to start out from the assumption of a search tree. Moreover, a search tree can be defined as the actual implementation of a sortseq by specifying a corresponding implementation parameter.

[36] We do not want to discuss the details of the implementation of niveau-connected (a, b) trees here. It suffices to understand that one can search in such a tree efficiently in all directions.

[37] An exact analysis that we want to abstain from here shows that this also holds although n is small at the beginning and then grows more and more.




Imprint-Dataprotection