Learning objectives
Finger search 
The operations finger_locate() , finger_locate_from_front() , and finger_locate_from_rear() 
The operation insert_at() 
What one should not do in daily life, pointing to someone with a finger, may well show its advantages when searching in sorted sequences.
The search operations of sortseq
that we have discussed until now take
logarithmic time: If n containers are
stored in a sortseq
, then lookup()
and
locate()
take O(log(n)) steps to
find the respective container.
This is due to the fact that
sortseq
is implemented by a search tree that is lying over
the sorted sequence.^{[35]}
To be more precise, it is a niveauconnected
(a,b)tree: An (a,b) tree is a tree
in which every node has at least a and at most b children; a tree is
niveauconnected if the following holds true:
Every node knows its children and its parent,
every node knows the maximum key in each of its subtrees (that is
the maximum contents of a leaf), and all nodes having the same
depth know their neighbors on this depth.^{[36]}
Figure 3.8 shows such a tree: An ordinary search starts up in the root, and since the tree has a depth of O(log(n)) with n containers contained in it, it takes O(log(n)) steps to find the desired leaf (or the position right behind this leaf if the search is not successful).
Finger search, however, opens up the possibility of searching in sublogarithmic time: This search method assumes that we know a certain container from which we start the search (instead of starting from the root), because we know that what we are searching for lies in the proximity of this container. An item that refers to this start container is called a finger. Fingers can be used for pointing to fields of high activity in a sorted sequence.
If there are d containers lying between the container from which the search starts and the one at which it ends, this search takes only O(log(d)) steps because it can walk up the tree lying over the sequence and use suitable shortcomes, see Exercise 39. If d is small compared to n, that is, if considerably more keyvalue pairs are contained in the sorted sequence than are lying between the start and end container of the finger search, then finger search is considerably faster than an ordinary search starting at the root. Figure 3.8 shows the difference between these two search methods.
The class sortseq
offers finger
search in form of the following methods: A call
seq_item it = S.finger_locate_from_front(k);
starts with an implicit finger pointing to the
first, leftmost container of the sequence and searches for the
container with the key k
. In contrast,
seq_item it = S.finger_locate_from_rear(k);
starts from the last, rightmost container. If the container searched
for is at position d in the sorted sequence, these methods take time
O(log d) and O(log(nd)), respectively. One should use these methods
if one knows that the container searched for lies near one of the two
end containers of the sequence (and then select the suitable method
depending on this end container). These methods are semantically equivalent to the
ordinary locate()
, that is, they return an
item referring to the container with the smallest key k' that is
equal to or larger than the key k searched for, and
nil
, if no such key exists.
One can even search much more efficiently if one
approximately knows the position at which the container searched
for lies in the sequence, that is, if one has an
explicit finger pointing to it. Such a
finger is an item finger_it
that refers to a
container near the container searched for. A call
seq_item it = S.finger_locate(finger_it, k);
starts the finger search for the key k
at the
container that finger_it
refers to.
This method also is semantically equivalent to locate()
,
but much faster under certain circumstances: This method takes only
O(log min(d,nd)) steps if there are d containers lying between the
start and target container.
If, in contrast, we neither know an explicit finger, nor know whether the key k searched for lies near the beginning or the end of the sorted sequence, a call of
seq_item it = S.finger_locate(k);
may pay off. This method searches simultaneously from the beginning
and the end of the sequence, so it alternately executes one step of
finger_locate_from_front()
and one step of
finger_locate_from_rear()
. Therefore, its
running time is the minimum of the running
times of the latter two methods, asymptotically speaking; the actual running time is larger by the
factor of 2, though.
After a fast search we also want to insert
fast. For this purpose, sortseq
keeps
ready the method insert_at()
, which also
needs a finger; this finger tells it at which position it shall
insert a new container.
A call
seq_item new_it = S.insert_at(finger_it, k, i);
first checks whether the container that finger_it
refers to contains the key k. If this is the case, the information is
set to i in this container. Otherwise,
insert_at()
inserts a new keyvalue pair
(k,i) behind or in front of the container (k',i') that
finger_it
refers to. It is a precondition here that
k' be either the largest key smaller than k or the smallest key larger
than k. In the first case, (k,i) is put in order behind (k',i'), in
the second case in front of this pair. So in either case, the
insertion takes place next to (k',i') and an item
is returned that refers to the container just inserted.
Though, if it is already known that k is larger than k', that is, the insertion must take place behind (k',i'), a call of
seq_item new_it = S.insert_at(finger_it, k, i, LEDA::after);
is even more efficient. If, in contrast, it is known that k is smaller
than k', LEDA::before
should be specified as the
direction of the insertion.
Every version of the method insert_at()
takes only
expected time O(1).
Sorted sequences are frequently used for sorting; they are particularly efficient if the input is already presorted for the most part. Let us imagine for an experiment that numbers arrive in the input in the following order:
n1, n2, n3, ..., nf, 0, 1, 2, 3, ..., nf1
Therein, n be a very large integer number, for example 5,000,000, and f be a
small integer number, for example 50. We want to sort this input sequence of
5,000,000 elements, which is already presorted for the most part, by
successively inserting the elements into a sortseq
.
As the first possibility, we use the normal
insert()
, which starts off from the
root for every insertion. The total time for n insertions will be
O(n·log n), since every search takes O(log n).^{[37]}
If we think a little bit about the shape of the
sequence, we see that the first f insertions all take place
at the beginning of the sortseq
,
all other nf (and this being the largest part) exactly in
front of the fth element at the end of the
sortseq
. So it would be wiser to start
with the search from the end of the
sortseq
by means of
finger_locate_from_rear()
and then,
after a finger has been found pointing to the insertion
position, to insert the new element by means of
insert_at()
there. Here the overall
running time will be only O(n·log f) because searching from
the end will take only time O(log f). We get the same
asymptotic running time if we work with the version of
finger_locate()
that searches from
the beginning and the end of the sequence in parallel.
The stroke of genius is coming now, though: If we look at the sequence even more closely, we notice that every insertion takes place exactly one position in front of or behind the last inserted element! It is therefore even wiser to memorize a finger on the last inserted container and to always start the finger search from there. We then need only time O(1) for searching, and altogether, we get a linear time algorithm: The complete running time is only O(n).
The following program executes these experiments. It
first creates a list that consists of the above number
sequence. Then it inserts every element of the list into a
sortseq
by means of the methods
discussed above. The times are measured with the help of
used_time()
. To level out measuring
errors, every experiment is executed 10 times.
#include <LEDA/sortseq.h> #include <LEDA/list.h> #include <LEDA/misc.h> #include <iostream> using leda::sortseq; using leda::seq_item; using leda::used_time; using leda::list; using std::cout; using std::endl; int main() { const int n = 5000000; const int f = 50; const int NUM_EXP = 10; // number of repeated experiments cout << "n = " << n << " f = " << f; cout << " # repeated experiments = " << NUM_EXP << endl; sortseq<int,int> S; list<int> L; seq_item it; float startTime; // create list of n integers for(int i = 1; i <= f; i++) L.append(ni); for(int i = 0; i <= n  f  1; i++) L.append(i); int k; leda::list_item lit; // experiment 1: // insert all ints without finger search startTime = used_time(); for(int num_exp = 0; num_exp < NUM_EXP; num_exp++) { S.clear(); forall(k, L) S.insert(k, 0); } cout << "Time to insert without finger search: " << used_time(startTime) << endl; // experiment 2: // insert all ints with finger search from rear end // make sortseq empty for 2nd experiment startTime = used_time(); for(int num_exp = 0; num_exp < NUM_EXP; num_exp++) { S.clear(); // insert first element of L into S lit = L.first(); S.insert(L.contents(lit), 0); lit = L.succ(lit); // insert remaining elements of S into L while(lit != nil) { k = L.contents(lit); it = S.finger_locate_from_rear(k); if(it) S.insert_at(it, k, 0, LEDA::before); else S.insert_at(S.max_item(), k, 0, LEDA::after); lit = L.succ(lit); } } cout << "Time to insert with finger search from rear: " << used_time(startTime) << endl; // experiment 3: // insert all ints with finger search from both ends startTime = used_time(); for(int num_exp = 0; num_exp < NUM_EXP; num_exp++) { S.clear(); lit = L.first(); S.insert(L.contents(lit), 0); lit = L.succ(lit); while(lit != nil) { k = L.contents(lit); it = S.finger_locate(k); if(it) S.insert_at(it, k, 0, LEDA::before); else S.insert_at(S.max_item(), k, 0, LEDA::after); lit = L.succ(lit); } } cout << "Time to insert with finger search from both ends: " << used_time(startTime) <<" "<< endl; // experiment 4: // insert all ints with previous finger search startTime = used_time(); for(int num_exp = 0; num_exp < NUM_EXP; num_exp++) { S.clear(); lit = L.first(); it = S.insert(L.contents(lit), 0); lit = L.succ(lit); while(lit != nil) { k = L.contents(lit); it = S.finger_locate(it, k); // use it here as the finger it = ( it ? S.insert_at(it, k, 0, LEDA::before) : S.insert_at(S.max_item(), k, 0, LEDA::after) ); // again, store it for succeeding finger search lit = L.succ(lit); } } cout << "Time to insert with finger search from last insertion: " << used_time(startTime) <<" "<< endl; cout << endl; }
The first element in a sortseq
must always be inserted by an
insert()
because no item can be
available at this time yet. In the experiments 2, 3, and 4,
the first element is inserted intentionally
outside the loop in order not to have to make the test whether
the sortseq
is still empty in every
iteration; this would cause additional costs of O(n) and
thereby could distort the measurement results.
A test run on one of the machines available to the author produced the following times:
Time to insert without finger search: 26.989 Time to insert with finger search from rear: 23.463 Time to insert with finger search from both ends: 24.846 Time to insert with finger search from last insertion: 15.242
We see by this that it always pays off to use finger search. As expected, the version with parallel finger search from the beginning and the end is slower than the one that searches only from the end. Despite this parallel finger search taking twice the time as before, it is not fundamentally slower because the biggest part of the time is used up for inserting. The version memorizing a finger to the last inserted element and starting the next search from there is considerably faster than all other ones.
Exercise 39.  Finger search works conceptually as follows: Let us assume that the finger points to a container containing the key H and that we search for the key after the key K, see Figure 3.8. First finger search checks whether K lies on the left or on the right of H; K lie on the right, for example. Then the search walks up the search tree, starting from H and reaching the node v. There it checks whether K lies in one of the subtrees of v or of v's right neighbor on the same niveau. (This can be found out very fast because every node knows the maximum keys of its subtrees.) If this is the case, the search turns around at v or at v's right neighbor and searches for K in the respective subtree. Otherwise, it continues walking up to v's parent w and checks again whether K lies in a subtree of w or of the right neighbor of w; at the root at the latest, it turns around and searches to below. Work out why finger search takes only O(log(d)) steps if the key to be searched for lies in distance d from the finger. Hint: If the search does not turn around in v but in w on height h, what is an upper bound for the maximum number of keys in the subtree with root v'? 
Exercise 40.  We said above that the method

Exercise 41.  Explain why the statement it = ( it ? S.insert_at(it, k, 0, LEDA::before) : S.insert_at(S.max_item(), k, 0, LEDA::after) ); really should read exactly like it is in the above loop. 
^{[35] }The standard implementation
of a sortseq
is a skiplist. Here,
however, in favor of the comprehensibility of the description,
we want to start out from the assumption of a search
tree. Moreover, a search tree can be defined as the actual
implementation of a sortseq
by specifying
a corresponding implementation
parameter.
^{[36] }We do not want to discuss the details of the implementation of niveauconnected (a, b) trees here. It suffices to understand that one can search in such a tree efficiently in all directions.
^{[37] }An exact analysis that we want to abstain from here shows that this also holds although n is small at the beginning and then grows more and more.