Learning objectives
| Finger search |
The operations finger_locate(), finger_locate_from_front(), and finger_locate_from_rear() |
The operation insert_at() |
What one should not do in daily life, pointing to someone with a finger, may well show its advantages when searching in sorted sequences.
The search operations of sortseq
that we have discussed until now take
logarithmic time: If n containers are
stored in a sortseq, then lookup() and
locate() take O(log(n)) steps to
find the respective container.
This is due to the fact that
sortseq is implemented by a search tree that is lying over
the sorted sequence.[35]
To be more precise, it is a niveau-connected
(a,b)-tree: An (a,b) tree is a tree
in which every node has at least a and at most b children; a tree is
niveau-connected if the following holds true:
Every node knows its children and its parent,
every node knows the maximum key in each of its sub-trees (that is
the maximum contents of a leaf), and all nodes having the same
depth know their neighbors on this depth.[36]
Figure 3.8 shows such a tree: An ordinary search starts up in the root, and since the tree has a depth of O(log(n)) with n containers contained in it, it takes O(log(n)) steps to find the desired leaf (or the position right behind this leaf if the search is not successful).
Finger search, however, opens up the possibility of searching in sub-logarithmic time: This search method assumes that we know a certain container from which we start the search (instead of starting from the root), because we know that what we are searching for lies in the proximity of this container. An item that refers to this start container is called a finger. Fingers can be used for pointing to fields of high activity in a sorted sequence.
If there are d containers lying between the container from which the search starts and the one at which it ends, this search takes only O(log(d)) steps because it can walk up the tree lying over the sequence and use suitable shortcomes, see Exercise 39. If d is small compared to n, that is, if considerably more key-value pairs are contained in the sorted sequence than are lying between the start and end container of the finger search, then finger search is considerably faster than an ordinary search starting at the root. Figure 3.8 shows the difference between these two search methods.
Figure 3.8. Finger search

A finger search of beautiful Hanne to beautiful Karin. Hanne is pointed to with a finger. The search first walks up the tree, turns around relatively soon (since Karin lies in the distance of only d to Hanne) and therefore reaches Karin in O(log d) steps. In contrast, a search starting at the root would have taken O(log n) steps. Finger search pays off if d is small.
The class sortseq offers finger
search in form of the following methods: A call
seq_item it = S.finger_locate_from_front(k);
starts with an implicit finger pointing to the
first, leftmost container of the sequence and searches for the
container with the key k. In contrast,
seq_item it = S.finger_locate_from_rear(k);
starts from the last, rightmost container. If the container searched
for is at position d in the sorted sequence, these methods take time
O(log d) and O(log(n-d)), respectively. One should use these methods
if one knows that the container searched for lies near one of the two
end containers of the sequence (and then select the suitable method
depending on this end container). These methods are semantically equivalent to the
ordinary locate(), that is, they return an
item referring to the container with the smallest key k' that is
equal to or larger than the key k searched for, and
nil, if no such key exists.
One can even search much more efficiently if one
approximately knows the position at which the container searched
for lies in the sequence, that is, if one has an
explicit finger pointing to it. Such a
finger is an item finger_it that refers to a
container near the container searched for. A call
seq_item it = S.finger_locate(finger_it, k);
starts the finger search for the key k at the
container that finger_it refers to.
This method also is semantically equivalent to locate(),
but much faster under certain circumstances: This method takes only
O(log min(d,n-d)) steps if there are d containers lying between the
start and target container.
If, in contrast, we neither know an explicit finger, nor know whether the key k searched for lies near the beginning or the end of the sorted sequence, a call of
seq_item it = S.finger_locate(k);
may pay off. This method searches simultaneously from the beginning
and the end of the sequence, so it alternately executes one step of
finger_locate_from_front() and one step of
finger_locate_from_rear(). Therefore, its
running time is the minimum of the running
times of the latter two methods, asymptotically speaking; the actual running time is larger by the
factor of 2, though.
After a fast search we also want to insert
fast. For this purpose, sortseq keeps
ready the method insert_at(), which also
needs a finger; this finger tells it at which position it shall
insert a new container.
A call
seq_item new_it = S.insert_at(finger_it, k, i);
first checks whether the container that finger_it
refers to contains the key k. If this is the case, the information is
set to i in this container. Otherwise,
insert_at() inserts a new key-value pair
(k,i) behind or in front of the container (k',i') that
finger_it refers to. It is a precondition here that
k' be either the largest key smaller than k or the smallest key larger
than k. In the first case, (k,i) is put in order behind (k',i'), in
the second case in front of this pair. So in either case, the
insertion takes place next to (k',i') and an item
is returned that refers to the container just inserted.
Though, if it is already known that k is larger than k', that is, the insertion must take place behind (k',i'), a call of
seq_item new_it = S.insert_at(finger_it, k, i, LEDA::after);
is even more efficient. If, in contrast, it is known that k is smaller
than k', LEDA::before should be specified as the
direction of the insertion.
Every version of the method insert_at() takes only
expected time O(1).
Sorted sequences are frequently used for sorting; they are particularly efficient if the input is already presorted for the most part. Let us imagine for an experiment that numbers arrive in the input in the following order:
n-1, n-2, n-3, ..., n-f, 0, 1, 2, 3, ..., n-f-1
Therein, n be a very large integer number, for example 5,000,000, and f be a
small integer number, for example 50. We want to sort this input sequence of
5,000,000 elements, which is already presorted for the most part, by
successively inserting the elements into a sortseq.
As the first possibility, we use the normal
insert(), which starts off from the
root for every insertion. The total time for n insertions will be
O(n·log n), since every search takes O(log n).[37]
If we think a little bit about the shape of the
sequence, we see that the first f insertions all take place
at the beginning of the sortseq,
all other n-f (and this being the largest part) exactly in
front of the f-th element at the end of the
sortseq. So it would be wiser to start
with the search from the end of the
sortseq by means of
finger_locate_from_rear() and then,
after a finger has been found pointing to the insertion
position, to insert the new element by means of
insert_at() there. Here the overall
running time will be only O(n·log f) because searching from
the end will take only time O(log f). We get the same
asymptotic running time if we work with the version of
finger_locate() that searches from
the beginning and the end of the sequence in parallel.
The stroke of genius is coming now, though: If we look at the sequence even more closely, we notice that every insertion takes place exactly one position in front of or behind the last inserted element! It is therefore even wiser to memorize a finger on the last inserted container and to always start the finger search from there. We then need only time O(1) for searching, and altogether, we get a linear time algorithm: The complete running time is only O(n).
The following program executes these experiments. It
first creates a list that consists of the above number
sequence. Then it inserts every element of the list into a
sortseq by means of the methods
discussed above. The times are measured with the help of
used_time(). To level out measuring
errors, every experiment is executed 10 times.
#include <LEDA/sortseq.h>
#include <LEDA/list.h>
#include <LEDA/misc.h>
#include <iostream>
using leda::sortseq;
using leda::seq_item;
using leda::used_time;
using leda::list;
using std::cout;
using std::endl;
int main()
{
const int n = 5000000;
const int f = 50;
const int NUM_EXP = 10; // number of repeated experiments
cout << "n = " << n << " f = " << f;
cout << " # repeated experiments = " << NUM_EXP << endl;
sortseq<int,int> S;
list<int> L;
seq_item it;
float startTime;
// create list of n integers
for(int i = 1; i <= f; i++)
L.append(n-i);
for(int i = 0; i <= n - f - 1; i++)
L.append(i);
int k;
leda::list_item lit;
// experiment 1:
// insert all ints without finger search
startTime = used_time();
for(int num_exp = 0; num_exp < NUM_EXP; num_exp++) {
S.clear();
forall(k, L)
S.insert(k, 0);
}
cout << "Time to insert without finger search: "
<< used_time(startTime) << endl;
// experiment 2:
// insert all ints with finger search from rear end
// make sortseq empty for 2nd experiment
startTime = used_time();
for(int num_exp = 0; num_exp < NUM_EXP; num_exp++) {
S.clear();
// insert first element of L into S
lit = L.first();
S.insert(L.contents(lit), 0);
lit = L.succ(lit);
// insert remaining elements of S into L
while(lit != nil)
{
k = L.contents(lit);
it = S.finger_locate_from_rear(k);
if(it) S.insert_at(it, k, 0, LEDA::before);
else S.insert_at(S.max_item(), k, 0, LEDA::after);
lit = L.succ(lit);
}
}
cout << "Time to insert with finger search from rear: "
<< used_time(startTime) << endl;
// experiment 3:
// insert all ints with finger search from both ends
startTime = used_time();
for(int num_exp = 0; num_exp < NUM_EXP; num_exp++) {
S.clear();
lit = L.first();
S.insert(L.contents(lit), 0);
lit = L.succ(lit);
while(lit != nil)
{
k = L.contents(lit);
it = S.finger_locate(k);
if(it) S.insert_at(it, k, 0, LEDA::before);
else S.insert_at(S.max_item(), k, 0, LEDA::after);
lit = L.succ(lit);
}
}
cout << "Time to insert with finger search from both ends: "
<< used_time(startTime) <<" "<< endl;
// experiment 4:
// insert all ints with previous finger search
startTime = used_time();
for(int num_exp = 0; num_exp < NUM_EXP; num_exp++) {
S.clear();
lit = L.first();
it = S.insert(L.contents(lit), 0);
lit = L.succ(lit);
while(lit != nil)
{
k = L.contents(lit);
it = S.finger_locate(it, k); // use it here as the finger
it = ( it ? S.insert_at(it, k, 0, LEDA::before)
: S.insert_at(S.max_item(), k, 0, LEDA::after)
); // again, store it for succeeding finger search
lit = L.succ(lit);
}
}
cout << "Time to insert with finger search from last insertion: "
<< used_time(startTime) <<" "<< endl;
cout << endl;
}
The first element in a sortseq
must always be inserted by an
insert() because no item can be
available at this time yet. In the experiments 2, 3, and 4,
the first element is inserted intentionally
outside the loop in order not to have to make the test whether
the sortseq is still empty in every
iteration; this would cause additional costs of O(n) and
thereby could distort the measurement results.
A test run on one of the machines available to the author produced the following times:
Time to insert without finger search: 26.989 Time to insert with finger search from rear: 23.463 Time to insert with finger search from both ends: 24.846 Time to insert with finger search from last insertion: 15.242
We see by this that it always pays off to use finger search. As expected, the version with parallel finger search from the beginning and the end is slower than the one that searches only from the end. Despite this parallel finger search taking twice the time as before, it is not fundamentally slower because the biggest part of the time is used up for inserting. The version memorizing a finger to the last inserted element and starting the next search from there is considerably faster than all other ones.
Exercise 39. | Finger search works conceptually as follows: Let us assume that the finger points to a container containing the key H and that we search for the key after the key K, see Figure 3.8. First finger search checks whether K lies on the left or on the right of H; K lie on the right, for example. Then the search walks up the search tree, starting from H and reaching the node v. There it checks whether K lies in one of the subtrees of v or of v's right neighbor on the same niveau. (This can be found out very fast because every node knows the maximum keys of its subtrees.) If this is the case, the search turns around at v or at v's right neighbor and searches for K in the respective subtree. Otherwise, it continues walking up to v's parent w and checks again whether K lies in a subtree of w or of the right neighbor of w; at the root at the latest, it turns around and searches to below. Work out why finger search takes only O(log(d)) steps if the key to be searched for lies in distance d from the finger. Hint: If the search does not turn around in v but in w on height h, what is an upper bound for the maximum number of keys in the sub-tree with root v'? |
Exercise 40. | We said above that the method
|
Exercise 41. | Explain why the statement
it = ( it ? S.insert_at(it, k, 0, LEDA::before)
: S.insert_at(S.max_item(), k, 0, LEDA::after)
);really should read exactly like it is in the above loop. |
[35] The standard implementation
of a sortseq is a skiplist. Here,
however, in favor of the comprehensibility of the description,
we want to start out from the assumption of a search
tree. Moreover, a search tree can be defined as the actual
implementation of a sortseq by specifying
a corresponding implementation
parameter.
[36] We do not want to discuss the details of the implementation of niveau-connected (a, b) trees here. It suffices to understand that one can search in such a tree efficiently in all directions.
[37] An exact analysis that we want to abstain from here shows that this also holds although n is small at the beginning and then grows more and more.