Learning objectives
| The notions “dictionary type” and “associative container type” |
The class dictionary |
A dictionary in the classic sense, such as an English-German dictionary, stores pairs of information: To an English word a German word is associated. (We ignore the fact that, as a rule, several words are associated to one word because a word has several meanings.) Starting from the English word, the key, the associated German word, the value, can be retrieved easily. This is due to the fact that the keys are sorted alphabetically in a dictionary and that intuitively, every human being masters a version of binary search refined to 26 sub-cases.
As the example shows, in daily life, one must often associate some value information to some key information and find the value again via the key as fast as possible. Of course this problem definition appears also in algorithmics; there, data structures that store such key-value pairs and allow fast access to the associated value via the key are denoted as dictionary types or associative container types.
LEDA provides several data types for this. The first one
that we will get to know is called dictionary
itself for historical reasons, although this is actually the
generic term for all these structures.
Figure 3.1 shows an English-German dictionary, in which 4 key-value pairs are stored. Every key occurs only once therein; values, however, may occur multiply (associated to different keys).
The following program builds up this dictionary and allows the user to interactively find the corresponding German translation for English words.
#include <LEDA/dictionary.h>
#include <iostream>
using leda::dictionary;
using leda::dic_item;
using leda::string;
using std::cin;
using std::cout;
int main()
{
dictionary<string, string> D;
// insert some key-value-pairs
D.insert("work", "Arbeit");
D.insert("cat", "Katze");
D.insert("go", "gehen");
D.insert("francais", "franz�sich");
D.insert("job", "Arbeit");
// delete a pair with a given key
D.del("francais");
cout << "There are " << D.size() << " pairs in the dictionary.\n\n";
string english_word;
dic_item it;
while(1) {
cout << "Enter an English word, STOP to quit: ";
cin >> english_word;
if(english_word == "STOP") { break; }
// look up a pair by a given key
it = D.lookup(english_word);
if(it) {
string german_word = D.inf(it);
cout << english_word << " = " << german_word << "\n";
}
else cout << english_word << " not in dictionary.\n";
}
// change the value of a key in D
it = D.lookup("job");
D.change_inf(it, "Stelle");
// this is equivalent to
D[it] = "Stelle";
// iterate over all items
cout << "\nThe dictionary contains the following pairs:\n";
forall_items(it, D)
cout << D.key(it) << " = " << D.inf(it) << "\n";
}
Here is the input and output of an example run:
There are 4 pairs in the dictionary. Enter an English word, STOP to quit:catcat = Katze Enter an English word, STOP to quit:dogdog not in dictionary. Enter an English word, STOP to quit:francaisfrancais not in dictionary. Enter an English word, STOP to quit:jobjob = Arbeit Enter an English word, STOP to quit:STOPThe dictionary contains the following pairs: cat = Katze go = gehen job = Stelle work = Arbeit
At first the program creates a dictionary
D that stores pairs of strings. As in the case
of the simple container types, the type of the objects to be
stored is defined by template parameters.
Inserting a pair is carried out by a call
D.insert(k, v);
Every key k may be contained in the dictionary only once. If a pair (k,v) is inserted although k already occurs as the key of a pair (k,w), then the previous value w of k is replaced by v.
In its implementation, the class
dictionary also follows the item container
concept, which we got to know in Section 2.3.2: A direct access to the
key-value pairs is not possible; every access is carried out via
an item that refers to the respective key-value
container. (Unlike the containers of the simple container types,
the containers include two entries or attributes now: a key and a
value.) The item type is called
dic_item here.
Every insert() returns an item that
can be stored for a later access to the pair just inserted. (We do
not do this here.)
The method
D.del(k);
deletes a pair belonging to a key from the dictionary.
D.size();
returns the number of pairs contained in the dictionary.
it = D.lookup(k);
is probably the most important method of
dictionary. With a key k
passed in, it returns an item referring to the corresponding key-value
pair. If the key is not existent, it returns
nil. This operation is - like
insert() and delete() -
executed very fast even with a very large number of pairs already
contained in the dictionary. This is due to the implementation of a
dictionary as a balanced
tree: With n
pairs stored, these operations take only time O(log(n)).
By
D.change_inf(it, new_v);
the value belonging to a key can be changed to a new value
new_v via a corresponding item
it. (For historical reasons the values are denoted
as “information” in the LEDA nomenclature.)
For an item it, the subscript operator
D[it];
returns a reference to the value of the corresponding pair; one can also write-access the value with that.
The macro
forall_items(it, D) { ... }
iterates over all pairs contained in the dictionary; successively for
every pair, it assigns to the variable it an item
referring to the respective pair. With this item, all keys and values
of the dictionary can be accessed one after another by means of the
methods
D.key(it);
D.inf(it);
Are the pairs in an associative container stored in a
certain order and can they be traversed in this order? One could
imagine here the order of the points in time of their insertion or
an order that suggests itself by a linear order of the key type,
for example the lexicographical order of the type
string.
Let us have a look at the above output of the
forall_items loop: Obviously, the pairs are
traversed in the lexicographical order of their key type (here
the type string),
not in the order of their insertion. This
is due to the implementation of the class
dictionary as a balanced (search) tree:
The key type must be an ordered type, cf. Section 2.8.3.
Therefore not every type can be used as the key of the
class dictionary, only types
that are ordered according to a function
compare(). The other associative container
types of LEDA are also subject to such restrictions due to their
implementation: In general, not every type can be used as a
key. In case of doubt, the appropriate manual page should be
consulted.
The keys of a dictionary reside
in the leaves of the search tree, ordered from left to right in
accordance with the linear order. Hence the
forall_items loop always passes through the
keys in the order that is given according to the linear order.
The class dictionary is implemented
by (2,4)-trees. If n is the number of pairs in a
dictionary, the methods
insert(),
lookup(),
del_item(), and
del() take time O(log n); a
clear() takes time O(n), all other
methods take only constant time. The space consumption is O(n).
More about the class dictionary is
found on the corresponding manual page.
For example, starting from a key, a copy of the associated value
can be requested directly by the method
access(),
del_item() deletes the pair belonging
to an item from the dictionary, and
clear() deletes all pairs from the
dictionary.