3.1. Dictionaries (class dictionary)

Learning objectives

The notions “dictionary type” and “associative container type
The class dictionary

A dictionary in the classic sense, such as an English-German dictionary, stores pairs of information: To an English word a German word is associated. (We ignore the fact that, as a rule, several words are associated to one word because a word has several meanings.) Starting from the English word, the key, the associated German word, the value, can be retrieved easily. This is due to the fact that the keys are sorted alphabetically in a dictionary and that intuitively, every human being masters a version of binary search refined to 26 sub-cases.

As the example shows, in daily life, one must often associate some value information to some key information and find the value again via the key as fast as possible. Of course this problem definition appears also in algorithmics; there, data structures that store such key-value pairs and allow fast access to the associated value via the key are denoted as dictionary types or associative container types.

LEDA provides several data types for this. The first one that we will get to know is called dictionary itself for historical reasons, although this is actually the generic term for all these structures.

Figure 3.1 shows an English-German dictionary, in which 4 key-value pairs are stored. Every key occurs only once therein; values, however, may occur multiply (associated to different keys).

Figure 3.1. An English-German dictionary

An English-German dictionary


The following program builds up this dictionary and allows the user to interactively find the corresponding German translation for English words.

Filename: EnglishGermanDict.C
LEDA users as of version 5.0: Note that header files are now partitioned into modules (subfolders)!
#include <LEDA/dictionary.h>
#include <iostream>

using leda::dictionary;
using leda::dic_item;
using leda::string;
using std::cin;
using std::cout;

int main() 
{
  dictionary<string, string> D;

  // insert some key-value-pairs
  D.insert("work", "Arbeit");
  D.insert("cat", "Katze");
  D.insert("go", "gehen");
  D.insert("francais", "franz�sich");
  D.insert("job", "Arbeit");

  // delete a pair with a given key
  D.del("francais");

  cout << "There are " << D.size() << " pairs in the dictionary.\n\n";

  string english_word;
  dic_item it;

  while(1) {
    cout << "Enter an English word, STOP to quit: ";
    cin >> english_word;
    if(english_word == "STOP") { break; }

    // look up a pair by a given key
    it = D.lookup(english_word);
    if(it) {
      string german_word = D.inf(it);
      cout << english_word << " = " << german_word << "\n";
    }
    else cout << english_word << " not in dictionary.\n";
  }

  // change the value of a key in D
  it = D.lookup("job");
  D.change_inf(it, "Stelle");
  // this is equivalent to
  D[it] = "Stelle";

  // iterate over all items
  cout << "\nThe dictionary contains the following pairs:\n";
  forall_items(it, D) 
    cout << D.key(it) << " = " <<  D.inf(it) << "\n";
}

Here is the input and output of an example run:

There are 4 pairs in the dictionary.

Enter an English word, STOP to quit: cat
cat = Katze
Enter an English word, STOP to quit: dog
dog not in dictionary.
Enter an English word, STOP to quit: francais
francais not in dictionary.
Enter an English word, STOP to quit: job
job = Arbeit
Enter an English word, STOP to quit: STOP

The dictionary contains the following pairs:
cat = Katze
go = gehen
job = Stelle
work = Arbeit

At first the program creates a dictionary D that stores pairs of strings. As in the case of the simple container types, the type of the objects to be stored is defined by template parameters.

Inserting a pair is carried out by a call

D.insert(k, v);

Every key k may be contained in the dictionary only once. If a pair (k,v) is inserted although k already occurs as the key of a pair (k,w), then the previous value w of k is replaced by v.

In its implementation, the class dictionary also follows the item container concept, which we got to know in Section 2.3.2: A direct access to the key-value pairs is not possible; every access is carried out via an item that refers to the respective key-value container. (Unlike the containers of the simple container types, the containers include two entries or attributes now: a key and a value.) The item type is called dic_item here.

Every insert() returns an item that can be stored for a later access to the pair just inserted. (We do not do this here.)

The method

D.del(k);

deletes a pair belonging to a key from the dictionary.

D.size();

returns the number of pairs contained in the dictionary.

it = D.lookup(k);

is probably the most important method of dictionary. With a key k passed in, it returns an item referring to the corresponding key-value pair. If the key is not existent, it returns nil. This operation is - like insert() and delete() - executed very fast even with a very large number of pairs already contained in the dictionary. This is due to the implementation of a dictionary as a balanced tree: With n pairs stored, these operations take only time O(log(n)).

By

D.change_inf(it, new_v);

the value belonging to a key can be changed to a new value new_v via a corresponding item it. (For historical reasons the values are denoted as “information” in the LEDA nomenclature.)

For an item it, the subscript operator

D[it];

returns a reference to the value of the corresponding pair; one can also write-access the value with that.

The macro

forall_items(it, D) { ... }

iterates over all pairs contained in the dictionary; successively for every pair, it assigns to the variable it an item referring to the respective pair. With this item, all keys and values of the dictionary can be accessed one after another by means of the methods

D.key(it);

and

D.inf(it);

respectively.

Order of the pairs in an associative container and requirements on the key type

Are the pairs in an associative container stored in a certain order and can they be traversed in this order? One could imagine here the order of the points in time of their insertion or an order that suggests itself by a linear order of the key type, for example the lexicographical order of the type string.

Let us have a look at the above output of the forall_items loop: Obviously, the pairs are traversed in the lexicographical order of their key type (here the type string), not in the order of their insertion. This is due to the implementation of the class dictionary as a balanced (search) tree: The key type must be an ordered type, cf. Section 2.8.3.

Therefore not every type can be used as the key of the class dictionary, only types that are ordered according to a function compare(). The other associative container types of LEDA are also subject to such restrictions due to their implementation: In general, not every type can be used as a key. In case of doubt, the appropriate manual page should be consulted.

The keys of a dictionary reside in the leaves of the search tree, ordered from left to right in accordance with the linear order. Hence the forall_items loop always passes through the keys in the order that is given according to the linear order.

Implementation

The class dictionary is implemented by (2,4)-trees. If n is the number of pairs in a dictionary, the methods insert(), lookup(), del_item(), and del() take time O(log n); a clear() takes time O(n), all other methods take only constant time. The space consumption is O(n).

Further information

More about the class dictionary is found on the corresponding manual page. For example, starting from a key, a copy of the associated value can be requested directly by the method access(), del_item() deletes the pair belonging to an item from the dictionary, and clear() deletes all pairs from the dictionary.

Exercises

Exercise 24.

Instead of a dictionary, one could also use two parallel arrays for storing pairs , that is, if a key k is at position i in the first array, the associated value v is also at position i in the second array. Why is a dictionary generally superior to this structure?

Exercise 25.

Write a function that, with a dictionary passed in as the argument, outputs all values v occurring in this dictionary, with every v followed by all keys having this value v. Every value v shall be output only once. (A value definitely can occur repeatedly in a dictionary, associated to different keys, but each key can occur only once!)

Find out whether the mapping key → value is invertible. If it is, create a second dictionary that implements this inversion. (One also wants to find an English word starting from a German word.)

What do you do if the mapping is not invertible?




Imprint-Dataprotection