### 5.2.2. Associating information to graphs

Learning objectives

 The classes `node_array` and `edge_array` The class `GRAPH` The classes `node_map` and `edge_map` Breadth first search with `BFS()`

As we already mentioned, it is necessary in most problem to store additional information with the nodes and/or the edges in order to access this information later starting from a certain node or a certain edge. Therefore LEDA offers several ways to do this, which we will got to know in the following.

#### Node arrays and edge arrays

We already introduced the classes `node_array` and `edge_array` and their usage briefly in Section 5.2.1. These classes allow us to associate additional information to the nodes and edges of a graph. In the following, we will talk about node arrays only; all that will be said holds accordingly for edge arrays.

The definitions

```node_array<I> A(G);
node_array<I> B(G, x);```

create two node arrays `A` and `B` that associate a value of type `I` to every node of the already constructed graph `G`. In the first case, a default value of type `I` is assigned to every node; in the second case, the explicitly denoted value `x` is assigned.

The value `v` that is associated to a node is accessed via the subscript operator, which, as always, returns a reference to the corresponding value:

`A[v] = y;`

Node arrays and edge arrays offer a very flexible way to associate information to the nodes and edges of a graph: On a given graph, one can define arbitrarily many such arrays at an arbitrary moment of the execution. Admittedly, the graph G has to be completely constructed at the time of the definition of these arrays. The node array is valid only for the nodes that are contained in G at this moment:

Warning If additional nodes (or edges) are added to a graph that a node array (or an edge array) is already defined on, the index set of this array is not augmented by these nodes (or edges)! A sequence of statements such as the following creates an error message: ```node_array dist(G); node v = G.new_node(); dist[v] = 42;``` Here, the use of a `GRAPH` or a `node_map` is more advantageous.

In some cases, one knows that the graph G will never contain more than n nodes and that this n is not very large. Then the constructor

`node_array<I> A(G, n, x);`

of `node_array` may be worthwhile: It allocates the space for n nodes altogether in the node array a priori, and associated the value x to every node. If m is the number of nodes in G at the time of this definition, n >= m must hold, and only n-m nodes altogether may be added to G without the node array becoming invalid.

As already said, arbitrarily many node arrays and edge arrays can be defined on a graph G. Instead of defining many different node arrays and edge arrays, the following alternative, which is applicable if the number of the arrays to be defined is known a priori, is more efficient: Provided that n_slots node arrays and e_slots edge arrays are needed, the constructor

`graph G(n_slots, e_slots);`

of the class `graph` creates a graph in which the internal structures for the representation of the nodes and edges contain space for additional entries (so-called “slots”) from n_slots node arrays and e_slots edge arrays. (We will not delve into the internal representation and the way of indirection here.) To use these additional entries for a certain node array `A`, this array has to be created as follows:

```node_array<I> A;
A.use_node_data(G, x);```

This assigns one of the slots to `A` and initializes all entries of the array to `x`. If no slot is available, `A` is created as an ordinary node array.

#### The class `GRAPH`

The next possibility for associating information to nodes or edges is offered by the class `GRAPH`. This class is suited in particular for dynamic graphs, that is, for graphs in which nodes and edges are added or deleted often and at unpredictable points in time.

The class `GRAPH` is derived, in the sense of C++, from the class `graph`; it realizes so-called parameterized graphs. Here “parameterized” means that a value of a certain information type `I1` and a value of a certain information type `I2` is associated to every node and every edge, respectively.

For example, a definition

`GRAPH<string,int> G;`

defines `G` as a parameterized graph in which a value of type `string` is associated to every node and a value of type `int` is associated to every edge.

A new node is created in `G` with

`G.new_node(s);`

With this, the information (here: the string) `s` is associated to the new node. Accordingly, an additional information `i` can be passed also to the method `new_edge()`; `i` is associated to the newly created edge:

`G.new_edge(v1, v2, i);`

The value associated to a node (an edge) can be accessed via a suitable node item `v` (a suitable edge item `e`) with the subscript operator:

```node v;
edge e;
//... v and e get their values ...
G[v] = "LOVE";
G[e] = 42;```

This data structure is fully dynamic, that is, information can be associated to the nodes and edges without restrictions, even if the graph grows or shrinks heavily. Furthermore, accessing the information via a node item or an edge item is a little more efficient than accessing it with a node array or an edge array. Only one piece of information, however, can be stored per node and per edge in a `GRAPH`:

Note If one wants to associate several variables to a node or to an edge, one can use a composite type as a user-defined type parameter. Another possibility is to use one or several additional `node_array`s and `edge_array`s - or `node_map`s and `edge_map`s, which we will discuss later.

As a working example we want to implement our search “From LOVE to PAIN” from Section “A working example: Breadth first search in a graph” once again. This time, however, we do not want to write the algorithm for breadth first search by ourselves, of course: As any other fundamental graph algorithm, also this one is built into LEDA. It is available by the function `BFS()`.

What do we have to do to use this built-in function `BFS()` properly in our problem? At first, we have to build the graph explicitly that the function has to search in. We did not do this in Section “A working example: Breadth first search in a graph”; there, we did not have a data structure for graphs yet, and we created the graph only implicitly during the program run by traversing it with breadth first search.

Again we read our list of 4-letter words from a file and create a node for every word read. This node has to know which word it carries. To associate this information, we use a `GRAPH<string,int>`. Here the type `int` of the edge information is not important - we dot not associate information to the edges, but we have to declare an information type; therefore we set all edge values to 0.

After we have created a node a for the word A, we look up whether there are nodes in the graph constructed so far that have to be neighbors to a. We find these nodes with the same method as in Section “A working example: Breadth first search in a graph”: We create all possibly adjacent words B by replacing the letters of A, and look up whether B already occurs in the graph. When we find such a node, we have to insert the corresponding edges to establish the neighborship relations. For every edge, the reverse edge has to be inserted, because the derivation relation of the word game is symmetric: If the word B is derivable from the word A, then also A is derivable from B. We find the node b that corresponds to the word B in the graph already constructed as follows: For every word, we have to memorize the node corresponding to this word; this is easily done with a `d_array<string,node>`. This `d_array` serves also for the foregoing test whether a certain word B occurs at all as a node in the graph.

After the graph `G` has been completely constructed, we let the user insert two words `from` and `to`, for which a derivation is to be searched. A suitable call

`BFS(G, start, dist, pred);`

executes a breadth first search in `G`, starting from the node `start`. The distance information that is calculated in this call for every visited node `v` is stored in the node array `dist`; the edge via that `v` is reached in the search, and which therefore lies on a shortest path from `start` to `v`, is stored in the node array `pred`. We use this node array `pred` at the end in order to walk along a shortest path from the node `to` back to the start node `from`.

Filename: SearchWordChainGraphBFS.C
LEDA users as of version 5.0: Note that header files are now partitioned into modules (subfolders)!
```#include <LEDA/graph.h>
#include <LEDA/d_array.h>
#include <LEDA/list.h>
#include <LEDA/basic_graph_alg.h>

#include <fstream>
#include <iostream>

using leda::GRAPH;
using leda::node;
using leda::edge;
using leda::node_array;
using leda::d_array;
using leda::string;
using leda::list;
using std::cout;
using std::cin;

int main()
{
std::ifstream is("Data/4LetterWords.txt");

GRAPH<string,int> G;
d_array <string,node> D;
string s;

// Read all words. Each word becomes a node of G
while (is >> s) {
node v = G.new_node(s);
D[s] = v; // Which node belongs to which word?

// Step through all 4 letters of s
for(int i = 0; i < 4; i++) {
// Step through all letters of the alphabet
for(char c = 'A' ; c <= 'Z'; c++) {
if(c == s[i]) continue;
// Candidate adjacent word: replace letter i
G.new_edge(D[s_adjacent], D[s], 0); // backward edge
}
}
}
}

// Let user input search words
string from, to;

cin >> from;
cout << "End with word: ";
cin >> to;

if(!D.defined(from) || !D.defined(to)) {
cout << "At least one of the words is not a part of the word list!\n";
exit(1);
}

// Call Breadth First Search on G
node_array<int> dist(G,-1);
node_array<edge> pred(G);
BFS(G, D[from], dist, pred);

// Output distance and path from <from> to <to>
int distance = dist[D[to]];
cout << "The distance from " << from << " to " << to;
cout << " is " << distance << "\n";

if(distance == -1) {
cout << "There is no way from " << from << " to " << to << ".\n";
}
else {
cout << "Here is a way to reach it:\n";
node v = D[to];
node start = D[from];
list<node> L;
L.push(v); // Output nodes in reverse order later
while(v != start) {
v = G.opposite(v, pred[v]);
// v = G.source(pred[v]); is also possible here
L.push(v);
}

int dcount = 0;
forall(v, L)
cout << dcount++ << " " << G.inf(v) << "\n";
}

}
```

The shortest path from LISP to JAVA reads as follows:

```Start with word: `LISP`
End with word: `JAVA`
The distance from LISP to JAVA is 5
Here is a way to reach it:
0 LISP
1 LIMP
2 LAMP
3 LAMA
4 LAVA
5 JAVA```

So short? Yes, after all, all Turing complete programming languages are equal...

The function `BFS()` requires a node array `node_array<int>` that is valid on `G` and whose values are all initialized to -1. Furthermore, it returns a list of the nodes that have been reached in the search; this return value is not used here.

We get the way from `to` back to `from` by following the references in `pred`. For an edge and a node that is the source (the target) of the edge, the method `opposite()` returns the opposite node, which is then the target (the source) of the edge; we will describe this function in more detail in Section 5.2.4.

To print the derivation steps in the correct order, we collect all words on the way back in a list and print this list in reversed order. (This is done implicitly here, by appending every word in front of the list and, at the end, outputting this list from the front to behind.)

For a node_item or an edge_item `it`, the method

`G.inf(it);`

returns a copy of the value associated to the corresponding node or edge. In contrast to the operator `[]`, the value cannot be changed by this.

#### Node maps and edge maps

There is a third possibility to provide graphs with information: Node maps and edge maps are a fully dynamic alternative to node arrays and edge arrays. They are offered by LEDA in the form of the classes `node_map` and `edge_map`.

The definitions

```node_map<I> A(G);
node_map<I> B(G,x);```

define two node maps on the graph `G`. In the first definition, the default value of type `I` is associated to every node (in the node map `A`); in the second definition, the explicitly denoted value `x` is associated (in the node map `B`). Edge maps are defined accordingly.

As with node arrays and edge arrays, the information associated to a node or an edge is accessed with the subscript operator `[]`.

Node maps and edge maps use hashing to associate information to nodes and edges. The definition of a node map and an edge map takes constant time (in contrast to O(n) and O(m) for node arrays and edge arrays on a graph with n nodes and m edges, respectively), and accessing the information associated to a node or an edge has constant costs (as the expectation value).

As with node arrays and edge arrays, if the number of node maps and edge maps to be defined is known a priori, there is the possibility to use one of slots for the node values created in the constructor of the graph:

`graph G(n_slots, e_slots);`

Such a slot can be used by the following definitions:

```node_map<I> A;
A.use_node_data(G, x);```

By the way, the classes `node_array` and `node_map` share these slots and act exactly the same when using one of these slots with `use_node_data()`.

#### A comparison of the possibilities

When shall one use which of the possibilities presented in this section? Table 5.1 tries to give some hints. Therein, n denotes the number of nodes in a graph. The table shows only the possibilities to associate information to a node; for the edges the corresponding statements hold with the corresponding types.

The main criterion is the question whether the graph is dynamic and whether it therefore will change after the definition of the respective structure, or whether it is static. Only in the latter case, node arrays and edge arrays are an option.

If the graph is static, the question is important to how many nodes and edges information shall be associated. If these are only a few, a node map or an edge map takes only little disk space and should therefore be preferred. If, in contrast, information is to be stored for more than half of the nodes or edges, node maps and edge maps are worthwhile.

Table 5.1. 5 possibilities to associate information to a graph

 `node_array` `G(n_slots,e_slots)` with `A.use_node_data(G,x)` `node_array A(graph G, int n, I x)` `GRAPH` `node_map` Fully dynamic? no no only for n - |V| further calls of `new_node()` yes yes Number of possible definitions on a given graph arbitrarily many n_slots arbitrarily many 1 arbitrarily many Access time O(1), fast O(1), very fast to fast O(1), fast O(1), very fast O(1) expected Time for initialization O(n) O(n · n_slots) O(n) O(n) O(1)

#### Further information

The manual pages for `node_array`, `edge_array`, `GRAPH`, `node_map`, and `edge_map` contain further information about the respective classes.

#### Exercises

 Exercise 50. Replace the class `GRAPH` by a combination of `graph` and `node_array` in the above program. Then replace it by a combination of `graph` and `node_map`.