5.2.2. Associating information to graphs

Learning objectives

The classes node_array and edge_array
The class GRAPH
The classes node_map and edge_map
Breadth first search with BFS()

As we already mentioned, it is necessary in most problem to store additional information with the nodes and/or the edges in order to access this information later starting from a certain node or a certain edge. Therefore LEDA offers several ways to do this, which we will got to know in the following.

Node arrays and edge arrays

We already introduced the classes node_array and edge_array and their usage briefly in Section 5.2.1. These classes allow us to associate additional information to the nodes and edges of a graph. In the following, we will talk about node arrays only; all that will be said holds accordingly for edge arrays.

The definitions

node_array<I> A(G);
node_array<I> B(G, x);

create two node arrays A and B that associate a value of type I to every node of the already constructed graph G. In the first case, a default value of type I is assigned to every node; in the second case, the explicitly denoted value x is assigned.

The value v that is associated to a node is accessed via the subscript operator, which, as always, returns a reference to the corresponding value:

A[v] = y;

Node arrays and edge arrays offer a very flexible way to associate information to the nodes and edges of a graph: On a given graph, one can define arbitrarily many such arrays at an arbitrary moment of the execution. Admittedly, the graph G has to be completely constructed at the time of the definition of these arrays. The node array is valid only for the nodes that are contained in G at this moment:

[Warning]Warning

If additional nodes (or edges) are added to a graph that a node array (or an edge array) is already defined on, the index set of this array is not augmented by these nodes (or edges)! A sequence of statements such as the following creates an error message:

node_array<int> dist(G);
node v = G.new_node();
dist[v] = 42;

Here, the use of a GRAPH or a node_map is more advantageous.

In some cases, one knows that the graph G will never contain more than n nodes and that this n is not very large. Then the constructor

node_array<I> A(G, n, x);

of node_array may be worthwhile: It allocates the space for n nodes altogether in the node array a priori, and associated the value x to every node. If m is the number of nodes in G at the time of this definition, n >= m must hold, and only n-m nodes altogether may be added to G without the node array becoming invalid.

As already said, arbitrarily many node arrays and edge arrays can be defined on a graph G. Instead of defining many different node arrays and edge arrays, the following alternative, which is applicable if the number of the arrays to be defined is known a priori, is more efficient: Provided that n_slots node arrays and e_slots edge arrays are needed, the constructor

graph G(n_slots, e_slots);

of the class graph creates a graph in which the internal structures for the representation of the nodes and edges contain space for additional entries (so-called “slots”) from n_slots node arrays and e_slots edge arrays. (We will not delve into the internal representation and the way of indirection here.) To use these additional entries for a certain node array A, this array has to be created as follows:

node_array<I> A;
A.use_node_data(G, x);

This assigns one of the slots to A and initializes all entries of the array to x. If no slot is available, A is created as an ordinary node array.

The class GRAPH

The next possibility for associating information to nodes or edges is offered by the class GRAPH. This class is suited in particular for dynamic graphs, that is, for graphs in which nodes and edges are added or deleted often and at unpredictable points in time.

The class GRAPH is derived, in the sense of C++, from the class graph; it realizes so-called parameterized graphs. Here “parameterized” means that a value of a certain information type I1 and a value of a certain information type I2 is associated to every node and every edge, respectively.

For example, a definition

GRAPH<string,int> G;

defines G as a parameterized graph in which a value of type string is associated to every node and a value of type int is associated to every edge.

A new node is created in G with

G.new_node(s);

With this, the information (here: the string) s is associated to the new node. Accordingly, an additional information i can be passed also to the method new_edge(); i is associated to the newly created edge:

G.new_edge(v1, v2, i);

The value associated to a node (an edge) can be accessed via a suitable node item v (a suitable edge item e) with the subscript operator:

node v;
edge e;
//... v and e get their values ...
G[v] = "LOVE";
G[e] = 42;

This data structure is fully dynamic, that is, information can be associated to the nodes and edges without restrictions, even if the graph grows or shrinks heavily. Furthermore, accessing the information via a node item or an edge item is a little more efficient than accessing it with a node array or an edge array. Only one piece of information, however, can be stored per node and per edge in a GRAPH:

[Note]Note

If one wants to associate several variables to a node or to an edge, one can use a composite type as a user-defined type parameter. Another possibility is to use one or several additional node_arrays and edge_arrays - or node_maps and edge_maps, which we will discuss later.

As a working example we want to implement our search “From LOVE to PAIN” from Section “A working example: Breadth first search in a graph” once again. This time, however, we do not want to write the algorithm for breadth first search by ourselves, of course: As any other fundamental graph algorithm, also this one is built into LEDA. It is available by the function BFS().

What do we have to do to use this built-in function BFS() properly in our problem? At first, we have to build the graph explicitly that the function has to search in. We did not do this in Section “A working example: Breadth first search in a graph”; there, we did not have a data structure for graphs yet, and we created the graph only implicitly during the program run by traversing it with breadth first search.

Again we read our list of 4-letter words from a file and create a node for every word read. This node has to know which word it carries. To associate this information, we use a GRAPH<string,int>. Here the type int of the edge information is not important - we dot not associate information to the edges, but we have to declare an information type; therefore we set all edge values to 0.

After we have created a node a for the word A, we look up whether there are nodes in the graph constructed so far that have to be neighbors to a. We find these nodes with the same method as in Section “A working example: Breadth first search in a graph”: We create all possibly adjacent words B by replacing the letters of A, and look up whether B already occurs in the graph. When we find such a node, we have to insert the corresponding edges to establish the neighborship relations. For every edge, the reverse edge has to be inserted, because the derivation relation of the word game is symmetric: If the word B is derivable from the word A, then also A is derivable from B. We find the node b that corresponds to the word B in the graph already constructed as follows: For every word, we have to memorize the node corresponding to this word; this is easily done with a d_array<string,node>. This d_array serves also for the foregoing test whether a certain word B occurs at all as a node in the graph.

After the graph G has been completely constructed, we let the user insert two words from and to, for which a derivation is to be searched. A suitable call

BFS(G, start, dist, pred);

executes a breadth first search in G, starting from the node start. The distance information that is calculated in this call for every visited node v is stored in the node array dist; the edge via that v is reached in the search, and which therefore lies on a shortest path from start to v, is stored in the node array pred. We use this node array pred at the end in order to walk along a shortest path from the node to back to the start node from.

Filename: SearchWordChainGraphBFS.C
LEDA users as of version 5.0: Note that header files are now partitioned into modules (subfolders)!
#include <LEDA/graph.h>
#include <LEDA/d_array.h>
#include <LEDA/list.h>
#include <LEDA/basic_graph_alg.h>

#include <fstream>
#include <iostream>

using leda::GRAPH;
using leda::node;
using leda::edge;
using leda::node_array;
using leda::d_array;
using leda::string;
using leda::list;
using std::cout;
using std::cin;

int main()
{
  std::ifstream is("Data/4LetterWords.txt");

  
  GRAPH<string,int> G;
  d_array <string,node> D;
  string s;

  // Read all words. Each word becomes a node of G
  while (is >> s) {
    node v = G.new_node(s);
    D[s] = v; // Which node belongs to which word?

    // Find all adjacent words and record adjacency in G
    // Step through all 4 letters of s
    for(int i = 0; i < 4; i++) {
      string s_adjacent = s;
      // Step through all letters of the alphabet
      for(char c = 'A' ; c <= 'Z'; c++) {
        if(c == s[i]) continue;
        // Candidate adjacent word: replace letter i
        s_adjacent[i] = c;
        if(D.defined(s_adjacent)) {
          // Adjacent word found; record adjacency by edges
          G.new_edge(D[s], D[s_adjacent], 0);
          G.new_edge(D[s_adjacent], D[s], 0); // backward edge
        }       
      }
    }
  }

  // Let user input search words
  string from, to;
  
  cout << "Start with word: ";
  cin >> from;
  cout << "End with word: ";
  cin >> to;
  
  if(!D.defined(from) || !D.defined(to)) {
    cout << "At least one of the words is not a part of the word list!\n";
    exit(1);
  }

  // Call Breadth First Search on G
  node_array<int> dist(G,-1);
  node_array<edge> pred(G);
  BFS(G, D[from], dist, pred);

  // Output distance and path from <from> to <to>
  int distance = dist[D[to]];
  cout << "The distance from " << from << " to " << to;
  cout << " is " << distance << "\n";

  if(distance == -1) {
    cout << "There is no way from " << from << " to " << to << ".\n";
  } 
  else {
    cout << "Here is a way to reach it:\n";
    node v = D[to];
    node start = D[from];
    list<node> L;
    L.push(v); // Output nodes in reverse order later
    while(v != start) {
      v = G.opposite(v, pred[v]);
      // v = G.source(pred[v]); is also possible here
      L.push(v);
    }
    
    int dcount = 0; 
    forall(v, L)
      cout << dcount++ << " " << G.inf(v) << "\n";
  }  

}

The shortest path from LISP to JAVA reads as follows:

Start with word: LISP
End with word: JAVA
The distance from LISP to JAVA is 5
Here is a way to reach it:
0 LISP
1 LIMP
2 LAMP
3 LAMA
4 LAVA
5 JAVA

So short? Yes, after all, all Turing complete programming languages are equal...

The function BFS() requires a node array node_array<int> that is valid on G and whose values are all initialized to -1. Furthermore, it returns a list of the nodes that have been reached in the search; this return value is not used here.

We get the way from to back to from by following the references in pred. For an edge and a node that is the source (the target) of the edge, the method opposite() returns the opposite node, which is then the target (the source) of the edge; we will describe this function in more detail in Section 5.2.4.

To print the derivation steps in the correct order, we collect all words on the way back in a list and print this list in reversed order. (This is done implicitly here, by appending every word in front of the list and, at the end, outputting this list from the front to behind.)

For a node_item or an edge_item it, the method

G.inf(it);

returns a copy of the value associated to the corresponding node or edge. In contrast to the operator [], the value cannot be changed by this.

Node maps and edge maps

There is a third possibility to provide graphs with information: Node maps and edge maps are a fully dynamic alternative to node arrays and edge arrays. They are offered by LEDA in the form of the classes node_map and edge_map.

The definitions

node_map<I> A(G);
node_map<I> B(G,x);

define two node maps on the graph G. In the first definition, the default value of type I is associated to every node (in the node map A); in the second definition, the explicitly denoted value x is associated (in the node map B). Edge maps are defined accordingly.

As with node arrays and edge arrays, the information associated to a node or an edge is accessed with the subscript operator [].

Node maps and edge maps use hashing to associate information to nodes and edges. The definition of a node map and an edge map takes constant time (in contrast to O(n) and O(m) for node arrays and edge arrays on a graph with n nodes and m edges, respectively), and accessing the information associated to a node or an edge has constant costs (as the expectation value).

As with node arrays and edge arrays, if the number of node maps and edge maps to be defined is known a priori, there is the possibility to use one of slots for the node values created in the constructor of the graph:

graph G(n_slots, e_slots);

Such a slot can be used by the following definitions:

node_map<I> A;
A.use_node_data(G, x);

By the way, the classes node_array and node_map share these slots and act exactly the same when using one of these slots with use_node_data().

A comparison of the possibilities

When shall one use which of the possibilities presented in this section? Table 5.1 tries to give some hints. Therein, n denotes the number of nodes in a graph. The table shows only the possibilities to associate information to a node; for the edges the corresponding statements hold with the corresponding types.

The main criterion is the question whether the graph is dynamic and whether it therefore will change after the definition of the respective structure, or whether it is static. Only in the latter case, node arrays and edge arrays are an option.

If the graph is static, the question is important to how many nodes and edges information shall be associated. If these are only a few, a node map or an edge map takes only little disk space and should therefore be preferred. If, in contrast, information is to be stored for more than half of the nodes or edges, node maps and edge maps are worthwhile.

Table 5.1. 5 possibilities to associate information to a graph

 node_arrayG(n_slots,e_slots) with A.use_node_data(G,x)node_array<I> A(graph G, int n, I x)GRAPHnode_map
Fully dynamic?nonoonly for n - |V| further calls of new_node()yesyes
Number of possible definitions on a given grapharbitrarily manyn_slotsarbitrarily many1arbitrarily many
Access timeO(1), fastO(1), very fast to fastO(1), fastO(1), very fastO(1) expected
Time for initializationO(n)O(n · n_slots)O(n)O(n)O(1)

Further information

The manual pages for node_array, edge_array, GRAPH, node_map, and edge_map contain further information about the respective classes.

Exercises

Exercise 50.

Replace the class GRAPH by a combination of graph and node_array in the above program. Then replace it by a combination of graph and node_map.




Imprint-Dataprotection