2.3.5. User-defined type parameters

Learning objectives

Type parameters
Self-defined comparison functions
The LEDA rule “Requirements for type parameters

The container classes of LEDA are parameterized by an element type. For example, we can store elements of an arbitrary type T in lists. Until now, we have stored only objects of type int or string therein. However, any class for which a certain small set of methods and operators is defined can be used as a type parameter! In other words: We can put everything we want in lists provided that we implement some functions the class list expects from their type parameter.

Which methods and operators are required? It must be possible to create an object of type T and to initialize it with the default value of the type (default constructor) or with a copy of an already existing object (copy constructor). An object must be able to get the value of another object by assignment (assignment operator). Finally, an object must be in the position to be written to an output stream (operator >>) and to be read from an input stream (operator <<). So the following methods and operators are necessary:

T::T();
T::T(const T&);
T& T::operator = (const T&);
istream& operator >> (istream&, T&);
ostream& operator << (ostream&, const T&);

These requirements are summarized in the LEDA rule “Requirements for type parameters”.

An example: Lists of string pairs in the computation of anagrams

And now for something completely different to be put into a list than only these boring integer numbers over and over again. What about a pair of strings?

A problem definition in whose solution such a type parameter occurs is the following one: Based on a dictionary file, we want to find as many anagrams as possible. A word is an anagram of another word if it consists of exactly the same letters, like for example the words “PYTHONIST” and “HYPNOTIST”. This has nothing to do with the fact that the author does not really like C++ and rather feels magically attracted by elegant, lean scripting languages.[17] The program introduced in this section has found these words as anagrams from each other;

We can find anagrams with the following algorithm: We need a dictionary file with as many words as possible of the English language.[18] It may start like this:

AA
AAH
AAHED
AAHING
AAHS
AAL
AALII
AALIIS
AALS
AARDVARK
AARDVARKS
AARDWOLF
AARDWOLVES
...

For every word, we calculate the signature; this is the sorted order of the letters. After this we sort the words according to their signature. Thereby, words with an equal signature are placed side by side; they are anagrams of each other. Thus, sorting serves here to form equivalence classes; we have already mentioned this formation of equivalence classes in Section 2.3.4.

To manage the pairs, each consisting of a word and its respective signature, in a list, we write a class StringPair. This is a compound type that must fulfill the regulations of the LEDA rule “Requirements for type parameters”. So it has a default constructor, a copy constructor and an assignment operator. In addition the input and output operator are overloaded in such a way that objects of type StringPair can be read in or output therewith. To make fast access to the private data members word and signature possible without a detour via the get-methods, we make these operators a friend of the class. With the so defined methods and operators we can use the class StringPair as a type parameter of a list.

We read all words from the file (piped in from standard input), calculate their signature, and store the word-signature pairs in a list<StringPair>. After that, we sort the list in accordance with a self-defined comparison function that compares objects of the type StringPair lexicographically by means of their signature value.

Filename: AnagramsTypeParameter.C
LEDA users as of version 5.0: Note that header files are now partitioned into modules (subfolders)!
#include <LEDA/array.h>
#include <LEDA/list.h>
#include <iostream>

using leda::string;
using leda::array;
using leda::list;
using leda::list_item;


class StringPair {
private:
  string word;
  string signature;

public:
  // 1) Default constructor
  StringPair() : word(), signature() {} 

  // 2) Copy constructor
  StringPair(const StringPair& p) 
    : word(p.word), signature(p.signature) {}

  // 3) Assignment operator
  StringPair& operator = (const StringPair& p) {
       if(this != &p) { word = p.word; signature = p.signature; }
       return *this;
       }

  // 4) Input operator
  friend std::istream& operator >> (std::istream& is, StringPair& p) 
    { is >> p.word >> p.signature; return is;}

  // 5) Output operator
  friend std::ostream& operator << 
    (std::ostream& os, const StringPair& p) 
  { os << p.word << " " << p.signature; return os;}

  // Constructor taking two string arguments
  StringPair(const string& s1, const string& s2) 
    : word(s1), signature(s2) {}

  // Access private data members
  const string& get_word() const { return word; }
  const string& get_signature() const { return signature; }
};


string compute_signature(const string& s) 
{
  int num_chars = s.length();
  array<char> A(num_chars);

  for(int i = 0; i < num_chars; i++)
    A[i]=s[i];

  A.sort();

  string result = s; // Reference counting, see sect. Strings
  for(int i = 0; i < num_chars; i++)
    result[i] = A[i]; // First assignment decouples

  return result;
}


// User defined ordering of type StringPair
int cmp_StringPair(const StringPair& p1, const StringPair& p2) {
  if(p1.get_signature() < p2.get_signature()) return -1;
  else if(p1.get_signature() > p2.get_signature()) return 1;
  return 0;
}


int main()
{
  string s;
  list<StringPair> L;

  // Read all words
  while(std::cin >> s)  {
    StringPair p(s, compute_signature(s));
    L.append(p);
  }

  // Sort according to user defined ordering
  L.sort(cmp_StringPair);

  // Print all sequences of equivalent words of length > 1
  list_item it,it_succ;
  it = L.first();
  it_succ = L.succ(it);

  while(it != L.last()) {
    if(L.contents(it).get_signature() 
       != L.contents(it_succ).get_signature()) {
      // Skip sequence of trivial length 1
      it = L.succ(it);
      it_succ = L.succ(it);
      continue;
    }

    // Print sequence of length > 1
    std::cout << L.contents(it).get_word();
    do {
      std::cout << " - ";
      std::cout << L.contents(it_succ).get_word();
      it_succ = L.succ(it_succ);
    }
    while(L.contents(it).get_signature()
          == L.contents(it_succ).get_signature());

    // Next sequence
    std::cout << std::endl;
    it = it_succ;
    it_succ = L.succ(it);
  }
}

The function compute_signature() returns the signature of a string by sorting its characters in an array and returning a string that consists of the sorted order of the characters.

The function cmp_StringPair() realizes a user-defined order on the type StringPair. Thereby, we make the type StringPair a linearly ordered type more or less. (We will learn more about linearly ordered types in Section 2.8.3.) We can pass this function to the method sort() of the class list as a parameter. We even must pass such a function because sorting is based on comparisons (and swapping) and because self-defined types do not have any default order.[19]

The main program builds up the list from word-signature pairs and then sorts it with the help of the functions sort() and cmp_StringPair(). Thereby, words with an equal signature are placed side by side. All words of such a (partial) sequence of equivalent words are anagrams of each other. Partial sequences of length 1 are trivial anagrams because every word is an anagram of itself. We therefore walk through the list and output all partial sequences with 2 or more words. In this way, so marvelously meaningful anagrams like the following are found:

CANAILLES - ALLIANCES
ACERBATED - ABREACTED
UPSTIR - PURIST
BRIDES - DEBRIS - REBIDS - BIDERS
WARRANTIES - RAINWATERS
ANALOG - AGONAL
BRABBLES - BLABBERS - BABBLERS

Exercises

Exercise 15.

Expand the above program so that the anagrams are output in the order of their multiplicity, that is, partial sequences of length n shall be output before partial sequences of length m if n > m. (In the above example, the line starting with BRIDES shall precede the line starting with BRABBLES.) Who finds the anagram with highest multiplicity?

Create a class Anagram. Every object of this type shall hold strings being anagrams of each other in an internal list. Every object shall return the multiplicity of its embodied anagram on demand. Collect all Anagrams in a list and sort this list according to the multiplicity of the anagrams it stores.



[17] In the German version of the tutorial, you can read the words “SAARGEBIET” (the region in Germany where the home town of the author is located) and “ABREISETAG” (day of departure) here. This has nothing to do with the fact that the program was written and executed in Saarbrücken and possibly does not like its surroundings - it is deterministic and would have led to the same output everywhere.

[18] The dictionary file the author used contains 117,969 words. You can get this file in the download area of the tutorial.

This file contains the list of words permitted in crossword games such as Scrabble™. To be more precise, it is the official crossword list compatible with the second edition of the Official Scrabble Players Dictionary. It is the concatenation of two of the Moby Word Lists collected by Grady Ward, being part of the Project Gutenberg.

[19] Intentionally, we did not call this function compare here since this name denotes the default order of a type according to Section 2.8.3. Had we done so, we could have called sort() without any arguments since this sorting method is built up on the default order unless no other order is explicitly specified. Sometimes however, one wants to realize different orders on a type; then one needs different compare functions with different names. We showed here by means of an identifier different from compare how these different functions can be defined and used.




Imprint-Dataprotection