Learning objectives
| Type parameters |
| Self-defined comparison functions |
| The LEDA rule “Requirements for type parameters” |
The container classes of LEDA are parameterized by an
element type. For example, we can store elements of an
arbitrary type T in lists. Until now, we have
stored only objects of type int or
string therein. However, any class
for which a certain small set of methods and operators is defined
can be used as a type
parameter! In other words: We can put everything
we want in lists provided that we implement some functions the
class list expects from their type
parameter.
Which methods and operators are required? It must be
possible to create an object of type T and to
initialize it with the default value of the type
(default constructor) or with a copy of
an already existing object (copy
constructor). An object must be able to get the
value of another object by assignment (assignment
operator). Finally, an object must be in the
position to be written to an output stream (operator
>>) and to be read from an
input stream (operator <<). So
the following methods and operators are necessary:
T::T(); T::T(const T&); T& T::operator = (const T&); istream& operator >> (istream&, T&); ostream& operator << (ostream&, const T&);
These requirements are summarized in the LEDA rule “Requirements for type parameters”.
And now for something completely different to be put into a list than only these boring integer numbers over and over again. What about a pair of strings?
A problem definition in whose solution such a type parameter occurs is the following one: Based on a dictionary file, we want to find as many anagrams as possible. A word is an anagram of another word if it consists of exactly the same letters, like for example the words “PYTHONIST” and “HYPNOTIST”. This has nothing to do with the fact that the author does not really like C++ and rather feels magically attracted by elegant, lean scripting languages.[17] The program introduced in this section has found these words as anagrams from each other;
We can find anagrams with the following algorithm: We need a dictionary file with as many words as possible of the English language.[18] It may start like this:
AA AAH AAHED AAHING AAHS AAL AALII AALIIS AALS AARDVARK AARDVARKS AARDWOLF AARDWOLVES ...
For every word, we calculate the signature; this is the sorted order of the letters. After this we sort the words according to their signature. Thereby, words with an equal signature are placed side by side; they are anagrams of each other. Thus, sorting serves here to form equivalence classes; we have already mentioned this formation of equivalence classes in Section 2.3.4.
To manage the pairs, each consisting of a word and its
respective signature, in a list, we write a class
StringPair. This is a compound type that
must fulfill the regulations of the LEDA rule “Requirements for type parameters”. So it has a default
constructor, a copy constructor and an assignment operator. In
addition the input and output operator are overloaded in such a
way that objects of type StringPair can
be read in or output therewith. To make fast access to the
private data members word and
signature possible without a detour via the
get-methods, we make these operators a
friend of the class. With the so defined
methods and operators we can use the class
StringPair as a type parameter of a
list.
We read all words from the file (piped in from standard
input), calculate their signature, and store the word-signature
pairs in a list<StringPair>. After that, we sort
the list in accordance with a self-defined comparison function
that compares objects of the type
StringPair lexicographically by means of
their signature value.
#include <LEDA/array.h>
#include <LEDA/list.h>
#include <iostream>
using leda::string;
using leda::array;
using leda::list;
using leda::list_item;
class StringPair {
private:
string word;
string signature;
public:
// 1) Default constructor
StringPair() : word(), signature() {}
// 2) Copy constructor
StringPair(const StringPair& p)
: word(p.word), signature(p.signature) {}
// 3) Assignment operator
StringPair& operator = (const StringPair& p) {
if(this != &p) { word = p.word; signature = p.signature; }
return *this;
}
// 4) Input operator
friend std::istream& operator >> (std::istream& is, StringPair& p)
{ is >> p.word >> p.signature; return is;}
// 5) Output operator
friend std::ostream& operator <<
(std::ostream& os, const StringPair& p)
{ os << p.word << " " << p.signature; return os;}
// Constructor taking two string arguments
StringPair(const string& s1, const string& s2)
: word(s1), signature(s2) {}
// Access private data members
const string& get_word() const { return word; }
const string& get_signature() const { return signature; }
};
string compute_signature(const string& s)
{
int num_chars = s.length();
array<char> A(num_chars);
for(int i = 0; i < num_chars; i++)
A[i]=s[i];
A.sort();
string result = s; // Reference counting, see sect. Strings
for(int i = 0; i < num_chars; i++)
result[i] = A[i]; // First assignment decouples
return result;
}
// User defined ordering of type StringPair
int cmp_StringPair(const StringPair& p1, const StringPair& p2) {
if(p1.get_signature() < p2.get_signature()) return -1;
else if(p1.get_signature() > p2.get_signature()) return 1;
return 0;
}
int main()
{
string s;
list<StringPair> L;
// Read all words
while(std::cin >> s) {
StringPair p(s, compute_signature(s));
L.append(p);
}
// Sort according to user defined ordering
L.sort(cmp_StringPair);
// Print all sequences of equivalent words of length > 1
list_item it,it_succ;
it = L.first();
it_succ = L.succ(it);
while(it != L.last()) {
if(L.contents(it).get_signature()
!= L.contents(it_succ).get_signature()) {
// Skip sequence of trivial length 1
it = L.succ(it);
it_succ = L.succ(it);
continue;
}
// Print sequence of length > 1
std::cout << L.contents(it).get_word();
do {
std::cout << " - ";
std::cout << L.contents(it_succ).get_word();
it_succ = L.succ(it_succ);
}
while(L.contents(it).get_signature()
== L.contents(it_succ).get_signature());
// Next sequence
std::cout << std::endl;
it = it_succ;
it_succ = L.succ(it);
}
}
The function compute_signature()
returns the signature of a string by sorting its characters in
an array and returning a string that consists of the sorted
order of the characters.
The function cmp_StringPair()
realizes a user-defined
order on the type
StringPair. Thereby, we make the type
StringPair a linearly
ordered type more or less. (We will learn more
about linearly ordered types in Section 2.8.3.)
We can pass this function to the method
sort() of the class
list as a parameter. We even
must pass such a function because sorting is
based on comparisons (and swapping) and because self-defined types do
not have any default order.[19]
The main program builds up the list from word-signature
pairs and then sorts it with the help of the functions
sort() and
cmp_StringPair(). Thereby, words with an
equal signature are placed side by side. All words of
such a (partial) sequence of equivalent words are anagrams of
each other. Partial sequences of length 1 are trivial
anagrams because every word is an anagram of itself. We
therefore walk through the list and output all partial
sequences with 2 or more words. In this way, so marvelously
meaningful anagrams like the following are found:
CANAILLES - ALLIANCES ACERBATED - ABREACTED UPSTIR - PURIST BRIDES - DEBRIS - REBIDS - BIDERS WARRANTIES - RAINWATERS ANALOG - AGONAL BRABBLES - BLABBERS - BABBLERS
[17] In the German version of the tutorial, you can read the words “SAARGEBIET” (the region in Germany where the home town of the author is located) and “ABREISETAG” (day of departure) here. This has nothing to do with the fact that the program was written and executed in Saarbrücken and possibly does not like its surroundings - it is deterministic and would have led to the same output everywhere.
[18] The dictionary file the author used contains 117,969 words. You can get this file in the download area of the tutorial.
This file contains the list of words permitted in crossword games such as Scrabble™. To be more precise, it is the official crossword list compatible with the second edition of the Official Scrabble Players Dictionary. It is the concatenation of two of the Moby Word Lists collected by Grady Ward, being part of the Project Gutenberg.
[19] Intentionally, we did not call this function
compare here since this name denotes
the default order of a type according
to Section 2.8.3. Had we done
so, we could have called sort()
without any arguments since this sorting method is built up
on the default order unless no other order is explicitly
specified. Sometimes however, one wants to realize different
orders on a type; then one needs different compare functions
with different names. We showed here by means of an
identifier different from compare how
these different functions can be defined and used.