2.3.4. Sorting lists

Learning objectives

Sorting lists
Removing duplicates from lists
Outputting lists
Merging lists

In principle, there can be four reasons to sort a collection of objects:

A problem definition of the latter kind is referred to as a unification problem. Let us imagine for example that we have a list of names in which many names occur repeatedly. This could be entries in a visitor's book of a web site, in which the same people make their highly qualified comments again and again. We want to know now which different people have left an entry. We have stored the list of names in a file SomeNames.txt. It could look as follows:

Doris Böhm
Christian Uhrig
Doris Böhm
Mick Jagger
Christian Uhrig

To find out only the different names therein, we read in all names into a list L, sort the list by a call of

L.sort();

and remove all duplicates by

L.unique();

from the sorted version:

Filename: NameUniq.C
LEDA users as of version 5.0: Note that header files are now partitioned into modules (subfolders)!
#include <LEDA/list.h>
#include <LEDA/string.h>
#include <iostream>

using leda::list;
using leda::string;

int main() 
{

  list<string> L;
  string s;

  while(std::cin) {
    s.read_line();
    L.append(s);
  }

  L.sort();

  L.unique();

  L.print('\n');
}

We get our input by redirecting standard input.

$ NameUniq < Data/SomeNames.txt
 
 
Doris Böhm
Christian Uhrig
Mick Jagger$
[Note]Note

Actually, the two blank lines at the beginning of the above output should not appear. The first one is due to the fact that print('\n') outputs a separator already in front of the first element, the second one to the fact that read_line() interprets the last separator of the input as an empty string. Both misconducts are eliminated as of version 4.5 of LEDA.

The method sort() arranges the linking information of the containers in a sorted order. Precondition is that the attribute type T be linearly ordered by a comparison function

int compare(const T&, const T&);

Such a function is already predefined for strings. We will learn how to sort self-defined types in Section 2.3.5. We will get to know more about linearly ordered types in general in Section 2.8.3.

The method unique() removes duplicates from a list. Here it is precondition that the list be sorted in accordance with the order defined on the attribute type.

With a call of

L.print();

a list can be output in an easy and comfortable way. A separator, which is output between the single values, can be passed to the method. It can even be told by specification of an output stream where it shall direct the output to. Its complete signature is

print(ostream& O, char space = ' ');

There are also other sorting methods:

L.merge_sort();
L.bucket_sort(f);

are stable sorting algorithms, that is, if container A has the same contents as container B, and A precedes B in the sequence of the containers before the sorting, then the same still holds after the sorting. merge_sort() is faster than sort() if the list contains long, pre-sorted intervals.

bucket_sort() needs a function f() as argument that “throws the individual elements into numbered buckets”. This means it must return an integer number for every element of the attribute type T, which should originate from a not too large interval [l..r]. (The return value is the number of the bucket in which the element will land.) Then it arranges the containers in an order of increasing values of f. To be able to use bucket_sort() efficiently, one should know the exact algorithm; it can be looked up for example in LEDA, A Platform for Combinatorial and Geometric Computing.

The method merge() is useful in some algorithms. It merges two already pre-sorted lists into a sorted list. In a call

L.merge(M);

the containers of M are inserted into L so that L remains sorted. M is emptied!




Imprint-Dataprotection