2.9.1. If one knows minimum and maximum in advance (class int_set)

Learning objectives

The class int_set

As an example we present a sorting algorithm that works on certain sets of integer numbers and that is particularly fast thereon.

Suppose we have stored non-negative integer numbers from a range [0..N] in a file SomeInts.txt. All numbers are different. It could for example be a large list of phone numbers. We want to output the list in sorted order. Since our main memory is scarce, we cannot afford to make every number an element of an ordinary array. An array of N ints would be too large but we can afford N/32 ints. (We assume a machine with M=32 bits per int here.) In addition, we must sort the rapidly changing list repeatedly in short time intervals so that an ordinary sorting algorithm would be too slow here. How we can solve this problem?[31]

We understand every number of the file as the number of a bit in an int_set. We read the file twice. In the first pass, we only determine the minimum and the maximum of the file's integer numbers to be able to create the int_set. When reading through for the second time, we set the bit at position i if we read the number i. After this, we run over the complete set from the minimum up to the maximum and output all elements contained in it. These are then exactly the numbers of the file in a sorted order. This algorithm needs linear time. (It is linear in the sum of the number of numbers entered and their interval size.)

Filename: FastIntegerSort.C
LEDA users as of version 5.0: Note that header files are now partitioned into modules (subfolders)!
#include <LEDA/int_set.h>
#include <fstream>
#include <iostream>
#include <limits>

using leda::int_set;

int main()
{
  std::ifstream is("Data/SomeInts.txt");

  int n;
  int min = std::numeric_limits<int>::max();
  int max = std::numeric_limits<int>::min();

  while (is >> n) {
    if(n < min) min = n;
    else if(n > max) max = n;
  }

  int_set S(min, max);

  is.clear();  // set state of stream to 'good'
  is.seekg(0); // rewind file 

  while (is >> n) 
    S.insert(n);
  
  for(int i = S.min(); i <= S.max(); i++)
    if(S.member(i))
      std::cout << i << "\n";  
}

The constructor of the class int_set expects the minimum and the maximum of the numbers to be stored in this set:

int_set S(min, max);

If, in contrast, only one argument N is passed to it, the class supposes the interval [0..N-1] as the range from which the elements originate.

As in the case of the class set the insert operation is called

S.insert(i);

Correspondingly, by

S.member(i);

it can be checked whether a certain number i is an element of the set and by

S.del(i);

the number i can be deleted from the set. All these operations take constant time.

A call of

S.clear();

deletes all numbers from the int_set.

The methods

S.min();
S.max();

return the minimum and the maximum, respectively, of the numbers stored in the set.

Furthermore, by the overloaded operators | and &

S | T;
S & T;

two sets can be united and intersected, respectively. These operations are also particularly fast since they are implemented as the logical Or and And of the individual ints of the underlying bit vector.

The operator

~S;

returns the complement of the set with respect to the underlying interval, that is a bit vector in which all bits are toggled. These operations have a time complexity that is proportional to the length of the bit vector.

Implementation and further information

Due to the implementation as a bit vector, the set operations perform particularly fast: If a is the minimum and b the maximum of the values stored, then intersection, union, complementation and emptying by clear() take time O(b-a+1). All other operations take constant time.

One finds more information about the class int_set on the corresponding manual page.



[31] A real life problem definition of this kind is described in [11].




Imprint-Dataprotection