2.8.2. Union, intersection, and difference of sets

Learning objectives

The methods join(), intersect(), and diff()

We turn to the set operations intersection, union, difference, and symmetric difference now. These also are supported by the class set.

Figure 2.25. The set operations intersection, union, and symmetric difference

The set operations intersection, union, and symmetric difference

Intersection (1), union (2), and symmetric difference (3) of two sets A and B.

Let us imagine that we run a big marriage bureau. The general isolation and loneliness of the modern person in the information society guarantees a large clientele for us. New customers join every year (because their willingness simply to address someone nice at the bus stop or in the café decreases more and more thanks to SMS) and some sign off (because we have found a compatible partner for them, or because they are simply sick and tired of the other sex). The customer numbers of the customers at the end of the years 2001 and 2002 are stored in the files Customers01.txt and Customers02.txt, respectively.

The file Customers01.txt looks for example like this:

7174
1234
5557
8181
9999
2788
5696

The file Customers02.txt like this:

1234
8181
2788
5696
2335
8912

We want to know now which customers we had within the two years altogether, which customers we attended to both in the first and in the second year, which customers we newly acquired in the year 2002, and which ones we lost. For this we must unite the sets of the customer numbers, intersect and subtract them from each other, respectively. This is what following program does.

Filename: SetOperations.C
#include <LEDA/set.h>
#include <fstream>
#include <iostream>

using leda::set;
using std::ifstream;
using std::cout;
using std::endl;

int main()
{
  ifstream is01("Data/Customers01.txt");
  ifstream is02("Data/Customers02.txt");

  set<int> S, T;
  
  int customer_id;

  while(is01 >> customer_id) 
      T.insert(customer_id);

  while(is02 >> customer_id) 
      S.insert(customer_id);
  
  set<int> All_customers = S.join(T);

  set<int> Both_years_customers = S.intersect(T);

  set<int> New_customers = S.diff(T);

  set<int> Signedoff_customers = T.diff(S);

  cout << "Altogether we had the following customers:\n";   
  forall(customer_id, All_customers) 
    cout << customer_id << " ";  
  cout << endl;

  cout << "The following were customers in both years:\n";   
  forall(customer_id, Both_years_customers) 
    cout << customer_id << " ";  
  cout << endl;

  cout << "The following customers are new:\n";   
  forall(customer_id, New_customers) 
    cout << customer_id << " ";  
  cout << endl;

  cout << "The following customers have signed off:\n";   
  forall(customer_id, Signedoff_customers) 
    cout << customer_id << " ";  
  cout << endl;
}

The program outputs the following:

Altogether we had the following customers:
1234 2335 2788 5557 5696 7174 8181 8912 9999
The following were customers in both years:
1234 2788 5696 8181
The following customers are new:
2335 8912
The following customers have signed off:
5557 7174 9999

The method

S.join(T);

unites two sets S and T and returns the union set.

In contrast, the method

S.intersect(T);

intersects two sets S and T and returns the difference set.

The method

S.diff(T);

calculates the set difference of two sets S and T, that is, it returns a set consisting of all elements which are contained in S, but not in T.

Furthermore, there is the method

S.symdiff(T);

It calculates the symmetric difference of two sets and returns it. This is the set of all the elements of both sets which are not contained in the intersection.

With the overloaded comparison operators two sets can be tested for equality, inequality, proper and improper inclusion. For example, the expression

S < T;

tests whether S is a proper subset of T.

Finally, there are the methods size(), empty(), and clear() with the same meaning as in the case of stacks and queues.

Implementation and further information

The class set is implemented by randomized search trees.

The operations insert(), del(), and member() take time O(log(n)), if the set contains n elements. empty() and size() take time O(n), and clear() takes constant time. If S and T are two sets with n and m elements, respectively, then S.join(T) and S.diff(T) take time O(m·log(n+m)) and S.intersect(T) takes time O(n·log(n+m)).

Furthermore, there are some operators of the class set we did not mention. These are described on the corresponding manual page.