Learning objectives
| The class set |
| Inserting, deleting, and choosing elements |
| Membership test |
Let us assume that we operate a web server. We have extracted a file from the server's access logfile access.log with the following structure:
139.19.1.151 139.19.1.151 207.46.249.27 139.19.1.153 139.19.1.151 139.19.1.154 198.182.196.56 139.19.1.151 139.19.1.154 139.19.1.151 139.19.1.151
The entries specify the IP-addresses of the computers by which our server was accessed. Since we have a subnet and publish important information for our employees on our web server, most of the entries originate from this subnet and are equal. There are only a few entries from outside of the subnet, but there are many entries altogether.
We want to know now by which machines our server was actually accessed, that is, we want to know which different entries there are in the logfile. These are only a few entries, although, as we said, there are many entries altogether: Most entries are equal.
This is a unification problem as we have seen it in Section 2.3.4 before. Exactly like there, we could throw all IP-addresses into a list, sort this list, and then invoke the list method unique(). However, this does not make sense with a very large number of IP-addresses because the list becomes enormously long and the sorting function then takes (too) much time. Another possibility would be to always keep the list sorted and to insert every new IP-address just in the right position. Then, however, for every insertion a previous search is necessary, starting from the beginning of the list. This is also too time consuming.
Here it is much more reasonable to work with a set of IP-addresses: We read line for line and test at every new IP-address whether it is already an element of the set. If not, we insert it into the set. Thus the structure always contains only one single copy of the corresponding element. (It cannot contain multiple copies.) At the end, we output all elements; they then correspond to the unique, different IP-addresses. We rely on the implementation of LEDA's class set performing single insert and search operations very efficiently. (What is indeed so because sets are implemented by search trees.)
The following program implements this procedure with the help of the class set. We feed it with the file access.log by redirecting standard input:
#include <LEDA/set.h>
#include <LEDA/string.h>
#include <iostream>
using leda::set;
using leda::string;
int main()
{
set<string> S;
string IPaddress;
while(std::cin >> IPaddress)
if(!S.member(IPaddress)) // this test could be left out
S.insert(IPaddress);
string unique_addr;
forall(unique_addr,S)
std::cout << unique_addr << std::endl;
}
Given the input above, the output of the program is
139.19.1.151 139.19.1.153 139.19.1.154 198.182.196.56 207.46.249.27
The method
S.insert(v);
adds an element with value v to S. If the value v is already contained in the set, nothing happens, that is, no elements are duplicated; every value can be contained in S at most once.
The method
S.member(v);
tests whether a value v is already a member of the set S. This test also could be left out in the above program since insert(), as already said, does not insert a value twice.
With the macro
forall(x, S) { ... }the individual elements of a set can be iterated over. The macro assigns copies of the elements to the iteration variable x.
The above program does not make use of the methods
S.del(v);
S.choose();
S.del(v) deletes an element with value v from the set. S. choose() chooses an element arbitrarily, that is, the method returns a copy of an arbitrary element (under the precondition that the set is not empty). This arbitrary selection is used in some algorithms.
![]() |
Note |
|---|---|
|
The method choose() does not guarantee that an element of the set is selected at random and that all elements are selected with the same probability. It is definitely possible that the same element is selected over and over again (provided that it is not deleted). |
|
The LEDA class set is, like the class array, a non-item-based, simple-structured type.