Learning objectives
The class set |
| Inserting, deleting, and choosing elements |
| Membership test |
Let us assume that we operate a web server. We have
extracted a file from the server's access logfile
access.log with the following
structure:
139.19.1.151 139.19.1.151 207.46.249.27 139.19.1.153 139.19.1.151 139.19.1.154 198.182.196.56 139.19.1.151 139.19.1.154 139.19.1.151 139.19.1.151
The entries specify the IP-addresses of the computers by which our server was accessed. Since we have a subnet and publish important information for our employees on our web server, most of the entries originate from this subnet and are equal. There are only a few entries from outside of the subnet, but there are many entries altogether.
We want to know now by which machines our server was actually accessed, that is, we want to know which different entries there are in the logfile. These are only a few entries, although, as we said, there are many entries altogether: Most entries are equal.
This is a unification problem as we
have seen it in Section 2.3.4 before. Exactly
like there, we could throw all IP-addresses into a list, sort
this list, and then invoke the list method
unique(). However, this does not make
sense with a very large number of IP-addresses because the list
becomes enormously long and the sorting function then takes
(too) much time. Another possibility would be to always keep
the list sorted and to insert every new IP-address just in the
right position. Then, however, for every insertion a previous
search is necessary, starting from the beginning of the
list. This is also too time consuming.
Here it is much more reasonable to work with a
set of IP-addresses: We read line for line
and test at every new IP-address whether it is already an
element of the set. If not, we insert it into the set. Thus the
structure always contains only one single copy of the
corresponding element. (It cannot contain multiple copies.) At
the end, we output all elements; they then correspond to the
unique, different IP-addresses. We rely on the implementation of
LEDA's class set performing single insert
and search operations very efficiently. (What is indeed so
because sets are implemented by search trees.)
The following program implements this procedure with the
help of the class set. We feed it with
the file access.log by redirecting standard
input:
#include <LEDA/set.h>
#include <LEDA/string.h>
#include <iostream>
using leda::set;
using leda::string;
int main()
{
set<string> S;
string IPaddress;
while(std::cin >> IPaddress)
if(!S.member(IPaddress)) // this test could be left out
S.insert(IPaddress);
string unique_addr;
forall(unique_addr,S)
std::cout << unique_addr << std::endl;
}
Given the input above, the output of the program is
139.19.1.151 139.19.1.153 139.19.1.154 198.182.196.56 207.46.249.27
The method
S.insert(v);
adds an element with value v to
S. If the value v is already
contained in the set, nothing happens, that is, no elements are
duplicated; every value can be contained in S at
most once.
The method
S.member(v);
tests whether a value v is already a member of the
set S. This test also could be left out in the above program since
insert(), as already said, does not insert a
value twice.
With the macro
forall(x, S) { ... }
the individual elements of a set can be iterated over. The macro
assigns copies of the elements to the iteration variable
x.
The above program does not make use of the methods
S.del(v);
S.choose();
S.del(v) deletes an element with value
v from the set.
S. choose() chooses an
element arbitrarily, that is, the method returns a copy of an arbitrary
element (under the precondition that the set is not empty). This
arbitrary selection is used in some algorithms.
![]() | Note |
|---|---|
The method choose() does not guarantee that an element of the set is selected at random and that all elements are selected with the same probability. It is definitely possible that the same element is selected over and over again (provided that it is not deleted). |
The LEDA class set is, like the
class array, a non-item-based,
simple-structured type.
set if the
number of search operations is fundamentally larger than
the number of insertion and deletion operations.
set is useful if one wants to eliminate multiple copies of
the same object from a collection of objects.int_set
or dynamic_integer_set
may be better choices.