Learning objectives
| LEDA rules in general |
| The LEDA rule “Definition with initialization by copying” |
| Handle scheme and reference counting |
string as a handle-like type |
| The categorization of the LEDA types |
In the last program the strings s and
t are defined with initialization by copying. A
syntactically equivalent form for the initialization of t is the
definition
string t = "LEDA";
This possibility of definition with initalization by copying exists for every LEDA type. This is a firm rule, which is always valid, namely the LEDA rule “Definition with initialization by copying”.
This rule originates from a set of rules that must be adhered to when programming with LEDA in order to write syntactically and semantically correct and efficient LEDA programs. The rule stating that definition with initialization by copying is possible for every LEDA type is one of them. We will get to know these rules by and by. The impatient reader can obtain a summary in Appendix B, The golden LEDA rules.
But what exactly is a copy of a string? What do statements like
t = s;
or
string t(s);
really do?
As experience shows, in programs working with many string variables
many of these variables have the same value, i.e., they refer to the
same character string. It would be a great waste of memory
if every variable denoted an object of its own, i.e. a
separate area in memory.
LEDA always tries to offer an implementation of a data type
as clever as possible, that is as memory and/or space saving as
possible. It therefore works with a handle scheme and with reference counting:
If the string variable s is assigned to the
variable t by the statement t =
s or if t initialized with a copy of
s, then not the individual characters are copied
from the physical memory location of s to the
memory location of t, but rather
s and t are made to point to the
same physical memory location. In this spirit, they are
“handles” by which the actual memory location is
accessed. The actual string, i.e. the sequence of the characters in
memory, only exists once and LEDA counts how many handles refer to it.
Not until a string is modified, for example by an access by
[], the two handles are actually
“decoupled”, that is, new storage is requested and a
copy is made. In this regard strings, concerning their
implementation, are very similar to pointers, but thanks to
LEDA everything happens automatically in the background; the
user does not have to worry about the difficulties
usually occurring when using pointers and dynamic memory
allocation.
Some types of LEDA follow this handle scheme with
reference counting, for example the type point
for points in the plane and the type
integer for arbitrarily large integers.
Accordingly, these types are called handle
types.
By various reasons pertaining the implementation, which we do
not want to dwell on here, the type string is not
a real handle type. According to this, the class
string is only referred to as a
handle-like type.
With regard to the way how an object of a LEDA type is
copied and with regard to the question how such an object is
built up from partial objects or how it must occur in a
comprehensive object, all data types of LEDA fit in a system
that is illustrated in Appendix A, Categorization of the LEDA data types. We will get to know this
categorization
by and by. Therein the type string is
subclassified as a handle-like type under the item types, which in turn
represent a subclass of the primitive types.
Unlike the assignment operator, the operator
+=, of whom we have taken advantage in
the program for palindrome testing to reverse a
string, always makes a copy of a string without us noticing this.
Thereby, when reversing a string as we have implemented it until now,
for each and every character of the original string appended to the
reversed string, a copy of a string is made in defiance of the handle scheme.
An unnecessary waste of memory!
So how can we reverse a (long) string memory-efficiently? The
following program contains two tricks. The first one is a trick to
form arbitrarily long palindromes; it has to be ascribed to the
English language. The second one is a programming trick that avoids
the copy rage of +=.
#include <LEDA/string.h>
#include <iostream>
using leda::string;
using std::cout;
string reverse_efficient(const string& s) {
string result = s;
int i = 0;
int j = result.length() - 1;
while(i < j) {
char tmp = result[i];
result[i] = result[j];
result[j] = tmp;
i++; j--;
}
return result;
}
int main() {
string s;
cin >> s;
string s_quoted = "'" + s + "'";
string r = reverse_efficient(s_quoted);
string outstring = s_quoted + " sides reversed is " + r;
cout << outstring << "\n";
cout << reverse_efficient(outstring) << "\n";
}
The output of the program with the input
Joachim is
'Joachim' sides reversed is 'mihcaoJ' 'Joachim' si desrever sedis 'mihcaoJ'
and the first trick should be obvious thereby.
The efficiency of the function
reverse_efficient() is due to the fact
that altogether, only one single copy is made, namely in that
moment result[0] is
write-accessed for the first time.
The swapping of the characters happens by two indices
i and j, inside this
copy. There i runs from front to rear and
j from rear to front until they meet in the
middle. In each step the characters at positions i
and j are swapped.
Strings are implemented by character arrays of C++.
If the string s consists of
n characters and the string
t of m characters, the
following holds: All operations containing a search for the
substring t in the string
s take time O(n·m). (We will get to
know the meaning of the O-Notation in Section 2.2.3.) Accessing an individual
character by [] takes constant
time. All other operations take time O(n).
The class string possesses yet some
other methods. Furthermore, the methods introduced above can
partly be invoked with different or additional arguments. We
do not want to introduce all of these possibilities here. We
refer to the corresponding manual page of the class
string instead.
In order to create strings from values of other types like double the constructor
string s(const char* format, ...);
can be used. Like the C function scanf() it awaits
a format string and after that a list of the objects to be turned into
a string.
The class string should in
general be preferred to C++ strings of type
char[].
In case of many time critical input/output
operations the type char[] may be more
efficient.
Whether the LEDA class
leda::string or the homonymous
class std::string of the C++
standard library is used, is a question of taste and
habit. However, as it is described in Section “The namespace leda”, in both cases one has to
pay attention to the correct namespace, otherwise one
does not know which class one actually uses!