3.4.3. Splitting, concatenating, and merging

Learning objectives

The operations split(), concat(), and merge()

In addition, the class sortseq offers operations to split a sorted sequence into two sorted sequences (at a certain container), to concatenate two sorted sequences, and to merge two sorted sequences.

A call of

S.split(it, T1, T2);

splits the sorted sequence S into two partial sequences T1 and T2. it must be an item that refers to a container of S. Then S is split behind this container, that is, the container that it refers to becomes the last container of T1. This corresponds to the additional fourth default argument LEDA::after of split():

S.split(it, T1, T2, LEDA::after);

To execute the splitting in front of this container, LEDA::before can be passed as a fourth argument. T1 and T2 must be different objects; one of them may be S. If none of them is S, then S will be empty after the splitting.

The statement

S.conc(T);

appends T to the rear of S. T is emptied. It is precondition that the smallest key of T be larger than the largest key of S. Here also, a fourth argument LEDA::before specifies that T is to be appended in front of the first container of S (the precondition then must hold turned around correspondingly.) Again, T is emptied.

By the statement

S.merge(T);

the two sorted sequences S and T are merged into one sequence. Of course, it is a precondition that all keys be different in the two sequences. T is emptied and S contains all key-value pairs after merging.

The following program gives examples of the application of these operations:

Filename: SortseqSplitConcatMerge.C
LEDA users as of version 5.0: Note that header files are now partitioned into modules (subfolders)!
#include <LEDA/sortseq.h>
#include <LEDA/string.h>
#include <iostream>

using leda::sortseq;
using leda::seq_item;
using leda::string;
using std::cout;
using std::endl;


// print all key-value-pairs of a sortseq 
// together with its name and a newline at the end
inline void sortseq_print(const string& name, 
                          const sortseq<int, int>& S) {
  cout << name << ": ";
  if(S.empty()) {
    cout << "(Empty sequence)" << endl;
    return;
  }

  seq_item it = S.min_item();
  seq_item last_it = S.max_item();
  while(true) { 
    cout << "(" << S.key(it) << "," << S.inf(it) << ") ";
    if(it == last_it) break;
    it = S.succ(it);
  }
  cout << endl;
}


int main()
{
  sortseq<int,int> Even;
  sortseq<int,int> Odd;

  for(int i = 0; i < 10; i++) {
    Even.insert(i * 2, 0);
    Odd.insert(i * 2 + 1, 0);
  }

  cout << "Before splitting Even:" << endl;
  sortseq_print("Even", Even);
  sortseq_print("Odd", Odd);

  seq_item it = Even.locate(5);

  sortseq<int,int> T1, T2;

  Even.split(it, T1, T2, LEDA::before);
  cout << "After Even.split(it, T1, T2, LEDA::before):" << endl;
  sortseq_print("Even", Even);
  sortseq_print("T1", T1);
  sortseq_print("T2", T2);

  T2.conc(T1,LEDA::before); // LEDA::after results in an error here!
  cout << "After T2.conc(T1,LEDA::before):" << endl;
  sortseq_print("T1", T1);
  sortseq_print("T2", T2);

  T2.merge(Odd);
  cout << "After T2.merge(Odd):" << endl;
  sortseq_print("Odd", Odd);
  sortseq_print("T2", T2);
}

The program outputs the following:

Before splitting Even:
Even: (0,0) (2,0) (4,0) (6,0) (8,0) (10,0) (12,0) (14,0) (16,0) (18,0)
Odd: (1,0) (3,0) (5,0) (7,0) (9,0) (11,0) (13,0) (15,0) (17,0) (19,0)
After Even.split(it, T1, T2, LEDA::before):
Even: (Empty sequence)
T1: (0,0) (2,0) (4,0)
T2: (6,0) (8,0) (10,0) (12,0) (14,0) (16,0) (18,0)
After T2.conc(T1,LEDA::before):
T1: (Empty sequence)
T2: (0,0) (2,0) (4,0) (6,0) (8,0) (10,0) (12,0) (14,0) (16,0) (18,0)
After T2.merge(Odd):
Odd: (Empty sequence)
T2: (0,0) (1,0) (2,0) (3,0) (4,0) (5,0) (6,0) (7,0) (8,0) (9,0) (10,0) 
(11,0) (12,0) (13,0) (14,0) (15,0) (16,0) (17,0) (18,0) (19,0)

The auxiliary function print_sortseq() shows how to iterate over all key-value pairs of a sortseq by means of min_item(), max_item(), and succ().

[Note]Note

Although it is not mentioned in the LEDA manual (yet), the forall macros exist also for sortseq. forall_defined(k,S) iterates over all keys, forall(v,S) over all values.

At first, the program creates two sorted sequences, Even and Odd, whose keys are even and odd integer numbers, respectively. (Since a sortseq always needs also a type parameter for the associated values, we simply choose the type int here and set the associated values always to 0.) After this, the program splits up the sequence Even at the key 5 into two partial sequences. Then it reconcatenates them suitably. At the end, it merges the sequence of the odd keys into the sequence of the even keys.

Implementation and complexity of the operations

As already mentioned, the class sortseq is implemented by skiplists. With n key-value pairs contained in it, this structure needs O(n) storage. The asymptotic constant is quite high, though: Approximately 25.5· n bytes plus the storage for the keys and the values are needed. Thus the memory consumption of a sortseq is considerably larger than the one of the other associative container types.

The methods insert(), locate(), lookup(), and del() take time O(log n), the method clear() takes time O(n).

In contrast, finger searching by the different versions of finger_locate() takes time O(log min(d, n-d)) only, as described in Section 3.4.2.

The operations for splitting, concatenating, and merging described above run very fast although they must build internal structures to manage sorted sequences: If n key-value pairs are contained in a sorted sequence and if this sequence is split at the k-th pair, the operation split() takes only O(1 + log min(k,n-k)) steps. If n and m are the lengths of two sequences to be concatenated, conc() takes only O(1 + log min(n,m)) steps. merge() takes O(log(n+m,n)) steps, wherein (n,k) is the binomial coefficient n choose k.

All other methods take time O(1).

Further information

More information about the class sortseq can be found on the corresponding manual page.




Imprint-Dataprotection