Learning objectives
The class p_queue 
The class p_queue
is
LEDA's type for priority queues with unbounded
priorities. Such a queue contains pairs (p,i)
consisting of a priority p and some information i. To every
information i in a p_queue
, a priority p is
assigned. The type P
of the priorities must be linearly ordered. The
pairs are contained in the queue in ascending order of their
priorities. Every priority p of type P
may be used;
so the priorities may occur with unbounded
values. For the information, an arbitrary type I
may
be used.
A priority queue with priority type P
and
information type I
is created by
p_queue<P,I> PQ;
Also this class follows the item container concept:
The pairs themselves can only be accessed via items of type
pq_item
.
Such an item is returned when a pair (p,i) is inserted into the
priority queue or when a pair with smallest priority is explicitly
asked for. This is carried out by means of the methods
pq_item it = PQ.insert(p,i);
pq_item it = PQ.find_min();
A pair with smallest priority can be deleted by
P p = PQ.del_min();
This method returns the priority of this pair.
A certain pair can be deleted starting from a
pq_item
that refers to this pair:
PQ.del_item(it);
Also starting from an item it
, the
priority and the information of the pair (p,i) referenced by this
item can be queried:
P p = PQ.prio(it); I i = PQ.inf(it);
The information can be arbitrarily modified, again starting from an item:
PQ.change_inf(it, new_i);
In contrast, the priority p can only be decreased:
PQ.decrease_p(it, new_p); // new_p <= old_p !
This is due to the especially efficient implementation of priority queues by socalled Fibonacci heaps. With this restriction of the interface also, LEDA follows the classic definition of a priority queue.
Warning  

This data structure must not be mistaken for an associative container type, in which a value is accessed via a key that this value is associated to. Given a priority, it is not possible to search for an information that has this priority. Conversely, it is not possible either to ask, starting out from an information, whether this information occurs in the queue, and which priority it has. Finally, this structure does not offer any possibility for iterating over the pairs contained in it. If one wants to do this, one has to collect and manage the items returned when inserting the pairs, for example in a list. 
Pairs with equal priority and/or equal information
absolutely may be contained in a
p_queue
. When two pairs (p,i) and (p,i')
with equal priority are inserted into the priority queue, there is
no agreement on which of the pairs is taken out of the queue
first; this pair is determined by chance, so to speak. Therefore,
the following must particularly be taken into account:
Important  

Let (p,i) and (p,i') be two pairs with equal priority
p, inserted into a 
Huffman coding is a compression algorithm that is particularly efficient on files that contain textual, naturallanguage data. This means files that consist of certain characters of a character set some of which occur with a large frequency (as the character “e” in English), others only with low frequency (as the character “z” in English). Huffman coding achieves very good compression rates on such files.
How does this method work? Suppose we have the following text, which consists of 100 characters occurring with a different frequency:
caaabacfaabbebaaacbabdaaceadcccaaaaaaacabaaafbcaad bacdaaaacdccacbaafbcfaaccaaaadadcdcddcabeaeefabfaa
How many bits do we need to code this text? As we can see, only
the characters a
, b
, c
,
d
, e
, and
f
appear in it. The simplest possibility for a
coding is to represent every character with the same fixed
number of bits. This is called fixedlength coding.
Since only 6 characters occur altogether, 3 bits per character
suffice for this coding. (It is 2^3=8, that is, we even could represent 8
different characters with this coding.) Table 4.1 shows a possible coding with fixed
length. So we need 100·3 = 300 bits altogether for the coding of
the complete text.
Can we get by on less bits by a different coding? We make
the following observation: The character a
appears with the highest
frequency by far in the text; it appears 46 times. On the other hand,
e
appears only 5
times. Therefore, using just as many bits for every a
as for every e
has waste written all over it. Rather,
we should code an a
with as few
bits as possible, for example, with 1 bit only (for example, with a 0),
and accept that the coding of characters that occur rarely, such
as the e
, requires more than 3 bits. Such a
coding in which the number of bits may be different from character
to character is referred to as variablelength coding.
Table 4.1 shows such a coding with variable
length. Here we need only 46·1 + 13·3 +
20·3 + 10·3 + 5·4 + 6·4 = 219
bits to code the text, that is, we achieve a compression rate of 219/300 = 73%.
So the coding with variable length is 27% shorter than the coding with
fixed length.
Table 4.1. The Huffman code
a  b  c  d  e  f  
Frequency  46  13  20  10  5  6 
Coding with fixed length  000  001  010  011  100  101 
Huffman coding  0  110  111  100  1010  1011 
The coding with variable length presented in Table 4.1 is called a prefixfree code. This means that the coding of a character is never a prefix of the coding of another character, which makes coding and decoding of a complete text simple; see also Exercise 42. Such a coding can always be represented as a binary tree. Every leaf corresponds to a character of the character set. Every edge is marked with 0 or 1. The coding of a character results from the concatenation of all markings on the path from the root to the corresponding leaf. Figure 4.1 shows the tree that corresponds to the above prefixfree coding.
An optimal prefixfree coding (for a predefined frequency distribution of individual characters) is a prefixfree coding that achieves the best compression rate of all possible prefixfree codings. The coding from Table 4.1 and the accompanying tree from Figure 4.1 represent an optimal prefixfree coding for the original text; they were created by a method that calculates such an optimal prefixfree coding.
This method was invented by David Huffman and carries his name today: the Huffman code. The basic idea is the following: The leaves of the tree correspond to the characters to be coded. The characters that occur most seldom should acquire the longest coding, that is, they should have the longest paths from the root. Therefore, first all characters are put into a priority queue ordered by their frequency, and the two characters with the lowest priority (that is, the lowest frequency) are pulled out of the queue. For these two characters, the last bit of their coding is known now, that is, the marking of the last edge on the path from the root: The rarest character is given a 1, the second rarest character a 0. They are given a common parent node in the tree to be created; the priority of this node results as the sum of the frequencies (priorities) of the two children. This parent node is put in order into the priority queue now, and the method goes into the next iteration: It pulls out the two nodes with the lowest priority from the queue, puts them in order under a common parent node, which contains the sum of the priorities of the two children as its priority, etc. Figure 4.2 shows this process for the distribution from Table 4.1. We do not want to give reasons here why the method always returns an optimal prefixfree coding; concerning this matter, we refer to secondary literature (for example, to the description in Introduction to Algorithms, which we follow here all in all).
The following program uses a
p_queue
to realize the priority queue
occurring in Huffman's algorithm. At first, it reads standard
input and counts in a d_array
which
character occurs how often therein. Starting out from this
frequency distribution, it then builds up the tree
explicitly. It uses a structure
huffman_node
to implement the nodes
occurring in the tree. This structure stores a reference to the
respective parent node, the priority of the node, and the
information whether this node is a left or a right child of its
parent. In the end, every character must know the leaf of the
tree belonging to it so that the method is able to climb up the
path from there to the root. For every character the program
therefore memorizes the associated leaf in an additional
d_array
.
#include <LEDA/d_array.h> #include <LEDA/p_queue.h> #include <LEDA/string.h> using leda::d_array; using leda::p_queue; using leda::string; using std::cin; using std::cout; class huffman_node { public: huffman_node *parent; int priority; bool is_left_child; // set parent to nil when constructing a new huffman_node huffman_node() : parent(nil) {}; }; int main() { d_array<char,int> H; char c; // read all characters from stdin and count their frequency while(cin >> c) H[c]++; p_queue<int, huffman_node*> PQ; d_array<char, huffman_node*> leaf_belonging_to_char; // for every character insert a huffman node into the priority queue // this node is a leaf of the tree to be constructed forall_defined(c,H) { cout << c << ": " << H[c] << std::endl; huffman_node *new_leaf = new huffman_node; new_leaf>priority = H[c]; PQ.insert(H[c], new_leaf); leaf_belonging_to_char[c] = new_leaf; } // construct the Huffman code cout << "Constructing the Huffman code for all read characters.\n"; while(PQ.size() > 1) { // extract two children with minimal priority huffman_node *left_child = PQ.inf(PQ.find_min()); PQ.del_min(); huffman_node *right_child = PQ.inf(PQ.find_min()); PQ.del_min(); // construct new parent node huffman_node *new_parent = new huffman_node; left_child>parent = right_child>parent = new_parent; left_child>is_left_child = true; right_child>is_left_child = false; new_parent>priority = left_child>priority + right_child>priority; // insert new parent into priority queue PQ.insert(new_parent>priority, new_parent); // control output cout << "New parent created with priority "; cout << new_parent>priority << " = "; cout << left_child>priority << " + " << right_child>priority << "\n"; } cout << "The Huffman codes are:\n"; // output the codes: climb from all leaves to the root forall_defined(c, H) { cout << c << ": "; huffman_node *climber = leaf_belonging_to_char[c]; string huffman_code; // collect 0s and 1s on the path to the root while(climber>parent != nil) { // do nothing for the root if(climber>is_left_child) huffman_code = "0" + huffman_code; else huffman_code = "1" + huffman_code; climber = climber>parent; } cout << huffman_code << std::endl; } }
With the previous text of 100 characters as input, the program outputs the following:
a: 46 b: 13 c: 20 d: 10 e: 5 f: 6 Constructing the Huffman code for all read characters. New parent created with priority 11 = 5 + 6 New parent created with priority 21 = 10 + 11 New parent created with priority 33 = 13 + 20 New parent created with priority 54 = 21 + 33 New parent created with priority 100 = 46 + 54 The Huffman codes are: a: 0 b: 110 c: 111 d: 100 e: 1010 f: 1011
The main loop asks by means of the method
PQ.size();
whether at least 2 nodes are still contained in the priority queue. In the end, the queue consists of the root node only, which does not have to be processed any more.
In the program for
the simulation of the queuing strategy at Saarbrücken Central
Station, we left a hint on how to accelerate it: Every
time a traveler stands in line in one of the 8 queues or exits
one of these, the simulation invokes the function
compute_shortest_queue()
; at every point in
time, it has to know the currently shortest queue^{[39]}
so that it can
append the next traveler to be served to just this queue. The
implementation of compute_shortest_queue()
is primitive: It iterates over all 8 queues with a for loop in
order to determine one of minimal length.
It would be wiser, however, always to know which queue is
currently the shortest! For this, we should have a data
structure in that the numbers of the queues can be stored,
ordered by the number of their elements (that is, the number of
travelers contained in it), and from that we always can extract
the number of the queue with the currently smallest number of
elements. But we have just got to know such a data structure:
p_queue
!
To what extent is the use of an additional
p_queue
really wiser? First of all, this
complicates the program text, as we will see at once!
p_queue
, however, is a data structure
whose search, insertion, and deletion operations are implemented
particularly efficient: With n pairs stored in the structure, a
call of insert()
,
del_item()
, and
del_min()
takes time O(log(n)) only;
the other operations even take time O(1) only. In other words,
this means that the shortest queue can be found very fast in the
previous situation.
So we introduce an additional
p_queue<int,int> PQ
, in which pairs (n,q)
are stored. n is the current number of travelers in the queue
with number q. This means that n plays the role of the priority
by that the pairs in the priority queue PQ
are ordered.
What do we have to do after we have determined the number
q of the shortest queue? In case a traveler exits this queue,
the associated n has to be decreased by 1. The method
decrease_p()
does this very fast, namely in
time O(1). The queue with number q has to know an item
referring to the corresponding pair (n,q) in the priority queue
PQ
, though. Therefore, for every queue we have
to memorize an associated item in an additional array
array<pq_item> PQITEM
. We see how the
source code is becoming more and more complex  this kind of
indirections and referencings over different structures is
typical of advanced modeling and programming with LEDA's data
structures.
In case a newly arrived traveler lines up in queue q, we encounter a problem: We cannot increase the associated number n because priorities in priority queues can be decreased only. We manage with a trick here:
Tip  

To increase the priority of a pair (p,i) in a

Here is the improved version of the simulation program from Section 2.7.1:
#include <LEDA/queue.h> #include <LEDA/p_queue.h> #include <LEDA/list.h> #include <LEDA/array.h> #include <LEDA/random_variate.h> #include <iostream> #include "Traveller.h" using leda::queue; using leda::p_queue; using leda::pq_item; using leda::array; using leda::random_variate; using leda::list; using std::cout; array< queue<Traveller*> > Queues(1,8); p_queue<int, int> PQ; // Every queue knows the number of its travellers array<pq_item> PQITEM(1,8); // Every queue knows its item in PQ list<int> Service_times; // needed for computation of variance const int mins_to_simulate = 60*24*7*52; // 1 full year int total_travellers = 0; int travellers_served = 0; int total_service_time = 0; //void compute_shortest_queue(); not needed any more! void output_statistics(); int main() { // initialize priority queue PQ: there are 0 travellers in all 8 queues for(int i = 1; i <= 8; i++) PQITEM[i] = PQ.insert(0,i); // random variate of 'travellers per minute' array<int> weight(0,4); weight[0] = 1; weight[1] = 4; weight[2] = 6; weight[3] = 4; weight[4] = 1; // this gives 2 travellers per minute on the average random_variate R(weight); // for every minute of the time interval do the following for(int time = 0; time < mins_to_simulate; time++) { // some new travellers arrive int num_new_travellers = R.generate(); total_travellers += num_new_travellers; // queing the new travellers for(int i = 0; i < num_new_travellers; i++ ) { Traveller* T = new Traveller(time); // find out queue with minimal length pq_item min_it = PQ.find_min(); int min_queue = PQ.inf(min_it); int min_length = PQ.prio(min_it); // traveller T goes to the queue with minimal length Queues[min_queue].append(T); // this queue has now one more traveller // BUT: one cannot increase priority! // use the following trick: delete and insert with new priority PQ.del_min(); PQITEM[min_queue] = PQ.insert(min_length + 1, min_queue); } // travellers at the front of the queues get served for(int i = 1; i <= 8; i++) if(!Queues[i].empty()) { Traveller* T = Queues[i].top(); if(T>dec_time_to_serve() == 0) { // T leaves the system Queues[i].pop(); // queue i has now one less traveller int old_p = PQ.prio(PQITEM[i]); PQ.decrease_p(PQITEM[i], old_p  1); travellers_served++; //compute_shortest_queue(); not needed any more! int time_used = time  T>get_arrival_time(); total_service_time += time_used; Service_times.append(time_used); delete T; } } } output_statistics(); } // update min_queue: not needed any more! /* void compute_shortest_queue() { for(int j = 1; j <= 8; j++) if(Queues[j].size() < Queues[min_queue].size()) min_queue = j; } */ void output_statistics() { double mean = total_service_time/(double)travellers_served; // compute standard deviation int time; double sum; forall(time, Service_times) sum += (time  mean) * (time  mean); double variance = sum / (Service_times.size()1); double stddev = sqrt(variance); cout << "Total number of travellers arrived = " << total_travellers << "\n"; cout << "Total number of travellers served = " << travellers_served << "\n"; cout << "Total service time = " << total_service_time << "\n"; cout << "Service time per traveller = " << mean << "\n"; cout << "Standard deviation = " << stddev << "\n"; }
Here our use of the priority queue is a little
unconventional: In most applications,
decrease_p()
and
del_min()
are invoked significantly more
often than insert()
.
But the additional priority queue pays off: On the
author's machine, the original simulation takes 8.1 seconds; in
contrast, the accelerated simulation takes only 5.5
seconds. This means a timesaving of 32%, which is quite
astonishing, since the originally used function
compute_shortest_queue()
executes only 8
iterations per call, with few, simple operations! This reveals
how fast the implementation of p_queue
is.
The class p_queue
is implemented by
Fibonacci heaps. With n pairs stored in
it, it requires O(n) storage. The methods
insert()
, del_item()
,
and del_min()
take time O(log n). The
method clear()
, which deletes all pairs
from a p_queue
, takes time O(n). All
other methods take time O(1) only.
Therefore, a p_queue
is worthwhile
particularly in the case that primarily the pair with the
smallest priority is to be found and priorities of certain pairs
are to be decreased.
The implementation of p_queue
is
more efficient than the one of sortseq
,
but it offers less functionality. For example, it cannot be
queried for a key (that is, for a priority). If one wants both
to store keyvalue pairs in the spirit of an associative
container and to have the functionality of a priority queue,
then one should use a sortseq
.
More about the class p_queue
can be
found on the corresponding manual page. For
example, for a pq_item
the subscript operator
[]
returns a reference to the
information of the pair that is referenced by this item; this information
can be read and modified via this reference. This may serve to
perform information modifications in series:
PQ[it1] = PQ[it2] = new_i;
Exercise 42.  Expand the program for the Huffman coding so that it outputs the original text in Huffman coding and calculates the compression rate. Write the inverse program, which calculates the original sequence of the characters from a Huffman coded input (for a given, hardwired Huffman coding). Which data structures of LEDA do you use? 
Exercise 43.  Also a 
Exercise 44.  A primitive implementation of a priority queue consists of a linear list in which the containers contain pairs (p,i) and are ordered by increasing priority. When a new pair (p,i) is added, the queue is searched from the beginning of the list until a position suitable for the insertion of the pair is found. When a priority p of a pair (p,i) is decreased, the pair may have to be moved to the left in the list. Implement this primitive version of a priority queue
with the help of a 
^{[39] }To be more precise, one of the currently shortest queues, because there might be more than one queue with minimal length.