Learning objectives
The class p_queue |
The class p_queue is
LEDA's type for priority queues with unbounded
priorities. Such a queue contains pairs (p,i)
consisting of a priority p and some information i. To every
information i in a p_queue, a priority p is
assigned. The type P of the priorities must be linearly ordered. The
pairs are contained in the queue in ascending order of their
priorities. Every priority p of type P may be used;
so the priorities may occur with unbounded
values. For the information, an arbitrary type I may
be used.
A priority queue with priority type P and
information type I is created by
p_queue<P,I> PQ;
Also this class follows the item container concept:
The pairs themselves can only be accessed via items of type
pq_item.
Such an item is returned when a pair (p,i) is inserted into the
priority queue or when a pair with smallest priority is explicitly
asked for. This is carried out by means of the methods
pq_item it = PQ.insert(p,i);
pq_item it = PQ.find_min();
A pair with smallest priority can be deleted by
P p = PQ.del_min();
This method returns the priority of this pair.
A certain pair can be deleted starting from a
pq_item that refers to this pair:
PQ.del_item(it);
Also starting from an item it, the
priority and the information of the pair (p,i) referenced by this
item can be queried:
P p = PQ.prio(it); I i = PQ.inf(it);
The information can be arbitrarily modified, again starting from an item:
PQ.change_inf(it, new_i);
In contrast, the priority p can only be decreased:
PQ.decrease_p(it, new_p); // new_p <= old_p !
This is due to the especially efficient implementation of priority queues by so-called Fibonacci heaps. With this restriction of the interface also, LEDA follows the classic definition of a priority queue.
![]() | Warning |
|---|---|
This data structure must not be mistaken for an associative container type, in which a value is accessed via a key that this value is associated to. Given a priority, it is not possible to search for an information that has this priority. Conversely, it is not possible either to ask, starting out from an information, whether this information occurs in the queue, and which priority it has. Finally, this structure does not offer any possibility for iterating over the pairs contained in it. If one wants to do this, one has to collect and manage the items returned when inserting the pairs, for example in a list. |
Pairs with equal priority and/or equal information
absolutely may be contained in a
p_queue. When two pairs (p,i) and (p,i')
with equal priority are inserted into the priority queue, there is
no agreement on which of the pairs is taken out of the queue
first; this pair is determined by chance, so to speak. Therefore,
the following must particularly be taken into account:
![]() | Important |
|---|---|
Let (p,i) and (p,i') be two pairs with equal priority
p, inserted into a |
Huffman coding is a compression algorithm that is particularly efficient on files that contain textual, natural-language data. This means files that consist of certain characters of a character set some of which occur with a large frequency (as the character “e” in English), others only with low frequency (as the character “z” in English). Huffman coding achieves very good compression rates on such files.
How does this method work? Suppose we have the following text, which consists of 100 characters occurring with a different frequency:
caaabacfaabbebaaacbabdaaceadcccaaaaaaacabaaafbcaad bacdaaaacdccacbaafbcfaaccaaaadadcdcddcabeaeefabfaa
How many bits do we need to code this text? As we can see, only
the characters a, b, c,
d, e, and
f appear in it. The simplest possibility for a
coding is to represent every character with the same fixed
number of bits. This is called fixed-length coding.
Since only 6 characters occur altogether, 3 bits per character
suffice for this coding. (It is 2^3=8, that is, we even could represent 8
different characters with this coding.) Table 4.1 shows a possible coding with fixed
length. So we need 100·3 = 300 bits altogether for the coding of
the complete text.
Can we get by on less bits by a different coding? We make
the following observation: The character a appears with the highest
frequency by far in the text; it appears 46 times. On the other hand,
e appears only 5
times. Therefore, using just as many bits for every a as for every e has waste written all over it. Rather,
we should code an a with as few
bits as possible, for example, with 1 bit only (for example, with a 0),
and accept that the coding of characters that occur rarely, such
as the e, requires more than 3 bits. Such a
coding in which the number of bits may be different from character
to character is referred to as variable-length coding.
Table 4.1 shows such a coding with variable
length. Here we need only 46·1 + 13·3 +
20·3 + 10·3 + 5·4 + 6·4 = 219
bits to code the text, that is, we achieve a compression rate of 219/300 = 73%.
So the coding with variable length is 27% shorter than the coding with
fixed length.
Table 4.1. The Huffman code
| a | b | c | d | e | f | |
| Frequency | 46 | 13 | 20 | 10 | 5 | 6 |
| Coding with fixed length | 000 | 001 | 010 | 011 | 100 | 101 |
| Huffman coding | 0 | 110 | 111 | 100 | 1010 | 1011 |
The coding with variable length presented in Table 4.1 is called a prefix-free code. This means that the coding of a character is never a prefix of the coding of another character, which makes coding and decoding of a complete text simple; see also Exercise 42. Such a coding can always be represented as a binary tree. Every leaf corresponds to a character of the character set. Every edge is marked with 0 or 1. The coding of a character results from the concatenation of all markings on the path from the root to the corresponding leaf. Figure 4.1 shows the tree that corresponds to the above prefix-free coding.
Figure 4.1. Huffman coding as a complete binary tree

A tree that corresponds to the Huffman coding from Table 4.1. Every leaf corresponds to a character and contains the frequency that this character occurs with. Every node contains the sum of the markings of its children as a marking. The child with the smaller marking always is the left child. The edge incident to this child is marked with 0, the edge incident to the right child with 1. The Huffman coding of a character corresponds to the sequence of edge markings on the path from the root to this character.
An optimal prefix-free coding (for a predefined frequency distribution of individual characters) is a prefix-free coding that achieves the best compression rate of all possible prefix-free codings. The coding from Table 4.1 and the accompanying tree from Figure 4.1 represent an optimal prefix-free coding for the original text; they were created by a method that calculates such an optimal prefix-free coding.
This method was invented by David Huffman and carries his name today: the Huffman code. The basic idea is the following: The leaves of the tree correspond to the characters to be coded. The characters that occur most seldom should acquire the longest coding, that is, they should have the longest paths from the root. Therefore, first all characters are put into a priority queue ordered by their frequency, and the two characters with the lowest priority (that is, the lowest frequency) are pulled out of the queue. For these two characters, the last bit of their coding is known now, that is, the marking of the last edge on the path from the root: The rarest character is given a 1, the second rarest character a 0. They are given a common parent node in the tree to be created; the priority of this node results as the sum of the frequencies (priorities) of the two children. This parent node is put in order into the priority queue now, and the method goes into the next iteration: It pulls out the two nodes with the lowest priority from the queue, puts them in order under a common parent node, which contains the sum of the priorities of the two children as its priority, etc. Figure 4.2 shows this process for the distribution from Table 4.1. We do not want to give reasons here why the method always returns an optimal prefix-free coding; concerning this matter, we refer to secondary literature (for example, to the description in Introduction to Algorithms, which we follow here all in all).
Figure 4.2. Construction of the Huffman code

A priority queue is filled with the characters to be coded; the priority corresponds to the frequency of the character. Every character is a leaf of a tree to be constructed. In every step, the two partial trees with the lowest priority are united. In this process, the left partial tree is the tree with the smaller priority. The edge incident to it is marked with 0, the edge incident to the right partial tree with 1. The priority of the tree being newly generated is the sum of the priorities of the two partial trees being united. The method stops as soon as the common root is the only node contained in the priority queue. The coding of a character results as the sequence of edge markings on the way from the root to the corresponding leaf.
The following program uses a
p_queue to realize the priority queue
occurring in Huffman's algorithm. At first, it reads standard
input and counts in a d_array which
character occurs how often therein. Starting out from this
frequency distribution, it then builds up the tree
explicitly. It uses a structure
huffman_node to implement the nodes
occurring in the tree. This structure stores a reference to the
respective parent node, the priority of the node, and the
information whether this node is a left or a right child of its
parent. In the end, every character must know the leaf of the
tree belonging to it so that the method is able to climb up the
path from there to the root. For every character the program
therefore memorizes the associated leaf in an additional
d_array.
#include <LEDA/d_array.h>
#include <LEDA/p_queue.h>
#include <LEDA/string.h>
using leda::d_array;
using leda::p_queue;
using leda::string;
using std::cin;
using std::cout;
class huffman_node {
public:
huffman_node *parent;
int priority;
bool is_left_child;
// set parent to nil when constructing a new huffman_node
huffman_node() : parent(nil) {};
};
int main()
{
d_array<char,int> H;
char c;
// read all characters from stdin and count their frequency
while(cin >> c) H[c]++;
p_queue<int, huffman_node*> PQ;
d_array<char, huffman_node*> leaf_belonging_to_char;
// for every character insert a huffman node into the priority queue
// this node is a leaf of the tree to be constructed
forall_defined(c,H) {
cout << c << ": " << H[c] << std::endl;
huffman_node *new_leaf = new huffman_node;
new_leaf->priority = H[c];
PQ.insert(H[c], new_leaf);
leaf_belonging_to_char[c] = new_leaf;
}
// construct the Huffman code
cout << "Constructing the Huffman code for all read characters.\n";
while(PQ.size() > 1) {
// extract two children with minimal priority
huffman_node *left_child = PQ.inf(PQ.find_min());
PQ.del_min();
huffman_node *right_child = PQ.inf(PQ.find_min());
PQ.del_min();
// construct new parent node
huffman_node *new_parent = new huffman_node;
left_child->parent = right_child->parent = new_parent;
left_child->is_left_child = true;
right_child->is_left_child = false;
new_parent->priority = left_child->priority + right_child->priority;
// insert new parent into priority queue
PQ.insert(new_parent->priority, new_parent);
// control output
cout << "New parent created with priority ";
cout << new_parent->priority << " = ";
cout << left_child->priority << " + " << right_child->priority << "\n";
}
cout << "The Huffman codes are:\n";
// output the codes: climb from all leaves to the root
forall_defined(c, H) {
cout << c << ": ";
huffman_node *climber = leaf_belonging_to_char[c];
string huffman_code; // collect 0s and 1s on the path to the root
while(climber->parent != nil) { // do nothing for the root
if(climber->is_left_child)
huffman_code = "0" + huffman_code;
else
huffman_code = "1" + huffman_code;
climber = climber->parent;
}
cout << huffman_code << std::endl;
}
}
With the previous text of 100 characters as input, the program outputs the following:
a: 46 b: 13 c: 20 d: 10 e: 5 f: 6 Constructing the Huffman code for all read characters. New parent created with priority 11 = 5 + 6 New parent created with priority 21 = 10 + 11 New parent created with priority 33 = 13 + 20 New parent created with priority 54 = 21 + 33 New parent created with priority 100 = 46 + 54 The Huffman codes are: a: 0 b: 110 c: 111 d: 100 e: 1010 f: 1011
The main loop asks by means of the method
PQ.size();
whether at least 2 nodes are still contained in the priority queue. In the end, the queue consists of the root node only, which does not have to be processed any more.
In the program for
the simulation of the queuing strategy at Saarbrücken Central
Station, we left a hint on how to accelerate it: Every
time a traveler stands in line in one of the 8 queues or exits
one of these, the simulation invokes the function
compute_shortest_queue(); at every point in
time, it has to know the currently shortest queue[39]
so that it can
append the next traveler to be served to just this queue. The
implementation of compute_shortest_queue()
is primitive: It iterates over all 8 queues with a for loop in
order to determine one of minimal length.
It would be wiser, however, always to know which queue is
currently the shortest! For this, we should have a data
structure in that the numbers of the queues can be stored,
ordered by the number of their elements (that is, the number of
travelers contained in it), and from that we always can extract
the number of the queue with the currently smallest number of
elements. But we have just got to know such a data structure:
p_queue!
To what extent is the use of an additional
p_queue really wiser? First of all, this
complicates the program text, as we will see at once!
p_queue, however, is a data structure
whose search, insertion, and deletion operations are implemented
particularly efficient: With n pairs stored in the structure, a
call of insert(),
del_item(), and
del_min() takes time O(log(n)) only;
the other operations even take time O(1) only. In other words,
this means that the shortest queue can be found very fast in the
previous situation.
So we introduce an additional
p_queue<int,int> PQ, in which pairs (n,q)
are stored. n is the current number of travelers in the queue
with number q. This means that n plays the role of the priority
by that the pairs in the priority queue PQ
are ordered.
What do we have to do after we have determined the number
q of the shortest queue? In case a traveler exits this queue,
the associated n has to be decreased by 1. The method
decrease_p() does this very fast, namely in
time O(1). The queue with number q has to know an item
referring to the corresponding pair (n,q) in the priority queue
PQ, though. Therefore, for every queue we have
to memorize an associated item in an additional array
array<pq_item> PQITEM. We see how the
source code is becoming more and more complex - this kind of
indirections and referencings over different structures is
typical of advanced modeling and programming with LEDA's data
structures.
In case a newly arrived traveler lines up in queue q, we encounter a problem: We cannot increase the associated number n because priorities in priority queues can be decreased only. We manage with a trick here:
![]() | Tip |
|---|---|
To increase the priority of a pair (p,i) in a
|
Here is the improved version of the simulation program from Section 2.7.1:
#include <LEDA/queue.h>
#include <LEDA/p_queue.h>
#include <LEDA/list.h>
#include <LEDA/array.h>
#include <LEDA/random_variate.h>
#include <iostream>
#include "Traveller.h"
using leda::queue;
using leda::p_queue;
using leda::pq_item;
using leda::array;
using leda::random_variate;
using leda::list;
using std::cout;
array< queue<Traveller*> > Queues(1,8);
p_queue<int, int> PQ; // Every queue knows the number of its travellers
array<pq_item> PQITEM(1,8); // Every queue knows its item in PQ
list<int> Service_times; // needed for computation of variance
const int mins_to_simulate = 60*24*7*52; // 1 full year
int total_travellers = 0;
int travellers_served = 0;
int total_service_time = 0;
//void compute_shortest_queue(); not needed any more!
void output_statistics();
int main()
{
// initialize priority queue PQ: there are 0 travellers in all 8 queues
for(int i = 1; i <= 8; i++)
PQITEM[i] = PQ.insert(0,i);
// random variate of 'travellers per minute'
array<int> weight(0,4);
weight[0] = 1;
weight[1] = 4;
weight[2] = 6;
weight[3] = 4;
weight[4] = 1;
// this gives 2 travellers per minute on the average
random_variate R(weight);
// for every minute of the time interval do the following
for(int time = 0; time < mins_to_simulate; time++) {
// some new travellers arrive
int num_new_travellers = R.generate();
total_travellers += num_new_travellers;
// queing the new travellers
for(int i = 0; i < num_new_travellers; i++ ) {
Traveller* T = new Traveller(time);
// find out queue with minimal length
pq_item min_it = PQ.find_min();
int min_queue = PQ.inf(min_it);
int min_length = PQ.prio(min_it);
// traveller T goes to the queue with minimal length
Queues[min_queue].append(T);
// this queue has now one more traveller
// BUT: one cannot increase priority!
// use the following trick: delete and insert with new priority
PQ.del_min();
PQITEM[min_queue] = PQ.insert(min_length + 1, min_queue);
}
// travellers at the front of the queues get served
for(int i = 1; i <= 8; i++)
if(!Queues[i].empty()) {
Traveller* T = Queues[i].top();
if(T->dec_time_to_serve() == 0) {
// T leaves the system
Queues[i].pop();
// queue i has now one less traveller
int old_p = PQ.prio(PQITEM[i]);
PQ.decrease_p(PQITEM[i], old_p - 1);
travellers_served++;
//compute_shortest_queue(); not needed any more!
int time_used = time - T->get_arrival_time();
total_service_time += time_used;
Service_times.append(time_used);
delete T;
}
}
}
output_statistics();
}
// update min_queue: not needed any more!
/*
void compute_shortest_queue() {
for(int j = 1; j <= 8; j++)
if(Queues[j].size() < Queues[min_queue].size())
min_queue = j;
}
*/
void output_statistics() {
double mean = total_service_time/(double)travellers_served;
// compute standard deviation
int time;
double sum;
forall(time, Service_times)
sum += (time - mean) * (time - mean);
double variance = sum / (Service_times.size()-1);
double stddev = sqrt(variance);
cout << "Total number of travellers arrived = "
<< total_travellers << "\n";
cout << "Total number of travellers served = "
<< travellers_served << "\n";
cout << "Total service time = " << total_service_time << "\n";
cout << "Service time per traveller = " << mean << "\n";
cout << "Standard deviation = " << stddev << "\n";
}
Here our use of the priority queue is a little
unconventional: In most applications,
decrease_p() and
del_min() are invoked significantly more
often than insert().
But the additional priority queue pays off: On the
author's machine, the original simulation takes 8.1 seconds; in
contrast, the accelerated simulation takes only 5.5
seconds. This means a time-saving of 32%, which is quite
astonishing, since the originally used function
compute_shortest_queue() executes only 8
iterations per call, with few, simple operations! This reveals
how fast the implementation of p_queue
is.
The class p_queue is implemented by
Fibonacci heaps. With n pairs stored in
it, it requires O(n) storage. The methods
insert(), del_item(),
and del_min() take time O(log n). The
method clear(), which deletes all pairs
from a p_queue, takes time O(n). All
other methods take time O(1) only.
Therefore, a p_queue is worthwhile
particularly in the case that primarily the pair with the
smallest priority is to be found and priorities of certain pairs
are to be decreased.
The implementation of p_queue is
more efficient than the one of sortseq,
but it offers less functionality. For example, it cannot be
queried for a key (that is, for a priority). If one wants both
to store key-value pairs in the spirit of an associative
container and to have the functionality of a priority queue,
then one should use a sortseq.
More about the class p_queue can be
found on the corresponding manual page. For
example, for a pq_item the subscript operator
[] returns a reference to the
information of the pair that is referenced by this item; this information
can be read and modified via this reference. This may serve to
perform information modifications in series:
PQ[it1] = PQ[it2] = new_i;
Exercise 42. | Expand the program for the Huffman coding so that it outputs the original text in Huffman coding and calculates the compression rate. Write the inverse program, which calculates the original sequence of the characters from a Huffman coded input (for a given, hard-wired Huffman coding). Which data structures of LEDA do you use? |
Exercise 43. | Also a |
Exercise 44. | A primitive implementation of a priority queue consists of a linear list in which the containers contain pairs (p,i) and are ordered by increasing priority. When a new pair (p,i) is added, the queue is searched from the beginning of the list until a position suitable for the insertion of the pair is found. When a priority p of a pair (p,i) is decreased, the pair may have to be moved to the left in the list. Implement this primitive version of a priority queue
with the help of a |
[39] To be more precise, one of the currently shortest queues, because there might be more than one queue with minimal length.