Click here to Skip to main content
15,867,986 members
Articles / Programming Languages / Objective C

A Lazy Stream Implementation in C++11

Rate me:
Please Sign up or sign in to vote.
4.87/5 (21 votes)
25 Jun 2018CPOL9 min read 44.2K   373   38   9
A lazy stream has been implemented in C++11, so as to highlight the functional capabilities of this new specification

Introduction

After attending a free introductory course on functional programming at Coursera, I got fascinated by the power of lazy (non-strict) collections. I was taught the basic functioning of the Stream class in Scala (basically a simply linked list with non-strict evaluation). This was a radical change in my view on how collections could be structured and how their elements could be evaluated.

Coming from a mostly imperative world (as a C++ hobbyist), functional programming was an exotic mythical beast I was yearning to tame into a language I felt much more comfortable with.

Browsing on the Internet, my first inspiration came from Example 2 in this Louis Botterill’s techy blog insightful post. In this example, Louis depicts a home-made but very inspiring imitation of the Scala Stream class. I took his design and tried to poured it into an equivalent C++03 class, with no success at all after several days. It was not until I moved to C++11 when I began to see the light at the end of the tunnel… But it is not yet time to show the innards of the resulting implementation.

The Fascination of Infinite Streams

Let's have a look at this simple instruction:

C++
lazy_stream<int> stream_a = lazy_stream<int>::from(3);    

To put it simply, stream_a is the set of all integers bigger than 2, i.e., {3, 4, 5,…}. It represents an infinite set on which we can apply some intriguing operations we may not be used to. The first one is filtering:

C++
lazy_stream<int> stream_b = stream_a.filter(
    [](int value){
        return value % 4 == 0;
        });    

We have just created another infinite set called stream_b. In this case, it is the subset of all multiples of 4 in stream_a. The filtering is achieved by means of an anonymous (lambda) function, in the above example, the pattern: [](...){...}. The operation represents the set {4, 8, 12,…}.

Ok. Nice, but how can I hold an infinite set in memory? In fact, you can’t, but you do not need to. At the point stream_a is created, the only known element is the head of the stream, i.e., the number 3. The rest (called the tail) is only an embryonic, not yet born stream that can be created by invoking a function we might call tail generator. To invoke the tail generator, we use the tail method. Thus:

C++
stream_a.tail();  

creates a brand new lazy stream whose head is the number 4 and whose tail is another tail generator (just a function).

But not all the operations are equally lazy. from is pure lazy, but filter is not so. In fact, it internally calls tail until the first filter-compliant occurrence is found (number 4 in our example). It is easy to see that if no such occurrence is found, we will get into trouble (remember: the stream is infinite).

Let’s have a look at another pure lazy operation, map:

C++
typedef std::pair<int, int> pair_type;                                
lazy_stream<pair_type> stream_c = stream_b.map<pair_type>(
    [](int value){ 
        return std::pair<int, int>(value, value + 1);
        });   

We have created a new lazy stream stream_c by mapping every value element of stream_b into the pair (value, value+1). stream_c is then: {(4, 5), (8, 9), (12, 13),…}.

Once an infinite stream is created, we may be soon interested in exploring its elements. To take the (say) first 10 elements of stream_c, we write:

C++
lazy_stream<pair_type> stream_d = stream_c.take(10); 

Observe that stream_d is a finite stream. In fact, we can take its size:

C++
std::cout << stream_d.size() << std::endl; // Prints 10. 

As you may imagine, size is a strict (non-lazy) operation. You have to evaluate all the elements of a lazy stream in order to know how many they are.

If we are only interested in getting a single element, we can use the get method. It just retrieves the element of a specified index n {n=0, 1, 2,…}. Unfortunately, it must evaluate the prior n-1 elements as well.

C++
std::cout << stream_b.get(2) << std::endl; // Prints 12. 

A worn-out but very illustrative example of a lazy stream is the extraction of the first n prime number by means of the so called Sieve of Eratosthenes.

Without entering in more explanations, this could be the code:

C++
// The sieve function object does the hard work.
std::function<lazy_stream<int> (const lazy_stream<int>&)> sieve =
[&sieve](const lazy_stream<int>& start){
    int head = start.head();
    // We filter the primes with respect to head.
    lazy_stream<int> temp = start.filter([head](int value){
        return value % head > 0;
        });
    // We return a new lazy stream with its own head and tail generator.
    return lazy_stream<int>(head, [&sieve, temp](){
        return sieve(temp);
        });
}; 
// prime_numbers holds all the prime numbers.
lazy_stream<int> prime_numbers = sieve(lazy_stream<int>::from(2));
// Let’s take the first 100 prime numbers and pour them into a std::list.
std::list<int> prime_number_list = prime_numbers.take(100).to_list();   

The Not So Fancy Finite Lazy Streams

Not so gorgeous, but they do not look that bad after all. To compose a finite lazy stream, we just prepend elements to an already existing finite stream. The simplest finite lazy stream is the empty one, also called nil.

C++
lazy_stream<int> stream_e = 1 & (2 & lazy_stream<int>::nil);

The use of parentheses is a pity, but all the right associative operators in C++ can be overloaded only as members of a class, so I have been compelled to use a left associative operator &. The equivalent Scala operator #:: needs no parentheses, for instance.

Some interesting notes:

  • The size of stream_e is 2.
  • lazy_stream<int>::nil is equivalent to lazy_stream<int>().
  • We can always prepend an element to an already lazy stream, no matter whether this is finite or infinite.

Save the generation function from, the rest of operations (filter, map, take, etc.) are equally available for finite streams. For integral types, we can create, for instance, finite ranges:

C++
// This is the lazy finite set {100, 99, 98,… ,2}. 
lazy_stream<int> stream_f = lazy_stream<int>::range(100, 1);

There are other operations it is not worth mentioning. I just would like to highlight one very important reducing function in functional languages: fold_left. This operation first takes a starting value of type U (let’s call it start). Then, with each one of the elements in the stream, it operates a given function op which takes the start value plus the element value and stores the result in start. It then proceeds with the next element an so on. The final result is a single value of type U.

Let’s add the double 0.5 plus all the elements in stream_f:

C++
// Observe that stream_f is a lazy stream of type int.
// The result is of type double instead. 
// (Types of start and result must be the same).
double addition = stream_f.fold_left<double>(0.5)(
    [](double accum, int element){
        return accum + element;
        });                

Purpose of the lazy_stream<> Class

With the writing of lazy_stream<>, I did not intend to make a full-fledged, optimized and efficient piece of code. I just wanted to realized how the new C++11 specification could significantly move the scope of the language to a more functional (less imperative) view.

In fact, one of features of lazy_stream<> class is that all its objects are immutable, i.e., once created, they cannot be modified. This is a common trait of functional programming objects, generally disregarded by imperative languages, because immutability sometimes leads to a lesser memory efficiency; many times it is much cheaper to modify an object than copying it.

On the other hand, I think this design would have been very tricky if not impossible with previous specifications of C++. I will discuss this idea later.

Implementation Details

Both finite and infinite streams keep three smart pointers to dynamically allocated information in the heap, namely:

  1. A std::shared_ptr smart pointer, called head_ptr_, to the head (a value of generic type T).
  2. A std::shared_ptr smart pointer, called tail_ptr_, to the stream tail (another lazy_stream<T>).
  3. A std::shared_ptr smart pointer, called tail_gen_ptr_, to the stream tail generator (a function class of type std::function<lazy_stream<T> ()>).

Pointers [2] and [3] are mutually exclusive in order to keep the design consistent. Thus, one of them is directly set to nullptr since object creation.

The notation std::function<lazy_stream<T> ()>, aliased in the code as tail_gen_type, is quite weird to a C++03 developer eyes. The above type can wrap any function with no input parameters and returning a lazy stream. That includes:

  1. Plain standalone functions
  2. Member functions (static or non-static)
  3. Lambda (anonymous) functions, with or without closures

The use of tail_gen_type is the keystone of the whole design. The use of lambdas (with or without closures) representing tail generators is widespread along the code. All those functions can be wrapped (or represented) by the same type: tail_gen_type.

As an illustrative example, let’s have a look at take method implementation. Remember that this method takes an input stream and returns a new stream with the first n elements of the former.

C++
// take method implementation
template <typename T>
lazy_stream<T> lazy_stream<T>::take(size_t n) const
{
    if(empty_ || n<1)
        return lazy_stream<T>();         // [1]    
    const tail_type& tail_ = tail();     // [2]
    return lazy_stream<T>
            (*head_ptr_, [tail_, n]() -> lazy_stream<T> {return tail_.take(n-1);}) // [3]
}   

This method, as well as the rest of methods in lazy_stream<>, is const.

If this stream is empty or if we ask for less than one element, we just return an empty stream [1]. Otherwise, we evaluate the tail stream, invoking tail() [2]. Eventually, we create and return a brand new lazy stream [3] whose head is this’s head and whose tail generator is given by the lambda:

C++
[tail_, n]() -> lazy_stream<T> { 
    return tail_.take(n-1); 
    });  

Observe that once the new lazy stream is created and returned, no further actions are taken. At this point, we only know the head of this second stream (*head_ptr_). If we now invoke tail() on this newborn stream, a third lazy stream will be created by evaluating:

C++
tail_.take(n-1); 

in the lambda, where tail_ is the tail of the first stream.

We have been lazy, but in fact we could have been even more. Consider this alternative (but not implemented) code:

C++
// Alternative take method implementation
template <typename T> 
lazy_stream<T> lazy_stream<T>::take(size_t n) const
{
    if(empty_ || n<1)
        return lazy_stream<T>();
    const tail_ptr_type& tail_ptr = tail_ptr_;
    const tail_gen_ptr_type& tail_gen_ptr = tail_gen_ptr_;
    return lazy_stream<T>(*head_ptr_,
                          [tail_ptr, tail_gen_ptr, n]() -> lazy_stream<T> {
                          // This cannot happen
                          assert(!(tail_ptr && tail_gen_ptr));
                          if(tail_ptr)
                              return tail_ptr->take(n-1);
                          return (*tail_gen_ptr)().take(n-1);
                          });
} 

The above implementation is indeed a little more efficient than the original one: we do not pass the stream head into the lambda. Besides, it is a little lazier: the first invocation of tail() is inside the lambda. I have disregarded it only for a subjective reason: it loses clarity with respect to the original one and it is a little more verbose. Feel free to choose what you like more.

Because the lazy streams herein developed are immutable, they cannot be assigned (assignment operation is disabled). Copy-construction, on the other hand, is available at a reasonable cost. As we already know, a lazy stream object stores its information in the heap, keeping only two active smart pointers pointing to it. When copying, we are actually sharing resources ownership.

Implementation Pitfalls in C++03

My first attempt after the Scala course was to write a lazy stream class in C++03. After racking my brains for a long while, I gave up. In C++03, there are no lambdas, so in order to approach them, I tried to use function classes (functors). For instance, let’s turn the following lambda into a functor:

C++
[tail_, n]() ->lazy_stream<T> {
    return tail_.take(n-1);
    });  

The equivalent functor could be:

C++
template <typename T>
class take_functor
{
    typedef lazy_stream<T> tail_type;
    T n_;            // Closure parameter
    tail_type tail_; // Closure parameter
public:
    take_functor(const T& n, const tail_type& tail) : n_(n), tail_(tail) {}
    tail_type operator()() const
    {
        return tail_.take(n_-1);
    }
}; 

The essential difference lies in the fact that every lambda function with symbolic: type ()=>lazy_stream<T> (in Scala notation), no matter how it may be, can be represented by a std::function<lazy_stream<T> ()>. As far as I know, an analogous representation for functors is not possible in C++03. Moreover, std::function does not even exist in C++ specifications prior to 2011. Thus, when using C++03, I was not able to give all the functors I wrote a common representation. What unique type in C++03 can hold a functor with any closure parameters representing a function of type: ()=>lazy_stream<T>? I think none. So the key problem was I could not figure out how to write a tail generator in C++03.

Source Code

The lazy_stream<> implementation code has been attached, along with a main sample code. Only a subset of the corresponding Scala methods (the most insightful, from my viewpoint) has been translated. The class allows extension and further optimization.

The code has been written with Code::Blocks and compiled with MinGw (mingw32-gcc-g++ 5.3.0-3). It has also been compiled with MS Visual Studio 2015.

History

  • December 2012. Initial version
  • May 2014. Version 1.1. Main changes
    • Constructors have been templatized in order to support rvalues arguments. (Thanks to Lukasz Gwizdz, another CodeProject member, for his insight)
    • New member functions added: filter_not, find_first, reduce_left, etc.
  • June 2018. Version 1.2. Main changes:
    • Bug corrected in reduce_left method
    • Added solution for MS Visual Studio 2015

References

  1. Coursera. Functional Programming Principles in Scala. Martin Odersky (www.coursera.org)
  2. Louis Botterill’s mostly software and techy Blog. Scala, lazy evaluation and the Sieve of Eratosthenes (http://louisbotterill.blogspot.com.es/2009/07/scala-lazy-evaluation-and-sieve-of.html)
  3. Wikipedia. Sieve of Eratosthenes (http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes)
  4. C/C++ Operator Precedence and Associativity (http://www.anne-gert.nl/techcorner/cpp/cpp-operators.html)
  5. Functors: Function Objects in C++. Alex Allain (http://www.cprogramming.com/tutorial/functors-function-objects-in-c++.html)

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Systems Engineer
Spain Spain
I work as a senior industrial engineer for Public Administration in Spain. I have experience in developing software for ballistic computations. I like maths and programming and, above all, riding my mountain bike. Contact me at davidalvi (at gmail dot com).

Comments and Discussions

 
QuestionFajny artykuł Pin
Member 115785053-Apr-15 2:43
Member 115785053-Apr-15 2:43 
SuggestionSome related code (~100 LOCs) and paper - using ANSI C++ (circa 2005) Pin
mborowczak10-Jun-14 5:38
mborowczak10-Jun-14 5:38 
GeneralRe: Some related code (~100 LOCs) and paper - using ANSI C++ (circa 2005) Pin
David Serrano Martínez10-Jun-14 21:04
David Serrano Martínez10-Jun-14 21:04 
SuggestionUse boost::function to hold your functor and boost::bind to capture your variables for C++03 Pin
Shao Voon Wong20-May-14 21:42
mvaShao Voon Wong20-May-14 21:42 
QuestionVote of 5 Pin
Ganesan Senthilvel19-May-14 7:50
Ganesan Senthilvel19-May-14 7:50 
GeneralMy vote of 5 Pin
hatemtaleb28-Aug-13 7:48
hatemtaleb28-Aug-13 7:48 
GeneralMy vote of 5 Pin
Ștefan-Mihai MOGA15-Jan-13 6:51
professionalȘtefan-Mihai MOGA15-Jan-13 6:51 
GeneralRe: My vote of 5 Pin
David Serrano Martínez19-Feb-13 3:43
David Serrano Martínez19-Feb-13 3:43 
GeneralMy vote of 4 Pin
ahlav21-Dec-12 6:25
ahlav21-Dec-12 6:25 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.