Paul Cowan

Nomadic cattle rustler and inventor of the electric lasso

Clojure - Lazy Sequences for the Masses

I have recently fallen head over heals in love with clojure and I’ve decided to run through the problems in the excellent 4Clojure site to ensure that I have a good understanding of the language before I even think about dealing with IO or building something meaningful. I’m currently on question 60 which has introduced the concept of lazy sequences and is displayed below:

r.clj
1
2
3
4
5
Write a function which behaves like reduce, but returns each intermediate value of the
reduction.  Your function must accept either two or three arguments, and the return
sequence must be lazy.

(= (take 5 (__ + (range))) [0 1 3 6 10])

What struck me as totally strange when I first read this problem is that an infinite list is being passed to the function in the form of (range). I had only previously seen range used with at least an end argument like below:

range.clj
1
2
user=> (range 10)
(0 1 2 3 4 5 6 7 8 9)

But if no start or end arguments are supplied then the default for end is infinity.

d.clj
1
2
3
;; default value of 'end' is infinity
user=> (range)
(0 1 2 3 4 5 6 7 8 9 10 ... 12770 12771 12772 12773 ... n

So why on earth would you want to supply an ever expanding range? The answer to this puzzle within a puzzle was to clarify what range actually returns which according to the docs is:

Returns a lazy seq of nums from start (inclusive) to end (exclusive), by step, where start defaults to 0, step to 1, and end to infinity.

The 4clojure problem at the start of this post also asks the user to return a lazy sequence for the solution. So what is a lazy sequence? As a newbie to clojure, I struggled to find good information of what an actual lazy sequence was and the docs left me with an unclear understanding of how I should use lazy-seq to construct a lazy sequence.

Lazy Sequences

Laziness in this context means that you can specify a computation that could theoretically take forever to complete, but you can evalutate as much or as little of it as you need. A very simple example of this is below:

take.clj
1
2
user=> (take 10 (range))
(0 1 2 3 4 5 6 7 8 9)

In the above example, range is being called with no end argument which according to the docs will mean that end defaults to infinity but if we type the expression into the repl, a sequence with the first 10 elements is immediately evaluated so it would appear that take 10 is not waiting for range to reach infinity before grabbing the first 10 elements.

A closer look at what type of sequence is returned starts to give the game away:

class.clj
1
2
user=> (class (take 10 (range)))
clojure.lang.LazySeq

As range with no end argument defaults to infinity, the only way that this could possibly work is if range returns a lazy sequence or one that realizes its elements as requested. The original 4Clojure problem stated that the sequence returned from the solution must be lazy in order to work with an infinite range:

t.clj
1
(= (take 5 (__ + (range))) [0 1 3 6 10])

After a quick google to determine how I can return a lazy sequence, I came across the clojure docs for the lazy-seq function but I did not fully grasp what the explanation meant:

Takes a body of expressions that returns an ISeq or nil, and yields
a Seqable object that will invoke the body only the first time seq
is called, and will cache the result and return it on all subsequent
seq calls. See also - realized?

In my haste to complete the problem, I simply thought that all I had to do was wrap an existing function in a lazy-seq call and everything would just magically work. Here is my original attempt at solving the question:

first.cljs
1
2
3
4
5
(= (take 5 ((fn [func se]
    (lazy-seq
        (reduce (fn [acc item]
                      (conj acc (func (last acc) item)))
                        [(first se)] (rest se)))) +  (range))) [0 1 3 6 10])

All I have done is wrap a call to reduce on line 3 in a call to lazy-seq on line 2 which of course resulted in a stackoverflow because reduce does not return a lazy sequence and instead will execute until the computation is complete which in this case is never because reduce is being called on an infinite list. I needed to go back to the drawing board. What I needed to do was create a sequence that is realized as needed.

After a bit more digging it became apparent that lazy sequences are not really sequences that simply return a list of elements but are more comparable to iterators in other languages such as java that conform to an a sequence api. With a lazy sequence it is up to the client or the calling function to decide how many elements to consume. A lazy function is just a data structure that waits until you ask it for a value. So how does one return a lazy sequence?

The Simplest Example Imaginable

Below is the simplest example that I could think of:

simple.clj
1
2
3
4
(defn add-one [se]
  (lazy-seq (cons (inc (first se)) (add_one (rest se)))))

(take 5 (add-one (range))) ;;;;;;=>(1 2 3 4 5)
  • Line 1 defines a function named add-one that accepts a sequence that could be infinite.
  • Line 2 is where the action happens, the lazy-seq macro is employed to create a lazy sequence whose body is an expression. The body of this lazy sequence uses the cons function which takes an element and a sequence and returns a new sequence by prepending the element to the sequence. The element in this case is the result of incrementing the first element of the list with the expression:
inc.clj
1
(inc (first se))

The sequence that the element is prepended to is the result of recursively calling the add-one function again. If a lazy sequence was not returned from add-one then add-one would be called over and over again until a stackoverflow exception was thrown. The whole point of the lazy-seq macro is to circumvent this and ensure that the function will only be accessed as the elements are requested. A sequence that is returned from lazy-seq contains the requested element and a pointer or a function to get the next element.

Another mistake I made in my original misunderstanding was that cons was returning a lazy sequence or was somehow involved in returning a lazy sequence. cons does not return a lazy sequence (the thing that does that is lazy-seq), but it is a good tool to use as part of buidling a lazy sequence.

The 4clojure Solution

Before I show my solution, I will remind you of the original 4Clojure problem.

red.clj
1
2
3
4
Write a function which behaves like reduce, but returns each intermediate value of the reduction.
Your function must accept either two or three arguments, and the return sequence must be lazy.

(= (take 5 (__ + (range))) [0 1 3 6 10])

The question is really asking the user to recreate clojure’s core reductions function. Below is my solution that took me quite a while to get to but I am very pleased with.

res.clj
1
2
3
4
5
6
7
8
9
10
11
12
(= (take 5 (
  (fn my-reduct
    ([func coll]
       (my-reduct func (first coll) (rest coll)))

    ([func firstArg coll]
      (letfn [(reduct [f init se]
                (lazy-seq (when-not (empty? se)
                            (let [res (f init (first se))]
                              (cons res (reduct f res (rest se)))))))]
        (lazy-seq (cons firstArg (reduct func firstArg coll))))))
                    + (range))) [0 1 3 6 10])

The important things to note are that an infinite range is passed into the named anonymous function my-reduct on line 12 but becase a lazy sequence is returned on line 8 and line 11, the resulting sequence will only be returned as requested. Because the take function is used on line line 1 to request the lazy sequence with an arguement of 5, the function will only be called 5 times.

I found lazy sequences very difficult to wrap my head around and part of that process was to write this post.

If I have inaccuately described anything or I can improve this explanation then please leave a comment below.

Comments