Functional Data Structures in Java 8 with Javaslang

Java 8’s lambdas (λ) empower us to create wonderful API’s. They incredibly increase the expressiveness of the language.

Javaslang leveraged lambdas to create various new features based on functional patterns. One of them is a functional collection library that is intended to be a replacement for Java’s standard collections.

Javaslang Collections

(This is just a bird’s view, you will find a human-readable version below.)

Functional Programming

Before we deep-dive into the details about the data structures I want to talk about some basics. This will make it clear why I created Javaslang and specifically new Java collections.

Side-Effects

Java applications are typically plentiful of side-effects. They mutate some sort of state, maybe the outer world. Common side effects are changing objects or variables in place, printing to the console, writing to a log file or to a database. Side-effects are considered harmful if they affect the semantics of our program in an undesirable way.

For example, if a function throws an exception and this exception is interpreted, it is considered as side-effect that affects our program. Furthermore exceptions are like non-local goto-statements. They break the normal control-flow. However, real-world applications do perform side-effects.

int divide(int dividend, int divisor) {
    // throws if divisor is zero
    return dividend / divisor;
}

In a functional setting we are in the favorable situation to encapsulate the side-effect in a Try:

// = Success(result) or Failure(exception)
Try<Integer> divide(Integer dividend, Integer divisor) {
    return Try.of(() -> dividend / divisor);
}

This version of divide does not throw any more. We made the possible failure explicit by using the type Try.

Mario Fusco on Functional Programming

Referential Transparency

A function, or more general an expression, is called referential transparent if a call can be replaced by its value without affecting the behavior of the program. Simply spoken, given the same input the output is always the same.

// not referential transparent
Math.random();

// referential transparent
Math.max(1, 2);

A function is called pure if all expressions involved are referential transparent. An application composed of pure functions will most probably just work if it compiles. We are able to reason about it. Unit tests are easy to write and debugging becomes a relict of the past.

Thinking in Values

Rich Hickey, the creator of Clojure, gave a great talk about The Value of Values. The most interesting values are immutable values. The main reason is that immutable values

  • are inherently thread-safe and hence do not need to be synchronized
  • are stable regarding equals and hashCode and thus are reliable hash keys
  • do not need to be cloned
  • behave type-safe when used in unchecked covariant casts (Java-specific)

The key to a better Java is to use immutable values paired with referential transparent functions.

Javaslang provides the necessary controls and collections to accomplish this goal in every-day Java programming.

Data Structures in a Nutshell

Javaslang’s collection library comprises of a rich set of functional data structures built on top of lambdas. The only interface they share with Java’s original collections is Iterable. The main reason is that the mutator methods of Java’s collection interfaces do not return an object of the underlying collection type.

We will see why this is so essential by taking a look at the different types of data structures.

Mutable Data Structures

Java is an object-oriented programming language. We encapsulate state in objects to achieve data hiding and provide mutator methods to control the state. The Java collections framework (JCF) is built upon this idea.

interface Collection<E> {
    // removes all elements from this collection
    void clear();
}

Today I comprehend a void return type as a smell. It is evidence that side-effects take place, state is mutated. Shared mutable state is an important source of failure, not only in a concurrent setting.

Immutable Data Structures

Immutable data structures cannot be modified after their creation. In the context of Java they are widely used in the form of collection wrappers.

List<String> list = Collections.unmodifiableList(otherList);

// Boom!
list.add("why not?");

There are various libraries that provide us with similar utility methods. The result is always an unmodifiable view of the specific collection. Typically it will throw at runtime when we call a mutator method.

Persistent Data Structures

A persistent data structure does preserve the previous version of itself when being modified and is therefore effectively immutable. Fully persistent data structures allow both updates and queries on any version.

Many operations perform only small changes. Just copying the previous version wouldn’t be efficient. To save time and memory, it is crucial to identify similarities between two versions and share as much data as possible.

This model does not impose any implementation details. Here come functional data structures into play.

Functional Data Structures

Also known as purely functional data structures, these are immutable and persistent. The methods of functional data structures are referential transparent.

Javaslang features a wide range of the most-commonly used functional data structures. The following examples are explained in-depth.

Linked List

One of the most popular and also simplest functional data structures is the (singly) linked List. It has a head element and a tail List. A linked List behaves like a Stack which follows the last in, first out (LIFO) method.

In Javaslang we instantiate a List like this:

// = List(1, 2, 3)
List<Integer> list1 = List.of(1, 2, 3);

Each of the List elements forms a separate List node. The tail of the last element is Nil, the empty List.

List 1

This enables us to share elements across different versions of the List.

// = List(0, 2, 3)
List<Integer> list2 = list1.tail().prepend(0);

The new head element 0 is linked to the tail of the original List. The original List remains unmodified.

List 2

These operations take place in constant time, in other words they are independent of the List size. Most of the other operations take linear time. In Javaslang this is expressed by the interface LinearSeq, which we may already know from Scala.

If we need data structures that are queryable in constant time, Javaslang offers Array and Vector. Both have random access capabilities.

The Array type is backed by a Java array of objects. Insert and remove operations take linear time. Vector is in-between Array and List. It performs well in both areas, random access and modification.

In fact the linked List can also be used to implement a Queue data structure.

Queue

A very efficient functional Queue can be implemented based on two linked Lists. The front List holds the elements that are dequeued, the rear List holds the elements that are enqueued. Both operations enqueue and dequeue perform in O(1).

Queue<Integer> queue = Queue.of(1, 2, 3)
                            .enqueue(4)
                            .enqueue(5);

The initial Queue is created of three elements. Two elements are enqueued on the rear List.

Queue 1

If the front List runs out of elements when dequeueing, the rear List is reversed and becomes the new front List.

Queue 2

When dequeueing an element we get a pair of the first element and the remaining Queue. It is necessary to return the new version of the Queue because functional data structures are immutable and persistent. The original Queue is not affected.

Queue<Integer> queue = Queue.of(1, 2, 3);

// = (1, Queue(2, 3))
Tuple2<Integer, Queue<Integer>> dequeued =
        queue.dequeue();

What happens when the Queue is empty? Then dequeue() will throw a NoSuchElementException. To do it the functional way we would rather expect an optional result.

// = Some((1, Queue()))
Queue.of(1).dequeueOption();

// = None
Queue.empty().dequeueOption();

An optional result may be further processed, regardless if it is empty or not.

// = Queue(1)
Queue<Integer> queue = Queue.of(1);

// = Some((1, Queue()))
Option<Tuple2<Integer, Queue<Integer>>>
        dequeued = queue.dequeueOption();

// = Some(1)
Option<Integer> element =
        dequeued.map(Tuple2::_1);

// = Some(Queue())
Option<Queue<Integer>> remaining =
        dequeued.map(Tuple2::_2);

Sorted Set

Sorted Sets are data structures that are more frequently used than Queues. We use binary search trees to model them in a functional way. These trees consist of nodes with up to two children and values at each node.

We build binary search trees in the presence of an ordering, represented by an element Comparator. All values of the left subtree of any given node are strictly less than the value of the given node. All values of the right subtree are strictly greater.

// = TreeSet(1, 2, 3, 4, 6, 7, 8)
SortedSet<Integer> xs =
        TreeSet.of(6, 1, 3, 2, 4, 7, 8);

Binary Tree 1

Searches on such trees run in O(log n) time. We start the search at the root and decide if we found the element. Because of the total ordering of the values we know where to search next, in the left or in the right branch of the current tree.

// = TreeSet(1, 2, 3);
SortedSet<Integer> set = TreeSet.of(2, 3, 1, 2);

// = TreeSet(3, 2, 1);
Comparator<Integer> c = (a, b) -> b - a;
SortedSet<Integer> reversed =
        TreeSet.of(c, 2, 3, 1, 2);

Most tree operations are inherently recursive. The insert function behaves similar to the search function. When the end of a search path is reached, a new node is created and the whole path is reconstructed up to the root. Existing child nodes are referenced whenever possible. Hence the insert operation takes O(log n) time and space.

// = TreeSet(1, 2, 3, 4, 5, 6, 7, 8)
SortedSet<Integer> ys = xs.add(5);

Binary Tree 2

In order to maintain the performance characteristics of a binary search tree it needs to be kept balanced. All paths from the root to a leaf need to have roughly the same length.

In Javaslang we implemented a binary search tree based on a Red/Black Tree. It uses a specific coloring strategy to keep the tree balanced on inserts and deletes. To read more about this topic please refer to the book Purely Functional Data Structures by Chris Okasaki.

State of the Collections

Generally we are observing a convergence of programming languages. Good features make it, other disappear. But Java is different, it is bound forever to be backward compatible. That is a strength but also slows down evolution.

Lambda brought Java and Scala closer together, yet they are still so different. Martin Odersky, the creator of Scala, recently mentioned in his BDSBTB 2015 keynote the state of the Java 8 collections.

He described Java’s Stream as a fancy form of an Iterator. The Java 8 Stream API is an example of a lifted collection. What it does is to define a computation and link it to a specific collection in another excplicit step.

// i + 1
i.prepareForAddition()
 .add(1)
 .mapBackToInteger(Mappers.toInteger())

This is how the new Java 8 Stream API works. It is a computational layer above the well known Java collections.

// = ["1", "2", "3"] in Java 8
Arrays.asList(1, 2, 3)
      .stream()
      .map(Object::toString)
      .collect(Collectors.toList())

Javaslang is greatly inspired by Scala. This is how the above example should have been in Java 8.

// = Stream("1", "2", "3") in Javaslang
Stream.of(1, 2, 3).map(Object::toString)

Within the last year we put much effort into implementing the Javaslang collection library. It comprises the most widely used collection types.

Seq

We started our journey by implementing sequential types. We already described the linked List above. Stream, a lazy linked List, followed. It allows us to process possibly infinite long sequences of elements.

Seq

All collections are Iterable and hence could be used in enhanced for-statements.

for (String s : List.of("Java", "Advent")) {
    // side effects and mutation
}

We could accomplish the same by internalizing the loop and injecting the behavior using a lambda.

List.of("Java", "Advent").forEach(s -> {
    // side effects and mutation
});

Anyway, as we previously saw we prefer expressions that return a value over statements that return nothing. By looking at a simple example, soon we will recognize that statements add noise and divide what belongs together.

String join(String... words) {
    StringBuilder builder = new StringBuilder();
    for(String s : words) {
        if (builder.length() > 0) {
            builder.append(", ");
        }
        builder.append(s);
    }
    return builder.toString();
}

The Javaslang collections provide us with many functions to operate on the underlying elements. This allows us to express things in a very concise way.

String join(String... words) {
    return List.of(words)
               .intersperse(", ")
               .fold("", String::concat);
}

Most goals can be accomplished in various ways using Javaslang. Here we reduced the whole method body to fluent function calls on a List instance. We could even remove the whole method and directly use our List to obtain the computation result.

List.of(words).mkString(", ");

In a real world application we are now able to drastically reduce the number of lines of code and hence lower the risk of bugs.

Set and Map

Sequences are great. But to be complete, a collection library also needs different types of Sets and Maps.

Set and Map

We described how to model sorted Sets with binary tree structures. A sorted Map is nothing else than a sorted Set containing key-value pairs and having an ordering for the keys.

The HashMap implementation is backed by a Hash Array Mapped Trie (HAMT). Accordingly the HashSet is backed by a HAMT containing key-key pairs.

Our Map does not have a special Entry type to represent key-value pairs. Instead we use Tuple2 which is already part of Javaslang. The fields of a Tuple are enumerated.

// = (1, "A")
Tuple2<Integer, String> entry = Tuple.of(1, "A");

Integer key = entry._1;
String value = entry._2;

Maps and Tuples are used throughout Javaslang. Tuples are inevitable to handle multi-valued return types in a general way.

// = HashMap((0, List(2, 4)), (1, List(1, 3)))
List.of(1, 2, 3, 4).groupBy(i -> i % 2);

// = List((a, 0), (b, 1), (c, 2))
List.of('a', 'b', 'c').zipWithIndex();

At Javaslang, we explore and test our library by implementing the 99 Euler Problems. It is a great proof of concept. Please don’t hesitate to send pull requests.

Hands On!

I really hope this article has sparked your interest in Javaslang. Even if you do use Java 7 (or below) at work, as I do, it is possible to follow the idea of functional programming. It will be of great good!

Please make sure Javaslang is part of your toolbelt in 2016.

Happy hacking!

PS: question ? @_Javaslang or Gitter chat

Published by

Daniel Dietrich

A programmer, shipping deliverables for the financial sector by day, reifying his visions and dreams by night.

24 thoughts on “Functional Data Structures in Java 8 with Javaslang”

  1. I love this article. What you manage to do is explain the benefits of functional style without becoming all preachy. If only such an honest and humble approach could be taken by all. I rather like your comment about throw being a non local goto; however, I think that is a bit strong. Throw on the JVM is a high speed alternate return path. It is very powerful indeed when used that way. The cost of throwing (as I am sure you already know) is in the creation of the exception object. If one throws pre-created exceptions, the path is of similar speed to a normal return. This was how we implemented closures for closure based looping in Magik on the JVM at GE.

    I also think your article could benefit (maybe due to the technical issues we have had here, the final version is not up, in which case ignore this) from some links. The area you are especially light on is the use of immutable type for distributed (either muti-thread or multi-process, even multi-machine) processing. Whilst you mention it briefly, the point that immutable types and pure functions permit not only multiple processing but are deterministic even under non deterministic processing order is very important. Even my Jypthon program ‘Sonic Field’ about which I posted, uses immutable data to represent audio and passes pure closures (though I cannot guarantee pureness yet) for processing. This allows it to schedule work without knowledge of the work being scheduled. Such techniques then permit work stealing etc:

    As Javaslang a take on scheduling?

    1. Hi Alexander,

      thank you for the kind words. I think you are right. The approach sketched in this article cannot be fully taken. It is important to see the benefits and start to change the thinking.

      My current project team managed to create a functional core for pricer calculations (in Java 7) within the last year. The core isn’t pure. But following the main principles of FP lead to a robust calculation core I’m really satisfied with.

      Side-effects (IO) take place in the outer layers of the application (a legacy system), our core is clean.

      I didn’t spend time on scheduling, yet.

  2. Hah, learned a new term today: “referential transparency”. I used to call that “deterministic” (Oracle DB-speak). When will we as a profession converge on a single set of terms? 🙂

  3. One thing which could be interesting (this might be done already – pardon if I missed it while scanning the article): separate the collection interfaces into their immutable / mutable parts. Ie. have something like an `ImmutableList` interface which only contains get / size / contains and `MutableList extends ImmutableList` which adds operations like add / insert / remove.

    This way we could enforce the mutable / immutable types during compile time (instead of encountering “UnsuppoertedOperationExceptions” during runtime). It also corrects the violation of Liskov’s substitution principle present in the Java Collection APIs.

    Google Guava also has an interesting hack for this problem – they mark all the mutation methods on their immutable collections as `@Deprecated`, but this only works if you use the `ImmutableX` interface exactly (instead of doing the recommended thing of declaring the variables / fields as generic Xs instead of the specific implementation types).

  4. Thx for your reply 🙂

    Javaslang does not need to split the collection interfaces into their mutable/immutable parts because the collections are immutable and persistent. I.e. javaslang.collection.Set#add(Object) returns a new Set instance rather than changing the original Set.

    We could change the standard Java collection library and introduce the Immutable/Mutable interfaces. But the collections wouldn’t be useful any more. We need methods like set, update, add, clear to create useful programs. But these methods need to be persistent in order to solve problems that arise from shared mutable state.

    Guava does a great job in implementing the original interfaces – with all its limits. The deprecation hack is – a hack. IMO it is better take a modern, more powerful approach.

    1. FYI, I found the / one answer about the rationale of not trying to enforce the mutable / immutable collections using types here: https://docs.oracle.com/javase/8/docs/technotes/guides/collections/designfaq.html#a1 (“Why don’t you support immutability directly in the core collection interfaces so that you can do away with optional operations (and UnsupportedOperationException)?”) – It’s an interesting read (even though I don’t agree with its conclusion :-))

      1. Had a minute…

        > ModifiableList, and ModifiableMap. What was previously a simple hierarchy is now a messy heterarchy.

        Yes, that makes absolutely sense not to implement such a ‘messy heterarchie’.

        Using such interfaces look like a workaround. We could provide operations on Java collections that give us views on the original collections. But this immutable view hierarchy has to be distinct from the original collection type hierarchy In other words, we would ‘lift’ the collection types to immutable types.

        However, the mutator methods (add, set, remove) would be gone and we need them to create useful programs.

        > Also, you need a new Iterator interface for use with unmodifiable Collections, that does not contain the remove operation.

        Javaslang has an Iterator implementations which implements Traversable, like all other collections. In other words this is possible:

        // = lazily(!) performs all operations
        Iterator iterator = List.of(1, 2, 3, 4).iterator()
        .filter(i -> i % 2 == 0)
        .map(Object::toString);

        1. I personally don’t buy it that it would have been such a “messy hierarchy”. I see how it could be more than mutable / immutable (ie. we could have: completely immutable / mutable but not through this view / mutable via extension (add) / mutable via removal / mutable via update), but still think that it would have been worth it.

          As for completely immutable / functional data structures: I come from a HPC (High Performance Computing) background where using such structures would be prohibitive (think many 100s of millions on entries stored in datastructures). Until our compilers / runtimes can better reason about our code (something like Rust) I don’t think it feasible to use immutable data structures for these datasets (but probably well worth for “normal” datasets to reduce bugs).

          1. Wanted to clarify “I personally don’t buy it (the argument from the FAQ) but then again I never tried to actually implement a full collection framework :-)”

          2. I understand the decision of the Java architects. Don’t underestimate implementing consistent objects throughout a big type hierarchy – it is an immense effort.

            Daniel Spiewak gave a great talk “Extreme Cleverness: Functional Data Structures in Scala”: http://www.infoq.com/presentations/Functional-Data-Structures-in-Scala

            At the end of this video he talks about immutable Maps, implemented based on ‘Bitmapped Vector Trie’ (which I think is very similar to Javaslang’s Hash Array Mapped Trie, also used by Scala’s and Clojure’s Map impl). He says this functional Map even outperforms Java’s HashMap for big data sets!

            I’m also switching to 3rd party libs in the case I need very high performance for a specific use case. For example fastutil is the best for primitive data types (to avoid boxing/unboxing), see http://java-performance.info/hashmap-overview-jdk-fastutil-goldman-sachs-hppc-koloboke-trove-january-2015/.

            There are also actual efforts to further improve hash maps, but I didn’t tested it yet: https://github.com/OpenHFT/SmoothieMap

            However, functional collections should be used IMO to guarantee program correctness. If needed, performance tuning can be made later (as usual).

  5. All the discussion here is interesting; however, there is an underlying sense certainty which bothers me. Immutable structures, declarative programming, provability and functional programming are all separate concepts. For example, any imperative program can be represented functionally. The reverse is also true. Therefore, if one can prove a program in one from one has proven the program in the other. Any lack of ability to prove imperitive code is simply a reflection of our human failings and has nothing what so ever todo with the program itself.

    Similarly, as a pure functional program can do no work (output being impure) imutability is something facilitated by functional style but not required by it or unique to it.

    Functional programmers need, as a community, to remain aware of these issues; otherwise one risks continuing to build up the wall function programming has had created around it. It is a statement of fact that functional programming has failed to address the main stream; for that mainstream to increasingly benefit from the concepts in the functional community that very ommunity must learn to disassemble its monolithic approach and accept functional programming is a set of things, not a single thing.

    1. My approach to functional programming was/is: give me defaults which make it the least likely that I’ll create bugs (for example immutable data structures are great because I don’t have to remember to synchronize access to them), but there has to be an escape hatch when they become a performance bottleneck.

      Also, make it possible to express concepts clearly and succinctly (I think lambdas / streams / method handles are a good step forward here).

      That said, I’m only beginning to understand the functional paradigm and never used a “proper” functional language for a “proper” project (but looking forward to a chance :-))

    2. Yes, I agree with you both. FP is a monolithic blackbox to many people – “has to do with functions”. We do not have pattern matching, built-in lazy evaluation or tail-recursion in Java. All things that are really helpful for functional programming. In my opinion it is necessary to use these concepts in other languages like Scala or Haskell to get an impression of how FP works. This will be the basis of creating some self-defined rules for better programming in a language of choice.

      I wrote down my thoughts in a Gist a few months ago – these are only some notes, far away from a final version but a good start in direction of FP in Java: https://gist.github.com/danieldietrich/4f4489f1cacd31709900 The rules need to be hardened. I think there are still some things I have not covered or described insufficient.

      Colleagues which where new to programming in Java applied these rules and their programs looked great to me. They started to implement method bodies by function calls with input args and an output. The input args where either new computations (same pattern, refactored to final local vars) or part of the method signature. Later, the method bodies of the new function calls where implemented. This is functional design, as easy as 1, 2, 3.

      Note: Of course there were also for-loops and some locally mutated variables, when necessary. In one or two cases also an object with mutable state. We pulled side-effects to the outer layers.

      1. +100 for the ideas listed in the gist. I remember some of Java’s architects saying that they wish that they had designed Java such that classes and variables (+ arguments) were by default final, not serializable and not synchronizable on by default – defaults which I whole heartedly agree with (because they prevent accidental mistakes).

  6. Just remembered something I’ve heard in one of this years JavaOne talks and I think it is something to be aware of when using functional / immutable data structures (https://www.reddit.com/r/java/comments/3requx/javaone_2015_sessions/ – can’t remember exactly which it was, sorry – I think it was given by a JVM engineer working at Twitter):

    The problem goes like this:

    – you have a queue represented by a singly linked list plus a “head” and “tail” reference. When you add an element to the queue you add it to the tail, when you consume one element, you move the head forward (I know this is not a functional / immutable data structure, but bear with me :-))

    – the following problem can occur: you have elements in the queue and a full collection occurs and now some of the elements are in the old generation. You consume your elements (ie. move the “head” forward), but they can’t be cleared up by GC until it performs a major collection because for minor collections those nodes in the OldGen count as GC roots.

    The workaround was to make the “next” pointer of the consumed nodes “null”.

    While there are lot of ways to work around such problems (perhaps major collections are not a problem for your application? perhaps it gets “lucky” most of the time and such objects don’t get into the oldgen? or perhaps with G1 we’ll see less of these problems), it is an other potential issue when having complex immutable object graphs in a JVM.

    1. Thank you! I’m no GC expert, so that particular topic was new to me.

      We can’t set the ‘next’ pointers automatically to null in our case. It would need some sort of ‘dealloc’ method if we are sure that an object is not needed any more. However, this would be overkill/impossible for complex data structures like the Red/Black Tree.

      I rely on the GC and hope that such cases are handled properly (in future Java versions).

Leave a Reply