Akk A Stream and HTTP Java
Akk A Stream and HTTP Java
Akk A Stream and HTTP Java
Java Documentation
Release 1.0
Typesafe Inc
CONTENTS
Streams
1.1 Introduction . . . . . . . . . . . . . . . . . .
1.2 Quick Start Guide: Reactive Tweets . . . . . .
1.3 Design Principles behind Akka Streams . . . .
1.4 Basics and working with Flows . . . . . . . .
1.5 Working with Graphs . . . . . . . . . . . . . .
1.6 Modularity, Composition and Hierarchy . . . .
1.7 Buffers and working with rate . . . . . . . . .
1.8 Custom stream processing . . . . . . . . . . .
1.9 Integration . . . . . . . . . . . . . . . . . . .
1.10 Error Handling . . . . . . . . . . . . . . . . .
1.11 Working with streaming IO . . . . . . . . . .
1.12 Pipelining and Parallelism . . . . . . . . . . .
1.13 Testing streams . . . . . . . . . . . . . . . . .
1.14 Overview of built-in stages and their semantics
1.15 Streams Cookbook . . . . . . . . . . . . . . .
1.16 Configuration . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
2
7
10
16
26
37
40
57
69
71
74
77
80
84
96
Akka HTTP
2.1 Configuration . . . . . . . . . . . . . . . . . .
2.2 HTTP Model . . . . . . . . . . . . . . . . . .
2.3 Low-Level Server-Side API . . . . . . . . . .
2.4 Server-Side WebSocket Support . . . . . . . .
2.5 High-level Server-Side API . . . . . . . . . .
2.6 Consuming HTTP-based Services (Client-Side)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
98
98
103
107
111
114
131
CHAPTER
ONE
STREAMS
1.1 Introduction
1.1.1 Motivation
The way we consume services from the internet today includes many instances of streaming data, both downloading from a service as well as uploading to it or peer-to-peer data transfers. Regarding data as a stream of
elements instead of in its entirety is very useful because it matches the way computers send and receive them (for
example via TCP), but it is often also a necessity because data sets frequently become too large to be handled as
a whole. We spread computations or analyses over large clusters and call it big data, where the whole principle
of processing them is by feeding those data sequentiallyas a streamthrough some CPUs.
Actors can be seen as dealing with streams as well: they send and receive series of messages in order to transfer
knowledge (or data) from one place to another. We have found it tedious and error-prone to implement all the
proper measures in order to achieve stable streaming between actors, since in addition to sending and receiving
we also need to take care to not overflow any buffers or mailboxes in the process. Another pitfall is that Actor
messages can be lost and must be retransmitted in that case lest the stream have holes on the receiving side. When
dealing with streams of elements of a fixed given type, Actors also do not currently offer good static guarantees
that no wiring errors are made: type-safety could be improved in this case.
For these reasons we decided to bundle up a solution to these problems as an Akka Streams API. The purpose is to
offer an intuitive and safe way to formulate stream processing setups such that we can then execute them efficiently
and with bounded resource usageno more OutOfMemoryErrors. In order to achieve this our streams need to be
able to limit the buffering that they employ, they need to be able to slow down producers if the consumers cannot
keep up. This feature is called back-pressure and is at the core of the Reactive Streams initiative of which Akka is a
founding member. For you this means that the hard problem of propagating and reacting to back-pressure has been
incorporated in the design of Akka Streams already, so you have one less thing to worry about; it also means that
Akka Streams interoperate seamlessly with all other Reactive Streams implementations (where Reactive Streams
interfaces define the interoperability SPI while implementations like Akka Streams offer a nice user API).
Relationship with Reactive Streams
The Akka Streams API is completely decoupled from the Reactive Streams interfaces. While Akka Streams
focus on the formulation of transformations on data streams the scope of Reactive Streams is just to define a
common mechanism of how to move data across an asynchronous boundary without losses, buffering or resource
exhaustion.
The relationship between these two is that the Akka Streams API is geared towards end-users while the Akka
Streams implementation uses the Reactive Streams interfaces internally to pass data between the different processing stages. For this reason you will not find any resemblance between the Reactive Streams interfaces and the
Akka Streams API. This is in line with the expectations of the Reactive Streams project, whose primary purpose is
to define interfaces such that different streaming implementation can interoperate; it is not the purpose of Reactive
Streams to describe an end-user API.
Note: If you would like to get an overview of the used vocabulary first instead of diving head-first into an actual
example you can have a look at the Core concepts and Defining and running streams sections of the docs, and
then come back to this quickstart to see it all pieced together into a simple example application.
The ActorMaterializer can optionally take ActorMaterializerSettings which can be used to define materialization properties, such as default buffer sizes (see also Buffers in Akka Streams), the dispatcher to be
used by the pipeline etc. These can be overridden withAttributes on Flow, Source, Sink and Graph.
Lets assume we have a stream of tweets readily available, in Akka this is expressed as a Source:
Source<Tweet, BoxedUnit> tweets;
Streams always start flowing from a Source<Out,M1> then can continue through Flow<In,Out,M2> elements or more advanced graph elements to finally be consumed by a Sink<In,M3>.
The first type parameterTweet in this casedesignates the kind of elements produced by the source while the
M type parameters describe the object that is created during materialization (see below)BoxedUnit (from the
scala.runtime package) means that no value is produced, it is the generic equivalent of void.
The operations should look familiar to anyone who has used the Scala Collections library, however they operate
on streams and not collections of data (which is a very important distinction, as some operations only make sense
in streaming and vice versa):
final Source<Author, BoxedUnit> authors =
tweets
.filter(t -> t.hashtags().contains(AKKA))
.map(t -> t.author);
Finally in order to materialize and run the stream computation we need to attach the Flow to a Sink<T, M> that
will get the flow running. The simplest way to do this is to call runWith(sink) on a Source<Out, M>. For
convenience a number of common Sinks are predefined and collected as static methods on the Sink class. For now
lets simply print each author:
authors.runWith(Sink.foreach(a -> System.out.println(a)), mat);
or by using the shorthand version (which are defined only for the most popular sinks such as Sink.fold and
Sink.foreach):
Materializing and running a stream always requires a Materializer to be passed in explicitly, like this:
.run(mat).
The complete snippet looks like this:
final ActorSystem system = ActorSystem.create("reactive-tweets");
final Materializer mat = ActorMaterializer.create(system);
final Source<Author, BoxedUnit> authors =
tweets
.filter(t -> t.hashtags().contains(AKKA))
.map(t -> t.author);
authors.runWith(Sink.foreach(a -> System.out.println(a)), mat);
Note: The name flatMap was consciously avoided due to its proximity with for-comprehensions and monadic
composition. It is problematic for two reasons: firstly, flattening by concatenation is often undesirable in bounded
stream processing due to the risk of deadlock (with merge being the preferred strategy), and secondly, the monad
laws would not hold for our implementation of flatMap (due to the liveness issues).
Please note that the mapConcat requires the supplied function to return a strict collection (Out f ->
java.util.List<T>), whereas flatMap would have to operate on streams all the way through.
As you can see, we use graph builder to mutably construct the graph using the addEdge method. Once we
have the FlowGraph in the value g it is immutable, thread-safe, and freely shareable. A graph can be run()
directly - assuming all ports (sinks/sources) within a flow have been connected properly. It is possible to construct PartialFlowGraph s where this is not required but this will be covered in detail in Constructing and
combining Partial Flow Graphs.
As all Akka Streams elements, Broadcast will properly propagate back-pressure to its upstream element.
The buffer element takes an explicit and required OverflowStrategy, which defines how the buffer should
react when it receives another element while it is full. Strategies provided include dropping the oldest element
(dropHead), dropping the entire buffer, signalling failures etc. Be sure to pick and choose the strategy that fits
your use case best.
First we prepare a reusable Flow that will change each incoming tweet into an integer of value 1. We combine
all values of the transformed stream using Sink.fold will sum all Integer elements of the stream and make
its result available as a Future<Integer>. Next we connect the tweets stream though a map step which
converts each tweet into the number 1, finally we connect the flow using toMat the previously prepared Sink.
Remember those mysterious Mat type parameters on Source<Out, Mat>, Flow<In, Out, Mat> and
Sink<In, Mat>? They represent the type of values these processing parts return when materialized. When
you chain these together, you can explicitly combine their materialized values: in our example we used
the Keep.right predefined function, which tells the implementation to only care about the materialized
type of the stage currently appended to the right. As you can notice, the materialized type of sumSink is
Future<Integer> and because of using Keep.right, the resulting RunnableGraph has also a type
parameter of Future<Integer>.
This step does not yet materialize the processing pipeline, it merely prepares the description of the
Flow, which is now connected to a Sink, and therefore can be run(), as indicated by its type:
RunnableGraph<Future<Integer>>. Next we call run() which uses the ActorMaterializer to
materialize and run the flow. The value returned by calling run() on a RunnableGraph<T> is of type T. In
our case this type is Future<Integer> which, when completed, will contain the total length of our tweets
stream. In case of the stream failing, this future would complete with a Failure.
A RunnableGraph may be reused and materialized multiple times, because it is just the blueprint of the
stream. This means that if we materialize a stream, for example one that consumes a live stream of tweets within
a minute, the materialized values for those two materializations will be different, as illustrated by this example:
final Sink<Integer, Future<Integer>> sumSink =
Sink.<Integer, Integer>fold(0, (acc, elem) -> acc + elem);
final RunnableGraph<Future<Integer>> counterRunnableGraph =
tweetsInMinuteFromNow
.filter(t -> t.hashtags().contains(AKKA))
.map(t -> 1)
.toMat(sumSink, Keep.right());
// materialize the stream once in the morning
final Future<Integer> morningTweetsCount = counterRunnableGraph.run(mat);
// and once in the evening, reusing the blueprint
final Future<Integer> eveningTweetsCount = counterRunnableGraph.run(mat);
Many elements in Akka Streams provide materialized values which can be used for obtaining either results of
computation or steering these elements which will be discussed in detail in Stream Materialization. Summing up
this section, now we know what happens behind the scenes when we run this one-liner, which is equivalent to the
multi line version above:
final Future<Integer> sum = tweets.map(t -> 1).runWith(sumSink, mat);
Note: runWith() is a convenience method that automatically ignores the materialized value of any other stages
except those appended by the runWith() itself. In the above example it translates to using Keep.right as
the combiner for materialized values.
All stream Processors produced by the default materialization of Akka Streams are restricted to having a single
Subscriber, additional Subscribers will be rejected. The reason for this is that the stream topologies described
using our DSL never require fan-out behavior from the Publisher sides of the elements, all fan-out is done using
explicit elements like Broadcast[T].
This means that Sink.fanoutPublisher must be used where multicast behavior is needed for interoperation
with other Reactive Streams implementations.
the low-level infrastructure for passing streams between execution units, and errors on this level are precisely the
failures that we are talking about on the higher level that is modeled by Akka Streams.
There is only limited support for treating onError in Akka Streams compared to the operators that are available for the transformation of data elements, which is intentional in the spirit of the previous paragraph. Since
onError signals that the stream is collapsing, its ordering semantics are not the same as for stream completion:
transformation stages of any kind will just collapse with the stream, possibly still holding elements in implicit or
explicit buffers. This means that data elements emitted before a failure can still be lost if the onError overtakes
them.
The ability for failures to propagate faster than data elements is essential for tearing down streams that are backpressuredespecially since back-pressure can be the failure mode (e.g. by tripping upstream buffers which then
abort because they cannot do anything else; or if a dead-lock occurred).
The semantics of stream recovery
A recovery element (i.e. any transformation that absorbs an onError signal and turns that into possibly more
data elements followed normal stream completion) acts as a bulkhead that confines a stream collapse to a given
region of the flow topology. Within the collapsed region buffered elements may be lost, but the outside is not
affected by the failure.
This works in the same fashion as a trycatch expression: it marks a region in which exceptions are caught, but
the exact amount of code that was skipped within this region in case of a failure might not be known preciselythe
placement of statements matters.
10
between actors, to have one actor prepare the work, and then have it be materialized at some completely different
place in the code.
final Source<Integer, BoxedUnit> source =
Source.from(Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10));
// note that the Future is scala.concurrent.Future
final Sink<Integer, Future<Integer>> sink =
Sink.fold(0, (aggr, next) -> aggr + next);
// connect the Source to the Sink, obtaining a RunnableFlow
final RunnableGraph<Future<Integer>> runnable =
source.toMat(sink, Keep.right());
// materialize the flow
final Future<Integer> sum = runnable.run(mat);
After running (materializing) the RunnableGraph we get a special container object, the MaterializedMap.
Both sources and sinks are able to put specific objects into this map. Whether they put something in or not is
implementation dependent. For example a FoldSink will make a Future available in this map which will
represent the result of the folding process over the stream. In general, a stream can expose multiple materialized
values, but it is quite common to be interested in only the value of the Source or the Sink in the stream. For
this reason there is a convenience method called runWith() available for Sink, Source or Flow requiring,
respectively, a supplied Source (in order to run a Sink), a Sink (in order to run a Source) or both a Source
and a Sink (in order to run a Flow, since it has neither attached yet).
final Source<Integer, BoxedUnit> source =
Source.from(Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10));
final Sink<Integer, Future<Integer>> sink =
Sink.fold(0, (aggr, next) -> aggr + next);
// materialize the flow, getting the Sinks materialized value
final Future<Integer> sum = source.runWith(sink, mat);
It is worth pointing out that since processing stages are immutable, connecting them returns a new processing
stage, instead of modifying the existing instance, so while constructing long flows, remember to assign the new
value to a variable or run it:
final Source<Integer, BoxedUnit> source =
Source.from(Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10));
source.map(x -> 0); // has no effect on source, since it's immutable
source.runWith(Sink.fold(0, (agg, next) -> agg + next), mat); // 55
// returns new Source<Integer>, with `map()` appended
final Source<Integer, BoxedUnit> zeroes = source.map(x -> 0);
final Sink<Integer, Future<Integer>> fold =
Sink.fold(0, (agg, next) -> agg + next);
zeroes.runWith(fold, mat); // 0
Note: By default Akka Streams elements support exactly one downstream processing stage. Making fan-out
(supporting multiple downstream processing stages) an explicit opt-in feature allows default stream elements to
be less complex and more efficient. Also it allows for greater flexibility on how exactly to handle the multicast
scenarios, by providing named fan-out elements such as broadcast (signals all down-stream elements) or balance
(signals one of available down-stream elements).
In the above example we used the runWith method, which both materializes the stream and returns the materialized value of the given sink or source.
Since a stream can be materialized multiple times, the MaterializedMap returned is different for each materialization. In the example below we create two running materialized instance of the stream that we described
in the runnable variable, and both materializations give us a different Future from the map even though we
used the same sink to refer to the future:
11
The objects Source and Sink define various ways to create sources and sinks of elements. The following
examples show some of the most useful constructs (refer to the API documentation for more details):
// Create a source from an Iterable
List<Integer> list = new LinkedList<Integer>();
list.add(1);
list.add(2);
list.add(3);
Source.from(list);
// Create a source form a Future
Source.from(Futures.successful("Hello Streams!"));
// Create a source from a single element
Source.single("only one element");
// an empty source
Source.empty();
// Sink that folds over the stream and returns a Future
// of the final result in the MaterializedMap
Sink.fold(0, (Integer aggr, Integer next) -> aggr + next);
// Sink that returns a Future in the MaterializedMap,
// containing the first element of the stream
Sink.head();
// A Sink that consumes a stream without doing anything with the elements
Sink.ignore();
// A Sink that executes a side-effecting call for every element of the stream
Sink.foreach(System.out::println);
There are various ways to wire up different parts of a stream, the following examples show some of the available
options:
// Explicitly creating and wiring up a Source, Sink and Flow
Source.from(Arrays.asList(1, 2, 3, 4))
.via(Flow.of(Integer.class).map(elem -> elem * 2))
.to(Sink.foreach(System.out::println));
// Starting from a Source
final Source<Integer, BoxedUnit> source = Source.from(Arrays.asList(1, 2, 3, 4))
.map(elem -> elem * 2);
source.to(Sink.foreach(System.out::println));
// Starting from a Sink
final Sink<Integer, BoxedUnit> sink = Flow.of(Integer.class)
12
In accordance to the Reactive Streams specification (Rule 2.13) Akka Streams do not allow null to be passed
through the stream as an element. In case you want to model the concept of absence of a value we recommend
using akka.japi.Option (for Java 6 and 7) or java.util.Optional which is available since Java 8.
Back-pressure explained
Akka Streams implement an asynchronous non-blocking back-pressure protocol standardised by the Reactive
Streams specification, which Akka is a founding member of.
The user of the library does not have to write any explicit back-pressure handling code it is built in and dealt
with automatically by all of the provided Akka Streams processing stages. It is possible however to add explicit
buffer stages with overflow strategies that can influence the behaviour of the stream. This is especially important
in complex processing graphs which may even contain loops (which must be treated with very special care, as
explained in Graph cycles, liveness and deadlocks).
The back pressure protocol is defined in terms of the number of elements a downstream Subscriber is able to
receive and buffer, referred to as demand. The source of data, referred to as Publisher in Reactive Streams
terminology and implemented as Source in Akka Streams, guarantees that it will never emit more elements than
the received total demand for any given Subscriber.
Note: The Reactive Streams specification defines its protocol in terms of Publisher and Subscriber.
These types are not meant to be user facing API, instead they serve as the low level building blocks for different
Reactive Streams implementations.
Akka Streams implements these concepts as Source, Flow (referred to as Processor in Reactive Streams)
and Sink without exposing the Reactive Streams interfaces directly. If you need to integrate with other Reactive
Stream libraries read Integrating with Reactive Streams.
The mode in which Reactive Streams back-pressure works can be colloquially described as dynamic push / pull
mode, since it will switch between push and pull based back-pressure models depending on the downstream
being able to cope with the upstream production rate or not.
To illustrate this further let us consider both problem situations and how the back-pressure protocol handles them:
Slow Publisher, fast Subscriber
This is the happy case of course we do not need to slow down the Publisher in this case. However signalling rates
are rarely constant and could change at any point in time, suddenly ending up in a situation where the Subscriber
is now slower than the Publisher. In order to safeguard from these situations, the back-pressure protocol must still
be enabled during such situations, however we do not want to pay a high penalty for this safety net being enabled.
The Reactive Streams protocol solves this by asynchronously signalling from the Subscriber to the Publisher
Request(int n) signals. The protocol guarantees that the Publisher will never signal more elements than the
signalled demand. Since the Subscriber however is currently faster, it will be signalling these Request messages at
a higher rate (and possibly also batching together the demand - requesting multiple elements in one Request signal). This means that the Publisher should not ever have to wait (be back-pressured) with publishing its incoming
elements.
As we can see, in this scenario we effectively operate in so called push-mode since the Publisher can continue
producing elements as fast as it can, since the pending demand will be recovered just-in-time while it is emitting
elements.
13
This is the case when back-pressuring the Publisher is required, because the Subscriber is not able to cope
with the rate at which its upstream would like to emit data elements.
Since the Publisher is not allowed to signal more elements than the pending demand signalled by the
Subscriber, it will have to abide to this back-pressure by applying one of the below strategies:
not generate elements, if it is able to control their production rate,
try buffering the elements in a bounded manner until more demand is signalled,
drop elements until more demand is signalled,
tear down the stream if unable to apply any of the above strategies.
As we can see, this scenario effectively means that the Subscriber will pull the elements from the Publisher
this mode of operation is referred to as pull-based back-pressure.
Stream Materialization
When constructing flows and graphs in Akka Streams think of them as preparing a blueprint, an execution plan.
Stream materialization is the process of taking a stream description (the graph) and allocating all the necessary
resources it needs in order to run. In the case of Akka Streams this often means starting up Actors which power the
processing, but is not restricted to that - it could also mean opening files or socket connections etc. depending
on what the stream needs.
Materialization is triggered at so called terminal operations. Most notably this includes the various forms
of the run() and runWith() methods defined on flow elements as well as a small number of special
syntactic sugars for running with well-known sinks, such as runForeach(el -> ) (being an alias to
runWith(Sink.foreach(el -> )).
Materialization is currently performed synchronously on the materializing thread. The actual stream processing
is handled by actors started up during the streams materialization, which will be running on the thread pools they
have been configured to run on - which defaults to the dispatcher set in MaterializationSettings while
constructing the ActorMaterializer.
Note: Reusing instances of linear computation stages (Source, Sink, Flow) inside FlowGraphs is legal, yet will
materialize that stage multiple times.
Since every processing stage in Akka Streams can provide a materialized value after being materialized, it is
necessary to somehow express how these values should be composed to a final value when we plug these stages
together. For this, many combinator methods have variants that take an additional argument, a function, that will
be used to combine the resulting values. Some examples of using these combiners are illustrated in the example
below.
// An empty source that can be shut down explicitly from the outside
Source<Integer, Promise<BoxedUnit>> source = Source.<Integer>lazyEmpty();
// A flow that internally throttles elements to 1/second, and returns a Cancellable
// which can be used to shut down the stream
Flow<Integer, Integer, Cancellable> flow = throttler;
// A sink that returns the first element of a stream in the returned Future
Sink<Integer, Future<Integer>> sink = Sink.head();
14
RunnableGraph<Cancellable> r11 =
r9.mapMaterializedValue( (nestedTuple) -> {
Promise<BoxedUnit> p = nestedTuple.first().first();
Cancellable c = nestedTuple.first().second();
Future<Integer> f = nestedTuple.second();
// Picking the Cancellable, but we could
return c;
});
Note: In Graphs it is possible to access the materialized value from inside the stream processing graph. For
details see Accessing the materialized value inside the Graph
15
16
Such graph is simple to translate to the Graph DSL since each linear element corresponds to a Flow, and each
circle corresponds to either a Junction or a Source or Sink if it is beginning or ending a Flow.
final
final
final
final
final
final
final
Note: Junction reference equality defines graph node equality (i.e. the same merge instance used in a FlowGraph
refers to the same location in the resulting graph).
By looking at the snippets above, it should be apparent that the builder object is mutable. The reason for
this design choice is to enable simpler creation of complex graphs, which may even contain cycles. Once the
FlowGraph has been constructed though, the RunnableGraph instance is immutable, thread-safe, and freely
shareable. The same is true of all flow piecessources, sinks, and flowsonce they are constructed. This means
that you can safely re-use one given Flow in multiple places in a processing graph.
We have seen examples of such re-use already above: the merge and broadcast junctions were imported into the
graph using builder.graph(...), an operation that will make a copy of the blueprint that is passed to it
and return the inlets and outlets of the resulting copy so that they can be wired up. Another alternative is to pass
existing graphsof any shapeinto the factory method that produces a new graph. The difference between these
approaches is that importing using b.graph(...) ignores the materialized value of the imported graph while
importing via the factory method allows its inclusion; for more details see stream-materialization-scala.
In the example below we prepare a graph that consists of two parallel streams, in which we re-use the same
instance of Flow, yet it will properly be materialized as two connections between the corresponding Sources and
Sinks:
final Sink<Integer, Future<Integer>> topHeadSink = Sink.head();
final Sink<Integer, Future<Integer>> bottomHeadSink = Sink.head();
final Flow<Integer, Integer, BoxedUnit> sharedDoubler = Flow.of(Integer.class).map(elem -> elem *
final RunnableGraph<Pair<Future<Integer>, Future<Integer>>> g = FlowGraph
.factory().closed(
topHeadSink, // import this sink into the graph
bottomHeadSink, // and this as well
Keep.both(),
17
As you can see, first we construct the partial graph that describes how to compute the maximum of two input
streams, then we reuse that twice while constructing the partial graph that extends this to three input streams, then
we import it (all of its nodes and connections) explicitly to the FlowGraph instance in which all the undefined
elements are rewired to real sources and sinks. The graph can then be run and yields the expected result.
18
Warning: Please note that a FlowGraph is not able to provide compile time type-safety about whether or
not all elements have been properly connectedthis validation is performed as a runtime check during the
graphs instantiation.
A partial flow graph also verifies that all ports are either connected or part of the returned Shape.
Similarly the same can be done for a Sink<T>, in which case the returned value must be an Inlet<T>. For
defining a Flow<T> we need to expose both an undefined source and sink:
final Flow<Integer, Pair<Integer, String>, BoxedUnit> pairs = Flow.factory().create(
b -> {
final UniformFanOutShape<Integer, Integer> bcast = b.graph(Broadcast.create(2));
final FanInShape2<Integer, String, Pair<Integer, String>> zip =
19
b.graph(Zip.create());
b.from(bcast).to(zip.in0());
b.from(bcast).via(Flow.of(Integer.class).map(i -> i.toString())).to(zip.in1());
return new Pair<>(bcast.in(), zip.out());
});
Source.single(1).via(pairs).runWith(Sink.<Pair<Integer, String>>head(), mat);
A bidirectional flow is defined just like a unidirectional Flow as demonstrated for the codec mentioned above:
static interface Message {}
static class Ping implements Message {
final int id;
public Ping(int id) { this.id = id; }
@Override
public boolean equals(Object o) {
if (o instanceof Ping) {
return ((Ping) o).id == id;
} else return false;
}
@Override
public int hashCode() {
return id;
}
}
static class Pong implements Message {
final int id;
public Pong(int id) { this.id = id; }
@Override
public boolean equals(Object o) {
if (o instanceof Pong) {
20
The first version resembles the partial graph constructor, while for the simple case of a functional 1:1 transformation there is a concise convenience method as shown on the last line. The implementation of the two functions is
not difficult either:
public static ByteString toBytes(Message msg) {
if (msg instanceof Ping) {
final int id = ((Ping) msg).id;
return new ByteStringBuilder().putByte((byte) 1)
.putInt(id, ByteOrder.LITTLE_ENDIAN).result();
} else {
final int id = ((Pong) msg).id;
return new ByteStringBuilder().putByte((byte) 2)
.putInt(id, ByteOrder.LITTLE_ENDIAN).result();
}
}
public static Message fromBytes(ByteString bytes) {
final ByteIterator it = bytes.iterator();
switch(it.getByte()) {
case 1:
return new Ping(it.getInt(ByteOrder.LITTLE_ENDIAN));
case 2:
return new Pong(it.getInt(ByteOrder.LITTLE_ENDIAN));
default:
throw new RuntimeException("message format error");
}
}
In this way you could easily integrate any other serialization library that turns an object into a sequence of bytes.
The other stage that we talked about is a little more involved since reversing a framing protocol means that
any received chunk of bytes may correspond to zero or more messages. This is best implemented using a
PushPullStage (see also Using PushPullStage).
21
22
BidiFlow.factory().create(b -> {
final FlowShape<ByteString, ByteString> top =
b.graph(Flow.<ByteString> empty().map(BidiFlowDocTest::addLengthHeader));
final FlowShape<ByteString, ByteString> bottom =
b.graph(Flow.<ByteString> empty().transform(() -> new FrameParser()));
return new BidiShape<>(top, bottom);
});
With these implementations we can build a protocol stack and test it:
/* construct protocol stack
+------------------------------------+
*
| stack
|
*
|
|
*
| +-------+
+---------+ |
*
~>
O~~o
|
~>
|
o~~O
~>
*
* Message | | codec | ByteString | framing | | ByteString
<~
O~~o
|
<~
|
o~~O
<~
*
| +-------+
+---------+ |
*
+------------------------------------+
*
*/
final BidiFlow<Message, ByteString, ByteString, Message, BoxedUnit> stack =
codec.atop(framing);
// test it by plugging it into its own inverse and closing the right end
final Flow<Message, Message, BoxedUnit> pingpong =
Flow.<Message> empty().collect(new PFBuilder<Message, Message>()
.match(Ping.class, p -> new Pong(p.id))
.build()
);
final Flow<Message, Message, BoxedUnit> flow =
stack.atop(stack.reversed()).join(pingpong);
final Future<List<Message>> result = Source
.from(Arrays.asList(0, 1, 2))
.<Message> map(id -> new Ping(id))
.via(flow)
.grouped(10)
.runWith(Sink.<List<Message>> head(), mat);
final FiniteDuration oneSec = Duration.create(1, TimeUnit.SECONDS);
assertArrayEquals(
new Message[] { new Pong(0), new Pong(1), new Pong(2) },
Await.result(result, oneSec).toArray(new Message[0]));
This example demonstrates how BidiFlow subgraphs can be hooked together and also turned around with the
.reversed() method. The test simulates both parties of a network communication protocol without actually
having to open a network connectionthe flows can just be connected directly.
23
});
final Flow<Integer, Integer, Future<Integer>> foldingFlow = Flow.factory().create(foldSink,
(b, fold) -> {
return new Pair<>(
fold.inlet(),
b.from(b.materializedValue()).via(flatten).out());
});
Be careful not to introduce a cycle where the materialized value actually contributes to the materialized value. The
following example demonstrates a case where the materialized Future of a fold is fed back to the fold itself.
// This cannot produce any value:
final Source<Integer, Future<Integer>> cyclicSource = Source.factory().create(foldSink,
(b, fold) -> {
// - Fold cannot complete until its upstream mapAsync completes
// - mapAsync cannot complete until the materialized Future produced by
//
fold completes
// As a result this Source will never emit anything, and its materialited
// Future will never complete
b.from(b.materializedValue()).via(flatten).to(fold);
return b.from(b.materializedValue()).via(flatten).out();
});
Running this we observe that after a few numbers have been printed, no more elements are logged to the console
- all processing stops after some time. After some investigation we observe that:
through merging from source we increase the number of elements flowing in the cycle
by broadcasting back to the cycle we do not decrease the number of elements in the cycle
Since Akka Streams (and Reactive Streams in general) guarantee bounded processing (see the Buffering section
for more details) it means that only a bounded number of elements are buffered over any time span. Since our
cycle gains more and more elements, eventually all of its internal buffers become full, backpressuring source
forever. To be able to process more elements from source elements would need to leave the cycle somehow.
If we modify our feedback loop by replacing the Merge junction with a MergePreferred we can avoid the
deadlock. MergePreferred is unfair as it always tries to consume from a preferred input port if there are
1.5. Working with Graphs
24
elements available before trying the other lower priority input ports. Since we feed back through the preferred
port it is always guaranteed that the elements in the cycles can flow.
// WARNING! The graph below stops consuming from "source" after a few steps
FlowGraph.factory().closed(b -> {
final MergePreferredShape<Integer> merge = b.graph(MergePreferred.create(1));
final UniformFanOutShape<Integer, Integer> bcast = b.graph(Broadcast.create(2));
b.from(source).via(merge).via(printFlow).via(bcast).to(Sink.ignore());
b.to(merge.preferred()) .from(bcast);
});
If we run the example we see that the same sequence of numbers are printed over and over again, but the processing
does not stop. Hence, we avoided the deadlock, but source is still back-pressured forever, because buffer space
is never recovered: the only action we see is the circulation of a couple of initial elements from source.
Note: What we see here is that in certain cases we need to choose between boundedness and liveness. Our first
example would not deadlock if there would be an infinite buffer in the loop, or vice versa, if the elements in the
cycle would be balanced (as many elements are removed as many are injected) then there would be no deadlock.
To make our cycle both live (not deadlocking) and fair we can introduce a dropping element on the feedback arc. In
this case we chose the buffer() operation giving it a dropping strategy OverflowStrategy.dropHead.
FlowGraph.factory().closed(b -> {
final UniformFanInShape<Integer, Integer> merge = b.graph(Merge.create(2));
final UniformFanOutShape<Integer, Integer> bcast = b.graph(Broadcast.create(2));
final FlowShape<Integer, Integer> droppyFlow = b.graph(
Flow.of(Integer.class).buffer(10, OverflowStrategy.dropHead()));
b.from(source).via(merge).via(printFlow).via(bcast).to(Sink.ignore());
b.to(merge).via(droppyFlow).from(bcast);
});
Still, when we try to run the example it turns out that no element is printed at all! After some investigation we
realize that:
1.5. Working with Graphs
25
In order to get the first element from source into the cycle we need an already existing element in the
cycle
In order to get an initial element in the cycle we need an element from source
These two conditions are a typical chicken-and-egg problem. The solution is to inject an initial element into the
cycle that is independent from source. We do this by using a Concat junction on the backwards arc that injects
a single element using Source.single.
FlowGraph.factory().closed(b -> {
final FanInShape2<Integer, Integer, Integer>
zip = b.graph(ZipWith.create((Integer left, Integer right) -> left));
final UniformFanOutShape<Integer, Integer> bcast = b.graph(Broadcast.create(2));
final UniformFanInShape<Integer, Integer> concat = b.graph(Concat.create());
b.from(source).to(zip.in0());
b.from(zip.out()).via(printFlow).via(bcast).to(Sink.ignore());
b.to(zip.in1()).via(concat).from(Source.single(1));
b.to(concat) .from(bcast);
});
When we run the above example we see that processing starts and never stops. The important takeaway from this
example is that balanced cycles often need an initial kick-off element to be injected into the cycle.
The linear stages are Source, Sink and Flow, as these can be used to compose strict chains of processing
stages. Fan-in and Fan-out stages have usually multiple input or multiple output ports, therefore they allow to
26
build more complex graph layouts, not just chains. BidiFlow stages are usually useful in IO related tasks,
where there are input and output channels to be handled. Due to the specific shape of BidiFlow it is easy to
stack them on top of each other to build a layered protocol for example. The TLS support in Akka is for example
implemented as a BidiFlow.
These reusable components already allow the creation of complex processing networks. What we have seen
so far does not implement modularity though. It is desirable for example to package up a larger graph entity
into a reusable component which hides its internals only exposing the ports that are meant to the users of the
module to interact with. One good example is the Http server component, which is encoded internally as a
BidiFlow which interfaces with the client TCP connection using an input-output port pair accepting and sending
ByteString s, while its upper ports emit and receive HttpRequest and HttpResponse instances.
The following figure demonstrates various composite stages, that contain various other type of stages internally,
but hiding them behind a shape that looks like a Source, Flow, etc.
One interesting example above is a Flow which is composed of a disconnected Sink and Source. This can be
achieved by using the wrap() constructor method on Flow which takes the two parts as parameters.
The example BidiFlow demonstrates that internally a module can be of arbitrary complexity, and the exposed
ports can be wired in flexible ways. The only constraint is that all the ports of enclosed modules must be either
connected to each other, or exposed as interface ports, and the number of such ports needs to match the requirement
of the shape, for example a Source allows only one exposed output port, the rest of the internal ports must be
properly connected.
27
These mechanics allow arbitrary nesting of modules. For example the following figure demonstrates a
RunnableGraph that is built from a composite Source and a composite Sink (which in turn contains a
composite Flow).
The above diagram contains one more shape that we have not seen yet, which is called RunnableGraph. It
turns out, that if we wire all exposed ports together, so that no more open ports remain, we get a module that
is closed. This is what the RunnableGraph class represents. This is the shape that a Materializer can
take and turn into a network of running entities that perform the task described. In fact, a RunnableGraph is
a module itself, and (maybe somewhat surprisingly) it can be used as part of larger graphs. It is rarely useful to
embed a closed graph shape in a larger graph (since it becomes an isolated island as there are no open port for
communication with the rest of the graph), but this demonstrates the uniform underlying model.
If we try to build a code snippet that corresponds to the above diagram, our first try might look like this:
Source.single(0)
.map(i -> i + 1)
.filter(i -> i != 0)
.map(i -> i - 2)
.to(Sink.fold(0, (acc, i) -> acc + i));
// ... where is the nesting?
It is clear however that there is no nesting present in our first attempt, since the library cannot figure out where
we intended to put composite module boundaries, it is our responsibility to do that. If we are using the DSL
provided by the Flow, Source, Sink classes then nesting can be achieved by calling one of the methods
withAttributes() or named() (where the latter is just a shorthand for adding a name attribute).
The following code demonstrates how to achieve the desired nesting:
final Source<Integer, BoxedUnit> nestedSource =
Source.single(0) // An atomic source
.map(i -> i + 1) // an atomic processing stage
.named("nestedSource"); // wraps up the current Source and gives it a name
final Flow<Integer, Integer, BoxedUnit> nestedFlow =
Flow.of(Integer.class).filter(i -> i != 0) // an atomic processing stage
.map(i -> i - 2) // another atomic processing stage
.named("nestedFlow"); // wraps up the Flow, and gives it a name
final Sink<Integer, BoxedUnit> nestedSink =
nestedFlow.to(Sink.fold(0, (acc, i) -> acc + i)) // wire an atomic sink to the nestedFlow
.named("nestedSink"); // wrap it up
28
// Create a RunnableGraph
final RunnableGraph<BoxedUnit> runnableGraph = nestedSource.to(nestedSink);
Once we have hidden the internals of our components, they act like any other built-in component of similar shape.
If we hide some of the internals of our composites, the result looks just like if any other predefine component has
been used:
If we look at usage of built-in components, and our custom components, there is no difference in usage as the code
snippet below demonstrates.
// Create a RunnableGraph from our components
final RunnableGraph<BoxedUnit> runnableGraph = nestedSource.to(nestedSink);
// Usage is uniform, no matter if modules are composite or atomic
final RunnableGraph<BoxedUnit> runnableGraph2 =
Source.single(0).to(Sink.fold(0, (acc, i) -> acc + i));
29
The diagram shows a RunnableGraph (remember, if there are no unwired ports, the graph is closed, and
therefore can be materialized) that encapsulates a non-trivial stream processing network. It contains fan-in, fanout stages, directed and non-directed cycles. The closed() method of the FlowGraph factory object allows
the creation of a general closed graph. For example the network on the diagram can be realized like this:
FlowGraph.factory().closed(builder -> {
final Outlet<Integer> A = builder.source(Source.single(0));
final UniformFanOutShape<Integer, Integer> B = builder.graph(Broadcast.create(2));
final UniformFanInShape<Integer, Integer> C = builder.graph(Merge.create(2));
final FlowShape<Integer, Integer> D =
builder.graph(Flow.of(Integer.class).map(i -> i + 1));
final UniformFanOutShape<Integer, Integer> E = builder.graph(Balance.create(2));
final UniformFanInShape<Integer, Integer> F = builder.graph(Merge.create(2));
final Inlet<Integer> G = builder.sink(Sink.foreach(System.out::println));
builder.from(F).to(C);
builder.from(A).via(B).via(C).to(F);
builder.from(B).via(D).via(E).to(F);
builder.from(E).to(G);
});
In the code above we used the implicit port numbering feature to make the graph more readable and similar to the
diagram. It is possible to refer to the ports, so another version might look like this:
FlowGraph.factory().closed(builder -> {
final Outlet<Integer> A = builder.source(Source.single(0));
final UniformFanOutShape<Integer, Integer> B = builder.graph(Broadcast.create(2));
final UniformFanInShape<Integer, Integer> C = builder.graph(Merge.create(2));
final FlowShape<Integer, Integer> D =
builder.graph(Flow.of(Integer.class).map(i -> i + 1));
final UniformFanOutShape<Integer, Integer> E = builder.graph(Balance.create(2));
final UniformFanInShape<Integer, Integer> F = builder.graph(Merge.create(2));
final Inlet<Integer> G = builder.sink(Sink.foreach(System.out::println));
builder.from(F.out()).to(C.in(0));
builder.from(A).to(B.in());
builder.from(B.out(0)).to(C.in(1));
builder.from(C.out()).to(F.in(0));
builder.from(B.out(1)).via(D).to(E.in());
30
builder.from(E.out(0)).to(F.in(1));
builder.from(E.out(1)).to(G);
});
Similar to the case in the first section, so far we have not considered modularity. We created a complex graph, but
the layout is flat, not modularized. We will modify our example, and create a reusable component with the graph
DSL. The way to do it is to use the partial() method on FlowGraph factory. If we remove the sources and
sinks from the previous example, what remains is a partial graph:
We can recreate a similar graph in code, using the DSL in a similar way than before:
final Graph<FlowShape<Integer, Integer>, BoxedUnit> partial =
FlowGraph.factory().partial(builder -> {
final UniformFanOutShape<Integer, Integer> B = builder.graph(Broadcast.create(2));
final UniformFanInShape<Integer, Integer> C = builder.graph(Merge.create(2));
final UniformFanOutShape<Integer, Integer> E = builder.graph(Balance.create(2));
final UniformFanInShape<Integer, Integer> F = builder.graph(Merge.create(2));
builder.from(F.out()).to(C.in(0));
builder.from(B).via(C).to(F);
builder.from(B).via(builder.graph(Flow.of(Integer.class).map(i -> i + 1))).via(E).to(F);
return new FlowShape(B.in(), E.out(1));
});
The only new addition is the return value of the builder block, which is a Shape. All graphs (including Source,
BidiFlow, etc) have a shape, which encodes the typed ports of the module. In our example there is exactly one
input and output port left, so we can declare it to have a FlowShape by returning an instance of it. While it is
possible to create new Shape types, it is usually recommended to use one of the matching built-in ones.
31
The resulting graph is already a properly wrapped module, so there is no need to call named() to encapsulate the
graph, but it is a good practice to give names to modules to help debugging.
Since our partial graph has the right shape, it can be already used in the simpler, linear DSL:
Source.single(0).via(partial).to(Sink.ignore());
It is not possible to use it as a Flow yet, though (i.e. we cannot call .filter() on it), but Flow has a wrap()
method that just adds the DSL to a FlowShape. There are similar methods on Source, Sink and BidiShape,
so it is easy to get back to the simpler DSL if a graph has the right shape. For convenience, it is also possible
to skip the partial graph creation, and use one of the convenience creator methods. To demonstrate this, we will
create the following graph:
The code version of the above closed graph might look like this:
32
Note: All graph builder sections check if the resulting graph has all ports connected except the exposed ones and
will throw an exception if this is violated.
We are still in debt of demonstrating that RunnableGraph is a component just like any other, which can be
embedded in graphs. In the following snippet we embed one closed graph in another:
final RunnableGraph<BoxedUnit> closed1 =
Source.single(0).to(Sink.foreach(System.out::println));
final RunnableGraph<BoxedUnit> closed2 = FlowGraph.factory().closed(builder -> {
final ClosedShape embeddedClosed = builder.graph(closed1);
});
The type of the imported module indicates that the imported module has a ClosedShape, and so we are not able
to wire it to anything else inside the enclosing closed graph. Nevertheless, this island is embedded properly, and
will be materialized just like any other module that is part of the graph.
As we have demonstrated, the two DSLs are fully interoperable, as they encode a similar nested structure of boxes
with ports, it is only the DSLs that differ to be as much powerful as possible on the given abstraction level. It is
possible to embed complex graphs in the fluid DSL, and it is just as easy to import and embed a Flow, etc, in a
larger, complex structure.
We have also seen, that every module has a Shape (for example a Sink has a SinkShape) independently
which DSL was used to create it. This uniform representation enables the rich composability of various stream
processing entities in a convenient way.
33
materialization needs to return a different object that provides the necessary interaction capabilities. In other
words, the RunnableGraph can be seen as a factory, which creates:
a network of running processing entities, inaccessible from the outside
a materialized value, optionally providing a controlled interaction capability with the network
Unlike actors though, each of the processing stages might provide a materialized value, so when we compose
multiple stages or modules, we need to combine the materialized value as well (there are default rules which make
this easier, for example to() and via() takes care of the most common case of taking the materialized value to the
left. See flow-combine-mat-scala for details). We demonstrate how this works by a code example and a diagram
which graphically demonstrates what is happening.
The propagation of the individual materialized values from the enclosed modules towards the top will look like
this:
To implement the above, first, we create a composite Source, where the enclosed Source have a materialized
type of Promise. By using the combiner function Keep.left(), the resulting materialized type is of the
nested module (indicated by the color red on the diagram):
// Materializes to Promise<BoxedUnit>
(red)
final Source<Integer, Promise<BoxedUnit>> source = Source.<Integer> lazyEmpty();
// Materializes to BoxedUnit
(black)
final Flow<Integer, Integer, BoxedUnit> flow1 = Flow.of(Integer.class).take(100);
// Materializes to Promise<BoxedUnit>
final Source<Integer, Promise<BoxedUnit>> nestedSource =
source.viaMat(flow1, Keep.left()).named("nestedSource");
(red)
Next, we create a composite Flow from two smaller components. Here, the second enclosed Flow has a materialized type of Future, and we propagate this to the parent by using Keep.right() as the combiner function
(indicated by the color yellow on the diagram):
// Materializes to BoxedUnit
(orange)
final Flow<Integer, ByteString, BoxedUnit> flow2 = Flow.of(Integer.class)
.map(i -> ByteString.fromString(i.toString()));
34
// Materializes to Future<OutgoingConnection>
final Flow<ByteString, ByteString, Future<OutgoingConnection>> flow3 =
Tcp.get(system).outgoingConnection("localhost", 8080);
(yellow)
// Materializes to Future<OutgoingConnection>
final Flow<Integer, ByteString, Future<OutgoingConnection>> nestedFlow =
flow2.viaMat(flow3, Keep.right()).named("nestedFlow");
(yellow)
As a third step, we create a composite Sink, using our nestedFlow as a building block. In this snippet, both the
enclosed Flow and the folding Sink has a materialized value that is interesting for us, so we use Keep.both()
to get a Pair of them as the materialized type of nestedSink (indicated by the color blue on the diagram)
// Materializes to Future<String>
final Sink<ByteString, Future<String>> sink = Sink
.fold("", (acc, i) -> acc + i.utf8String());
(green)
As the last example, we wire together nestedSource and nestedSink and we use a custom combiner
function to create a yet another materialized type of the resulting RunnableGraph. This combiner function just
ignores the Future part, and wraps the other two values in a custom case class MyClass (indicated by color
purple on the diagram):
static class MyClass {
private Promise<BoxedUnit> p;
private OutgoingConnection conn;
public MyClass(Promise<BoxedUnit> p, OutgoingConnection conn) {
this.p = p;
this.conn = conn;
}
public void close() {
p.success(scala.runtime.BoxedUnit.UNIT);
}
}
static class Combiner {
static Future<MyClass> f(Promise<BoxedUnit> p,
Pair<Future<OutgoingConnection>, Future<String>> rest) {
return rest.first().map(new Mapper<OutgoingConnection, MyClass>() {
public MyClass apply(OutgoingConnection c) {
return new MyClass(p, c);
}
}, system.dispatcher());
}
}
// Materializes to Future<MyClass>
final RunnableGraph<Future<MyClass>> runnableGraph =
nestedSource.toMat(nestedSink, Combiner::f);
(purple)
Note: The nested structure in the above example is not necessary for combining the materialized values, it just
demonstrates how the two features work together. See Combining materialized values for further examples of
combining materialized values without nesting and hierarchy involved.
35
1.6.4 Attributes
We have seen that we can use named() to introduce a nesting level in the fluid DSL (and also explicit nesting by
using partial() from FlowGraph). Apart from having the effect of adding a nesting level, named() is actually a shorthand for calling withAttributes(Attributes.name("someName")). Attributes provide
a way to fine-tune certain aspects of the materialized running entity. For example buffer sizes can be controlled via
attributes (see stream-buffers-scala). When it comes to hierarchic composition, attributes are inherited by nested
modules, unless they override them with a custom value.
The code below, a modification of an earlier example sets the inputBuffer attribute on certain modules, but
not on others:
final Source<Integer, BoxedUnit> nestedSource =
Source.single(0)
.map(i -> i + 1)
.named("nestedSource"); // Wrap, no inputBuffer set
final Flow<Integer, Integer, BoxedUnit> nestedFlow =
Flow.of(Integer.class).filter(i -> i != 0)
.via(Flow.of(Integer.class)
.map(i -> i - 2)
.withAttributes(Attributes.inputBuffer(4, 4))) // override
.named("nestedFlow"); // Wrap, no inputBuffer set
final Sink<Integer, BoxedUnit> nestedSink =
nestedFlow.to(Sink.fold(0, (acc, i) -> acc + i)) // wire an atomic sink to the nestedFlow
.withAttributes(Attributes.name("nestedSink")
.and(Attributes.inputBuffer(3, 3))); // override
The effect is, that each module inherits the inputBuffer attribute from its enclosing parent, unless it has
the same attribute explicitly set. nestedSource gets the default attributes from the materializer itself.
nestedSink on the other hand has this attribute set, so it will be used by all nested modules. nestedFlow
will inherit from nestedSink except the map stage which has again an explicitly provided attribute overriding
the inherited one.
36
This diagram illustrates the inheritance process for the example code (representing the materializer default attributes as the color red, the attributes set on nestedSink as blue and the attributes set on nestedFlow as
green).
Running the above example, one of the possible outputs looks like this:
A:
A:
B:
A:
B:
C:
B:
C:
C:
1
2
1
3
2
1
3
2
3
Note that the order is not A:1, B:1, C:1, A:2, B:2, C:2, which would correspond to a synchronous
execution model where an element completely flows through the processing pipeline before the next element
enters the flow. The next element is processed by a stage as soon as it is emitted the previous one.
While pipelining in general increases throughput, in practice there is a cost of passing an element through the
asynchronous (and therefore thread crossing) boundary which is significant. To amortize this cost Akka Streams
uses a windowed, batching backpressure strategy internally. It is windowed because as opposed to a Stop-AndWait protocol multiple elements might be in-flight concurrently with requests for elements. It is also batching
because a new element is not immediately requested once an element has been drained from the window-buffer
but multiple elements are requested after multiple elements have been drained. This batching strategy reduces the
communication cost of propagating the backpressure signal through the asynchronous boundary.
While this internal protocol is mostly invisible to the user (apart form its throughput increasing effects) there are
situations when these details get exposed. In all of our previous examples we always assumed that the rate of
the processing chain is strictly coordinated through the backpressure signal causing all stages to process no faster
than the throughput of the connected chain. There are tools in Akka Streams however that enable the rates of
different segments of a processing chain to be detached or to define the maximum throughput of the stream
through external timing sources. These situations are exactly those where the internal batching buffering strategy
suddenly becomes non-transparent.
37
akka.stream.materializer.max-input-buffer-size = 16
If the buffer size needs to be set for segments of a Flow only, it is possible by defining a separate Flow with these
attributes:
final Flow<Integer, Integer, BoxedUnit> flow1 =
Flow.of(Integer.class)
.map(elem -> elem * 2) // the buffer size of this map is 1
.withAttributes(Attributes.inputBuffer(1, 1));
final Flow<Integer, Integer, BoxedUnit> flow2 =
flow1.via(
Flow.of(Integer.class)
.map(elem -> elem / 2)); // the buffer size of this map is the default
Here is an example of a code that demonstrate some of the issues caused by internal buffers:
final FiniteDuration oneSecond =
FiniteDuration.create(1, TimeUnit.SECONDS);
final Source<String, Cancellable> msgSource =
Source.from(oneSecond, oneSecond, "message!");
final Source<String, Cancellable> tickSource =
Source.from(oneSecond.mul(3), oneSecond.mul(3), "tick");
final Flow<String, Integer, BoxedUnit> conflate =
Flow.of(String.class).conflate(
first -> 1, (count, elem) -> count + 1);
FlowGraph.factory().closed(b -> {
final FanInShape2<String, Integer, Integer> zipper =
b.graph(ZipWith.create((String tick, Integer count) -> count));
b.from(msgSource).via(conflate).to(zipper.in1());
b.from(tickSource).to(zipper.in0());
b.from(zipper.out()).to(Sink.foreach(elem -> System.out.println(elem)));
}).run(mat);
Running the above example one would expect the number 3 to be printed in every 3 seconds (the conflate step
here is configured so that it counts the number of elements received before the downstream ZipWith consumes
them). What is being printed is different though, we will see the number 1. The reason for this is the internal buffer
which is by default 16 elements large, and prefetches elements before the ZipWith starts consuming them. It is
possible to fix this issue by changing the buffer size of ZipWith (or the whole graph) to 1. We will still see a
leading 1 though which is caused by an initial prefetch of the ZipWith element.
Note: In general, when time or rate driven processing stages exhibit strange behavior, one of the first solutions to
try should be to decrease the input buffer of the affected elements to 1.
38
The next example will also queue up 1000 jobs locally, but if there are more jobs waiting in the imaginary external
systems, it makes space for the new element by dropping one element from the tail of the buffer. Dropping from
the tail is a very common strategy but it must be noted that this will drop the youngest waiting job. If some
fairness is desired in the sense that we want to be nice to jobs that has been waiting for long, then this option
can be useful.
jobs.buffer(1000, OverflowStrategy.dropTail());
Instead of dropping the youngest element from the tail of the buffer a new element can be dropped without
enqueueing it to the buffer at all.
jobs.buffer(1000, OverflowStrategy.dropNew());
Here is another example with a queue of 1000 jobs, but it makes space for the new element by dropping one
element from the head of the buffer. This is the oldest waiting job. This is the preferred strategy if jobs are
expected to be resent if not processed in a certain period. The oldest element will be retransmitted soon, (in fact a
retransmitted duplicate might be already in the queue!) so it makes sense to drop it first.
jobs.buffer(1000, OverflowStrategy.dropHead());
Compared to the dropping strategies above, dropBuffer drops all the 1000 jobs it has enqueued once the buffer
gets full. This aggressive strategy is useful when dropping jobs is preferred to delaying jobs.
jobs.buffer(1000, OverflowStrategy.dropBuffer());
If our imaginary external job provider is a client using our API, we might want to enforce that the client cannot
have more than 1000 queued jobs otherwise we consider it flooding and terminate the connection. This is easily
achievable by the error strategy which simply fails the stream once the buffer gets full.
jobs.buffer(1000, OverflowStrategy.fail());
This example demonstrates that such flows rate is decoupled. Element rate at the start of the flow can be much
higher that the element rate at the end of the flow.
39
Another possible use of conflate is to not consider all elements for summary when producer starts getting too fast.
Example below demonstrates how conflate can be used to implement random drop of elements when consumer is
not able to keep up with the producer.
final Double p = 0.01;
final Flow<Double, Double, BoxedUnit> sampleFlow = Flow.of(Double.class)
.conflate(elem -> Collections.singletonList(elem), (acc, elem) -> {
if (r.nextDouble() < p) {
return Stream
.concat(acc.stream(), Collections.singletonList(elem).stream())
.collect(Collectors.toList());
}
return acc;
})
.mapConcat(d -> d);
Understanding expand
Expand helps to deal with slow producers which are unable to keep up with the demand coming from consumers.
Expand allows to extrapolate a value to be sent as an element to a consumer.
As a simple use of expand here is a flow that sends the same element to consumer when producer does not send
any new elements.
final Flow<Double, Double, BoxedUnit> lastFlow = Flow.of(Double.class)
.expand(d -> d, s -> new Pair(s, s));
Expand also allows to keep some state between demand requests from the downstream. Leveraging this, here is a
flow that tracks and reports a drift between fast consumer and slow producer.
final Flow<Double, Pair<Double, Integer>, BoxedUnit> driftFlow = Flow.of(Double.class)
.expand(d -> new Pair<Double, Integer>(d, 0), t -> {
return new Pair(t, new Pair(t.first(), t.second() + 1));
});
Note that all of the elements coming from upstream will go through expand at least once. This means that the
output of this flow is going to report a drift of zero if producer is fast enough, or a larger drift otherwise.
40
The input ports are implemented as event handlers onPush(elem,ctx) and onPull(ctx) while output
ports correspond to methods on the Context object that is handed as a parameter to the event handlers. By
calling exactly one output port method we wire up these four ports in various ways which we demonstrate
shortly.
Warning: There is one very important rule to remember when working with a Stage. Exactly one method
should be called on the currently passed Context exactly once and as the last statement of the handler
where the return type of the called method matches the expected return type of the handler. Any violation
of this rule will almost certainly result in unspecified behavior (in other words, it will break in spectacular
ways). Exceptions to this rule are the query methods isHolding() and isFinishing()
To illustrate these concepts we create a small PushPullStage that implements the map transformation.
Map calls ctx.push() from the onPush() handler and it also calls ctx.pull() form the onPull handler
resulting in the conceptual wiring above, and fully expressed in code below:
public class Map<A, B> extends PushPullStage<A, B> {
private final Function<A, B> f;
public Map(Function<A, B> f) {
this.f = f;
}
@Override public SyncDirective onPush(A elem, Context<B> ctx) {
return ctx.push(f.apply(elem));
}
41
Map is a typical example of a one-to-one transformation of a stream. To demonstrate a many-to-one stage we will
implement filter. The conceptual wiring of Filter looks like this:
As we see above, if the given predicate matches the current element we are propagating it downwards, otherwise
we return the ball to our upstream so that we get the new element. This is achieved by modifying the map
example by adding a conditional in the onPush handler and decide between a ctx.pull() or ctx.push()
call (and of course not having a mapping f function).
public class Filter<A> extends PushPullStage<A, A> {
private final Predicate<A> p;
public Filter(Predicate<A> p) {
this.p = p;
}
@Override public SyncDirective onPush(A elem, Context<A> ctx) {
if (p.test(elem)) return ctx.push(elem);
else return ctx.pull();
}
@Override public SyncDirective onPull(Context<A> ctx) {
return ctx.pull();
}
}
To complete the picture we define a one-to-many transformation as the next step. We chose a straightforward
example stage that emits every upstream element twice downstream. The conceptual wiring of this stage looks
like this:
42
This is a stage that has state: the last element it has seen, and a flag oneLeft that indicates if we have duplicated
this last element already or not. Looking at the code below, the reader might notice that our onPull method
is more complex than it is demonstrated by the figure above. The reason for this is completion handling, which
we will explain a little bit later. For now it is enough to look at the if(!ctx.isFinishing) block which
corresponds to the logic we expect by looking at the conceptual picture.
class Duplicator<A> extends PushPullStage<A, A> {
private A lastElem = null;
private boolean oneLeft = false;
@Override public SyncDirective onPush(A elem, Context<A> ctx) {
lastElem = elem;
oneLeft = true;
return ctx.push(elem);
}
@Override public SyncDirective onPull(Context<A> ctx) {
if (!ctx.isFinishing()) {
// the main pulling logic is below as it is demonstrated on the illustration
if (oneLeft) {
oneLeft = false;
return ctx.push(lastElem);
} else
return ctx.pull();
} else {
// If we need to emit a final element after the upstream
// finished
if (oneLeft) return ctx.pushAndFinish(lastElem);
else return ctx.finish();
}
}
@Override public TerminationDirective onUpstreamFinish(Context<A> ctx) {
return ctx.absorbTermination();
}
}
Finally, to demonstrate all of the stages above, we put them together into a processing chain, which conceptually
would correspond to the following structure:
43
In code this is only a few lines, using the transform method to inject our custom processing into a stream:
final RunnableGraph<Future<List<Integer>>> runnable =
Source
.from(Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
.transform(() -> new Filter<Integer>(elem -> elem % 2 == 0))
.transform(() -> new Duplicator<Integer>())
.transform(() -> new Map<Integer, Integer>(elem -> elem / 2))
.toMat(sink, Keep.right());
If we attempt to draw the sequence of events, it shows that there is one event token in circulation in a potential
chain of stages, just like our conceptual railroad tracks representation predicts.
Completion handling
Completion handling usually (but not exclusively) comes into the picture when processing stages need
to emit a few more elements after their upstream source has been completed. We have seen an example of this in our Duplicator class where the last element needs to be doubled even after the
44
upstream neighbor stage has been completed. Since the onUpstreamFinish() handler expects a
TerminationDirective as the return type we are only allowed to call ctx.finish(), ctx.fail()
or ctx.absorbTermination(). Since the first two of these available methods will immediately terminate,
our only option is absorbTermination(). It is also clear from the return type of onUpstreamFinish
that we cannot call ctx.push() but we need to emit elements somehow! The trick is that after calling absorbTermination() the onPull() handler will be called eventually, and at the same time
ctx.isFinishing will return true, indicating that ctx.pull() cannot be called anymore. Now we are
free to emit additional elementss and call ctx.finish() or ctx.pushAndFinish() eventually to finish
processing.
The reason for this slightly complex termination sequence is that the underlying onComplete signal of Reactive
Streams may arrive without any pending demand, i.e. without respecting backpressure. This means that our
push/pull structure that was illustrated in the figure of our custom processing chain does not apply to termination.
Our neat model that is analogous to a ball that bounces back-and-forth in a pipe (it bounces back on Filter,
Duplicator for example) cannot describe the termination signals. By calling absorbTermination() the
execution environment checks if the conceptual token was above the current stage at that time (which means that
it will never come back, so the environment immediately calls onPull) or it was below (which means that it will
come back eventually, so the environment does not need to call anything yet).
The first of the two scenarios is when a termination signal arrives after a stage passed the event to its downstream.
As we can see in the following diagram, there is no need to do anything by absorbTermination() since the
black arrows representing the movement of the event token is uninterrupted.
In the second scenario the event token is somewhere upstream when the termination signal arrives. In this
case absorbTermination needs to ensure that a new event token is generated replacing the old one that is
forever gone (since the upstream finished). This is done by calling the onPull() event handler of the stage.
45
Observe, that in both scenarios onPull() kicks off the continuation of the processing logic, the only difference
is whether it is the downstream or the absorbTermination() call that calls the event handler.
Using PushStage
Many one-to-one and many-to-one transformations do not need to override the onPull() handler at all since all
they do is just propagate the pull upwards. For such transformations it is better to extend PushStage directly. For
example our Map and Filter would look like this:
public class Map2<A, B> extends PushStage<A, B> {
private final Function<A, B> f;
public Map2(Function<A, B> f) {
this.f = f;
}
@Override public SyncDirective onPush(A elem, Context<B> ctx) {
return ctx.push(f.apply(elem));
}
}
public class Filter2<A> extends PushStage<A, A> {
private final Predicate<A> p;
public Filter2(Predicate<A> p) {
this.p = p;
}
@Override public SyncDirective onPush(A elem, Context<A> ctx) {
if (p.test(elem)) return ctx.push(elem);
else return ctx.pull();
}
}
The reason to use PushStage is not just cosmetic: internal optimizations rely on the fact that the onPull method
only calls ctx.pull() and allow the environment do process elements faster than without this knowledge. By
extending PushStage the environment can be sure that onPull() was not overridden since it is final on
PushStage.
46
Using StatefulStage
On top of PushPullStage which is the most elementary and low-level abstraction and PushStage that is
a convenience class that also informs the environment about possible optimizations StatefulStage is a new
tool that builds on PushPullStage directly, adding various convenience methods on top of it. It is possible to
explicitly maintain state-machine like states using its become() method to encapsulates states explicitly. There
is also a handy emit() method that simplifies emitting multiple values given as an iterator. To demonstrate this
feature we reimplemented Duplicator in terms of a StatefulStage:
public class Duplicator2<A> extends StatefulStage<A, A> {
@Override public StageState<A, A> initial() {
return new StageState<A, A>() {
@Override public SyncDirective onPush(A elem, Context<A> ctx) {
return emit(Arrays.asList(elem, elem).iterator(), ctx);
}
};
}
}
Using DetachedStage
The model described in previous sections, while conceptually simple, cannot describe all desired stages. The main
limitation is the single-ball (single event token) model which prevents independent progress of an upstream
and downstream of a stage. Sometimes it is desirable to detach the progress (and therefore, rate) of the upstream
and downstream of a stage, synchronizing only when needed.
This is achieved in the model by representing a DetachedStage as a boundary between two single-ball
regions. One immediate consequence of this difference is that it is not allowed to call ctx.pull() from
onPull() and it is not allowed to call ctx.push() from onPush() as such combinations would steal
a token from one region (resulting in zero tokens left) and would inject an unexpected second token to the other
region. This is enforced by the expected return types of these callback functions.
One of the important use-cases for DetachedStage is to build buffer-like entities, that allow independent
progress of upstream and downstream stages when the buffer is not full or empty, and slowing down the appropriate
side if the buffer becomes empty or full. The next diagram illustrates the event sequence for a buffer with capacity
of two elements.
47
The very first difference we can notice is that our Buffer stage is automatically pulling its upstream on initialization. Remember that it is forbidden to call ctx.pull from onPull, therefore it is the task of the framework
to kick off the first event token in the upstream region, which will remain there until the upstream stages stop.
The diagram distinguishes between the actions of the two regions by colors: purple arrows indicate the actions
involving the upstream event token, while red arrows show the downstream region actions. This demonstrates
the clear separation of these regions, and the invariant that the number of tokens in the two regions are kept
unchanged.
For buffer it is necessary to detach the two regions, but it is also necessary to sometimes hold back the upstream
or downstream. The new API calls that are available for DetachedStage s are the various ctx.holdXXX()
methods , ctx.pushAndPull() and variants, and ctx.isHoldingXXX(). Calling ctx.holdXXX()
from onPull() or onPush results in suspending the corresponding region from progress, and temporarily
taking ownership of the event token. This state can be queried by ctx.isHolding() which will tell if the
stage is currently holding a token or not. It is only allowed to suspend one of the regions, not both, since that
would disable all possible future events, resulting in a dead-lock. Releasing the held token is only possible by
calling ctx.pushAndPull(). This is to ensure that both the held token is released, and the triggering region
gets its token back (one inbound token + one held token = two released tokens).
The following code example demonstrates the buffer class corresponding to the message sequence chart we discussed.
class Buffer2<T> extends DetachedStage<T, T> {
final private Integer SIZE = 2;
final private List<T> buf = new ArrayList<>(SIZE);
private Integer capacity = SIZE;
private boolean isFull() {
return capacity == 0;
48
}
private boolean isEmpty() {
return capacity == SIZE;
}
private T dequeue() {
capacity += 1;
return buf.remove(0);
}
private void enqueue(T elem) {
capacity -= 1;
buf.add(elem);
}
public DownstreamDirective onPull(DetachedContext<T> ctx) {
if (isEmpty()) {
if (ctx.isFinishing()) return ctx.finish(); // No more elements will arrive
else return ctx.holdDownstream(); // waiting until new elements
} else {
final T next = dequeue();
if (ctx.isHoldingUpstream()) return ctx.pushAndPull(next); // release upstream
else return ctx.push(next);
}
}
public UpstreamDirective onPush(T elem, DetachedContext<T> ctx) {
enqueue(elem);
if (isFull()) return ctx.holdUpstream(); // Queue is now full, wait until new empty slot
else {
if (ctx.isHoldingDownstream()) return ctx.pushAndPull(dequeue()); // Release downstream
else return ctx.pull();
}
}
public TerminationDirective onUpstreamFinish(DetachedContext<T> ctx) {
if (!isEmpty()) return ctx.absorbTermination(); // still need to flush from buffer
else return ctx.finish(); // already empty, finishing
}
}
49
Implementing a custom merge stage is done by extending the FlexiMerge trait, exposing its input ports and
finally defining the logic which will decide how this merge should behave. First we need to create the ports which
are used to wire up the fan-in element in a FlowGraph. These input ports must be properly typed and their names
should indicate what kind of port it is.
public class PreferringMerge
extends FlexiMerge<Integer, Integer, FanInShape3<Integer, Integer, Integer, Integer>> {
public PreferringMerge() {
super(
new FanInShape3<Integer, Integer, Integer, Integer>("PreferringMerge"),
Attributes.name("PreferringMerge")
);
}
@Override
public MergeLogic<Integer, Integer> createMergeLogic(
FanInShape3<Integer, Integer, Integer, Integer> s) {
return new MergeLogic<Integer, Integer>() {
@Override
public State<Integer, Integer> initialState() {
return new State<Integer, Integer>(readPreferred(s.in0(), s.in1(), s.in2())) {
@Override
public State<Integer, Integer> onInput(MergeLogicContext<Integer> ctx,
InPort inputHandle, Integer element) {
ctx.emit(element);
return sameState();
}
};
}
};
}
}
Next we implement the createMergeLogic method, which will be used as factory of merges MergeLogic.
A new MergeLogic object will be created for each materialized stream, so it is allowed to be stateful.
The MergeLogic defines the behaviour of our merge stage, and may be stateful (for example to buffer some
elements internally).
Warning: While a MergeLogic instance may be stateful, the FlexiMerge instance must not hold any
mutable state, since it may be shared across several materialized FlowGraph instances.
Next we implement the initialState method, which returns the behaviour of the merge stage. A
MergeLogic#State defines the behaviour of the merge by signaling which input ports it is interested in consuming, and how to handle the element once it has been pulled from its upstream. Signalling which input port
we are interested in pulling data from is done by using an appropriate read condition. Available read conditions
include:
Read(input) - reads from only the given input,
ReadAny(inputs) reads from any of the given inputs,
ReadPreferred(preferred)(secondaries) reads from the preferred input if elements available, otherwise from one of the secondaries,
ReadAll(inputs) reads from all given inputs (like Zip), and offers an ReadAllInputs as the
element passed into the state function, which allows to obtain the pulled element values in a type-safe
way.
In our case we use the ReadPreferred read condition which has the exact semantics which we need to implement our preferring merge it pulls elements from the preferred input port if there are any available, otherwise
reverting to pulling from the secondary inputs. The context object passed into the state function allows us to
50
interact with the connected streams, for example by emitting an element, which was just pulled from the given
input, or signalling completion or failure to the merges downstream stage.
The state function must always return the next behaviour to be used when an element should be pulled from its
upstreams, we use the special SameState object which signals FlexiMerge that no state transition is needed.
Note: As response to an input element it is allowed to emit at most one output element.
More complex fan-in junctions may require not only multiple States but also sharing state between those states.
As MergeLogic is allowed to be stateful, it can be easily used to hold the state of the merge junction.
We now implement the equivalent of the built-in Zip junction by using the property that a the MergeLogic can be
stateful and that each read is followed by a state transition (much like in Akka FSM or Actor#become).
public class Zip2<A, B> extends FlexiMerge<A, Pair<A, B>, FanInShape2<A, B, Pair<A, B>>> {
public Zip2() {
super(new FanInShape2<A, B, Pair<A, B>>("Zip2"), Attributes.name("Zip2"));
}
@Override
public MergeLogic<A, Pair<A, B>> createMergeLogic(final FanInShape2<A, B, Pair<A, B>> s) {
return new MergeLogic<A, Pair<A, B>>() {
private A lastInA = null;
private final State<A, Pair<A, B>> readA = new State<A, Pair<A, B>>(read(s.in0())) {
@Override
public State<B, Pair<A, B>> onInput(
MergeLogicContext<Pair<A, B>> ctx, InPort inputHandle, A element) {
lastInA = element;
return readB;
}
};
private final State<B, Pair<A, B>> readB = new State<B, Pair<A, B>>(read(s.in1())) {
@Override
public State<A, Pair<A, B>> onInput(
MergeLogicContext<Pair<A, B>> ctx, InPort inputHandle, B element) {
ctx.emit(new Pair<A, B>(lastInA, element));
return readA;
}
};
@Override
public State<A, Pair<A, B>> initialState() {
return readA;
}
@Override
public CompletionHandling<Pair<A, B>> initialCompletionHandling() {
return eagerClose();
}
};
}
}
The above style of implementing complex flexi merges is useful when we need fine grained control over consuming
from certain input ports. Sometimes however it is simpler to strictly consume all of a given set of inputs. In the
Zip rewrite below we use the ReadAll read condition, which behaves slightly differently than the other read
1.8. Custom stream processing
51
conditions, as the element it is emitting is of the type ReadAllInputs instead of directly handing over the
pulled elements:
public class Zip<A, B>
extends FlexiMerge<FlexiMerge.ReadAllInputs, Pair<A, B>, FanInShape2<A, B, Pair<A, B>>> {
public Zip() {
super(new FanInShape2<A, B, Pair<A, B>>("Zip"), Attributes.name("Zip"));
}
@Override
public MergeLogic<ReadAllInputs, Pair<A, B>>createMergeLogic(
final FanInShape2<A, B, Pair<A, B>> s) {
return new MergeLogic<ReadAllInputs, Pair<A, B>>() {
@Override
public State<ReadAllInputs, Pair<A, B>> initialState() {
return new State<ReadAllInputs, Pair<A, B>>(readAll(s.in0(), s.in1())) {
@Override
public State<ReadAllInputs, Pair<A, B>> onInput(
MergeLogicContext<Pair<A, B>> ctx,
InPort input,
ReadAllInputs inputs) {
final A a = inputs.get(s.in0());
final B b = inputs.get(s.in1());
ctx.emit(new Pair<A, B>(a, b));
return this;
}
};
}
@Override
public CompletionHandling<Pair<A, B>> initialCompletionHandling() {
return eagerClose();
}
};
}
}
Thanks to being handed a ReadAllInputs instance instead of the elements directly it is possible to pick elements in a type-safe way based on their input port.
Connecting your custom junction is as simple as creating an instance and connecting Sources and Sinks to its ports
(notice that the merged output port is named out):
final Sink<Pair<Integer, String>, Future<Pair<Integer, String>>> head =
Sink.<Pair<Integer, String>>head();
final Future<Pair<Integer, String>> future = FlowGraph.factory().closed(head,
(builder, headSink) -> {
final FanInShape2<Integer, String, Pair<Integer, String>> zip =
builder.graph(new Zip<Integer, String>());
builder.from(Source.single(1)).to(zip.in0());
builder.from(Source.single("A")).to(zip.in1());
builder.from(zip.out()).to(headSink);
}).run(mat);
52
Completion handling
@Override
public MergeLogic<T, T> createMergeLogic(final FanInShape3<T, T, T, T> s) {
return new MergeLogic<T, T>() {
@Override
public CompletionHandling<T> initialCompletionHandling() {
return new CompletionHandling<T>() {
@Override
public State<T, T> onUpstreamFinish(MergeLogicContextBase<T> ctx,
InPort input) {
if (input == s.in0()) {
System.out.println("Important input completed, shutting down.");
ctx.finish();
return sameState();
} else {
System.out.printf("Replica %s completed, " +
"no more replicas available, " +
"applying eagerClose completion handling.\n", input);
ctx.changeCompletionHandling(eagerClose());
return sameState();
}
}
@Override
public State<T, T> onUpstreamFailure(MergeLogicContextBase<T> ctx,
InPort input, Throwable cause) {
if (input == s.in0()) {
ctx.fail(cause);
return sameState();
} else {
System.out.printf("Replica %s failed, " +
53
completion
handling,
it is available as
It is not possible to emit elements from the completion handling, since completion handlers may be invoked at any
time (without regard to downstream demand being available).
Using FlexiRoute
Similarily to using FlexiMerge, implementing custom fan-out stages requires extending the FlexiRoute
class and with a RouteLogic object which determines how the route should behave.
The first flexi route stage that we are going to implement is Unzip, which consumes a stream of pairs and splits
it into two streams of the first and second elements of each pair.
A FlexiRoute has exactly-one input port (in our example, type parameterized as Pair<A,B>), and may have
multiple output ports, all of which must be created beforehand (they can not be added dynamically). First we need
to create the ports which are used to wire up the fan-in element in a FlowGraph.
public class Unzip<A, B> extends FlexiRoute<Pair<A, B>, FanOutShape2<Pair<A, B>, A, B>> {
public Unzip() {
super(new FanOutShape2<Pair<A, B>, A, B>("Unzip"), Attributes.name("Unzip"));
}
@Override
public RouteLogic<Pair<A, B>> createRouteLogic(final FanOutShape2<Pair<A, B>, A, B> s) {
return new RouteLogic<Pair<A, B>>() {
@Override
public State<BoxedUnit, Pair<A, B>> initialState() {
return new State<BoxedUnit, Pair<A, B>>(demandFromAll(s.out0(), s.out1())) {
@Override
public State<BoxedUnit, Pair<A, B>> onInput(
RouteLogicContext<Pair<A, B>> ctx, BoxedUnit x, Pair<A, B> element) {
ctx.emit(s.out0(), element.first());
ctx.emit(s.out1(), element.second());
return sameState();
}
54
};
}
@Override
public CompletionHandling<Pair<A, B>> initialCompletionHandling() {
return eagerClose();
}
};
}
}
Next we implement RouteLogic#initialState by providing a State that uses the DemandFromAll demand condition to signal to flexi route that elements can only be emitted from this stage when demand is available
from all given downstream output ports. Other available demand conditions are:
DemandFrom(output) - triggers when the given output port has pending demand,
DemandFromAny(outputs) - triggers when any of the given output ports has pending demand,
DemandFromAll(outputs) - triggers when all of the given output ports has pending demand.
Since the Unzip junction were implementing signals both downstreams stages at the same time, we use
DemandFromAll, unpack the incoming pair in the state function and signal its first element to the left stream,
and the second element of the pair to the right stream. Notice that since we are emitting values of different types
(A and B), the output type parameter of this State must be set to Any. This type can be utilised more efficiently
when a junction is emitting the same type of element to its downstreams e.g. in all strictly routing stages.
The state function must always return the next behaviour to be used when an element should be emitted, we use
the special SameState object which signals FlexiRoute that no state transition is needed.
Warning: While a RouteLogic instance may be stateful, the FlexiRoute instance must not hold any
mutable state, since it may be shared across several materialized FlowGraph instances.
Note: It is only allowed to emit at most one element to each output in response to onInput, IllegalStateException
is thrown.
Completion handling
55
@Override
public RouteLogic<T> createRouteLogic(FanOutShape3<T, T, T, T> s) {
return new RouteLogic<T>() {
@Override
public CompletionHandling<T> initialCompletionHandling() {
return new CompletionHandling<T>() {
@Override
public State<T, T> onDownstreamFinish(RouteLogicContextBase<T> ctx,
OutPort output) {
if (output == s.out0()) {
// finish all downstreams, and cancel the upstream
ctx.finish();
return sameState();
} else {
return sameState();
}
}
@Override
public void onUpstreamFinish(RouteLogicContextBase<T> ctx) {
}
@Override
public void onUpstreamFailure(RouteLogicContextBase<T> ctx, Throwable t) {
}
};
}
@Override
public State<OutPort, T> initialState() {
return new State<OutPort, T>(demandFromAny(s.out0(), s.out1(), s.out2())) {
@SuppressWarnings("unchecked")
@Override
public State<T, T> onInput(
RouteLogicContext<T> ctx, OutPort preferred, T element) {
ctx.emit((Outlet<T>) preferred, element);
return sameState();
}
};
}
};
}
}
Notice that State changes are only allowed in reaction to downstream cancellations, and not in the upstream
completion/failure cases. This is because since there is only one upstream, there is nothing else to do than possibly
flush buffered elements and continue with shutting down the entire stream.
It is not possible to emit elements from the completion handling, since completion handlers may be invoked at any
time (without regard to downstream demand being available).
56
In essence, the above guarantees are similar to what Actor s provide, if one thinks of the state of a custom stage
as state of an actor, and the callbacks as the receive block of the actor.
Warning: It is not safe to access the state of any custom stage outside of the callbacks that it provides, just
like it is unsafe to access the state of an actor from the outside. This means that Future callbacks should not
close over internal state of custom stages because such access can be concurrent with the provided callbacks,
leading to undefined behavior.
1.9 Integration
1.9.1 Integrating with Actors
For piping the elements of a stream as messages to an ordinary actor you can use the Sink.actorRef. Messages
can be sent to a stream via the ActorRef that is materialized by Source.actorRef.
For more advanced use cases the ActorPublisher and ActorSubscriber traits are provided to support
implementing Reactive Streams Publisher and Subscriber with an Actor.
These can be consumed by other Reactive Stream libraries or used as a Akka Streams Source or Sink.
Warning: AbstractActorPublisher and AbstractActorSubscriber cannot be used with remote actors, because if signals of the Reactive Streams protocol (e.g. request) are lost the the stream may
deadlock.
Note: These Actors are designed to be implemented using Java 8 lambda expressions. In case you need to stay
on a JVM prior to 8, Akka provides UntypedActorPublisher and UntypedActorSubscriber which
can be used easily from any language level.
Source.actorRef
Messages sent to the actor that is materialized by Source.actorRef will be emitted to the stream if there is
demand from downstream, otherwise they will be buffered until request for demand is received.
Depending on the defined OverflowStrategy it might drop elements if there is no space available in the
buffer. The strategy OverflowStrategy.backpressure() is not supported for this Source type, you
should consider using ActorPublisher if you want a backpressured actor interface.
The stream can be completed successfully by
akka.actor.Status.Success to the actor reference.
sending
akka.actor.PoisonPill
or
The stream can be completed with failure by sending akka.actor.Status.Failure to the actor reference.
The actor will be stopped when the stream is completed, failed or cancelled from downstream, i.e. you can watch
it to get notified when that happens.
Sink.actorRef
The sink sends the elements of the stream to the given ActorRef. If the target actor terminates the stream will
be cancelled. When the stream is completed successfully the given onCompleteMessage will be sent to the
destination actor. When the stream is completed with failure a akka.actor.Status.Failure message will
be sent to the destination actor.
1.9. Integration
57
Warning: There is no back-pressure signal from the destination actor, i.e. if the actor is not consuming the
messages fast enough the mailbox of the actor will grow. For potentially slow consumer actors it is recommended to use a bounded mailbox with zero mailbox-push-timeout-time or use a rate limiting stage in front of
this stage.
ActorPublisher
Extend akka.stream.actor.AbstractActorPublisher to implement a stream publisher that keeps
track of the subscription life cycle and requested elements.
Here is an example of such an actor. It dispatches incoming jobs to the attached subscriber:
public static class JobManagerProtocol {
final public static class Job {
public final String payload;
public Job(String payload) {
this.payload = payload;
}
}
public static class JobAcceptedMessage {
@Override
public String toString() {
return "JobAccepted";
}
}
public static final JobAcceptedMessage JobAccepted = new JobAcceptedMessage();
public static class JobDeniedMessage {
@Override
public String toString() {
return "JobDenied";
}
}
public static final JobDeniedMessage JobDenied = new JobDeniedMessage();
}
public static class JobManager extends AbstractActorPublisher<JobManagerProtocol.Job> {
public static Props props() { return Props.create(JobManager.class); }
private final int MAX_BUFFER_SIZE = 100;
private final List<JobManagerProtocol.Job> buf = new ArrayList<>();
public JobManager() {
receive(ReceiveBuilder.
match(JobManagerProtocol.Job.class, job -> buf.size() == MAX_BUFFER_SIZE, job -> {
sender().tell(JobManagerProtocol.JobDenied, self());
}).
match(JobManagerProtocol.Job.class, job -> {
sender().tell(JobManagerProtocol.JobAccepted, self());
if (buf.isEmpty() && totalDemand() > 0)
onNext(job);
else {
buf.add(job);
deliverBuf();
}
}).
match(ActorPublisherMessage.Request.class, request -> deliverBuf()).
1.9. Integration
58
You send elements to the stream by calling onNext. You are allowed to send as many elements as
have been requested by the stream subscriber. This amount can be inquired with totalDemand. It
is only allowed to use onNext when isActive and totalDemand>0, otherwise onNext will throw
IllegalStateException.
When the stream subscriber requests more elements the ActorPublisherMessage.Request message is
delivered to this actor, and you can act on that event. The totalDemand is updated automatically.
When the stream subscriber cancels the subscription the ActorPublisherMessage.Cancel message is
delivered to this actor. After that subsequent calls to onNext will be ignored.
You can complete the stream by calling onComplete. After that you are not allowed to call onNext, onError
and onComplete.
You can terminate the stream with failure by calling onError. After that you are not allowed to call onNext,
onError and onComplete.
If you suspect that this AbstractActorPublisher may never get subscribed to, you can
override the subscriptionTimeout method to provide a timeout after which this Publisher
should be considered canceled.
The actor will be notified when the timeout triggers via an
ActorPublisherMessage.SubscriptionTimeoutExceeded message and MUST then perform
cleanup and stop itself.
If the actor is stopped the stream will be completed, unless it was not already terminated with failure, completed
or canceled.
More detailed information can be found in the API documentation.
This is how it can be used as input Source to a Flow:
final Source<JobManagerProtocol.Job, ActorRef> jobManagerSource =
Source.actorPublisher(JobManager.props());
final ActorRef ref = jobManagerSource
.map(job -> job.payload.toUpperCase())
.map(elem -> {
System.out.println(elem);
return elem;
})
1.9. Integration
59
.to(Sink.ignore())
.run(mat);
ref.tell(new JobManagerProtocol.Job("a"), ActorRef.noSender());
ref.tell(new JobManagerProtocol.Job("b"), ActorRef.noSender());
ref.tell(new JobManagerProtocol.Job("c"), ActorRef.noSender());
ActorSubscriber
Extend akka.stream.actor.AbstractActorSubscriber to make your class a stream subscriber
with full control of stream back pressure.
It will receive ActorSubscriberMessage.OnNext,
ActorSubscriberMessage.OnComplete and ActorSubscriberMessage.OnError messages
from the stream. It can also receive other, non-stream messages, in the same way as any actor.
Here is an example of such an actor. It dispatches incoming jobs to child worker actors:
public static class WorkerPoolProtocol {
public static class Msg {
public final int id;
public final ActorRef replyTo;
public Msg(int id, ActorRef replyTo) {
this.id = id;
this.replyTo = replyTo;
}
@Override
public String toString() {
return String.format("Msg(%s, %s)", id, replyTo);
}
}
public static Msg msg(int id, ActorRef replyTo) {
return new Msg(id, replyTo);
}
1.9. Integration
60
}
public static Reply reply(int id) {
return new Reply(id);
}
1.9. Integration
61
public WorkerPool() {
final List<Routee> routees = new ArrayList<>();
for (int i = 0; i < 3; i++)
routees.add(new ActorRefRoutee(context().actorOf(Props.create(Worker.class))));
router = new Router(new RoundRobinRoutingLogic(), routees);
receive(ReceiveBuilder.
match(ActorSubscriberMessage.OnNext.class, on -> on.element() instanceof WorkerPoolProtocol.Ms
onNext -> {
WorkerPoolProtocol.Msg msg = (WorkerPoolProtocol.Msg) onNext.element();
queue.put(msg.id, msg.replyTo);
if (queue.size() > MAX_QUEUE_SIZE)
throw new RuntimeException("queued too many: " + queue.size());
router.route(WorkerPoolProtocol.work(msg.id), self());
}).
match(WorkerPoolProtocol.Reply.class, reply -> {
int id = reply.id;
queue.get(id).tell(WorkerPoolProtocol.done(id), self());
queue.remove(id);
}).
build());
}
}
static class Worker extends AbstractActor {
public Worker() {
receive(ReceiveBuilder.
match(WorkerPoolProtocol.Work.class, work -> {
// ...
sender().tell(WorkerPoolProtocol.reply(work.id), self());
}).build());
}
}
Subclass must define the RequestStrategy to control stream back pressure. After each incoming message the
AbstractActorSubscriber will automatically invoke the RequestStrategy.requestDemand and
propagate the returned demand to the stream.
The provided WatermarkRequestStrategy is a good strategy if the actor performs work itself.
The provided MaxInFlightRequestStrategy is useful if messages are queued internally or delegated to other actors.
You can also implement a custom RequestStrategy or call request manually together with
ZeroRequestStrategy or some other strategy. In that case you must also call request when the
actor is started or when it is ready, otherwise it will not receive any elements.
More detailed information can be found in the API documentation.
This is how it can be used as output Sink to a Flow:
final int N = 117;
final List<Integer> data = new ArrayList<>(N);
for (int i = 0; i < N; i++) {
data.add(i);
}
Source.from(data)
.map(i -> WorkerPoolProtocol.msg(i, replyTo))
.runWith(Sink.<WorkerPoolProtocol.Msg>actorSubscriber(WorkerPool.props()), mat);
1.9. Integration
62
Transforming the stream of authors to a stream of email addresses by using the lookupEmail service can be
done with mapAsync:
final Source<String, BoxedUnit> emailAddresses = authors
.mapAsync(4, author -> addressSystem.lookupEmail(author.handle))
.filter(o -> o.isPresent())
.map(o -> o.get());
mapAsync is applying the given function that is calling out to the external service to each of the elements as they
pass through this processing step. The function returns a Future and the value of that future will be emitted
downstreams. The number of Futures that shall run in parallel is given as the first argument to mapAsync. These
Futures may complete in any order, but the elements that are emitted downstream are in the same order as received
from upstream.
That means that back-pressure works as expected. For example if the emailServer.send is the bottleneck it
will limit the rate at which incoming tweets are retrieved and email addresses looked up.
The final piece of this pipeline is to generate the demand that pulls the tweet authors information through the
emailing pipeline: we attach a Sink.ignore which makes it all run. If our email process would return some
interesting data for further transformation then we would of course not ignore it but send that result stream onwards
for further processing or storage.
Note that mapAsync preserves the order of the stream elements. In this example the order is not important and
then we can use the more efficient mapAsyncUnordered:
final Source<Author, BoxedUnit> authors =
tweets
.filter(t -> t.hashtags().contains(AKKA))
.map(t -> t.author);
final Source<String, BoxedUnit> emailAddresses =
authors
.mapAsyncUnordered(4, author -> addressSystem.lookupEmail(author.handle))
.filter(o -> o.isPresent())
.map(o -> o.get());
1.9. Integration
63
In the above example the services conveniently returned a Future of the result. If that is not the case you need
to wrap the call in a Future. If the service call involves blocking you must also make sure that you run it on a
dedicated execution context, to avoid starvation and disturbance of other tasks in the system.
final MessageDispatcher blockingEc = system.dispatchers().lookup("blocking-dispatcher");
final RunnableGraph sendTextMessages =
phoneNumbers
.mapAsync(4, phoneNo ->
Futures.future(() ->
smsServer.send(new TextMessage(phoneNo, "I like your tweet")),
blockingEc)
)
.to(Sink.ignore());
sendTextMessages.run(mat);
An alternative for blocking calls is to perform them in a map operation, still using a dedicated dispatcher for that
operation.
final Flow<String, Boolean, BoxedUnit> send =
Flow.of(String.class)
.map(phoneNo -> smsServer.send(new TextMessage(phoneNo, "I like your tweet")))
.withAttributes(ActorAttributes.dispatcher("blocking-dispatcher"));
final RunnableGraph<?> sendTextMessages =
phoneNumbers.via(send).to(Sink.ignore());
sendTextMessages.run(mat);
However, that is not exactly the same as mapAsync, since the mapAsync may run several calls concurrently,
but map performs them one at a time.
For a service that is exposed as an actor, or if an actor is used as a gateway in front of an external service, you can
use ask:
final Source<Tweet, BoxedUnit> akkaTweets = tweets.filter(t -> t.hashtags().contains(AKKA));
final RunnableGraph saveTweets =
akkaTweets
.mapAsync(4, tweet -> ask(database, new Save(tweet), 300))
.to(Sink.ignore());
Note that if the ask is not completed within the given timeout the stream is completed with failure. If that is not
desired outcome you can use recover on the ask Future.
1.9. Integration
64
Elements starting with a lower case character are simulated to take longer time to process.
Here is how we can use it with mapAsync:
final MessageDispatcher blockingEc = system.dispatchers().lookup("blocking-dispatcher");
final SometimesSlowService service = new SometimesSlowService(blockingEc);
final ActorMaterializer mat = ActorMaterializer.create(
ActorMaterializerSettings.create(system).withInputBuffer(4, 4), system);
Source.from(Arrays.asList("a", "B", "C", "D", "e", "F", "g", "H", "i", "J"))
.map(elem -> { System.out.println("before: " + elem); return elem; })
.mapAsync(4, service::convert)
.runForeach(elem -> System.out.println("after: " + elem), mat);
1.9. Integration
65
running: C
before: F
running: D
before: g
before: H
completed:
completed:
completed:
completed:
after: A
after: B
running: e
after: C
after: D
running: F
before: i
before: J
running: g
running: H
completed:
completed:
completed:
completed:
after: E
after: F
running: i
after: G
after: H
running: J
completed:
completed:
after: I
after: J
(3)
(4)
C
B
D
a
(3)
(2)
(1)
(0)
(1)
(2)
(3)
(4)
H (2)
F (3)
e (1)
g (0)
(1)
(2)
J (1)
i (0)
Note that after lines are in the same order as the before lines even though elements are completed in a
different order. For example H is completed before g, but still emitted afterwards.
The numbers in parenthesis illustrates how many calls that are in progress at the same time. Here the
downstream demand and thereby the number of concurrent calls are limited by the buffer size (4) of the
ActorMaterializerSettings.
Here is how we can use the same service with mapAsyncUnordered:
final MessageDispatcher blockingEc = system.dispatchers().lookup("blocking-dispatcher");
final SometimesSlowService service = new SometimesSlowService(blockingEc);
final ActorMaterializer mat = ActorMaterializer.create(
ActorMaterializerSettings.create(system).withInputBuffer(4, 4), system);
Source.from(Arrays.asList("a", "B", "C", "D", "e", "F", "g", "H", "i", "J"))
.map(elem -> { System.out.println("before: " + elem); return elem; })
.mapAsyncUnordered(4, service::convert)
.runForeach(elem -> System.out.println("after: " + elem), mat);
1.9. Integration
66
before: F
running: D
before: g
before: H
completed:
completed:
completed:
after: B
after: D
running: e
after: C
running: F
before: i
before: J
completed:
after: F
running: g
running: H
completed:
after: H
completed:
after: A
running: i
running: J
completed:
after: J
completed:
after: E
completed:
after: G
completed:
after: I
(4)
B (3)
C (1)
D (2)
(2)
(3)
F (2)
(3)
(4)
H (3)
a (2)
(3)
(4)
J (3)
e (2)
g (1)
i (0)
Note that after lines are not in the same order as the before lines. For example H overtakes the slow G.
The numbers in parenthesis illustrates how many calls that are in progress at the same time. Here the
downstream demand and thereby the number of concurrent calls are limited by the buffer size (4) of the
ActorMaterializerSettings.
1.9. Integration
67
Using an Akka Streams Flow we can transform the stream and connect those:
final Flow<Tweet, Author, BoxedUnit> authors = Flow.of(Tweet.class)
.filter(t -> t.hashtags().contains(AKKA))
.map(t -> t.author);
Source.from(rs.tweets())
.via(authors)
.to(Sink.create(rs.storage()));
The Publisher is used as an input Source to the flow and the Subscriber is used as an output Sink.
A Flow can also be also converted to a RunnableGraph[Processor[In, Out]] which materializes to a
Processor when run() is called. run() itself can be called multiple times, resulting in a new Processor
instance each time.
final Processor<Tweet, Author> processor =
authors.toProcessor().run(mat);
rs.tweets().subscribe(processor);
processor.subscribe(rs.storage());
A publisher that is created with Sink.publisher only supports one subscriber. A second subscription attempt
will be rejected with an IllegalStateException.
A publisher that supports multiple subscribers can be created with Sink.fanoutPublisher instead:
Subscriber<Author> storage();
Subscriber<Author> alert();
final Publisher<Author> authorPublisher =
Source.from(rs.tweets())
.via(authors)
.runWith(Sink.fanoutPublisher(8, 16), mat);
authorPublisher.subscribe(rs.storage());
authorPublisher.subscribe(rs.alert());
The buffer size controls how far apart the slowest subscriber can be from the fastest subscriber before slowing
down the stream.
To make the picture complete, it is also possible to expose a Sink as a Subscriber by using the SubscriberSource:
final Subscriber<Author> storage = rs.storage();
final Subscriber<Tweet> tweetSubscriber =
authors
.to(Sink.create(storage))
.runWith(Source.subscriber(), mat);
rs.tweets().subscribe(tweetSubscriber);
1.9. Integration
68
It is also possible to use re-wrap Processor instances as a Flow by passing a factory function that will create
the Processor instances:
// An example Processor factory
final Creator<Processor<Integer, Integer>> factory =
new Creator<Processor<Integer, Integer>>() {
public Processor<Integer, Integer> create() {
return Flow.of(Integer.class).toProcessor().run(mat);
}
};
final Flow<Integer, Integer, BoxedUnit> flow = Flow.create(factory);
Please note that a factory is necessary to achieve reusability of the resulting Flow.
The default supervision strategy for a stream can be defined on the settings of the materializer.
final Function<Throwable, Supervision.Directive> decider = exc -> {
if (exc instanceof ArithmeticException)
return Supervision.resume();
else
return Supervision.stop();
};
final Materializer mat = ActorMaterializer.create(
ActorMaterializerSettings.create(system).withSupervisionStrategy(decider),
system);
final Source<Integer, BoxedUnit> source = Source.from(Arrays.asList(0, 1, 2, 3, 4, 5))
69
Here you can see that all ArithmeticException will resume the processing, i.e. the elements that cause the
division by zero are effectively dropped.
Note: Be aware that dropping elements may result in deadlocks in graphs with cycles, as explained in Graph
cycles, liveness and deadlocks.
The supervision strategy can also be defined for all operators of a flow.
final Materializer mat = ActorMaterializer.create(system);
final Function<Throwable, Supervision.Directive> decider = exc -> {
if (exc instanceof ArithmeticException)
return Supervision.resume();
else
return Supervision.stop();
};
final Flow<Integer, Integer, BoxedUnit> flow =
Flow.of(Integer.class).filter(elem -> 100 / elem < 50).map(elem -> 100 / (5 - elem))
.withAttributes(ActorAttributes.withSupervisionStrategy(decider));
final Source<Integer, BoxedUnit> source = Source.from(Arrays.asList(0, 1, 2, 3, 4, 5))
.via(flow);
final Sink<Integer, Future<Integer>> fold =
Sink.fold(0, (acc, elem) -> acc + elem);
final Future<Integer> result = source.runWith(fold, mat);
// the elements causing division by zero will be dropped
// result here will be a Future completed with Success(150)
Restart works in a similar way as Resume with the addition that accumulated state, if any, of the failing
processing stage will be reset.
final Materializer mat = ActorMaterializer.create(system);
final Function<Throwable, Supervision.Directive> decider = exc -> {
if (exc instanceof IllegalArgumentException)
return Supervision.restart();
else
return Supervision.stop();
};
final Flow<Integer, Integer, BoxedUnit> flow =
Flow.of(Integer.class).scan(0, (acc, elem) -> {
if (elem < 0) throw new IllegalArgumentException("negative not allowed");
else return acc + elem;
})
.withAttributes(ActorAttributes.withSupervisionStrategy(decider));
final Source<Integer, BoxedUnit> source = Source.from(Arrays.asList(1, 3, -1, 5, 7))
.via(flow);
final Future<List<Integer>> result = source.grouped(1000)
.runWith(Sink.<List<Integer>>head(), mat);
// the negative element cause the scan stage to be restarted,
// i.e. start from 0 again
// result here will be a Future completed with Success(List(0, 1, 4, 0, 5, 12))
70
Lets say that we use an external service to lookup email addresses and we would like to discard those that cannot
be found.
We start with the tweet stream of authors:
final Source<Author, BoxedUnit> authors = tweets
.filter(t -> t.hashtags().contains(AKKA))
.map(t -> t.author);
If we would not use Resume the default stopping strategy would complete the stream with failure on the first
Future that was completed with Failure.
Next, we simply handle each incoming connection using a Flow which will be used as the processing stage to
handle and emit ByteStrings from and to the TCP Socket. Since one ByteString does not have to necessarily
correspond to exactly one line of text (the client might be sending the line in chunks) we use the delimiter
helper Flow from akka.stream.io.Framing to chunk the inputs up into actual lines of text. The last boolean
argument indicates that we require an explicit line ending even for the last message before the connection is closed.
In this example we simply add exclamation marks to each incoming text message and push it through the flow:
connections.runForeach(connection -> {
System.out.println("New connection from: " + connection.remoteAddress());
final Flow<ByteString, ByteString, BoxedUnit> echo = Flow.of(ByteString.class)
.via(Framing.delimiter(ByteString.fromString("\n"), 256, false))
.map(bytes -> bytes.utf8String())
71
Notice that while most building blocks in Akka Streams are reusable and freely shareable, this is not the case
for the incoming connection Flow, since it directly corresponds to an existing, already accepted connection its
handling can only ever be materialized once.
Closing connections is possible by cancelling the incoming connection Flow from your server logic (e.g. by
connecting its downstream to an CancelledSink and its upstream to a completed Source). It is also possible
to shut down the servers socket by cancelling the connections:Source[IncomingConnection].
We can then test the TCP server by sending data to the TCP Socket using netcat:
$ echo -n "Hello World" | netcat 127.0.0.1 8889
Hello World!!!
The repl flow we use to handle the server interaction first prints the servers response, then awaits on input from
the command line (this blocking call is used here just for the sake of simplicity) and converts it to a ByteString
which is then sent over the wire to the server. Then we simply connect the TCP pipeline to this processing stageat
this point it will be materialized and start processing data once the server responds with an initial message.
A resilient REPL client would be more sophisticated than this, for example it should split out the input reading
into a separate mapAsync step and have a way to let the server write more data than one ByteString chunk at any
given time, these improvements however are left as exercise for the reader.
Avoiding deadlocks and liveness issues in back-pressured cycles
When writing such end-to-end back-pressured systems you may sometimes end up in a situation of a loop, in
which either side is waiting for the other one to start the conversation. One does not need to look far to find
examples of such back-pressure loops. In the two examples shown previously, we always assumed that the side
we are connecting to would start the conversation, which effectively means both sides are back-pressured and can
1.11. Working with streaming IO
72
not get the conversation started. There are multiple ways of dealing with this which are explained in depth in
Graph cycles, liveness and deadlocks, however in client-server scenarios it is often the simplest to make either
side simply send an initial message.
Note: In case of back-pressured cycles (which can occur even between different systems) sometimes you have to
decide which of the sides has start the conversation in order to kick it off. This can be often done by injecting an
initial message from one of the sidesa conversation starter.
To break this back-pressure cycle we need to inject some initial message, a conversation starter. First, we need
to decide which side of the connection should remain passive and which active. Thankfully in most situations
finding the right spot to start the conversation is rather simple, as it often is inherent to the protocol we are trying
to implement using Streams. In chat-like applications, which our examples resemble, it makes sense to make the
Server initiate the conversation by emitting a hello message:
connections.runForeach(connection -> {
// server logic, parses incoming commands
final PushStage<String, String> commandParser = new PushStage<String, String>() {
@Override public SyncDirective onPush(String elem, Context<String> ctx) {
if (elem.equals("BYE"))
return ctx.finish();
else
return ctx.push(elem + "!");
}
};
final String welcomeMsg = "Welcome to: " + connection.localAddress() +
" you are: " + connection.remoteAddress() + "!\n";
final Source<ByteString, BoxedUnit> welcome =
Source.single(ByteString.fromString(welcomeMsg));
final Flow<ByteString, ByteString, BoxedUnit> echoFlow =
Flow.of(ByteString.class)
.via(Framing.delimiter(ByteString.fromString("\n"), 256, false))
.map(bytes -> bytes.utf8String())
.transform(() -> commandParser)
.map(s -> s + "\n")
.map(s -> ByteString.fromString(s));
final Flow<ByteString, ByteString, BoxedUnit> serverLogic =
Flow.factory().create(builder -> {
final UniformFanInShape<ByteString, ByteString> concat =
builder.graph(Concat.create());
final FlowShape<ByteString, ByteString> echo = builder.graph(echoFlow);
builder
.from(welcome).to(concat)
.from(echo).to(concat);
return new Pair<>(echo.inlet(), concat.out());
});
connection.handleWith(serverLogic, mat);
}, mat);
The way we constructed a Flow using a PartialFlowGraph is explained in detail in Constructing Sources,
Sinks and Flows from Partial Graphs, however the basic concepts is rather simple we can encapsulate arbitrarily complex logic within a Flow as long as it exposes the same interface, which means exposing exactly one
UndefinedSink and exactly one UndefinedSource which will be connected to the TCP pipeline. In this
example we use a Concat graph processing stage to inject the initial message, and then continue with handling
all incoming data using the echo handler. You should use this pattern of encapsulating complex logic in Flows and
attaching those to StreamIO in order to implement your custom and possibly sophisticated TCP servers.
73
In this example both client and server may need to close the stream based on a parsed command - BYE in the case
of the server, and q in the case of the client. This is implemented by using a custom PushStage (see Using
PushPullStage) which completes the stream once it encounters such command.
Please note that these processing stages are backed by Actors and by default are configured to run on a preconfigured threadpool-backed dispatcher dedicated for File IO. This is very important as it isolates the blocking
file IO operations from the rest of the ActorSystem allowing each dispatcher to be utilised in the most efficient
way. If you want to configure a custom dispatcher for file IO operations globally, you can do so by changing the
akka.stream.file-io-dispatcher, or for a specific stage by specifying a custom Dispatcher in code,
like this:
SynchronousFileSink.create(file)
.withAttributes(ActorAttributes.dispatcher("custom-file-io-dispatcher"));
1.12.1 Pipelining
Roland uses the two frying pans in an asymmetric fashion. The first pan is only used to fry one side of the pancake
then the half-finished pancake is flipped into the second pan for the finishing fry on the other side. Once the first
frying pan becomes available it gets a new scoop of batter. As an effect, most of the time there are two pancakes
being cooked at the same time, one being cooked on its first side and the second being cooked to completion. This
is how this setup would look like implemented as a stream:
74
The two map stages in sequence (encapsulated in the frying pan flows) will be executed in a pipelined way,
basically doing the same as Roland with his frying pans:
1. A ScoopOfBatter enters fryingPan1
2. fryingPan1 emits a HalfCookedPancake once fryingPan2 becomes available
3. fryingPan2 takes the HalfCookedPancake
4. at this point fryingPan1 already takes the next scoop, without waiting for fryingPan2 to finish
The benefit of pipelining is that it can be applied to any sequence of processing steps that are otherwise not
parallelisable (for example because the result of a processing step depends on all the information from the previous
step). One drawback is that if the processing times of the stages are very different then some of the stages will
not be able to operate at full throughput because they will wait on a previous or subsequent stage most of the
time. In the pancake example frying the second half of the pancake is usually faster than frying the first half,
fryingPan2 will not be able to operate at full capacity 1 .
Stream processing stages have internal buffers to make communication between them more efficient. For more
details about the behavior of these and how to add additional buffers refer to stream-rate-scala.
The benefit of parallelizing is that it is easy to scale. In the pancake example it is easy to add a third frying pan
with Patriks method, but Roland cannot add a third frying pan, since that would require a third processing step,
which is not practically possible in the case of frying pancakes.
1 Rolands reason for this seemingly suboptimal procedure is that he prefers the temperature of the second pan to be slightly lower than the
first in order to achieve a more homogeneous result.
75
One drawback of the example code above that it does not preserve the ordering of pancakes. This might be a
problem if children like to track their own pancakes. In those cases the Balance and Merge stages should be
replaced by strict-round robing balancing and merging stages that put in and take out pancakes in a strict order.
A more detailed example of creating a worker pool can be found in the cookbook: cookbook-balance-scala
The above pattern works well if there are many independent jobs that do not depend on the results of each other,
but the jobs themselves need multiple processing steps where each step builds on the result of the previous one. In
our case individual pancakes do not depend on each other, they can be cooked in parallel, on the other hand it is
not possible to fry both sides of the same pancake at the same time, so the two sides have to be fried in sequence.
It is also possible to organize parallelized stages into pipelines. This would mean employing four chefs:
the first two chefs prepare half-cooked pancakes from batter, in parallel, then putting those on a large enough
flat surface.
the second two chefs take these and fry their other side in their own pans, then they put the pancakes on a
shared plate.
This is again straightforward to implement with the streams API:
Flow<ScoopOfBatter, HalfCookedPancake, BoxedUnit> pancakeChefs1 =
Flow.factory().create(b -> {
final UniformFanInShape<HalfCookedPancake, HalfCookedPancake> mergeHalfCooked =
b.graph(Merge.create(2));
final UniformFanOutShape<ScoopOfBatter, ScoopOfBatter> dispatchBatter =
b.graph(Balance.create(2));
// Two chefs work with one frying pan for each, half-frying the pancakes then putting
// them into a common pool
b.from(dispatchBatter.out(0)).via(fryingPan1).to(mergeHalfCooked.in(0));
b.from(dispatchBatter.out(1)).via(fryingPan1).to(mergeHalfCooked.in(1));
76
This usage pattern is less common but might be usable if a certain step in the pipeline might take wildly different
times to finish different jobs. The reason is that there are more balance-merge steps in this pattern compared to
the parallel pipelines. This pattern rebalances after each step, while the previous pattern only balances at the entry
point of the pipeline. This only matters however if the processing time distribution has a large deviation.
The same strategy can be applied for sources as well. In the next example we have a source that produces an
infinite stream of elements. Such source can be tested by asserting that first arbitrary number of elements hold
some condition. Here the grouped combinator and Sink.head are very useful.
77
When testing a flow we need to attach a source and a sink. As both stream ends are under our control, we can
choose sources that tests various edge cases of the flow and sinks that ease assertions.
final Flow<Integer, Integer, BoxedUnit> flowUnderTest = Flow.of(Integer.class)
.takeWhile(i -> i < 5);
final Future<Integer> future = Source.from(Arrays.asList(1, 2, 3, 4, 5, 6))
.via(flowUnderTest).runWith(Sink.fold(0, (agg, next) -> agg + next), mat);
final Integer result = Await.result(future, Duration.create(1, TimeUnit.SECONDS));
assert(result == 10);
1.13.2 TestKit
Akka Stream offers integration with Actors out of the box. This support can be used for writing stream tests that
use familiar TestProbe from the akka-testkit API.
One of the more straightforward tests would be to materialize stream to a Future and then use pipe pattern to
pipe the result of that future to the probe.
final Source<List<Integer>, BoxedUnit> sourceUnderTest = Source
.from(Arrays.asList(1, 2, 3, 4))
.grouped(2);
final TestProbe probe = new TestProbe(system);
final Future<List<List<Integer>>> future = sourceUnderTest
.grouped(2)
.runWith(Sink.head(), mat);
akka.pattern.Patterns.pipe(future, system.dispatcher()).to(probe.ref());
probe.expectMsg(Duration.create(1, TimeUnit.SECONDS),
Arrays.asList(Arrays.asList(1, 2), Arrays.asList(3, 4))
);
Instead of materializing to a future, we can use a Sink.actorRef that sends all incoming elements to the
given ActorRef. Now we can use assertion methods on TestProbe and expect elements one by one as
they arrive. We can also assert stream completion by expecting for onCompleteMessage which was given to
Sink.actorRef.
final Source<Tick, Cancellable> sourceUnderTest = Source.from(
FiniteDuration.create(0, TimeUnit.MILLISECONDS),
FiniteDuration.create(200, TimeUnit.MILLISECONDS),
Tick.TOCK);
final TestProbe probe = new TestProbe(system);
final Cancellable cancellable = sourceUnderTest
.to(Sink.actorRef(probe.ref(), Tick.COMPLETED)).run(mat);
probe.expectMsg(Duration.create(1, TimeUnit.SECONDS), Tick.TOCK);
probe.expectNoMsg(Duration.create(100, TimeUnit.MILLISECONDS));
probe.expectMsg(Duration.create(1, TimeUnit.SECONDS), Tick.TOCK);
cancellable.cancel();
probe.expectMsg(Duration.create(1, TimeUnit.SECONDS), Tick.COMPLETED);
78
Similarly to Sink.actorRef that provides control over received elements, we can use Source.actorRef
and have full control over elements to be sent.
final Sink<Integer, Future<String>> sinkUnderTest = Flow.of(Integer.class)
.map(i -> i.toString())
.toMat(Sink.fold("", (agg, next) -> agg + next), Keep.right());
final Pair<ActorRef, Future<String>> refAndFuture =
Source.<Integer>actorRef(8, OverflowStrategy.fail())
.toMat(sinkUnderTest, Keep.both())
.run(mat);
final ActorRef ref = refAndFuture.first();
final Future<String> future = refAndFuture.second();
ref.tell(1, ActorRef.noSender());
ref.tell(2, ActorRef.noSender());
ref.tell(3, ActorRef.noSender());
ref.tell(new akka.actor.Status.Success("done"), ActorRef.noSender());
final String result = Await.result(future, Duration.create(1, TimeUnit.SECONDS));
assertEquals(result, "123");
A source returned by TestSource.probe can be used for asserting demand or controlling when stream is
completed or ended with an error.
final Sink<Integer, BoxedUnit> sinkUnderTest = Sink.cancelled();
TestSource.<Integer>probe(system)
.toMat(sinkUnderTest, Keep.left())
.run(mat)
.expectCancellation();
You can also inject exceptions and test sink behaviour on error conditions.
final Sink<Integer, Future<Integer>> sinkUnderTest = Sink.head();
final Pair<TestPublisher.Probe<Integer>, Future<Integer>> probeAndFuture =
TestSource.<Integer>probe(system)
.toMat(sinkUnderTest, Keep.both())
.run(mat);
79
Test source and sink can be used together in combination when testing flows.
final Flow<Integer, Integer, BoxedUnit> flowUnderTest = Flow.of(Integer.class)
.mapAsyncUnordered(2, sleep -> akka.pattern.Patterns.after(
Duration.create(10, TimeUnit.MILLISECONDS),
system.scheduler(),
system.dispatcher(),
Futures.successful(sleep)
));
final Pair<TestPublisher.Probe<Integer>, TestSubscriber.Probe<Integer>> pubAndSub =
TestSource.<Integer>probe(system)
.via(flowUnderTest)
.toMat(TestSink.<Integer>probe(system), Keep.both())
.run(mat);
final TestPublisher.Probe<Integer> pub = pubAndSub.first();
final TestSubscriber.Probe<Integer> sub = pubAndSub.second();
sub.request(3);
pub.sendNext(3);
pub.sendNext(2);
pub.sendNext(1);
sub.expectNextUnordered(1, 2, 3);
pub.sendError(new Exception("Power surge in the linear subroutine C-47!"));
final Throwable ex = sub.expectError();
assert(ex.getMessage().contains("C-47"));
80
grouped
the specified number of elements has
been accumulated or upstream
completed
scan the function scanning the element
returns a new element
fold upstream completes
drop the specified number of elements has
been dropped already
take
Backpressures when
downstream backpressures
Completes when
upstream completes
downstream backpressures or
there are still available elements
from the previously calculated
collection
the given predicate returns true
for the element and downstream
backpressures
the partial function is defined for
the element and downstream
backpressures
a group has been assembled and
downstream backpressures
upstream completes
and all remaining
elements has been
emitted
upstream completes
downstream backpressures
upstream completes
downstream backpressures
the specified number of elements
has been dropped and
downstream backpressures
downstream backpressures
upstream completes
upstream completes
downstream backpressures
dropWhile
recover
upstream completes
upstream completes
Backpressures when
the number of futures
reaches the configured
parallelism and the
downstream
backpressures
the number of futures
reaches the configured
parallelism and the
downstream
backpressures
Completes when
upstream completes and all futures has been
completed and all elements has been emitted 2
a Future fails, the stream also fails (unless a different supervision strategy is applied)
81
Emits when
Backpressures when
takeWithin
downstream backpressures
dropWithin
groupedWithin
downstream backpressures
the group has been assembled (the
duration elapsed) and downstream
backpressures
Completes
when
upstream
completes or
timer fires
upstream
completes
upstream
completes
expand
buffer
(Backpressure)
buffer
(DropX)
buffer
(Fail)
Emits when
downstream stops backpressuring
and there is a conflated element
available
downstream stops backpressuring
downstream stops backpressuring
and there is a pending element in
the buffer
downstream stops backpressuring
and there is a pending element in
the buffer
downstream stops backpressuring
and there is a pending element in
the buffer
Backpressures when
never 3
Completes when
upstream completes
downstream backpressures
buffer is full
upstream completes
upstream completes and
buffered elements has
been drained
upstream completes and
buffered elements has
been drained
upstream completes and
buffered elements has
been drained
never 2
82
Backpressures when
downstream backpressures or
substream backpressures
Completes when
prefix elements has been
consumed and substream has
been consumed
upstream completes 4
upstream completes 3
upstream completes 3
Emits when
merge
concat
Backpressures
when
downstream
backpressures
downstream
backpressures
downstream
backpressures
downstream
backpressures
downstream
backpressures
Completes
when
all upstreams
complete
all upstreams
complete
any upstream
completes
any upstream
completes
all upstreams
complete
the end of stream it is not possible to know whether new substreams will be needed or not
83
Stage
Emits when
unzip
unzipWith
broadcast
balance
Backpressures
when
any of the outputs
backpressures
any of the outputs
backpressures
any of the outputs
backpressures
all of the outputs
backpressure
Completes
when
upstream
completes
upstream
completes
upstream
completes
upstream
completes
Another approach to logging is to use log() operation which allows configuring logging for elements flowing
through the stream as well as completion and erroring.
// customise log levels
mySource.log("before-map")
.withAttributes(Attributes.createLogLevels(onElement, onFinish, onFailure))
.map(i -> analyse(i));
// or provide custom logging adapter
final LoggingAdapter adapter = Logging.getLogger(system, "customLogger");
mySource.log("custom", adapter);
84
85
}
}
@Override
public TerminationDirective onUpstreamFinish(Context<ByteString> ctx) {
// If the stream is finished, we need to emit the last element in the onPull block.
// It is not allowed to directly emit elements from a termination block
// (onUpstreamFinish or onUpstreamFailure)
return ctx.absorbTermination();
}
};
}
final Source<ByteString, BoxedUnit> digest = data
.transform(() -> digestCalculator("SHA-256"));
Implementing reduce-by-key
Situation: Given a stream of elements, we want to calculate some aggregated value on different subgroups of the
elements.
The hello world of reduce-by-key style operations is wordcount which we demonstrate below. Given a stream
of words we first create a new stream wordStreams that groups the words according to the i -> i function,
i.e. now we have a stream of streams, where every substream will serve identical words.
To count the words, we need to process the stream of streams (the actual groups containing identical words). By
mapping over the groups and using fold (remember that fold automatically materializes and runs the stream it
is used on) we get a stream with elements of Future[String,Int]. Now all we need is to flatten this stream,
which can be achieved by calling mapAsync with i -> i identity function.
There is one tricky issue to be noted here. The careful reader probably noticed that we put a buffer between the
mapAsync() operation that flattens the stream of futures and the actual stream of futures. The reason for this is
that the substreams produced by groupBy() can only complete when the original upstream source completes.
This means that mapAsync() cannot pull for more substreams because it still waits on folding futures to finish,
but these futures never finish if the additional group streams are not consumed. This typical deadlock situation
is resolved by this buffer which either able to contain all the group streams (which ensures that they are already
running and folding) or fails with an explicit failure instead of a silent deadlock.
final int MAXIMUM_DISTINCT_WORDS = 1000;
// split the words into separate streams first
final Source<Pair<String, Source<String, BoxedUnit>>, BoxedUnit> wordStreams = words
.groupBy(i -> i);
// add counting logic to the streams
Source<Future<Pair<String, Integer>>, BoxedUnit> countedWords = wordStreams.map(pair -> {
final String word = pair.first();
final Source<String, BoxedUnit> wordStream = pair.second();
86
return wordStream.runFold(
new Pair<>(word, 0),
(acc, w) -> new Pair<>(word, acc.second() + 1), mat);
});
// get a stream of word counts
final Source<Pair<String, Integer>, BoxedUnit> counts = countedWords
.buffer(MAXIMUM_DISTINCT_WORDS, OverflowStrategy.fail())
.mapAsync(4, i -> i);
Note: Please note that the reduce-by-key version we discussed above is sequential, in other words it is NOT a
parallelization pattern like mapReduce and similar frameworks.
87
first, using a function topicMapper that gives a list of topics (groups) a message belongs to, we transform
our stream of Message to a stream of (Message, Topic) where for each topic the message belongs
to a separate pair will be emitted. This is achieved by using mapConcat
Then we take this new stream of message topic pairs (containing a separate pair for each topic a given
message belongs to) and feed it into groupBy, using the topic as the group key.
final Function<Message, List<Topic>> topicMapper = m -> extractTopics(m);
final Source<Pair<Message, Topic>, BoxedUnit> messageAndTopic = elems
.mapConcat((Message msg) -> {
List<Topic> topicsForMessage = topicMapper.apply(msg);
// Create a (Msg, Topic) pair for each of the topics
// the message belongs to
return topicsForMessage
.stream()
.map(topic -> new Pair<Message, Topic>(msg, topic))
.collect(toList());
});
Source<Pair<Topic, Source<Message, BoxedUnit>>, BoxedUnit> multiGroups = messageAndTopic
.groupBy(pair -> pair.second())
.map(pair -> {
Topic topic = pair.first();
Source<Pair<Message, Topic>, BoxedUnit> topicStream = pair.second();
// chopping of the topic from the (Message, Topic) pairs
return new Pair<Topic, Source<Message, BoxedUnit>>(
topic,
topicStream.<Message> map(p -> p.first()));
});
88
Alternatively, instead of using a Zip, and then using map to get the first element of the pairs, we can avoid creating
the pairs in the first place by using ZipWith which takes a two argument function to produce the output element.
If this function would return a pair of the two argument it would be exactly the behavior of Zip so ZipWith is a
generalization of zipping.
final RunnableGraph<Pair<TestPublisher.Probe<Trigger>, TestSubscriber.Probe<Message>>> g =
FlowGraph.factory().closed(triggerSource, messageSink,
(p, s) -> new Pair<TestPublisher.Probe<Trigger>, TestSubscriber.Probe<Message>>(p, s),
(builder, source, sink) -> {
final FanInShape2<Message, Trigger, Message> zipWith =
builder.graph(ZipWith.create((msg, trigger) -> msg));
builder.from(elements).to(zipWith.in0());
builder.from(source).to(zipWith.in1());
builder.from(zipWith.out()).to(sink);
});
89
This can be solved by using the most versatile rate-transforming operation, conflate. Conflate can be thought
as a special fold operation that collapses multiple upstream elements into one aggregate element if needed to
keep the speed of the upstream unaffected by the downstream.
When the upstream is faster, the fold process of the conflate starts. This folding needs a zero element, which
is given by a seed function that takes the current element and produces a zero for the folding process. In our case
this is i -> i so our folding state starts form the message itself. The folder function is also special: given the
aggregate value (the last message) and the new element (the freshest element) our aggregate state becomes simply
the freshest element. This choice of functions results in a simple dropping operation.
final Flow<Message, Message, BoxedUnit> droppyStream =
Flow.of(Message.class).conflate(i -> i, (lastMessage, newMessage) -> newMessage);
Dropping broadcast
Situation: The default Broadcast graph element is properly backpressured, but that means that a slow downstream consumer can hold back the other downstream consumers resulting in lowered throughput. In other words
the rate of Broadcast is the rate of its slowest downstream consumer. In certain cases it is desirable to allow
faster consumers to progress independently of their slower siblings by dropping elements if necessary.
One solution to this problem is to append a buffer element in front of all of the downstream consumers defining
a dropping strategy instead of the default Backpressure. This allows small temporary rate differences between
the different consumers (the buffer smooths out small rate variances), but also allows faster consumers to progress
by dropping from the buffer of the slow consumers if necessary.
// Makes a sink drop elements if too slow
public <T> Sink<T, Future<BoxedUnit>> droppySink(Sink<T, Future<BoxedUnit>> sink, int size) {
return Flow.<T> create()
.buffer(size, OverflowStrategy.dropHead())
.toMat(sink, Keep.right());
}
FlowGraph.factory().closed(builder -> {
final int outputCount = 3;
final UniformFanOutShape<Integer, Integer> bcast =
builder.graph(Broadcast.create(outputCount));
builder.from(builder.source(myData)).to(bcast);
builder.from(bcast).to(builder.sink(droppySink(mySink1, 10)));
builder.from(bcast).to(builder.sink(droppySink(mySink2, 10)));
builder.from(bcast).to(builder.sink(droppySink(mySink3, 10)));
});
90
While it is relatively simple, the drawback of the first version is that it needs an arbitrary initial element which is
not always possible to provide. Hence, we create a second version where the downstream might need to wait in
one single case: if the very first element is not yet available.
We introduce a boolean variable waitingFirstValue to denote whether the first element has been provided
or not (alternatively an Optional can be used for currentValue or if the element type is a subclass of
Object a null can be used with the same purpose). In the downstream onPull() handler the difference from the
previous version is that we call holdDownstream() if the first element is not yet available and thus blocking
our downstream. The upstream onPush() handler sets waitingFirstValue to false, and after checking if
holdDownstream() has been called it either releaves the upstream producer, or both the upstream producer
and downstream consumer by calling pushAndPull()
class HoldWithWait<T> extends DetachedStage<T, T> {
private T currentValue = null;
private boolean waitingFirstValue = true;
@Override
public UpstreamDirective onPush(T elem, DetachedContext<T> ctx) {
currentValue = elem;
waitingFirstValue = false;
if (ctx.isHoldingDownstream()) {
return ctx.pushAndPull(currentValue);
} else {
return ctx.pull();
91
}
}
@Override
public DownstreamDirective onPull(DetachedContext<T> ctx) {
if (waitingFirstValue) {
return ctx.holdDownstream();
} else {
return ctx.push(currentValue);
}
}
}
92
this.tokenRefreshAmount = tokenRefreshAmount;
this.permitTokens = maxAvailableTokens;
this.replenishTimer = system.scheduler().schedule(
this.tokenRefreshPeriod,
this.tokenRefreshPeriod,
self(),
REPLENISH_TOKENS,
context().system().dispatcher(),
self());
receive(open());
}
PartialFunction<Object, BoxedUnit> open() {
return ReceiveBuilder
.match(ReplenishTokens.class, rt -> {
permitTokens = Math.min(permitTokens + tokenRefreshAmount, maxAvailableTokens);
})
.match(WantToPass.class, wtp -> {
permitTokens -= 1;
sender().tell(MAY_PASS, self());
if (permitTokens == 0) {
context().become(closed());
}
}).build();
}
PartialFunction<Object, BoxedUnit> closed() {
return ReceiveBuilder
.match(ReplenishTokens.class, rt -> {
permitTokens = Math.min(permitTokens + tokenRefreshAmount, maxAvailableTokens);
releaseWaiting();
})
.match(WantToPass.class, wtp -> {
waitQueue.add(sender());
})
.build();
}
private void releaseWaiting() {
final List<ActorRef> toBeReleased = new ArrayList<>(permitTokens);
for (int i = 0; i < permitTokens && i < waitQueue.size(); i++) {
toBeReleased.add(waitQueue.remove(i));
}
permitTokens -= toBeReleased.size();
toBeReleased.stream().forEach(ref -> ref.tell(MAY_PASS, self()));
if (permitTokens > 0) {
context().become(open());
}
}
@Override
public void postStop() {
replenishTimer.cancel();
waitQueue.stream().forEach(ref -> {
ref.tell(new Status.Failure(new IllegalStateException("limiter stopped")), self());
});
}
}
To create a Flow that uses this global limiter actor we use the mapAsync function with the combination of the
93
ask pattern. We also define a timeout, so if a reply is not received during the configured maximum wait period
the returned future from ask will fail, which will fail the corresponding stream as well.
public <T> Flow<T, T, BoxedUnit> limitGlobal(ActorRef limiter, FiniteDuration maxAllowedWait) {
final int parallelism = 4;
final Flow<T, T, BoxedUnit> f = Flow.create();
return f.mapAsync(parallelism, element -> {
final Timeout triggerTimeout = new Timeout(maxAllowedWait);
final Future<Object> limiterTriggerFuture =
Patterns.ask(limiter, Limiter.WANT_TO_PASS, triggerTimeout);
return limiterTriggerFuture.map(new Mapper<Object, T>() {
@Override
public T apply(Object parameter) {
return element;
}
}, system.dispatcher());
});
}
Note: The global actor used for limiting introduces a global bottleneck. You might want to assign a dedicated
dispatcher for this actor.
94
if (buffer.isEmpty()) {
return ctx.pull();
} else {
Tuple2<ByteString, ByteString> split = buffer.splitAt(chunkSize);
ByteString emit = split._1();
buffer = split._2();
return ctx.push(emit);
}
}
}
Source<ByteString, BoxedUnit> chunksStream =
rawBytes.transform(() -> new Chunker(CHUNK_LIMIT));
95
All this recipe needs is the MergePreferred element which is a version of a merge that is not fair. In other
words, whenever the merge can choose because multiple upstream producers have elements to produce it will
always choose the preferred upstream effectively giving it an absolute priority.
Flow<Tick, ByteString, BoxedUnit> tickToKeepAlivePacket =
Flow.of(Tick.class).conflate(tick -> keepAliveMessage, (msg, newTick) -> msg);
final Tuple3<
TestPublisher.Probe<Tick>,
TestPublisher.Probe<ByteString>,
TestSubscriber.Probe<ByteString>
> ticksDataRes =
FlowGraph.factory().closed3(ticks, data, sink,
(t, d, s) -> new Tuple3(t, d, s),
(builder, t, d, s) -> {
final int secondaryPorts = 1;
final MergePreferredShape<ByteString> unfairMerge =
builder.graph(MergePreferred.create(secondaryPorts));
// If data is available then no keepalive is injected
builder.from(d).to(unfairMerge.preferred());
builder.from(t).via(tickToKeepAlivePacket).to(unfairMerge.in(0));
builder.from(unfairMerge.out()).to(s);
}
).run(mat);
1.16 Configuration
#####################################
# Akka Stream Reference Config File #
#####################################
akka {
stream {
# Default flow materializer settings
materializer {
# Initial size of buffers used in stream elements
initial-input-buffer-size = 4
# Maximum size of buffers used in stream elements
max-input-buffer-size = 16
# Fully qualified config path which holds the dispatcher configuration
# to be used by FlowMaterialiser when creating Actors.
# When this value is left empty, the default-dispatcher will be used.
dispatcher = ""
# Cleanup leaked publishers and subscribers when they are not used within a given
# deadline
subscription-timeout {
# when the subscription timeout is reached one of the following strategies on
# the "stale" publisher:
# cancel - cancel it (via `onError` or subscribing to the publisher and
#
`cancel()`ing the subscription right away
# warn
- log a warning statement about the stale element (then drop the
#
reference to it)
# noop
- do nothing (not recommended)
mode = cancel
# time after which a subscriber / publisher is considered stale and eligible
# for cancelation (see `akka.stream.subscription-timeout.mode`)
1.16. Configuration
96
timeout = 5s
}
# Enable additional troubleshooting logging at DEBUG log level
debug-logging = off
# Maximum number of elements emitted in batch if downstream signals large demand
output-burst-limit = 1000
}
# Fully qualified config path which holds the dispatcher configuration
# to be used by FlowMaterialiser when creating Actors for IO operations,
# such as FileSource, FileSink and others.
file-io-dispatcher = "akka.stream.default-file-io-dispatcher"
default-file-io-dispatcher {
type = "Dispatcher"
executor = "thread-pool-executor"
throughput = 1
thread-pool-executor {
core-pool-size-min = 2
core-pool-size-factor = 2.0
core-pool-size-max = 16
}
}
}
}
1.16. Configuration
97
CHAPTER
TWO
AKKA HTTP
The Akka HTTP modules implement a full server- and client-side HTTP stack on top of akka-actor and akkastream. Its not a web-framework but rather a more general toolkit for providing and consuming HTTP-based
services. While interaction with a browser is of course also in scope it is not the primary focus of Akka HTTP.
Akka HTTP follows a rather open design and many times offers several different API levels for doing the same
thing. You get to pick the API level of abstraction that is most suitable for your application. This means that, if
you have trouble achieving something using a high-level API, theres a good chance that you can get it done with
a low-level API, which offers more flexibility but might require you to write more application code.
Akka HTTP is structured into several modules:
akka-http-core A complete, mostly low-level, server- and client-side implementation of HTTP (incl. WebSockets). Includes a model of all things HTTP.
akka-http Higher-level functionality, like (un)marshalling, (de)compression as well as a powerful DSL for defining HTTP-based APIs on the server-side
akka-http-testkit A test harness and set of utilities for verifying server-side service implementations
akka-http-jackson Predefined glue-code for (de)serializing custom types from/to JSON with jackson
2.1 Configuration
Just like any other Akka module Akka HTTP is configured via Typesafe Config. Usually this means that you
provide an application.conf which contains all the application-specific settings that differ from the default
ones provided by the reference configuration files from the individual Akka modules.
These are the relevant default configuration values for the Akka HTTP modules.
2.1.1 akka-http-core
########################################
# akka-http-core Reference Config File #
########################################
# This is the reference config file that contains all the default settings.
# Make your edits/overrides in your application.conf.
akka.http {
server {
# The default value of the `Server` header to produce if no
# explicit `Server`-header was included in a response.
# If this value is the empty string and no header was included in
# the request, no `Server` header will be rendered at all.
server-header = akka-http/${akka.version}
98
2.1. Configuration
99
socket-options {
so-receive-buffer-size = undefined
so-send-buffer-size = undefined
so-reuse-address = undefined
so-traffic-class = undefined
tcp-keep-alive = undefined
tcp-oob-inline = undefined
tcp-no-delay = undefined
}
# Modify to tweak parsing settings on the server-side only.
parsing = ${akka.http.parsing}
}
client {
# The default value of the `User-Agent` header to produce if no
# explicit `User-Agent`-header was included in a request.
# If this value is the empty string and no header was included in
# the request, no `User-Agent` header will be rendered at all.
user-agent-header = akka-http/${akka.version}
# The time period within which the TCP connecting process must be completed.
connecting-timeout = 10s
# The time after which an idle connection will be automatically closed.
# Set to `infinite` to completely disable idle timeouts.
idle-timeout = 60 s
# The initial size of the buffer to render the request headers in.
# Can be used for fine-tuning request rendering performance but probably
# doesn't have to be fiddled with in most applications.
request-header-size-hint = 512
2.1. Configuration
100
# The maximum number of open requests accepted into the pool across all
# materializations of any of its client flows.
# Protects against (accidentally) overloading a single pool with too many client flow material
# Note that with N concurrent materializations the max number of open request in the pool
# will never exceed N * max-connections * pipelining-limit.
# Must be a power of 2 and > 0!
max-open-requests = 32
# The maximum number of requests that are dispatched to the target host in
# batch-mode across a single connection (HTTP pipelining).
# A setting of 1 disables HTTP pipelining, since only one request per
# connection can be "in flight" at any time.
# Set to higher values to enable HTTP pipelining.
# This value must be > 0.
# (Note that, independently of this setting, pipelining will never be done
# on a connection that still has a non-idempotent request in flight.
# See http://tools.ietf.org/html/rfc7230#section-6.3.2 for more info.)
pipelining-limit = 1
# The time after which an idle connection pool (without pending requests)
# will automatically terminate itself. Set to `infinite` to completely disable idle timeouts.
idle-timeout = 30 s
# Modify to tweak client settings for host connection pools only.
client = ${akka.http.client}
}
# The (default) configuration of the HTTP message parser for the server and the client.
# IMPORTANT: These settings (i.e. children of `akka.http.parsing`) can't be directly
# overridden in `application.conf` to change the parser settings for client and server
# at the same time. Instead, override the concrete settings beneath
# `akka.http.server.parsing` and `akka.http.client.parsing`
# where these settings are copied to.
parsing {
# The limits for the various parts of the HTTP message parser.
max-uri-length
= 2k
max-method-length
= 16
max-response-reason-length = 64
max-header-name-length
= 64
max-header-value-length
= 8k
max-header-count
= 64
max-content-length
= 8m
max-chunk-ext-length
= 256
max-chunk-size
= 1m
# Sets the strictness mode for parsing request target URIs.
# The following values are defined:
#
2.1. Configuration
101
2.1.2 akka-http
#######################################
# akka-http Reference Config File #
#######################################
# This is the reference config file that contains all the default settings.
# Make your edits/overrides in your application.conf.
akka.http.routing {
# Enables/disables the returning of more detailed error messages to the
# client in the error response
# Should be disabled for browser-facing APIs due to the risk of XSS attacks
2.1. Configuration
102
# The maximum size between two requested ranges. Ranges with less space in between will be coale
#
# When multiple ranges are requested, a server may coalesce any of the ranges that overlap or th
# by a gap that is smaller than the overhead of sending multiple parts, regardless of the order
# corresponding byte-range-spec appeared in the received Range header field. Since the typical o
# parts of a multipart/byteranges payload is around 80 bytes, depending on the selected represen
# media type and the chosen boundary parameter length, it can be less efficient to transfer many
# disjoint parts than it is to transfer the entire selected representation.
range-coalescing-threshold = 80
# The maximum number of allowed ranges per request.
# Requests with more ranges will be rejected due to DOS suspicion.
range-count-limit = 16
# The maximum number of bytes per ByteString a decoding directive will produce
# for an entity data stream.
decode-max-bytes-per-chunk = 1m
# Fully qualified config path which holds the dispatcher configuration
# to be used by FlowMaterialiser when creating Actors for IO operations.
file-io-dispatcher = ${akka.stream.file-io-dispatcher}
}
The other Akka HTTP modules do not offer any configuration via Typesafe Config.
2.2.1 Overview
Since akka-http-core provides the central HTTP data structures you will find the following import in quite a few
places around the code base (and probably your own code as well):
import akka.http.javadsl.model.*;
import akka.http.javadsl.model.headers.*;
103
For example:
Defined HttpMethod instances are defined as static fields of the HttpMethods class.
Defined HttpCharset instances are defined as static fields of the HttpCharsets class.
Defined HttpEncoding instances are defined as static fields of the HttpEncodings class.
Defined HttpProtocol instances are defined as static fields of the HttpProtocols class.
Defined MediaType instances are defined as static fields of the MediaTypes class.
Defined StatusCode instances are defined as static fields of the StatusCodes class.
2.2.2 HttpRequest
HttpRequest and HttpResponse are the basic immutable classes representing HTTP messages.
An HttpRequest consists of
a method (GET, POST, etc.)
a URI
a seq of headers
an entity (body data)
a protocol
Here are some examples how to construct an HttpRequest:
// construct a simple GET request to `homeUri`
Uri homeUri = Uri.create("/home");
HttpRequest request1 = HttpRequest.create().withUri(homeUri);
// construct simple GET request to "/index" using helper methods
HttpRequest request2 = HttpRequest.GET("/index");
// construct simple POST request containing entity
ByteString data = ByteString.fromString("abc");
HttpRequest postRequest1 = HttpRequest.POST("/receive").withEntity(data);
// customize every detail of HTTP request
//import HttpProtocols._
//import MediaTypes._
Authorization authorization = Authorization.basic("user", "pass");
HttpRequest complexRequest =
HttpRequest.PUT("/user")
.withEntity(HttpEntities.create(MediaTypes.TEXT_PLAIN.toContentType(), "abc"))
.addHeader(authorization)
.withProtocol(HttpProtocols.HTTP_1_0);
In its basic form HttpRequest.create creates an empty default GET request without headers which can then
be transformed using one of the withX methods, addHeader, or addHeaders. Each of those will create a new
immutable instance, so instances can be shared freely. There exist some overloads for HttpRequest.create
that simplify creating requests for common cases. Also, to aid readability, there are predefined alternatives for
create named after HTTP methods to create a request with a given method and uri directly.
2.2.3 HttpResponse
An HttpResponse consists of
a status code
a list of headers
104
In addition to the simple HttpEntities.create methods which create an entity from a fixed String or
ByteString as shown here the Akka HTTP model defines a number of subclasses of HttpEntity which
allow body data to be specified as a stream of bytes. All of these types can be created using the method on
HttpEntites.
2.2.4 HttpEntity
An HttpEntity carries the data bytes of a message together with its Content-Type and, if known, its ContentLength. In Akka HTTP there are five different kinds of entities which model the various ways that message content
can be received or sent:
HttpEntityStrict The simplest entity, which is used when all the entity are already available in memory. It wraps
a plain ByteString and represents a standard, unchunked entity with a known Content-Length.
HttpEntityDefault The general, unchunked HTTP/1.1 message entity. It has a known length and presents its
data as a Source[ByteString] which can be only materialized once. It is an error if the provided
source doesnt produce exactly as many bytes as specified. The distinction of HttpEntityStrict and
HttpEntityDefault is an API-only one. One the wire, both kinds of entities look the same.
HttpEntityChunked The model for HTTP/1.1 chunked content (i.e. sent with Transfer-Encoding:
chunked).
The content length is unknown and the individual chunks are presented as a
Source[ChunkStreamPart]. A ChunkStreamPart is either a non-empty chunk or the empty
last chunk containing optional trailer headers. The stream consists of zero or more non-empty chunks parts
and can be terminated by an optional last chunk.
HttpEntityCloseDelimited An unchunked entity of unknown length that is implicitly delimited by closing the
connection (Connection: close). Content data is presented as a Source[ByteString]. Since
the connection must be closed after sending an entity of this type it can only be used on the server-side for
sending a response. Also, the main purpose of CloseDelimited entities is compatibility with HTTP/1.0
peers, which do not support chunked transfer encoding. If you are building a new application and are
not constrained by legacy requirements you shouldnt rely on CloseDelimited entities, since implicit
terminate-by-connection-close is not a robust way of signaling response end, especially in the presence of
proxies. Additionally this type of entity prevents connection reuse which can seriously degrade performance. Use HttpEntityChunked instead!
HttpEntityIndefiniteLength A streaming entity of unspecified length for use in a Multipart.BodyPart.
105
Entity types HttpEntityStrict, HttpEntityDefault, and HttpEntityChunked are a subtype of RequestEntity which allows to use them for requests and responses.
In contrast,
HttpEntityCloseDelimited can only be used for responses.
Streaming entity types (i.e. all but HttpEntityStrict) cannot be shared or serialized. To create a strict,
sharable copy of an entity or message use HttpEntity.toStrict or HttpMessage.toStrict which
returns a Future of the object with the body data collected into a ByteString.
The class HttpEntities contains static methods to create entities from common types easily.
You can use the isX methods of HttpEntity to find out of which subclass an entity is if you want to
provide special handling for each of the subtypes. However, in many cases a recipient of an HttpEntity doesnt
care about of which subtype an entity is (and how data is transported exactly on the HTTP layer). Therefore, the
general method HttpEntity.getDataBytes() is provided which returns a Source<ByteString, ?>
that allows access to the data of an entity regardless of its concrete subtype.
Note:
When to use which subtype?
Use HttpEntityStrict if the amount of data is small and already available in memory (e.g. as
a String or ByteString)
Use HttpEntityDefault if the data is generated by a streaming data source and the size of the
data is known
Use HttpEntityChunked for an entity of unknown length
Use HttpEntityCloseDelimited for a response as a legacy alternative to
HttpEntityChunked if the client doesnt support chunked transfer encoding. Otherwise
use HttpEntityChunked!
In a Multipart.Bodypart use HttpEntityIndefiniteLength for content of unknown
length.
Caution: When you receive a non-strict message from a connection then additional data is only read from
the network when you request it by consuming the entity data stream. This means that, if you dont consume
the entity stream then the connection will effectively be stalled. In particular, no subsequent message (request
or response) will be read from the connection as the entity of the current message blocks the stream. Therefore you must make sure that you always consume the entity data, even in the case that you are not actually
interested in it!
106
return Option.none();
}
107
108
Arguments to the Http().bind method specify the interface and port to bind to and register interest in handling
incoming HTTP connections. Additionally, the method also allows for the definition of socket options as well as
a larger number of settings for configuring the server according to your needs.
The result of the bind method is a Source<Http.IncomingConnection> which must be drained by the
application in order to accept incoming connections. The actual binding is not performed before this source is
materialized as part of a processing pipeline. In case the bind fails (e.g. because the port is already busy) the
materialized stream will immediately be terminated with a respective exception. The binding is released (i.e. the
underlying socket unbound) when the subscriber of the incoming connection source has cancelled its subscription.
Alternatively one can use the unbind() method of the Http.ServerBinding instance that is created as part
of the connection sources materialization process. The Http.ServerBinding also provides a way to get a
hold of the actual local address of the bound socket, which is useful for example when binding to port zero (and
thus letting the OS pick an available port).
for
@Override
public HttpResponse apply(HttpRequest request) throws Exception {
Uri uri = request.getUri();
if (request.method() == HttpMethods.GET) {
if (uri.path().equals("/"))
return
HttpResponse.create()
.withEntity(MediaTypes.TEXT_HTML.toContentType(),
"<html><body>Hello world!</body></html>");
else if (uri.path().equals("/hello")) {
String name = Util.getOrElse(uri.parameter("name"), "Mister X");
return
HttpResponse.create()
109
connection.handleWithSyncHandler(requestHandler, materializer);
// this is equivalent to
//connection.handleWith(Flow.of(HttpRequest.class).map(requestHandler), materializ
}
})).run(materializer);
In this example, a request is handled by transforming the request stream with a function
Function<HttpRequest, HttpResponse> using handleWithSyncHandler (or equivalently,
Akka Streams map operator). Depending on the use case many other ways of providing a request handler are
conceivable using Akka Streams combinators.
If the application provides a Flow it is also the responsibility of the application to generate exactly one response
for every request and that the ordering of responses matches the ordering of the associated requests (which is
relevant if HTTP pipelining is enabled where processing of multiple incoming requests may overlap). When
relying on handleWithSyncHandler or handleWithAsyncHandler, or the map or mapAsync stream
operators, this requirement will be automatically fulfilled.
See Routing DSL Overview for a more convenient high-level DSL to create request handlers.
Streaming Request/Response Entities
Streaming of HTTP message entities is supported through subclasses of HttpEntity. The application needs
to be able to deal with streamed entities when receiving a request as well as, in many cases, when constructing
responses. See HttpEntity for a description of the alternatives.
Closing a connection
The HTTP connection will be closed when the handling Flow cancels its upstream subscription or the peer closes
the connection. An often times more convenient alternative is to explicitly add a Connection: close header
to an HttpResponse. This response will then be the last one on the connection and the server will actively close
the connection when it has been sent out.
110
Option<Collection<String>> enabledProtocols,
Option<ClientAuth> clientAuth,
Option<SSLParameters> sslParameters)
On the server-side the bind, and bindAndHandleXXX methods of the akka.http.javadsl.Http extension define an optional httpsContext parameter, which can receive the HTTPS configuration in the form of an
HttpsContext instance. If defined encryption is enabled on all accepted connections. Otherwise it is disabled (which is the default).
2.4.1 Model
The basic unit of data exchange in the WebSocket protocol is a message. A message can either be binary message,
i.e. a sequence of octets or a text message, i.e. a sequence of unicode code points.
In the data model the two kinds of messages, binary and text messages, are represented by the two classes
BinaryMessage and TextMessage deriving from a common superclass Message. The superclass
Message contains isText and isBinary methods to distinguish a message and asBinaryMessage and
asTextMessage methods to cast a message.
The subclasses BinaryMessage and TextMessage contain methods to access the data. Take the API of
TextMessage as an example (BinaryMessage is very similar with String replaced by ByteString):
abstract class TextMessage extends Message {
/**
* Returns a source of the text message data.
*/
def getStreamedText: Source[String, _]
/** Is this message a strict one? */
def isStrict: Boolean
/**
* Returns the strict message text if this message is strict, throws otherwise.
*/
def getStrictText: String
}
The data of a message is provided as a stream because WebSocket messages do not have a predefined size and
could (in theory) be infinitely long. However, only one message can be open per direction of the WebSocket
connection, so that many application level protocols will want to make use of the delineation into (small) messages
to transport single application-level data units like one event or one chat message.
Many messages are small enough to be sent or received in one go. As an opportunity for optimization, the
model provides the notion of a strict message to represent cases where a whole message was received in one
111
go. If TextMessage.isStrict returns true, the complete data is already available and can be accessed with
TextMessage.getStrictText (analogously for BinaryMessage).
When receiving data from the network connection the WebSocket implementation tries to create a strict message
whenever possible, i.e. when the complete data was received in one chunk. However, the actual chunking of
messages over a network connection and through the various streaming abstraction layers is not deterministic
from the perspective of the application. Therefore, application code must be able to handle both streaming and
strict messages and not expect certain messages to be strict. (Particularly, note that tests against localhost will
behave differently than tests against remote peers where data is received over a physical network connection.)
For sending data, you can use the static TextMessage.create(String) method to create a strict message if the complete message has already been assembled.
Otherwise, use
TextMessage.create(Source<String, ?>) to create a streaming message from an Akka Stream
source.
112
The handling code itself will be the same as with using the low-level API.
See the full routing example.
113
@Override
public HttpResponse apply(HttpRequest request) throws Exception {
Uri uri = request.getUri();
if (request.method() == HttpMethods.GET) {
if (uri.path().equals("/"))
return
HttpResponse.create()
.withEntity(MediaTypes.TEXT_HTML.toContentType(),
"<html><body>Hello world!</body></html>");
else if (uri.path().equals("/hello")) {
String name = Util.getOrElse(uri.parameter("name"), "Mister X");
return
HttpResponse.create()
.withEntity("Hello " + name + "!");
}
else if (uri.path().equals("/ping"))
return HttpResponse.create().withEntity("PONG!");
else
return NOT_FOUND;
}
else return NOT_FOUND;
}
};
While itd be perfectly possible to define a complete REST API service purely by inspecting the incoming
HttpRequest this approach becomes somewhat unwieldy for larger services due to the amount of syntax ceremony required. Also, it doesnt help in keeping your service definition as DRY as you might like.
As an alternative Akka HTTP provides a flexible DSL for expressing your service behavior as a structure of
composable elements (called Directives) in a concise and readable way. Directives are assembled into a so called
route structure which, at its top-level, can be used to create a handler Flow (or, alternatively, an async handler
function) that can be directly supplied to a bind call.
Heres the complete example rewritten using the composable high-level API:
import
import
import
import
akka.actor.ActorSystem;
akka.http.javadsl.model.MediaTypes;
akka.http.javadsl.server.*;
akka.http.javadsl.server.values.Parameters;
import java.io.IOException;
public class HighLevelServerExample extends HttpApp {
114
Heart of the high-level architecture is the route tree. It is a big expression of type Route that is evaluated only
once during startup time of your service. It completely describes how your service should react to any request.
The type Route is the basic building block of the route tree. It defines if and a how a request should be handled.
Routes are composed to form the route tree in the following two ways.
A route can be wrapped by a Directive which adds some behavioral aspect to its wrapped inner route.
path("ping") is such a directive that implements a path filter, i.e. it only passes control to its inner route
when the unmatched path matches "ping". Directives can be more versatile than this: A directive can also transform the request before passing it into its inner route or transform a response that comes out of its inner route. Its
115
a general and powerful abstraction that allows to package any kind of HTTP processing into well-defined blocks
that can be freely combined. akka-http defines a library of predefined directives and routes for all the various
aspects of dealing with HTTP requests and responses.
Read more about Directives.
The other way of composition is defining a list of Route alternatives. Alternative routes are tried one after the
other until one route accepts the request and provides a response. Otherwise, a route can also reject a request,
in which case further alternatives are explored. Alternatives are specified by passing a list of routes either to
Directive.route() as in pathSingleSlash().route() or to directives that directly take a variable
number of inner routes as argument like get() here.
Read more about Routes.
Another important building block is a RequestVal<T>. It represents a value that can be extracted from a
request (like the URI parameter Parameters.stringValue("name") in the example) and which is then
interpreted as a value of type T. Examples of HTTP aspects represented by a RequestVal are URI parameters,
HTTP form fields, details of the request like headers, URI, the entity, or authentication data.
Read more about Request values.
The actual application-defined processing of a request is defined with a Handler instance or by specifying a
handling method with reflection. A handler can receive the value of any request values and is converted into a
Route by using one of the BasicDirectives.handleWith directives.
Read more about Handlers.
Requests or responses often contain data that needs to be interpreted or rendered in some way. Akka-http provides
the abstraction of Marshaller and Unmarshaller that define how domain model objects map to HTTP
entities.
Read more about Marshalling & Unmarshalling.
akka-http contains a testkit that simplifies testing routes. It allows to run test-requests against (sub-)routes quickly
without running them over the network and helps with writing assertions on HTTP response properties.
Read more about Route Testkit.
2.5.2 Routes
A Route itself is a function that operates on a RequestContext and returns a RouteResult. The
RequestContext is a data structure that contains the current request and auxiliary data like the so far unmatched path of the request URI that gets passed through the route structure. It also contains the current
ExecutionContext and akka.stream.Materializer, so that these dont have to be passed around
manually.
RequestContext
The RequestContext achieves two goals: it allows access to request data and it is a factory for creating a
RouteResult. A user-defined handler (see Handlers) that is usually used at the leaf position of the route tree
receives a RequestContext, evaluates its content and then returns a result generated by one of the methods of
the context.
RouteResult
The RouteResult is an opaque structure that represents possible results of evaluating a route. A
RouteResult can only be created by using one of the methods of the RequestContext. A result can
either be a response, if it was generated by one of the completeX methods, it can be an eventual result, i.e. a
Future<RouteResult if completeWith was used or a rejection that contains information about why the
route could not handle the request.
116
Composing Routes
Routes are composed to form the route tree in two principle ways.
A route can be wrapped by a Directive which adds some behavioral aspect to its wrapped inner route. Such
an aspect can be
filtering requests to decide which requests will get to the inner route
transforming the request before passing it to the inner route
transforming the response (or more generally the route result) received from the inner route
applying side-effects around inner route processing, such as measuring the time taken to run the inner route
akka-http defines a library of predefined Directives and routes for all the various aspects of dealing with HTTP
requests and responses.
The other way of composition is defining a list of Route alternatives. Alternative routes are tried one after the
other until one route accepts the request and provides a response. Otherwise, a route can also reject a request,
in which case further alternatives are explored. Alternatives are specified by passing a list of routes either to
Directive.route() as in path("xyz").route() or to directives that directly take a variable number
of inner routes as argument like get().
The Routing Tree
Essentially, when you combine routes via nesting and alternative, you build a routing structure that forms a tree.
When a request comes in it is injected into this tree at the root and flows down through all the branches in a
depth-first manner until either some node completes it or it is fully rejected.
Consider this schematic example:
val route =
a.route(
b.route(
c.route(
... // route 1
),
d.route(
... // route 2
),
... // route 3
),
e.route(
... // route 4
)
)
2.5.3 Directives
A directive is a wrapper for a route or a list of alternative routes that adds one or more of the following functionality
to its nested route(s):
117
it filters the request and lets only matching requests pass (e.g. the get directive lets only GET-requests pass)
it modifies the request or the RequestContext (e.g. the path directives filters on the unmatched path
and then passes an updated RequestContext unmatched path)
it modifies the response coming out of the nested route
akka-http provides a set of predefined directives for various tasks. You can access them by either extending from
akka.http.javadsl.server.AllDirectives or by importing them statically with import static
akka.http.javadsl.server.Directives.*;.
These classes of directives are currently defined:
BasicDirectives Contains methods to create routes that complete with a static values or allow specifying Handlers
to process a request.
CacheConditionDirectives Contains a single directive conditional that wraps its inner route with support
for Conditional Requests as defined by RFC 7234.
CodingDirectives Contains directives to decode compressed requests and encode responses.
CookieDirectives Contains a single directive setCookie to aid adding a cookie to a response.
ExecutionDirectives Contains directives to deal with exceptions that occurred during routing.
FileAndResourceDirectives Contains directives to serve resources from files on the file system or from the classpath.
HostDirectives Contains directives to filter on the Host header of the incoming request.
MethodDirectives Contains directives to filter on the HTTP method of the incoming request.
MiscDirectives Contains directives that validate a request by user-defined logic.
PathDirectives Contains directives to match and filter on the URI path of the incoming request.
RangeDirectives Contains a single directive withRangeSupport that adds support for retrieving partial responses.
SchemeDirectives Contains a single directive scheme to filter requests based on the URI scheme (http vs. https).
WebsocketDirectives Contains directives to support answering Websocket requests.
PathDirectives
Path directives are the most basic building blocks for routing requests depending on the URI path.
When a request (or rather the respective RequestContext instance) enters the route structure it has an unmatched path that is identical to the request.uri.path. As it descends the routing tree and passes through
one or more pathPrefix or path directives the unmatched path progressively gets eaten into from the left
until, in most cases, it eventually has been consumed completely.
The two main directives are path and pathPrefix. The path directive tries to match the complete remaining
unmatched path against the specified path matchers, the pathPrefix directive only matches a prefix and
passes the remaining unmatched path to nested directives. Both directives automatically match a slash from the
beginning, so that matching slashes in a hierarchy of nested pathPrefix and path directives is usually not
needed.
Path directives take a variable amount of arguments. Each argument must be a PathMatcher or a string (which
is automatically converted to a path matcher using PathMatchers.segment). In the case of path and
pathPrefix, if multiple arguments are supplied, a slash is assumed between any of the supplied path matchers.
The rawPathX variants of those directives on the other side do no such preprocessing, so that slashes must be
matched manually.
118
Path Matchers
119
120
2.5.5 Handlers
Handlers implement the actual application-defined logic for a certain trace in the routing tree. Most of the leaves
of the routing tree will be routes created from handlers. Creating a Route from a handler is achieved using the
BasicDirectives.handleWith overloads. They come in several forms:
with a single Handler argument and a variable number of RequestVal<?> (may be 0)
with a number n of RequestVal<T1> arguments and a HandlerN<T1, .., TN> argument
with a Class<?> and/or instance and a method name String argument and a variable number of
RequestVal<?> (may be 0) arguments
Simple Handler
In its simplest form a Handler is a SAM class that defines application behavior by inspecting the
RequestContext and returning a RouteResult:
trait Handler extends akka.japi.function.Function[RequestContext, RouteResult] {
override def apply(ctx: RequestContext): RouteResult
}
121
Such a handler inspects the RequestContext it receives and uses the RequestContexts methods to create
a response:
Handler handler = new Handler() {
@Override
public RouteResult apply(RequestContext ctx) {
return ctx.complete("This was a " + ctx.request().method().value()
" request to "+ctx.request().getUri());
}
};
The handler can include any kind of logic but must return a RouteResult in the end which can only be created
by using one of the RequestContext methods.
A handler instance can be used once or several times as shown in the full example:
class TestHandler extends akka.http.javadsl.server.AllDirectives {
Handler handler = new Handler() {
@Override
public RouteResult apply(RequestContext ctx) {
return ctx.complete("This was a " + ctx.request().method().value()
" request to "+ctx.request().getUri());
}
};
Route createRoute() {
return route(
get(
handleWith(handler)
),
post(
path("abc").route(
handleWith(handler)
)
)
);
}
}
// actual testing code
TestRoute r = testRoute(new TestHandler().createRoute());
r.run(HttpRequest.GET("/test"))
.assertStatusCode(200)
.assertEntity("This was a GET request to /test");
r.run(HttpRequest.POST("/test"))
.assertStatusCode(404);
r.run(HttpRequest.POST("/abc"))
.assertStatusCode(200)
.assertEntity("This was a POST request to /abc");
122
The handler here implements multiplication of two integers. However, it doesnt need to specify where these
parameters come from. In handleWith, as many request values of the matching type have to be specified as the
handler needs. This can be seen in the full example:
class TestHandler extends akka.http.javadsl.server.AllDirectives {
RequestVal<Integer> xParam = Parameters.intValue("x");
RequestVal<Integer> yParam = Parameters.intValue("y");
RequestVal<Integer> xSegment = PathMatchers.intValue();
RequestVal<Integer> ySegment = PathMatchers.intValue();
final Handler2<Integer, Integer> multiply =
new Handler2<Integer, Integer>() {
@Override
public RouteResult apply(RequestContext ctx, Integer x, Integer y) {
int result = x * y;
return ctx.complete("x * y = " + result);
}
};
final Route multiplyXAndYParam = handleWith2(xParam, yParam, multiply);
Route createRoute() {
return route(
get(
pathPrefix("calculator").route(
path("multiply").route(
multiplyXAndYParam
),
path("path-multiply", xSegment, ySegment).route(
handleWith2(xSegment, ySegment, multiply)
)
)
)
);
}
}
// actual testing code
TestRoute r = testRoute(new TestHandler().createRoute());
r.run(HttpRequest.GET("/calculator/multiply?x=12&y=42"))
.assertStatusCode(200)
.assertEntity("x * y = 504");
r.run(HttpRequest.GET("/calculator/path-multiply/23/5"))
.assertStatusCode(200)
.assertEntity("x * y = 115");
Here, the handler is again being reused. First, in creating a route that expects URI parameters x and y. This route
is then used in the route structure. And second, the handler is used with another set of RequestVal in the route
structure, this time representing segments from the URI path.
Handlers in Java 8
Handlers are in fact simply classes which extend akka.japi.function.FunctionN in order to make
reasoning about the number of handled arguments easier. For example, a Handler1[String] is sim2.5. High-level Server-Side API
123
ply a Function2[RequestContext, String, RouteResult]. You can think of handlers as hotdogs, where each T type represents a sausage, put between the buns which are RequestContext and
RouteResult.
In Java 8 handlers can be provided as function literals or method references. The example from before then looks
like this:
class TestHandler extends akka.http.javadsl.server.AllDirectives {
final RequestVal<Integer> xParam = Parameters.intValue("x");
final RequestVal<Integer> yParam = Parameters.intValue("y");
final Handler2<Integer, Integer> multiply =
(ctx, x, y) -> ctx.complete("x * y = " + (x * y));
final Route multiplyXAndYParam = handleWith2(xParam, yParam, multiply);
RouteResult subtract(RequestContext ctx, int x, int y) {
return ctx.complete("x - y = " + (x - y));
}
Route createRoute() {
return route(
get(
pathPrefix("calculator").route(
path("multiply").route(
// use Handler explicitly
multiplyXAndYParam
),
path("add").route(
// create Handler as lambda expression
handleWith2(xParam, yParam,
(ctx, x, y) -> ctx.complete("x + y = " + (x + y)))
),
path("subtract").route(
// create handler by lifting method
handleWith2(xParam, yParam, this::subtract)
)
)
)
);
}
}
// actual testing code
TestRoute r = testRoute(new TestHandler().createRoute());
r.run(HttpRequest.GET("/calculator/multiply?x=12&y=42"))
.assertStatusCode(200)
.assertEntity("x * y = 504");
r.run(HttpRequest.GET("/calculator/add?x=12&y=42"))
.assertStatusCode(200)
.assertEntity("x + y = 54");
r.run(HttpRequest.GET("/calculator/subtract?x=42&y=12"))
.assertStatusCode(200)
.assertEntity("x - y = 30");
Note: The reason the handleWith## methods include the number of handled values is because otherwise (if
overloading would be used, for all 22 methods) error messages generated by javac end up being very long and
not readable, i.e. if one type of a handler does not match the given values, all possible candidates would be printed
in the error message (22 of them), instead of just the one arity-matching method, pointing out that the type does
not match.
124
We opted for better error messages as we feel this is more helpful when developing applications,
instead of having one overloaded method which looks nice when everything works, but procudes
hard to read error messages if something does not match up.
There are alternative overloads for handleReflectively that take a Class instead of an object instance to
refer to static methods. The referenced method must be publicly accessible.
2.5. High-level Server-Side API
125
Here the calculator runs the actual calculation in the background and only eventually returns the result. The
HTTP service should provide a front-end to that service without having to block while waiting for the results. As
explained above this can be done in two ways.
First, you can use handleWithAsyncN to be able to return a Future<RouteResult>:
// would probably be injected or passed at construction time in real code
CalculatorService calculatorService = new CalculatorService();
public Future<RouteResult> multiplyAsync(final RequestContext ctx, int x, int y) {
Future<Integer> result = calculatorService.multiply(x, y, ctx.executionContext());
Mapper<Integer, RouteResult> func = new Mapper<Integer, RouteResult>() {
@Override
public RouteResult apply(Integer product) {
return ctx.complete("x * y = " + product);
}
}; // cannot be written as lambda, unfortunately
return result.map(func, ctx.executionContext());
}
Route multiplyAsyncRoute =
path("multiply").route(
handleWithAsync2(xParam, yParam, this::multiplyAsync)
);
The handler invokes the service and then maps the calculation result to a RouteResult using Future.map
and returns the resulting Future<RouteResult>.
Otherwise, you can also still use handleWithN and use RequestContext.completeWith to convert a
Future<RouteResult> into a RouteResult as shown here:
public RouteResult addAsync(final RequestContext ctx, int x, int y) {
Future<Integer> result = calculatorService.add(x, y, ctx.executionContext());
Mapper<Integer, RouteResult> func = new Mapper<Integer, RouteResult>() {
@Override
public RouteResult apply(Integer sum) {
return ctx.complete("x + y = " + sum);
}
}; // cannot be written as lambda, unfortunately
return ctx.completeWith(result.map(func, ctx.executionContext()));
}
Route addAsyncRoute =
path("add").route(
126
Using this style, you can decide in your handler if you want to return a direct synchronous result or if you need to
defer completion.
Both alternatives will not block and show the same runtime behavior.
Heres the complete example:
class CalculatorService {
public Future<Integer> multiply(final int x, final int y, ExecutionContext ec) {
return akka.dispatch.Futures.future(() -> x * y, ec);
}
public Future<Integer> add(final int x, final int y, ExecutionContext ec) {
return akka.dispatch.Futures.future(() -> x + y, ec);
}
}
class TestHandler extends akka.http.javadsl.server.AllDirectives {
RequestVal<Integer> xParam = Parameters.intValue("x");
RequestVal<Integer> yParam = Parameters.intValue("y");
// would probably be injected or passed at construction time in real code
CalculatorService calculatorService = new CalculatorService();
public Future<RouteResult> multiplyAsync(final RequestContext ctx, int x, int y) {
Future<Integer> result = calculatorService.multiply(x, y, ctx.executionContext());
Mapper<Integer, RouteResult> func = new Mapper<Integer, RouteResult>() {
@Override
public RouteResult apply(Integer product) {
return ctx.complete("x * y = " + product);
}
}; // cannot be written as lambda, unfortunately
return result.map(func, ctx.executionContext());
}
Route multiplyAsyncRoute =
path("multiply").route(
handleWithAsync2(xParam, yParam, this::multiplyAsync)
);
public RouteResult addAsync(final RequestContext ctx, int x, int y) {
Future<Integer> result = calculatorService.add(x, y, ctx.executionContext());
Mapper<Integer, RouteResult> func = new Mapper<Integer, RouteResult>() {
@Override
public RouteResult apply(Integer sum) {
return ctx.complete("x + y = " + sum);
}
}; // cannot be written as lambda, unfortunately
return ctx.completeWith(result.map(func, ctx.executionContext()));
}
Route addAsyncRoute =
path("add").route(
handleWith2(xParam, yParam, this::addAsync)
);
Route createRoute() {
return route(
get(
pathPrefix("calculator").route(
multiplyAsyncRoute,
addAsyncRoute
)
)
127
);
}
}
// testing code
TestRoute r = testRoute(new TestHandler().createRoute());
r.run(HttpRequest.GET("/calculator/multiply?x=12&y=42"))
.assertStatusCode(200)
.assertEntity("x * y = 504");
r.run(HttpRequest.GET("/calculator/add?x=23&y=5"))
.assertStatusCode(200)
.assertEntity("x + y = 28");
128
The app extends from HttpApp which brings all of the directives into scope. Method createRoute needs to
be implemented to return the complete route of the app.
Heres how you would test that service:
import
import
import
import
import
akka.http.javadsl.model.HttpRequest;
akka.http.javadsl.model.StatusCodes;
akka.http.javadsl.testkit.JUnitRouteTest;
akka.http.javadsl.testkit.TestRoute;
org.junit.Test;
129
.assertStatusCode(StatusCodes.NOT_FOUND) // 404
.assertEntity("Request is missing required query parameter 'y'");
// test responses to potential errors
appRoute.run(HttpRequest.GET("/calculator/add?x=3.2&y=three"))
.assertStatusCode(StatusCodes.BAD_REQUEST)
.assertEntity("The query parameter 'y' was malformed:\n" +
"'three' is not a valid 64-bit floating point value");
}
}
Its, of course, possible to use any other means of writing assertions by inspecting the properties the response manually. As written above, TestResponse.entity and TestResponse.response return strict versions of
the entity data.
Supporting Custom Test Frameworks
Adding support for a custom test framework is achieved by creating new superclass analogous to JUnitRouteTest for writing tests with the custom test framwork deriving from
akka.http.javadsl.testkit.RouteTest and implementing its abstract methods. This will allow users of the test framework to use testRoute and to write assertions using the assertion methods defined
on TestResponse.
130
Apart from the host name and port the Http.get(system).outgoingConnection(...) method also
allows you to specify socket options and a number of configuration settings for the connection.
Note that no connection is attempted until the returned flow is actually materialized! If the flow is materialized
several times then several independent connections will be opened (one per materialization). If the connection attempt fails, for whatever reason, the materialized flow will be immediately terminated with a respective exception.
131
Request-Response Cycle
Once the connection flow has been materialized it is ready to consume HttpRequest instances from the source
it is attached to. Each request is sent across the connection and incoming responses dispatched to the downstream
pipeline. Of course and as always, back-pressure is adequately maintained across all parts of the connection. This
means that, if the downstream pipeline consuming the HTTP responses is slow, the request source will eventually
be slowed down in sending requests.
Any errors occurring on the underlying connection are surfaced as exceptions terminating the response stream
(and canceling the request source).
Note that, if the source produces subsequent requests before the prior responses have arrived, these requests will
be pipelined across the connection, which is something that is not supported by all HTTP servers. Also, if the
server closes the connection before responses to all requests have been received this will result in the response
stream being terminated with a truncation error.
Closing Connections
Akka HTTP actively closes an established connection upon reception of a response containing Connection:
close header. The connection can also be closed by the server.
An application can actively trigger the closing of the connection by completing the request stream. In this case
the underlying TCP connection will be closed when the last pending response has been received.
Timeouts
Currently Akka HTTP doesnt implement client-side request timeout checking itself as this functionality can be
regarded as a more general purpose streaming infrastructure feature. However, akka-stream should soon provide
such a feature.
Stand-Alone HTTP Layer Usage
// TODO
132
This means it consumes tuples of type (HttpRequest, T) and produces tuples of type
(Try[HttpResponse], T) which might appear more complicated than necessary on first sight. The
reason why the pool API includes objects of custom type T on both ends lies in the fact that the underlying
transport usually comprises more than a single connection and as such the pool client flow often generates
responses in an order that doesnt directly match the consumed requests. We could have built the pool logic in
a way that reorders responses according to their requests before dispatching them to the application, but this
would have meant that a single slow response could block the delivery of potentially many responses that would
otherwise be ready for consumption by the application.
In order to prevent unnecessary head-of-line blocking the pool client-flow is allowed to dispatch responses as
soon as they arrive, independently of the request order. Of course this means that there needs to be another way
to associate a response with its respective request. The way that this is done is by allowing the application to pass
along a custom context object with the request, which is then passed back to the application with the respective
response. This context object of type T is completely opaque to Akka HTTP, i.e. you can pick whatever works
best for your particular application scenario.
Connection Allocation Logic
This is how Akka HTTP allocates incoming requests to the available connection slots:
1. If there is a connection alive and currently idle then schedule the request across this connection.
2. If no connection is idle and there is still an unconnected slot then establish a new connection.
3. If all connections are already established and loaded with other requests then pick the connection with
the least open requests (< the configured pipelining-limit) that only has requests with idempotent
methods scheduled to it, if there is one.
133
4. Otherwise apply back-pressure to the request source, i.e. stop accepting new requests.
For more information about scheduling more than one request at a time across a single connection see this
wikipedia entry on HTTP pipelining.
Retrying a Request
If the max-retries pool config setting is greater than zero the pool retries idempotent requests for which a
response could not be successfully retrieved. Idempotent requests are those whose HTTP method is defined to be
idempotent by the HTTP spec, which are all the ones currently modelled by Akka HTTP except for the POST,
PATCH and CONNECT methods.
When a response could not be received for a certain request there are essentially three possible error scenarios:
1. The request got lost on the way to the server.
2. The server experiences a problem while processing the request.
3. The response from the server got lost on the way back.
Since the host connector cannot know which one of these possible reasons caused the problem and therefore
PATCH and POST requests could have already triggered a non-idempotent action on the server these requests
cannot be retried.
In these cases, as well as when all retries have not yielded a proper response, the pool produces a failed Try (i.e.
a scala.util.Failure) together with the custom request context.
Pool Shutdown
Completing a pool client flow will simply detach the flow from the pool. The connection pool itself will continue to run as it may be serving other client flows concurrently or in the future. Only after the configured
idle-timeout for the pool has expired will Akka HTTP automatically terminate the pool and free all its resources.
If a new client flow is requested with Http.get(system).cachedHostConnectionPool(...) or if
an already existing client flow is re-materialized the respective pool is automatically and transparently restarted.
In addition to the automatic shutdown via the configured idle timeouts its also possible to trigger the immediate
shutdown of a specific pool by calling shutdown() on the HostConnectionPool instance that the pool
client flow materializes into. This shutdown() call produces a Future[Unit] which is fulfilled when the
pool termination has been completed.
Its also possible to trigger the immediate termination of all connection pools in the ActorSystem at the
same time by calling Http.get(system).shutdownAllConnectionPools(). This call too produces
a Future[Unit] which is fulfilled when all pools have terminated.
Example
final ActorSystem system = ActorSystem.create();
final ActorMaterializer materializer = ActorMaterializer.create(system);
// construct a pool client flow with context type `Int`
// TODO these Tuple2 will be changed to akka.japi.Pair
final Flow<
Tuple2<HttpRequest, Integer>,
Tuple2<Try<HttpResponse>, Integer>,
HostConnectionPool> poolClientFlow =
Http.get(system).<Integer>cachedHostConnectionPool("akka.io", 80, materializer);
// construct a pool client flow with context type `Int`
final Future<Tuple2<Try<HttpResponse>, Integer>> responseFuture =
134
Source
.single(Pair.create(HttpRequest.create("/"), 42).toScala())
.via(poolClientFlow)
.runWith(Sink.<Tuple2<Try<HttpResponse>, Integer>>head(), materializer);
135
In
addition
to
the
outgoingConnection,
newHostConnectionPool
and
cachedHostConnectionPool
methods
the
akka.http.javadsl.Http
extension
also
defines
outgoingConnectionTls,
newHostConnectionPoolTls
and
cachedHostConnectionPoolTls. These methods work identically to their counterparts without the
-Tls suffix, with the exception that all connections will always be encrypted.
The singleRequest and superPool methods determine the encryption state via the scheme of the incoming
request, i.e. requests to an https URI will be encrypted, while requests to an http URI wont.
The encryption configuration for all HTTPS connections, i.e. the HttpsContext is determined according to
the following logic:
1. If the optional httpsContext method parameter is defined it contains the configuration to be used (and
thus takes precedence over any potentially set default client-side HttpsContext).
2. If the optional httpsContext method parameter is undefined (which is the default) the default client-side
HttpsContext is used, which can be set via the setDefaultClientHttpsContext on the Http
extension.
3. If no default client-side HttpsContext has been set via the setDefaultClientHttpsContext on
the Http extension the default system configuration is used.
Usually the process is, if the default system TLS configuration is not good enough for your
applications needs, that you configure a custom HttpsContext instance and set it via
Http.get(system).setDefaultClientHttpsContext.
Afterwards you simply use
outgoingConnectionTls, newHostConnectionPoolTls, cachedHostConnectionPoolTls,
superPool or singleRequest without a specific httpsContext argument, which causes encrypted
connections to rely on the configured default client-side HttpsContext.
136