Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Spring Cloud Data Flow Samples

Download as pdf or txt
Download as pdf or txt
You are on page 1of 144

11/22/2018 Spring Cloud Data Flow Samples

Spring Cloud Data Flow Samples


Sabby Anandan
David Turanski
Glenn Renfro
Eric Bottard
Mark Pollack
Chris Schaefer
Christian Tzolov

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 1/144
11/22/2018 Spring Cloud Data Flow Samples

Table of Contents
1. Overview
2. Java DSL
2.1. Deploying a stream programmaticaly
3. Streaming
3.1. HTTP to Cassandra Demo
3.1.1. Prerequisites
3.1.2. Using the Local Server
Additional Prerequisites
Building and Running the Demo
3.1.3. Using the Cloud Foundry Server
Additional Prerequisites
Running the Demo
3.1.4. Summary
3.2. HTTP to MySQL Demo
3.2.1. Prerequisites
3.2.2. Using the Local Server
Additional Prerequisites
Building and Running the Demo
3.2.3. Using the Cloud Foundry Server
Additional Prerequisites
Building and Running the Demo
3.2.4. Summary
3.3. HTTP to Gem re Demo
3.3.1. Prerequisites
3.3.2. Using the Local Server
Additional Prerequisites
Building and Running the Demo
Using the Cloud Foundry Server
3.3.3. Summary
3.4. Gem re CQ to Log Demo
3.4.1. Prerequisites
3.4.2. Using the Local Server
Additional Prerequisites
Building and Running the Demo

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 2/144
11/22/2018 Spring Cloud Data Flow Samples

3.4.3. Using the Cloud Foundry Server


Additional Prerequisites
Building and Running the Demo
3.4.4. Summary
3.5. Gem re to Log Demo
3.5.1. Prerequisites
3.5.2. Using the Local Server
Additional Prerequisites
Building and Running the Demo
3.5.3. Using the Cloud Foundry Server
Additional Prerequisites
Building and Running the Demo
3.5.4. Summary
3.6. Custom Spring Cloud Stream Processor
3.6.1. Prerequisites
3.6.2. Creating the Custom Stream App
3.6.3. Deploying the App to Spring Cloud Data Flow
3.6.4. Summary
4. Task / Batch
4.1. Batch Job on Cloud Foundry
4.1.1. Prerequisites
4.1.2. Building and Running the Demo
4.1.3. Summary
4.2. Batch File Ingest
4.2.1. Prerequisites
4.2.2. Batch File Ingest Demo Overview
4.2.3. Building and Running the Demo
4.2.4. Summary
5. Stream Launching Batch Job
5.1. Batch File Ingest - SFTP Demo
5.1.1. Prerequisites
5.1.2. Using the Local Server
Additional Prerequisites
Building and Running the Demo
5.1.3. Using the Cloud Foundry Server
Additional Prerequisites

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 3/144
11/22/2018 Spring Cloud Data Flow Samples

Con guring the SCDF server


Running the Demo
5.1.4. Limiting Concurrent Task Executions
Con guring the SCDF server
Running the demo
5.1.5. Avoid Duplicate Processing
5.1.6. Summary
6. Analytics
6.1. Twitter Analytics
6.1.1. Prerequisites
6.1.2. Building and Running the Demo
6.1.3. Summary
7. Data Science
7.1. Species Prediction
7.1.1. Prerequisites
7.1.2. Building and Running the Demo
7.1.3. Summary
8. Functions
8.1. Functions in Spring Cloud Data Flow
8.1.1. Prerequisites
8.1.2. Building and Running the Demo
8.1.3. Summary
9. Micrometer
9.1. SCDF metrics with In uxDB and Grafana
9.1.1. Prerequisites
9.1.2. Building and Running the Demo
9.1.3. Summary
9.2. SCDF metrics with Prometheus and Grafana
9.2.1. Prerequisites
9.2.2. Building and Running the Demo
9.2.3. Summary

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 4/144
11/22/2018 Spring Cloud Data Flow Samples

Version 1.0.0.BUILD-SNAPSHOT

© 2012-2018 Pivotal Software, Inc.

Copies of this document may be made for your own use and for distribution to others, provided that
you do not charge any fee for such copies and further provided that each copy contains this
Copyright Notice, whether distributed in print or electronically.

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 5/144
11/22/2018 Spring Cloud Data Flow Samples

1. Overview
This guide contains samples and demonstrations of how to build data pipelines with Spring Cloud
Data Flow (https://cloud.spring.io/spring-cloud-dataflow/).

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 6/144
11/22/2018 Spring Cloud Data Flow Samples

2. Java DSL
2.1. Deploying a stream programmaticaly
This sample shows the two usage styles of the Java DSL to create and deploy a stream. You should
look in the source code
(https://github.com/spring-cloud/spring-cloud-dataflow-samples/tree/master/batch/javadsl/src/main) to get a
feel for the different styles.

1) Build the sample application

BASH
./mvnw clean package

With no command line options, the application will deploy the stream http --
server.port=9900 | splitter --expression=payload.split(' ') | log using the URI
localhost:9393 to connect to the Data Flow server. There is also a command line option --
style whose value can be either definition or fluent . This options picks which JavaDSL
style will execute. Both are identical in terms of behavior. The spring-cloud-dataflow-rest-
client project provides auto-configuration for DataFlowOperations and StreamBuilder

The properties in DataFlowClientProperties


(https://github.com/spring-cloud/spring-cloud-dataflow/blob/master/spring-cloud-dataflow-rest-
client/src/main/java/org/springframework/cloud/dataflow/rest/client/config/DataFlowClientProperties.java)
can be used to configure the connection to the Data Flow server. The common property to start
using is spring.cloud.dataflow.client.uri

JAVA
@Autowired
private DataFlowOperations dataFlowOperations;

@Autowired
private StreamBuilder builder;

You can use those beans to build streams as well as work directly with `DataFlowOperations"
REST client.

The definition style has code of the style

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 7/144
11/22/2018 Spring Cloud Data Flow Samples

JAVA
Stream woodchuck = builder
.name("woodchuck")
.definition("http --server.port=9900 | splitter --expression=payload.split(' ') | lo
.create()
.deploy(deploymentProperties);

while the fluent style has code of the style

JAVA
Stream woodchuck = builder.name("woodchuck")
.source(source)
.processor(processor)
.sink(sink)
.create()
.deploy(deploymentProperties);

where source , processor , and sink variables were defined as @Bean`s of the type
`StreamApplication

JAVA
@Bean
public StreamApplication source() {
return new StreamApplication("http").addProperty("server.port", 9900);
}

Another useful class is the DeploymentPropertiesBuilder which aids in the creation of the
Map of properties required to deploy stream applications.

JAVA
private Map<String, String> createDeploymentProperties() {
DeploymentPropertiesBuilder propertiesBuilder = new DeploymentPropertiesBuilder();
propertiesBuilder.memory("log", 512);
propertiesBuilder.count("log",2);
propertiesBuilder.put("app.splitter.producer.partitionKeyExpression", "payload");
return propertiesBuilder.build();
}

2) Run a local Data Flow Server and run the sample application. This sample demonstrates the
use of the local Data Flow Server, but you can pass in the option --uri to point to another Data
Flow server instance that is running elsewhere.

BASH
$ java -jar target/scdfdsl-0.0.1-SNAPSHOT.jar

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 8/144
11/22/2018 Spring Cloud Data Flow Samples

You will then see the following output.

BASH
Deploying stream.
Wating for deployment of stream.
Wating for deployment of stream.
Wating for deployment of stream.
Wating for deployment of stream.
Wating for deployment of stream.
Letting the stream run for 2 minutes.

To verify that the application has been deployed successfully, will tail the logs of one of the log
sinks and post some data to the http source. You can find the location for the logs of one of the log
sink applications by looking in the Data Flow server’s log file.

3) Post some data to the server

curl http://localhost:9900 -H "Content-Type:text/plain" -X POST -d "how much wood would


a woodchuck chuck if a woodchuck could chuck wood"

5) Verify the output Tailing the log file of the first instance

BASH
cd /tmp/spring-cloud-dataflow-4323595028663837160/woodchuck-1511390696355/woodchuck.log
tail -f stdout_0.log

BASH
2017-11-22 18:04:08.631 INFO 26652 --- [r.woodchuck-0-1] log-sink : how
2017-11-22 18:04:08.632 INFO 26652 --- [r.woodchuck-0-1] log-sink : chuck
2017-11-22 18:04:08.634 INFO 26652 --- [r.woodchuck-0-1] log-sink : chuck

Tailing the log file of the second instance

BASH
cd /tmp/spring-cloud-dataflow-4323595028663837160/woodchuck-1511390696355/woodchuck.log
tail -f stdout_1.log

You should see the output

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 9/144
11/22/2018 Spring Cloud Data Flow Samples

BASH
$ tail -f stdout_1.log
2017-11-22 18:04:08.636 INFO 26655 --- [r.woodchuck-1-1] log-sink : much
2017-11-22 18:04:08.638 INFO 26655 --- [r.woodchuck-1-1] log-sink : wood
2017-11-22 18:04:08.639 INFO 26655 --- [r.woodchuck-1-1] log-sink : would
2017-11-22 18:04:08.640 INFO 26655 --- [r.woodchuck-1-1] log-sink : a
2017-11-22 18:04:08.641 INFO 26655 --- [r.woodchuck-1-1] log-sink : woodchuck
2017-11-22 18:04:08.642 INFO 26655 --- [r.woodchuck-1-1] log-sink : if
2017-11-22 18:04:08.644 INFO 26655 --- [r.woodchuck-1-1] log-sink : a
2017-11-22 18:04:08.645 INFO 26655 --- [r.woodchuck-1-1] log-sink : woodchuck
2017-11-22 18:04:08.646 INFO 26655 --- [r.woodchuck-1-1] log-sink : could
2017-11-22 18:04:08.647 INFO 26655 --- [r.woodchuck-1-1] log-sink : wood

Note that the partitioning is done based on the hash of the java.lang.String object.

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 10/144
11/22/2018 Spring Cloud Data Flow Samples

3. Streaming
3.1. HTTP to Cassandra Demo
In this demonstration, you will learn how to build a data pipeline using Spring Cloud Data Flow
(http://cloud.spring.io/spring-cloud-dataflow/) to consume data from an HTTP endpoint and write the
payload to a Cassandra database.

We will take you through the steps to configure and Spring Cloud Data Flow server in either a
local (https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-started/) or
Cloud Foundry
(https://docs.spring.io/spring-cloud-dataflow-server-cloudfoundry/docs/current/reference/htmlsingle/#getting-
started)
environment.

3.1.1. Prerequisites
A Running Data Flow Shell

The Spring Cloud Data Flow Shell is available for download


(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-started-deploying-
spring-cloud-dataflow)
or you can build (https://github.com/spring-cloud/spring-cloud-dataflow) it yourself.

the Spring Cloud Data Flow Shell and Local server implementation are in the
same repository and are both built by running ./mvnw install from the project
 root directory. If you have already run the build, use the jar in spring-cloud-
dataflow-shell/target

To run the Shell open a new terminal session:

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 11/144
11/22/2018 Spring Cloud Data Flow Samples

$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR>
$ java -jar spring-cloud-dataflow-shell-<VERSION>.jar
____ ____ _ __
/ ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| |
\___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` |
___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| |
|____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_|
____ |_| _ __|___/ __________
| _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \
| | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \
| |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / /
|____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/

Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
dataflow:>

The Spring Cloud Data Flow Shell is a Spring Boot application that connects to the
Data Flow Server’s REST API and supports a DSL that simplifies the process of

 defining a stream or task and managing its lifecycle. Most of these samples use
the shell. If you prefer, you can use the Data Flow UI localhost:9393/dashboard,
(or wherever it the server is hosted) to perform equivalent operations.

3.1.2. Using the Local Server

Additional Prerequisites
A running local Data Flow Server

The Local Data Flow Server is Spring Boot application available for download
(http://cloud.spring.io/spring-cloud-dataflow/#platform-implementations) or you can build
(https://github.com/spring-cloud/spring-cloud-dataflow) it yourself. If you build it yourself, the
executable jar will be in spring-cloud-dataflow-server-local/target

To run the Local Data Flow server Open a new terminal session:

$cd <PATH/TO/SPRING-CLOUD-DATAFLOW-LOCAL-JAR>
$java -jar spring-cloud-dataflow-server-local-<VERSION>.jar

Running instance of Kafka (http://kafka.apache.org/downloads.html)

Running instance of Apache Cassandra (http://cassandra.apache.org/)

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 12/144
11/22/2018 Spring Cloud Data Flow Samples

A database utility tool such as DBeaver (http://dbeaver.jkiss.org/) to connect to the Cassandra


instance. You might have to provide host , port , username and password depending on
the Cassandra configuration you are using.

Create a keyspace and a book table in Cassandra using:

CREATE KEYSPACE clouddata WITH REPLICATION = { 'class' :


'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1' } AND
DURABLE_WRITES = true;
USE clouddata;
CREATE TABLE book (
id uuid PRIMARY KEY,
isbn text,
author text,
title text
);

Building and Running the Demo


1. Register
(https://github.com/spring-cloud/spring-cloud-dataflow/blob/master/spring-cloud-dataflow-
docs/src/main/asciidoc/streams.adoc#register-a-stream-app)
the out-of-the-box applications for the Kafka binder

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 13/144
11/22/2018 Spring Cloud Data Flow Samples

These samples assume that the Data Flow Server can access a remote Maven
repository, repo.spring.io/libs-release by default. If your Data Flow
server is running behind a firewall, or you are using a maven proxy
preventing access to public repositories, you will need to install the sample
apps in your internal Maven repository and configure
(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-
started-maven-configuration)
the server accordingly. The sample applications are typically registered using
Data Flow’s bulk import facility. For example, the Shell command
dataflow:>app import --uri bit.ly/Celsius-SR1-stream-
applications-rabbit-maven (The actual URI is release and binder specific so
refer to the sample instructions for the actual URL). The bulk import URI
references a plain text file containing entries for all of the publicly available
Spring Cloud Stream and Task applications published to repo.spring.io .
 For example,
source.http=maven://org.springframework.cloud.stream.app:http-
source-rabbit:1.3.1.RELEASE registers the http source app at the
corresponding Maven address, relative to the remote repository(ies)
configured for the Data Flow server. The format is maven://<groupId>:
<artifactId>:<version> You will need to download
(https://repo.spring.io/libs-release/org/springframework/cloud/stream/app/spring-cloud-
stream-app-descriptor/Bacon.RELEASE/spring-cloud-stream-app-descriptor-
Bacon.RELEASE.rabbit-apps-maven-repo-url.properties)
the required apps or build (https://github.com/spring-cloud-stream-app-starters)
them and then install them in your Maven repository, using whatever group,
artifact, and version you choose. If you do this, register individual apps using
dataflow:>app register… using the maven:// resource URI format
corresponding to your installed app.

dataflow:>app import --uri http://bit.ly/Celsius-SR1-stream-applications-kafka-10-


maven

2. Create the stream

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 14/144
11/22/2018 Spring Cloud Data Flow Samples

dataflow:>stream create cassandrastream --definition "http --server.port=8888 --


spring.cloud.stream.bindings.output.contentType='application/json' | cassandra --
ingestQuery='insert into book (id, isbn, title, author) values (uuid(), ?, ?, ?)' --
keyspace=clouddata" --deploy

Created and deployed new stream 'cassandrastream'

If Cassandra isn’t running on default port on localhost or if you need


username and password to connect, use one of the following options to

 specify the necessary connection parameters: --username='<USERNAME>' --


password='<PASSWORD>' --port=<PORT> --contact-points=<LIST-OF-
HOSTS>

3. Verify the stream is successfully deployed

dataflow:>stream list

4. Notice that cassandrastream-http and cassandrastream-cassandra Spring Cloud Stream


(https://github.com/spring-cloud-stream-app-starters//) applications are running as Spring Boot
applications within the server as a collocated process.
CONSOLE
2015-12-15 15:52:31.576 INFO 18337 --- [nio-9393-exec-1] o.s.c.d.a.s.l.OutOfProcessModul
Logs will be in /var/folders/c3/ctx7_rns6x30tq7rb76wzqwr0000gp/T/spring-cloud-data-flo
2015-12-15 15:52:31.583 INFO 18337 --- [nio-9393-exec-1] o.s.c.d.a.s.l.OutOfProcessModul
Logs will be in /var/folders/c3/ctx7_rns6x30tq7rb76wzqwr0000gp/T/spring-cloud-data-flo

5. Post sample data pointing to the http endpoint: localhost:8888 ( 8888 is the
server.port we specified for the http source in this case)

dataflow:>http post --contentType 'application/json' --data '{"isbn": "1599869772",


"title": "The Art of War", "author": "Sun Tzu"}' --target http://localhost:8888
> POST (application/json;charset=UTF-8) http://localhost:8888 {"isbn": "1599869772",
"title": "The Art of War", "author": "Sun Tzu"}
> 202 ACCEPTED

6. Connect to the Cassandra instance and query the table clouddata.book to list the persisted
records

select * from clouddata.book;

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 15/144
11/22/2018 Spring Cloud Data Flow Samples

7. You’re done!

3.1.3. Using the Cloud Foundry Server

Additional Prerequisites
Cloud Foundry instance

A rabbit service instance

A Running instance of cassandra in Cloud Foundry or from another Cloud provider

A database utility tool such as DBeaver (http://dbeaver.jkiss.org/) to connect to the Cassandra


instance. You might have to provide host , port , username and password depending on
the Cassandra configuration you are using.

Create a book table in your Cassandra keyspace using:

CREATE TABLE book (


id uuid PRIMARY KEY,
isbn text,
author text,
title text
);

The Spring Cloud Data Flow Cloud Foundry Server

The Cloud Foundry Data Flow Server is Spring Boot application available for download
(http://cloud.spring.io/spring-cloud-dataflow/#platform-implementations/) or you can build
(https://github.com/spring-cloud/spring-cloud-dataflow-server-cloudfoundry) it yourself. If you build it
yourself, the executable jar will be in spring-cloud-dataflow-server-cloudfoundry/target

Although you can run the Data Flow Cloud Foundry Server locally and configure

 it to deploy to any Cloud Foundry instance, we will deploy the server to Cloud
Foundry as recommended.

1. Verify that CF instance is reachable (Your endpoint urls will be different from what is shown
here).

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 16/144
11/22/2018 Spring Cloud Data Flow Samples

$ cf api
API endpoint: https://api.system.io (API version: ...)

$ cf apps
Getting apps in org [your-org] / space [your-space] as user...
OK

No apps found

2. Follow the instructions to deploy the Spring Cloud Data Flow Cloud Foundry server
(https://docs.spring.io/spring-cloud-dataflow-server-cloudfoundry/docs/current/reference/htmlsingle).
Don’t worry about creating a Redis service. We won’t need it. If you are familiar with Cloud
Foundry application manifests, we recommend creating a manifest for the the Data Flow
server as shown here
(https://docs.spring.io/spring-cloud-dataflow-server-
cloudfoundry/docs/current/reference/htmlsingle/#sample-manifest-template)
.

As of this writing, there is a typo on the SPRING_APPLICATION_JSON entry in


the sample manifest. SPRING_APPLICATION_JSON must be followed by : and
the JSON string must be wrapped in single quotes. Alternatively, you can
replace that line with MAVEN_REMOTE_REPOSITORIES_REPO1_URL:
repo.spring.io/libs-snapshot . If your Cloud Foundry installation is
 behind a firewall, you may need to install the stream apps used in this sample
in your internal Maven repository and configure
(https://docs.spring.io/spring-cloud-dataflow/docs/1.3.0.M2/reference/htmlsingle/#getting-
started-maven-configuration)
the server to access that repository.

3. Once you have successfully executed cf push , verify the dataflow server is running

$ cf apps
Getting apps in org [your-org] / space [your-space] as user...
OK

name requested state instances memory disk urls


dataflow-server started 1/1 1G 1G dataflow-
server.app.io

4. Notice that the dataflow-server application is started and ready for interaction via the url
endpoint

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 17/144
11/22/2018 Spring Cloud Data Flow Samples

5. Connect the shell with server running on Cloud Foundry, e.g., dataflow-server.app.io

$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR>
$ java -jar spring-cloud-dataflow-shell-<VERSION>.jar

____ ____ _ __
/ ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| |
\___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` |
___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| |
|____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_|
____ |_| _ __|___/ __________
| _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \
| | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \
| |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / /
|____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/

Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
server-unknown:>

server-unknown:>dataflow config server http://dataflow-server.app.io


Successfully targeted http://dataflow-server.app.io
dataflow:>

Running the Demo


The source code for the Batch File Ingest batch job is located in batch/file-ingest

1. Register
(https://github.com/spring-cloud/spring-cloud-dataflow/blob/master/spring-cloud-dataflow-
docs/src/main/asciidoc/streams.adoc#register-a-stream-app)
the out-of-the-box applications for the Rabbit binder

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 18/144
11/22/2018 Spring Cloud Data Flow Samples

These samples assume that the Data Flow Server can access a remote Maven
repository, repo.spring.io/libs-release by default. If your Data Flow
server is running behind a firewall, or you are using a maven proxy
preventing access to public repositories, you will need to install the sample
apps in your internal Maven repository and configure
(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-
started-maven-configuration)
the server accordingly. The sample applications are typically registered using
Data Flow’s bulk import facility. For example, the Shell command
dataflow:>app import --uri bit.ly/Celsius-SR1-stream-
applications-rabbit-maven (The actual URI is release and binder specific so
refer to the sample instructions for the actual URL). The bulk import URI
references a plain text file containing entries for all of the publicly available
Spring Cloud Stream and Task applications published to repo.spring.io .
 For example,
source.http=maven://org.springframework.cloud.stream.app:http-
source-rabbit:1.3.1.RELEASE registers the http source app at the
corresponding Maven address, relative to the remote repository(ies)
configured for the Data Flow server. The format is maven://<groupId>:
<artifactId>:<version> You will need to download
(https://repo.spring.io/libs-release/org/springframework/cloud/stream/app/spring-cloud-
stream-app-descriptor/Bacon.RELEASE/spring-cloud-stream-app-descriptor-
Bacon.RELEASE.rabbit-apps-maven-repo-url.properties)
the required apps or build (https://github.com/spring-cloud-stream-app-starters)
them and then install them in your Maven repository, using whatever group,
artifact, and version you choose. If you do this, register individual apps using
dataflow:>app register… using the maven:// resource URI format
corresponding to your installed app.

dataflow:>app import --uri http://bit.ly/Celsius-SR1-stream-applications-rabbit-maven

2. Create the stream

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 19/144
11/22/2018 Spring Cloud Data Flow Samples

dataflow:>stream create cassandrastream --definition "http --


spring.cloud.stream.bindings.output.contentType='application/json' | cassandra --
ingestQuery='insert into book (id, isbn, title, author) values (uuid(), ?, ?, ?)' --
username='<USERNAME>' --password='<PASSWORD>' --port=<PORT> --contact-points=<HOST> --
keyspace='<KEYSPACE>'" --deploy

Created and deployed new stream 'cassandrastream'

You may want to change the cassandrastream name in PCF if you have enabled random
application name prefix, you could run into issues with the route name being too long.

+ . Verify the stream is successfully deployed

dataflow:>stream list

+ . Notice that cassandrastream-http and cassandrastream-cassandra Spring Cloud Stream


(https://github.com/spring-cloud-stream-app-starters/) applications are running as cloud-native
(microservice) applications in Cloud Foundry

$ cf apps
Getting apps in org [your-org] / space [your-space] as user...
OK

name requested state instances memory disk urls


cassandrastream-cassandra started 1/1 1G 1G cassandrastream-
cassandra.app.io
cassandrastream-http started 1/1 1G 1G cassandrastream-
http.app.io
dataflow-server started 1/1 1G 1G dataflow-
server.app.io

+ . Lookup the url for cassandrastream-http application from the list above. Post sample data
pointing to the http endpoint: <YOUR-cassandrastream-http-APP-URL>

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 20/144
11/22/2018 Spring Cloud Data Flow Samples

http post --contentType 'application/json' --data '{"isbn": "1599869772", "title": "The


Art of War", "author": "Sun Tzu"}' --target http://<YOUR-cassandrastream-http-APP-URL>
> POST (application/json;charset=UTF-8) http://cassandrastream-http.app.io {"isbn":
"1599869772", "title": "The Art of War", "author": "Sun Tzu"}
> 202 ACCEPTED

+ . Connect to the Cassandra instance and query the table book to list the data inserted

select * from book;

+ . Now, let’s try to take advantage of Pivotal Cloud Foundry’s platform capability. Let’s scale the
cassandrastream-http application from 1 to 3 instances

$ cf scale cassandrastream-http -i 3
Scaling app cassandrastream-http in org user-dataflow / space development as user...
OK

+ . Verify App instances (3/3) running successfully

$ cf apps
Getting apps in org user-dataflow / space development as user...
OK

name requested state instances memory disk urls


cassandrastream-cassandra started 1/1 1G 1G cassandrastream-
cassandra.app.io
cassandrastream-http started 3/3 1G 1G cassandrastream-
http.app.io
dataflow-server started 1/1 1G 1G dataflow-
server.app.io

+ . You’re done!

3.1.4. Summary
In this sample, you have learned:

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 21/144
11/22/2018 Spring Cloud Data Flow Samples

How to use Spring Cloud Data Flow’s Local and Cloud Foundry servers

How to use Spring Cloud Data Flow’s shell

How to create streaming data pipeline to connect and write to Cassandra

How to scale applications on Pivotal Cloud Foundry :sectnums: :docs_dir: ../../..

3.2. HTTP to MySQL Demo


In this demonstration, you will learn how to build a data pipeline using Spring Cloud Data Flow
(http://cloud.spring.io/spring-cloud-dataflow/) to consume data from an http endpoint and write to
MySQL database using jdbc sink.

We will take you through the steps to configure and Spring Cloud Data Flow server in either a
local (https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-started/) or
Cloud Foundry
(https://docs.spring.io/spring-cloud-dataflow-server-cloudfoundry/docs/current/reference/htmlsingle/#getting-
started)
environment.

3.2.1. Prerequisites
A Running Data Flow Shell

The Spring Cloud Data Flow Shell is available for download


(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-started-deploying-
spring-cloud-dataflow)
or you can build (https://github.com/spring-cloud/spring-cloud-dataflow) it yourself.

the Spring Cloud Data Flow Shell and Local server implementation are in the
same repository and are both built by running ./mvnw install from the project
 root directory. If you have already run the build, use the jar in spring-cloud-
dataflow-shell/target

To run the Shell open a new terminal session:

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 22/144
11/22/2018 Spring Cloud Data Flow Samples

$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR>
$ java -jar spring-cloud-dataflow-shell-<VERSION>.jar
____ ____ _ __
/ ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| |
\___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` |
___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| |
|____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_|
____ |_| _ __|___/ __________
| _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \
| | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \
| |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / /
|____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/

Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
dataflow:>

The Spring Cloud Data Flow Shell is a Spring Boot application that connects to the
Data Flow Server’s REST API and supports a DSL that simplifies the process of

 defining a stream or task and managing its lifecycle. Most of these samples use
the shell. If you prefer, you can use the Data Flow UI localhost:9393/dashboard,
(or wherever it the server is hosted) to perform equivalent operations.

3.2.2. Using the Local Server

Additional Prerequisites
A running local Data Flow Server

The Local Data Flow Server is Spring Boot application available for download
(http://cloud.spring.io/spring-cloud-dataflow/#platform-implementations) or you can build
(https://github.com/spring-cloud/spring-cloud-dataflow) it yourself. If you build it yourself, the
executable jar will be in spring-cloud-dataflow-server-local/target

To run the Local Data Flow server Open a new terminal session:

$cd <PATH/TO/SPRING-CLOUD-DATAFLOW-LOCAL-JAR>
$java -jar spring-cloud-dataflow-server-local-<VERSION>.jar

Running instance of Kafka (http://kafka.apache.org/downloads.html)

Running instance of MySQL (http://www.mysql.com/)

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 23/144
11/22/2018 Spring Cloud Data Flow Samples

A database utility tool such as DBeaver (http://dbeaver.jkiss.org/) or DbVisualizer


(https://www.dbvis.com/)

Create the test database with a names table (in MySQL) using:

CREATE DATABASE test;


USE test;
CREATE TABLE names
(
name varchar(255)
);

Building and Running the Demo


1. Register
(https://github.com/spring-cloud/spring-cloud-dataflow/blob/master/spring-cloud-dataflow-
docs/src/main/asciidoc/streams.adoc#register-a-stream-app)
the out-of-the-box applications for the Kafka binder

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 24/144
11/22/2018 Spring Cloud Data Flow Samples

These samples assume that the Data Flow Server can access a remote Maven
repository, repo.spring.io/libs-release by default. If your Data Flow
server is running behind a firewall, or you are using a maven proxy
preventing access to public repositories, you will need to install the sample
apps in your internal Maven repository and configure
(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-
started-maven-configuration)
the server accordingly. The sample applications are typically registered using
Data Flow’s bulk import facility. For example, the Shell command
dataflow:>app import --uri bit.ly/Celsius-SR1-stream-
applications-rabbit-maven (The actual URI is release and binder specific so
refer to the sample instructions for the actual URL). The bulk import URI
references a plain text file containing entries for all of the publicly available
Spring Cloud Stream and Task applications published to repo.spring.io .
 For example,
source.http=maven://org.springframework.cloud.stream.app:http-
source-rabbit:1.3.1.RELEASE registers the http source app at the
corresponding Maven address, relative to the remote repository(ies)
configured for the Data Flow server. The format is maven://<groupId>:
<artifactId>:<version> You will need to download
(https://repo.spring.io/libs-release/org/springframework/cloud/stream/app/spring-cloud-
stream-app-descriptor/Bacon.RELEASE/spring-cloud-stream-app-descriptor-
Bacon.RELEASE.rabbit-apps-maven-repo-url.properties)
the required apps or build (https://github.com/spring-cloud-stream-app-starters)
them and then install them in your Maven repository, using whatever group,
artifact, and version you choose. If you do this, register individual apps using
dataflow:>app register… using the maven:// resource URI format
corresponding to your installed app.

dataflow:>app import --uri http://bit.ly/Celsius-SR1-stream-applications-kafka-10-


maven

2. Create the stream

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 25/144
11/22/2018 Spring Cloud Data Flow Samples

dataflow:>stream create --name mysqlstream --definition "http --server.port=8787 |


jdbc --tableName=names --columns=name --spring.datasource.driver-class-
name=org.mariadb.jdbc.Driver --
spring.datasource.url='jdbc:mysql://localhost:3306/test'" --deploy

Created and deployed new stream 'mysqlstream'

If MySQL isn’t running on default port on localhost or if you need


username and password to connect, use one of the following options to
specify the necessary connection parameters: --
 spring.datasource.url='jdbc:mysql://<HOST>:<PORT>/<NAME>' --
spring.datasource.username=<USERNAME> --
spring.datasource.password=<PASSWORD>

3. Verify the stream is successfully deployed

dataflow:>stream list

4. Notice that mysqlstream-http and mysqlstream-jdbc Spring Cloud Stream


(https://github.com/spring-cloud-stream-app-starters//) applications are running as Spring Boot
applications within the Local server as collocated processes.
CONSOLE
2016-05-03 09:29:55.918 INFO 65162 --- [nio-9393-exec-3] o.s.c.d.spi.local.LocalAppDeplo
Logs will be in /var/folders/c3/ctx7_rns6x30tq7rb76wzqwr0000gp/T/spring-cloud-dataflow
2016-05-03 09:29:55.939 INFO 65162 --- [nio-9393-exec-3] o.s.c.d.spi.local.LocalAppDeplo
Logs will be in /var/folders/c3/ctx7_rns6x30tq7rb76wzqwr0000gp/T/spring-cloud-dataflow

5. Post sample data pointing to the http endpoint: localhost:8787 [ 8787 is the
server.port we specified for the http source in this case]

dataflow:>http post --contentType 'application/json' --target http://localhost:8787 --


data "{\"name\": \"Foo\"}"
> POST (application/json;charset=UTF-8) http://localhost:8787 {"name": "Spring Boot"}
> 202 ACCEPTED

6. Connect to the MySQL instance and query the table test.names to list the new rows:

select * from test.names;

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 26/144
11/22/2018 Spring Cloud Data Flow Samples

7. You’re done!

3.2.3. Using the Cloud Foundry Server

Additional Prerequisites
Cloud Foundry instance

Running instance of rabbit in Cloud Foundry

Running instance of mysql in Cloud Foundry

A database utility tool such as DBeaver (http://dbeaver.jkiss.org/) or DbVisualizer


(https://www.dbvis.com/)

Create the names table (in MySQL) using:

CREATE TABLE names


(
name varchar(255)
);

The Spring Cloud Data Flow Cloud Foundry Server

The Cloud Foundry Data Flow Server is Spring Boot application available for download
(http://cloud.spring.io/spring-cloud-dataflow/#platform-implementations/) or you can build
(https://github.com/spring-cloud/spring-cloud-dataflow-server-cloudfoundry) it yourself. If you build it
yourself, the executable jar will be in spring-cloud-dataflow-server-cloudfoundry/target

Although you can run the Data Flow Cloud Foundry Server locally and configure

 it to deploy to any Cloud Foundry instance, we will deploy the server to Cloud
Foundry as recommended.

1. Verify that CF instance is reachable (Your endpoint urls will be different from what is shown
here).

$ cf api
API endpoint: https://api.system.io (API version: ...)

$ cf apps
Getting apps in org [your-org] / space [your-space] as user...
OK

No apps found

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 27/144
11/22/2018 Spring Cloud Data Flow Samples

2. Follow the instructions to deploy the Spring Cloud Data Flow Cloud Foundry server
(https://docs.spring.io/spring-cloud-dataflow-server-cloudfoundry/docs/current/reference/htmlsingle).
Don’t worry about creating a Redis service. We won’t need it. If you are familiar with Cloud
Foundry application manifests, we recommend creating a manifest for the the Data Flow
server as shown here
(https://docs.spring.io/spring-cloud-dataflow-server-
cloudfoundry/docs/current/reference/htmlsingle/#sample-manifest-template)
.

As of this writing, there is a typo on the SPRING_APPLICATION_JSON entry in


the sample manifest. SPRING_APPLICATION_JSON must be followed by : and
the JSON string must be wrapped in single quotes. Alternatively, you can
replace that line with MAVEN_REMOTE_REPOSITORIES_REPO1_URL:
repo.spring.io/libs-snapshot . If your Cloud Foundry installation is
 behind a firewall, you may need to install the stream apps used in this sample
in your internal Maven repository and configure
(https://docs.spring.io/spring-cloud-dataflow/docs/1.3.0.M2/reference/htmlsingle/#getting-
started-maven-configuration)
the server to access that repository.

3. Once you have successfully executed cf push , verify the dataflow server is running

$ cf apps
Getting apps in org [your-org] / space [your-space] as user...
OK

name requested state instances memory disk urls


dataflow-server started 1/1 1G 1G dataflow-
server.app.io

4. Notice that the dataflow-server application is started and ready for interaction via the url
endpoint

5. Connect the shell with server running on Cloud Foundry, e.g., dataflow-server.app.io

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 28/144
11/22/2018 Spring Cloud Data Flow Samples

$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR>
$ java -jar spring-cloud-dataflow-shell-<VERSION>.jar

____ ____ _ __
/ ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| |
\___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` |
___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| |
|____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_|
____ |_| _ __|___/ __________
| _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \
| | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \
| |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / /
|____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/

Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
server-unknown:>

server-unknown:>dataflow config server http://dataflow-server.app.io


Successfully targeted http://dataflow-server.app.io
dataflow:>

Building and Running the Demo


1. Register
(https://github.com/spring-cloud/spring-cloud-dataflow/blob/master/spring-cloud-dataflow-
docs/src/main/asciidoc/streams.adoc#register-a-stream-app)
the out-of-the-box applications for the Rabbit binder

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 29/144
11/22/2018 Spring Cloud Data Flow Samples

These samples assume that the Data Flow Server can access a remote Maven
repository, repo.spring.io/libs-release by default. If your Data Flow
server is running behind a firewall, or you are using a maven proxy
preventing access to public repositories, you will need to install the sample
apps in your internal Maven repository and configure
(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-
started-maven-configuration)
the server accordingly. The sample applications are typically registered using
Data Flow’s bulk import facility. For example, the Shell command
dataflow:>app import --uri bit.ly/Celsius-SR1-stream-
applications-rabbit-maven (The actual URI is release and binder specific so
refer to the sample instructions for the actual URL). The bulk import URI
references a plain text file containing entries for all of the publicly available
Spring Cloud Stream and Task applications published to repo.spring.io .
 For example,
source.http=maven://org.springframework.cloud.stream.app:http-
source-rabbit:1.3.1.RELEASE registers the http source app at the
corresponding Maven address, relative to the remote repository(ies)
configured for the Data Flow server. The format is maven://<groupId>:
<artifactId>:<version> You will need to download
(https://repo.spring.io/libs-release/org/springframework/cloud/stream/app/spring-cloud-
stream-app-descriptor/Bacon.RELEASE/spring-cloud-stream-app-descriptor-
Bacon.RELEASE.rabbit-apps-maven-repo-url.properties)
the required apps or build (https://github.com/spring-cloud-stream-app-starters)
them and then install them in your Maven repository, using whatever group,
artifact, and version you choose. If you do this, register individual apps using
dataflow:>app register… using the maven:// resource URI format
corresponding to your installed app.

dataflow:>app import --uri http://bit.ly/Celsius-SR1-stream-applications-rabbit-maven

2. Create the stream

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 30/144
11/22/2018 Spring Cloud Data Flow Samples

dataflow:>stream create --name mysqlstream --definition "http | jdbc --tableName=names


--columns=name"
Created new stream 'mysqlstream'

dataflow:>stream deploy --name mysqlstream --properties


"deployer.jdbc.cloudfoundry.services=mysql"
Deployed stream 'mysqlstream'

By supplying the deployer.jdbc.cloudfoundry.services=mysql property,


we are deploying the stream with jdbc-sink to automatically bind to mysql

 service and only this application in the stream gets the service binding. This
also eliminates the requirement to supply datasource credentials in stream
definition.

3. Verify the stream is successfully deployed

dataflow:>stream list

4. Notice that mysqlstream-http and mysqlstream-jdbc Spring Cloud Stream


(https://github.com/spring-cloud-stream-app-starters/) applications are running as cloud-native
(microservice) applications in Cloud Foundry

$ cf apps
Getting apps in org user-dataflow / space development as user...
OK

name requested state instances memory disk urls


mysqlstream-http started 1/1 1G 1G mysqlstream-
http.app.io
mysqlstream-jdbc started 1/1 1G 1G mysqlstream-
jdbc.app.io
dataflow-server started 1/1 1G 1G dataflow-
server.app.io

5. Lookup the url for mysqlstream-http application from the list above. Post sample data
pointing to the http endpoint: <YOUR-mysqlstream-http-APP-URL>

http post --contentType 'application/json' --data "{\"name\": \"Bar"} --target


http://mysqlstream-http.app.io "
> POST (application/json;charset=UTF-8) http://mysqlstream-http.app.io {"name": "Bar"}
> 202 ACCEPTED

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 31/144
11/22/2018 Spring Cloud Data Flow Samples

6. Connect to the MySQL instance and query the table names to list the new rows:

select * from names;

7. Now, let’s take advantage of Pivotal Cloud Foundry’s platform capability. Let’s scale the
mysqlstream-http application from 1 to 3 instances

$ cf scale mysqlstream-http -i 3
Scaling app mysqlstream-http in org user-dataflow / space development as user...
OK

8. Verify App instances (3/3) running successfully

$ cf apps
Getting apps in org user-dataflow / space development as user...
OK

name requested state instances memory disk urls


mysqlstream-http started 3/3 1G 1G mysqlstream-
http.app.io
mysqlstream-jdbc started 1/1 1G 1G mysqlstream-
jdbc.app.io
dataflow-server started 1/1 1G 1G dataflow-
server.app.io

9. You’re done!

3.2.4. Summary
In this sample, you have learned:

How to use Spring Cloud Data Flow’s Local and Cloud Foundry servers

How to use Spring Cloud Data Flow’s shell

How to create streaming data pipeline to connect and write to MySQL

How to scale applications on Pivotal Cloud Foundry :sectnums: :docs_dir: ../../..

3.3. HTTP to Gem re Demo


In this demonstration, you will learn how to build a data pipeline using Spring Cloud Data Flow
(http://cloud.spring.io/spring-cloud-dataflow/) to consume data from an http endpoint and write to
Gemfire using the gemfire sink.

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 32/144
11/22/2018 Spring Cloud Data Flow Samples

We will take you through the steps to configure and run Spring Cloud Data Flow server in either a
local (https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-started/) or
Cloud Foundry
(https://docs.spring.io/spring-cloud-dataflow-server-cloudfoundry/docs/current/reference/htmlsingle/#getting-
started)
environment.

For legacy reasons the gemfire Spring Cloud Stream Apps are named after
Pivotal GemFire . The code base for the commercial product has since been

 open sourced as Apache Geode . These samples should work with compatible
versions of Pivotal GemFire or Apache Geode. Herein we will refer to the
installed IMDG simply as Geode .

3.3.1. Prerequisites
A Running Data Flow Shell

The Spring Cloud Data Flow Shell is available for download


(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-started-deploying-
spring-cloud-dataflow)
or you can build (https://github.com/spring-cloud/spring-cloud-dataflow) it yourself.

the Spring Cloud Data Flow Shell and Local server implementation are in the
same repository and are both built by running ./mvnw install from the project
 root directory. If you have already run the build, use the jar in spring-cloud-
dataflow-shell/target

To run the Shell open a new terminal session:

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 33/144
11/22/2018 Spring Cloud Data Flow Samples

$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR>
$ java -jar spring-cloud-dataflow-shell-<VERSION>.jar
____ ____ _ __
/ ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| |
\___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` |
___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| |
|____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_|
____ |_| _ __|___/ __________
| _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \
| | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \
| |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / /
|____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/

Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
dataflow:>

The Spring Cloud Data Flow Shell is a Spring Boot application that connects to the
Data Flow Server’s REST API and supports a DSL that simplifies the process of

 defining a stream or task and managing its lifecycle. Most of these samples use
the shell. If you prefer, you can use the Data Flow UI localhost:9393/dashboard,
(or wherever it the server is hosted) to perform equivalent operations.

A Geode installation with a locator and cache server running

If you do not have access an existing Geode installation, install Apache Geode
(http://geode.apache.org) or Pivotal Gemfire (http://geode.apache.org/) and start the gfsh CLI in a
separate terminal.

_________________________ __
/ _____/ ______/ ______/ /____/ /
/ / __/ /___ /_____ / _____ /
/ /__/ / ____/ _____/ / / / /
/______/_/ /______/_/ /_/ 1.2.1

Monitor and Manage Apache Geode


gfsh>

3.3.2. Using the Local Server

Additional Prerequisites
A running local Data Flow Server

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 34/144
11/22/2018 Spring Cloud Data Flow Samples

The Local Data Flow Server is Spring Boot application available for download
(http://cloud.spring.io/spring-cloud-dataflow/#platform-implementations) or you can build
(https://github.com/spring-cloud/spring-cloud-dataflow) it yourself. If you build it yourself, the
executable jar will be in spring-cloud-dataflow-server-local/target

To run the Local Data Flow server Open a new terminal session:

$cd <PATH/TO/SPRING-CLOUD-DATAFLOW-LOCAL-JAR>
$java -jar spring-cloud-dataflow-server-local-<VERSION>.jar

A running instance of Rabbit MQ (https://www.rabbitmq.com)

Building and Running the Demo


1. Use gfsh to start a locator and server

gfsh>start locator --name=locator1


gfsh>start server --name=server1

2. Create a region called Stocks

gfsh>create region --name Stocks --type=REPLICATE

Use the Shell to create the sample stream

3. Register
(https://github.com/spring-cloud/spring-cloud-dataflow/blob/master/spring-cloud-dataflow-
docs/src/main/asciidoc/streams.adoc#register-a-stream-app)
the out-of-the-box applications for the Rabbit binder

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 35/144
11/22/2018 Spring Cloud Data Flow Samples

These samples assume that the Data Flow Server can access a remote Maven
repository, repo.spring.io/libs-release by default. If your Data Flow
server is running behind a firewall, or you are using a maven proxy
preventing access to public repositories, you will need to install the sample
apps in your internal Maven repository and configure
(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-
started-maven-configuration)
the server accordingly. The sample applications are typically registered using
Data Flow’s bulk import facility. For example, the Shell command
dataflow:>app import --uri bit.ly/Celsius-SR1-stream-
applications-rabbit-maven (The actual URI is release and binder specific so
refer to the sample instructions for the actual URL). The bulk import URI
references a plain text file containing entries for all of the publicly available
Spring Cloud Stream and Task applications published to repo.spring.io .
 For example,
source.http=maven://org.springframework.cloud.stream.app:http-
source-rabbit:1.3.1.RELEASE registers the http source app at the
corresponding Maven address, relative to the remote repository(ies)
configured for the Data Flow server. The format is maven://<groupId>:
<artifactId>:<version> You will need to download
(https://repo.spring.io/libs-release/org/springframework/cloud/stream/app/spring-cloud-
stream-app-descriptor/Bacon.RELEASE/spring-cloud-stream-app-descriptor-
Bacon.RELEASE.rabbit-apps-maven-repo-url.properties)
the required apps or build (https://github.com/spring-cloud-stream-app-starters)
them and then install them in your Maven repository, using whatever group,
artifact, and version you choose. If you do this, register individual apps using
dataflow:>app register… using the maven:// resource URI format
corresponding to your installed app.

dataflow:>app import --uri http://bit.ly/Celsius-SR1-stream-applications-rabbit-maven

4. Create the stream

This example creates an http endpoint to which we will post stock prices as a JSON document
containing symbol and price fields. The property --json=true to enable Geode’s JSON
support and configures the sink to convert JSON String payloads to PdxInstance

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 36/144
11/22/2018 Spring Cloud Data Flow Samples

(https://geode.apache.org/releases/latest/javadoc/org/apache/geode/pdx/PdxInstance.html), the
recommended way to store JSON documents in Geode. The keyExpression property is a
SpEL expression used to extract the symbol value the PdxInstance to use as an entry key.

PDX serialization is very efficient and supports OQL queries without


requiring a custom domain class. Use of custom domain types requires these
 classes to be in the class path of both the stream apps and the cache server.
For this reason, the use of custom payload types is generally discouraged.

dataflow:>stream create --name stocks --definition "http --port=9090 | gemfire --


json=true --regionName=Stocks --keyExpression=payload.getField('symbol')" --deploy
Created and deployed new stream 'stocks'

If the Geode locator isn’t running on default port on localhost , add the
options --connect-type=locator --host-addresses=<host>:<port> . If

 there are multiple locators, you can provide a comma separated list of locator
addresses. This is not necessary for the sample but is typical for production
environments to enable fail-over.

5. Verify the stream is successfully deployed

dataflow:>stream list

6. Post sample data pointing to the http endpoint: localhost:9090 ( 9090 is the port we
specified for the http source)

dataflow:>http post --target http://localhost:9090 --contentType application/json --


data '{"symbol":"VMW","price":117.06}'
> POST (application/json) http://localhost:9090 {"symbol":"VMW","price":117.06}
> 202 ACCEPTED

7. Using gfsh , connect to the locator if not already connected, and verify the cache entry was
created.

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 37/144
11/22/2018 Spring Cloud Data Flow Samples

gfsh>get --key='VMW' --region=/Stocks


Result : true
Key Class : java.lang.String
Key : VMW
Value Class : org.apache.geode.pdx.internal.PdxInstanceImpl

symbol | price
------ | ------
VMW | 117.06

8. You’re done!

Using the Cloud Foundry Server


Additional Prerequisites
A Cloud Foundry instance

Running instance of a rabbit service in Cloud Foundry

Running instance of the Pivotal Cloud Cache for PCF


(https://docs.pivotal.io/p-cloud-cache/1-0/developer.html) (PCC) service cloudcache in Cloud
Foundry.

Cloud Data Flow Cloud Foundry Server

The Cloud Foundry Data Flow Server is Spring Boot application available for download
(http://cloud.spring.io/spring-cloud-dataflow/#platform-implementations/) or you can build
(https://github.com/spring-cloud/spring-cloud-dataflow-server-cloudfoundry) it yourself. If you build it
yourself, the executable jar will be in spring-cloud-dataflow-server-cloudfoundry/target

Although you can run the Data Flow Cloud Foundry Server locally and configure

 it to deploy to any Cloud Foundry instance, we will deploy the server to Cloud
Foundry as recommended.

1. Verify that CF instance is reachable (Your endpoint urls will be different from what is shown
here).

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 38/144
11/22/2018 Spring Cloud Data Flow Samples

$ cf api
API endpoint: https://api.system.io (API version: ...)

$ cf apps
Getting apps in org [your-org] / space [your-space] as user...
OK

No apps found

2. Follow the instructions to deploy the Spring Cloud Data Flow Cloud Foundry server
(https://docs.spring.io/spring-cloud-dataflow-server-cloudfoundry/docs/current/reference/htmlsingle).
Don’t worry about creating a Redis service. We won’t need it. If you are familiar with Cloud
Foundry application manifests, we recommend creating a manifest for the the Data Flow
server as shown here
(https://docs.spring.io/spring-cloud-dataflow-server-
cloudfoundry/docs/current/reference/htmlsingle/#sample-manifest-template)
.

As of this writing, there is a typo on the SPRING_APPLICATION_JSON entry in


the sample manifest. SPRING_APPLICATION_JSON must be followed by : and
the JSON string must be wrapped in single quotes. Alternatively, you can
replace that line with MAVEN_REMOTE_REPOSITORIES_REPO1_URL:
repo.spring.io/libs-snapshot . If your Cloud Foundry installation is
 behind a firewall, you may need to install the stream apps used in this sample
in your internal Maven repository and configure
(https://docs.spring.io/spring-cloud-dataflow/docs/1.3.0.M2/reference/htmlsingle/#getting-
started-maven-configuration)
the server to access that repository.

3. Once you have successfully executed cf push , verify the dataflow server is running

$ cf apps
Getting apps in org [your-org] / space [your-space] as user...
OK

name requested state instances memory disk urls


dataflow-server started 1/1 1G 1G dataflow-
server.app.io

4. Notice that the dataflow-server application is started and ready for interaction via the url
endpoint

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 39/144
11/22/2018 Spring Cloud Data Flow Samples

5. Connect the shell with server running on Cloud Foundry, e.g., dataflow-server.app.io

$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR>
$ java -jar spring-cloud-dataflow-shell-<VERSION>.jar

____ ____ _ __
/ ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| |
\___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` |
___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| |
|____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_|
____ |_| _ __|___/ __________
| _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \
| | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \
| |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / /
|____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/

Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
server-unknown:>

server-unknown:>dataflow config server http://dataflow-server.app.io


Successfully targeted http://dataflow-server.app.io
dataflow:>

Building and Running the Demo


1. Register
(https://github.com/spring-cloud/spring-cloud-dataflow/blob/master/spring-cloud-dataflow-
docs/src/main/asciidoc/streams.adoc#register-a-stream-app)
the out-of-the-box applications for the Rabbit binder

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 40/144
11/22/2018 Spring Cloud Data Flow Samples

These samples assume that the Data Flow Server can access a remote Maven
repository, repo.spring.io/libs-release by default. If your Data Flow
server is running behind a firewall, or you are using a maven proxy
preventing access to public repositories, you will need to install the sample
apps in your internal Maven repository and configure
(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-
started-maven-configuration)
the server accordingly. The sample applications are typically registered using
Data Flow’s bulk import facility. For example, the Shell command
dataflow:>app import --uri bit.ly/Celsius-SR1-stream-
applications-rabbit-maven (The actual URI is release and binder specific so
refer to the sample instructions for the actual URL). The bulk import URI
references a plain text file containing entries for all of the publicly available
Spring Cloud Stream and Task applications published to repo.spring.io .
 For example,
source.http=maven://org.springframework.cloud.stream.app:http-
source-rabbit:1.3.1.RELEASE registers the http source app at the
corresponding Maven address, relative to the remote repository(ies)
configured for the Data Flow server. The format is maven://<groupId>:
<artifactId>:<version> You will need to download
(https://repo.spring.io/libs-release/org/springframework/cloud/stream/app/spring-cloud-
stream-app-descriptor/Bacon.RELEASE/spring-cloud-stream-app-descriptor-
Bacon.RELEASE.rabbit-apps-maven-repo-url.properties)
the required apps or build (https://github.com/spring-cloud-stream-app-starters)
them and then install them in your Maven repository, using whatever group,
artifact, and version you choose. If you do this, register individual apps using
dataflow:>app register… using the maven:// resource URI format
corresponding to your installed app.

dataflow:>app import --uri http://bit.ly/Celsius-SR1-stream-applications-rabbit-maven

2. Get the PCC connection information

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 41/144
11/22/2018 Spring Cloud Data Flow Samples

$ cf service-key cloudcache my-service-key


Getting key my-service-key for service instance cloudcache as <user>...

{
"locators": [
"10.0.16.9[55221]",
"10.0.16.11[55221]",
"10.0.16.10[55221]"
],
"urls": {
"gfsh": "http://...",
"pulse": "http://.../pulse"
},
"users": [
{
"password": <password>,
"username": "cluster_operator"
},
{
"password": <password>,
"username": "developer"
}
]
}

3. Using gfsh , connect to the PCC instance as cluster_operator using the service key values
and create the Stocks region.

gfsh>connect --use-http --url=<gfsh-url> --user=cluster_operator --password=


<cluster_operator_password>
gfsh>create region --name Stocks --type=REPLICATE

4. Create the stream, connecting to the PCC instance as developer

This example creates an http endpoint to which we will post stock prices as a JSON document
containing symbol and price fields. The property --json=true to enable Geode’s JSON
support and configures the sink to convert JSON String payloads to PdxInstance
(https://geode.apache.org/releases/latest/javadoc/org/apache/geode/pdx/PdxInstance.html), the
recommended way to store JSON documents in Geode. The keyExpression property is a
SpEL expression used to extract the symbol value the PdxInstance to use as an entry key.

PDX serialization is very efficient and supports OQL queries without


requiring a custom domain class. Use of custom domain types requires these
 classes to be in the class path of both the stream apps and the cache server.
For this reason, the use of custom payload types is generally discouraged.

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 42/144
11/22/2018 Spring Cloud Data Flow Samples

dataflow:>stream create --name stocks --definition "http --


security.basic.enabled=false | gemfire --username=developer --password=<developer-
password> --connect-type=locator --host-addresses=10.0.16.9:55221 --json=true --
regionName=Stocks --keyExpression=payload.getField('symbol')" --deploy

5. Verify the stream is successfully deployed

dataflow:>stream list

6. Post sample data pointing to the http endpoint

Get the url of the http source using cf apps

dataflow:>http post --target http://<http source url> --contentType application/json -


-data '{"symbol":"VMW","price":117.06}'
> POST (application/json) http://... {"symbol":"VMW","price":117.06}
> 202 ACCEPTED

7. Using gfsh , connect to the PCC instance as cluster_operator using the service key values.

gfsh>connect --use-http --url=<gfsh-url> --user=cluster_operator --password=


<cluster_operator_password>
gfsh>get --key='VMW' --region=/Stocks
Result : true
Key Class : java.lang.String
Key : VMW
Value Class : org.apache.geode.pdx.internal.PdxInstanceImpl

symbol | price
------ | ------
VMW | 117.06

8. You’re done!

3.3.3. Summary
In this sample, you have learned:

How to use Spring Cloud Data Flow’s Local and Cloud Foundry servers

How to use Spring Cloud Data Flow’s shell

How to create streaming data pipeline to connect and write to gemfire :sectnums: :docs_dir:
../../..

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 43/144
11/22/2018 Spring Cloud Data Flow Samples

3.4. Gem re CQ to Log Demo


In this demonstration, you will learn how to build a data pipeline using Spring Cloud Data Flow
(http://cloud.spring.io/spring-cloud-dataflow/) to consume data from a gemfire-cq (Continuous
Query) endpoint and write to a log using the log sink. The gemfire-cq source creates a
Continuous Query to monitor events for a region that match the query’s result set and publish a
message whenever such an event is emitted. In this example, we simulate monitoring orders to
trigger a process whenever the quantity ordered is above a defined limit.

We will take you through the steps to configure and run Spring Cloud Data Flow server in either a
local (https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-started/) or
Cloud Foundry
(https://docs.spring.io/spring-cloud-dataflow-server-cloudfoundry/docs/current/reference/htmlsingle/#getting-
started)
environment.

For legacy reasons the gemfire Spring Cloud Stream Apps are named after
Pivotal GemFire . The code base for the commercial product has since been

 open sourced as Apache Geode . These samples should work with compatible
versions of Pivotal GemFire or Apache Geode. Herein we will refer to the
installed IMDG simply as Geode .

3.4.1. Prerequisites
A Running Data Flow Shell

The Spring Cloud Data Flow Shell is available for download


(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-started-deploying-
spring-cloud-dataflow)
or you can build (https://github.com/spring-cloud/spring-cloud-dataflow) it yourself.

the Spring Cloud Data Flow Shell and Local server implementation are in the
same repository and are both built by running ./mvnw install from the project
 root directory. If you have already run the build, use the jar in spring-cloud-
dataflow-shell/target

To run the Shell open a new terminal session:

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 44/144
11/22/2018 Spring Cloud Data Flow Samples

$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR>
$ java -jar spring-cloud-dataflow-shell-<VERSION>.jar
____ ____ _ __
/ ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| |
\___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` |
___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| |
|____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_|
____ |_| _ __|___/ __________
| _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \
| | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \
| |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / /
|____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/

Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
dataflow:>

The Spring Cloud Data Flow Shell is a Spring Boot application that connects to the
Data Flow Server’s REST API and supports a DSL that simplifies the process of

 defining a stream or task and managing its lifecycle. Most of these samples use
the shell. If you prefer, you can use the Data Flow UI localhost:9393/dashboard,
(or wherever it the server is hosted) to perform equivalent operations.

A Geode installation with a locator and cache server running

If you do not have access an existing Geode installation, install Apache Geode
(http://geode.apache.org) or Pivotal Gemfire (http://geode.apache.org/) and start the gfsh CLI in a
separate terminal.

_________________________ __
/ _____/ ______/ ______/ /____/ /
/ / __/ /___ /_____ / _____ /
/ /__/ / ____/ _____/ / / / /
/______/_/ /______/_/ /_/ 1.2.1

Monitor and Manage Apache Geode


gfsh>

3.4.2. Using the Local Server

Additional Prerequisites
A Running Data Flow Server

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 45/144
11/22/2018 Spring Cloud Data Flow Samples

The Local Data Flow Server is Spring Boot application available for download
(http://cloud.spring.io/spring-cloud-dataflow/#platform-implementations) or you can build
(https://github.com/spring-cloud/spring-cloud-dataflow) it yourself. If you build it yourself, the
executable jar will be in spring-cloud-dataflow-server-local/target

To run the Local Data Flow server Open a new terminal session:

$cd <PATH/TO/SPRING-CLOUD-DATAFLOW-LOCAL-JAR>
$java -jar spring-cloud-dataflow-server-local-<VERSION>.jar

A running instance of Rabbit MQ (https://www.rabbitmq.com)

Building and Running the Demo


1. Use gfsh to start a locator and server

gfsh>start locator --name=locator1


gfsh>start server --name=server1

2. Create a region called Orders

gfsh>create region --name Orders --type=REPLICATE

Use the Shell to create the sample stream

3. Register
(https://github.com/spring-cloud/spring-cloud-dataflow/blob/master/spring-cloud-dataflow-
docs/src/main/asciidoc/streams.adoc#register-a-stream-app)
the out-of-the-box applications for the Rabbit binder

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 46/144
11/22/2018 Spring Cloud Data Flow Samples

These samples assume that the Data Flow Server can access a remote Maven
repository, repo.spring.io/libs-release by default. If your Data Flow
server is running behind a firewall, or you are using a maven proxy
preventing access to public repositories, you will need to install the sample
apps in your internal Maven repository and configure
(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-
started-maven-configuration)
the server accordingly. The sample applications are typically registered using
Data Flow’s bulk import facility. For example, the Shell command
dataflow:>app import --uri bit.ly/Celsius-SR1-stream-
applications-rabbit-maven (The actual URI is release and binder specific so
refer to the sample instructions for the actual URL). The bulk import URI
references a plain text file containing entries for all of the publicly available
Spring Cloud Stream and Task applications published to repo.spring.io .
 For example,
source.http=maven://org.springframework.cloud.stream.app:http-
source-rabbit:1.3.1.RELEASE registers the http source app at the
corresponding Maven address, relative to the remote repository(ies)
configured for the Data Flow server. The format is maven://<groupId>:
<artifactId>:<version> You will need to download
(https://repo.spring.io/libs-release/org/springframework/cloud/stream/app/spring-cloud-
stream-app-descriptor/Bacon.RELEASE/spring-cloud-stream-app-descriptor-
Bacon.RELEASE.rabbit-apps-maven-repo-url.properties)
the required apps or build (https://github.com/spring-cloud-stream-app-starters)
them and then install them in your Maven repository, using whatever group,
artifact, and version you choose. If you do this, register individual apps using
dataflow:>app register… using the maven:// resource URI format
corresponding to your installed app.

dataflow:>app import --uri http://bit.ly/Celsius-SR1-stream-applications-rabbit-maven

4. Create the stream

This example creates an gemfire-cq source to which will publish events matching a query
criteria on a region. In this case we will monitor the Orders region. For simplicity, we will
avoid creating a data structure for the order. Each cache entry contains an integer value
representing the quantity of the ordered item. This stream will fire a message whenever the
value>999. By default, the source emits only the value. Here we will override that using the

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 47/144
11/22/2018 Spring Cloud Data Flow Samples

cq-event-expression property. This accepts a SpEL expression bound to a CQEvent


(https://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/query/CqEvent.html). To
reference the entire CQEvent instance, we use #this . In order to display the contents in the
log, we will invoke toString() on the instance.

dataflow:>stream create --name orders --definition " gemfire-cq --query='SELECT * from


/Orders o where o > 999' --cq-event-expression=#this.toString() | log" --deploy

If the Geode locator isn’t running on default port on localhost , add the
options --connect-type=locator --host-addresses=<host>:<port> . If

 there are multiple locators, you can provide a comma separated list of locator
addresses. This is not necessary for the sample but is typical for production
environments to enable fail-over.

5. Verify the stream is successfully deployed

dataflow:>stream list

6. Monitor stdout for the log sink. When you deploy the stream, you will see log messages in the
Data Flow server console like this

2017-10-30 09:39:36.283 INFO 8167 --- [nio-9393-exec-5]


o.s.c.d.spi.local.LocalAppDeployer : Deploying app with deploymentId orders.log
instance 0.
Logs will be in /var/folders/hd/5yqz2v2d3sxd3n879f4sg4gr0000gn/T/spring-cloud-
dataflow-5375107584795488581/orders-1509370775940/orders.log

Copy the location of the log sink logs. This is a directory that ends in orders.log . The log
files will be in stdout_0.log under this directory. You can monitor the output of the log sink
using tail , or something similar:

$tail -f /var/folders/hd/5yqz2v2d3sxd3n879f4sg4gr0000gn/T/spring-cloud-dataflow-
5375107584795488581/orders-1509370775940/orders.log/stdout_0.log

7. Using gfsh , create and update some cache entries

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 48/144
11/22/2018 Spring Cloud Data Flow Samples

gfsh>put --region Orders --value-class java.lang.Integer --key 01234 --value 1000


gfsh>put --region Orders --value-class java.lang.Integer --key 11234 --value 1005
gfsh>put --region Orders --value-class java.lang.Integer --key 21234 --value 100
gfsh>put --region Orders --value-class java.lang.Integer --key 31234 --value 999
gfsh>put --region Orders --value-class java.lang.Integer --key 21234 --value 1000

8. Observe the log output You should see messages like:

2017-10-30 09:53:02.231 INFO 8563 --- [ire-cq.orders-1] log-sink


: CqEvent [CqName=GfCq1; base operation=CREATE; cq operation=CREATE; key=01234;
value=1000]
2017-10-30 09:53:19.732 INFO 8563 --- [ire-cq.orders-1] log-sink
: CqEvent [CqName=GfCq1; base operation=CREATE; cq operation=CREATE; key=11234;
value=1005]
2017-10-30 09:53:53.242 INFO 8563 --- [ire-cq.orders-1] log-sink
: CqEvent [CqName=GfCq1; base operation=UPDATE; cq operation=CREATE; key=21234;
value=1000]

9. Another interesting demonstration combines gemfire-cq with the http-gemfire example.

dataflow:> stream create --name stocks --definition "http --port=9090 | gemfire-json-


server --regionName=Stocks --keyExpression=payload.getField('symbol')" --deploy
dataflow:> stream create --name stock_watch --definition "gemfire-cq --query='Select *
from /Stocks where symbol=''VMW''' | log" --deploy

1. You’re done!

3.4.3. Using the Cloud Foundry Server

Additional Prerequisites
A Cloud Foundry instance

Running instance of a rabbit service in Cloud Foundry

Running instance of the Pivotal Cloud Cache for PCF


(https://docs.pivotal.io/p-cloud-cache/1-0/developer.html) (PCC) service cloudcache in Cloud
Foundry.

The Spring Cloud Data Flow Cloud Foundry Server

The Cloud Foundry Data Flow Server is Spring Boot application available for download
(http://cloud.spring.io/spring-cloud-dataflow/#platform-implementations/) or you can build
(https://github.com/spring-cloud/spring-cloud-dataflow-server-cloudfoundry) it yourself. If you build it
yourself, the executable jar will be in spring-cloud-dataflow-server-cloudfoundry/target

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 49/144
11/22/2018 Spring Cloud Data Flow Samples

Although you can run the Data Flow Cloud Foundry Server locally and configure

 it to deploy to any Cloud Foundry instance, we will deploy the server to Cloud
Foundry as recommended.

1. Verify that CF instance is reachable (Your endpoint urls will be different from what is shown
here).

$ cf api
API endpoint: https://api.system.io (API version: ...)

$ cf apps
Getting apps in org [your-org] / space [your-space] as user...
OK

No apps found

2. Follow the instructions to deploy the Spring Cloud Data Flow Cloud Foundry server
(https://docs.spring.io/spring-cloud-dataflow-server-cloudfoundry/docs/current/reference/htmlsingle).
Don’t worry about creating a Redis service. We won’t need it. If you are familiar with Cloud
Foundry application manifests, we recommend creating a manifest for the the Data Flow
server as shown here
(https://docs.spring.io/spring-cloud-dataflow-server-
cloudfoundry/docs/current/reference/htmlsingle/#sample-manifest-template)
.

As of this writing, there is a typo on the SPRING_APPLICATION_JSON entry in


the sample manifest. SPRING_APPLICATION_JSON must be followed by : and
the JSON string must be wrapped in single quotes. Alternatively, you can
replace that line with MAVEN_REMOTE_REPOSITORIES_REPO1_URL:
repo.spring.io/libs-snapshot . If your Cloud Foundry installation is
 behind a firewall, you may need to install the stream apps used in this sample
in your internal Maven repository and configure
(https://docs.spring.io/spring-cloud-dataflow/docs/1.3.0.M2/reference/htmlsingle/#getting-
started-maven-configuration)
the server to access that repository.

3. Once you have successfully executed cf push , verify the dataflow server is running

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 50/144
11/22/2018 Spring Cloud Data Flow Samples

$ cf apps
Getting apps in org [your-org] / space [your-space] as user...
OK

name requested state instances memory disk urls


dataflow-server started 1/1 1G 1G dataflow-
server.app.io

4. Notice that the dataflow-server application is started and ready for interaction via the url
endpoint

5. Connect the shell with server running on Cloud Foundry, e.g., dataflow-server.app.io

$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR>
$ java -jar spring-cloud-dataflow-shell-<VERSION>.jar

____ ____ _ __
/ ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| |
\___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` |
___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| |
|____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_|
____ |_| _ __|___/ __________
| _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \
| | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \
| |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / /
|____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/

Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
server-unknown:>

server-unknown:>dataflow config server http://dataflow-server.app.io


Successfully targeted http://dataflow-server.app.io
dataflow:>

Building and Running the Demo


1. Register
(https://github.com/spring-cloud/spring-cloud-dataflow/blob/master/spring-cloud-dataflow-
docs/src/main/asciidoc/streams.adoc#register-a-stream-app)
the out-of-the-box applications for the Rabbit binder

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 51/144
11/22/2018 Spring Cloud Data Flow Samples

These samples assume that the Data Flow Server can access a remote Maven
repository, repo.spring.io/libs-release by default. If your Data Flow
server is running behind a firewall, or you are using a maven proxy
preventing access to public repositories, you will need to install the sample
apps in your internal Maven repository and configure
(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-
started-maven-configuration)
the server accordingly. The sample applications are typically registered using
Data Flow’s bulk import facility. For example, the Shell command
dataflow:>app import --uri bit.ly/Celsius-SR1-stream-
applications-rabbit-maven (The actual URI is release and binder specific so
refer to the sample instructions for the actual URL). The bulk import URI
references a plain text file containing entries for all of the publicly available
Spring Cloud Stream and Task applications published to repo.spring.io .
 For example,
source.http=maven://org.springframework.cloud.stream.app:http-
source-rabbit:1.3.1.RELEASE registers the http source app at the
corresponding Maven address, relative to the remote repository(ies)
configured for the Data Flow server. The format is maven://<groupId>:
<artifactId>:<version> You will need to download
(https://repo.spring.io/libs-release/org/springframework/cloud/stream/app/spring-cloud-
stream-app-descriptor/Bacon.RELEASE/spring-cloud-stream-app-descriptor-
Bacon.RELEASE.rabbit-apps-maven-repo-url.properties)
the required apps or build (https://github.com/spring-cloud-stream-app-starters)
them and then install them in your Maven repository, using whatever group,
artifact, and version you choose. If you do this, register individual apps using
dataflow:>app register… using the maven:// resource URI format
corresponding to your installed app.

dataflow:>app import --uri http://bit.ly/Celsius-SR1-stream-applications-rabbit-maven

2. Get the PCC connection information

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 52/144
11/22/2018 Spring Cloud Data Flow Samples

$ cf service-key cloudcache my-service-key


Getting key my-service-key for service instance cloudcache as <user>...

{
"locators": [
"10.0.16.9[55221]",
"10.0.16.11[55221]",
"10.0.16.10[55221]"
],
"urls": {
"gfsh": "http://...",
"pulse": "http://.../pulse"
},
"users": [
{
"password": <password>,
"username": "cluster_operator"
},
{
"password": <password>,
"username": "developer"
}
]
}

3. Using gfsh , connect to the PCC instance as cluster_operator using the service key values
and create the Test region.

gfsh>connect --use-http --url=<gfsh-url> --user=cluster_operator --password=


<cluster_operator_password>
gfsh>create region --name Orders --type=REPLICATE

4. Create the stream using the Data Flow Shell

This example creates an gemfire-cq source to which will publish events matching a query
criteria on a region. In this case we will monitor the Orders region. For simplicity, we will
avoid creating a data structure for the order. Each cache entry contains an integer value
representing the quantity of the ordered item. This stream will fire a message whenever the
value>999. By default, the source emits only the value. Here we will override that using the
cq-event-expression property. This accepts a SpEL expression bound to a CQEvent
(https://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/query/CqEvent.html). To
reference the entire CQEvent instance, we use #this . In order to display the contents in the
log, we will invoke toString() on the instance.

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 53/144
11/22/2018 Spring Cloud Data Flow Samples

dataflow:>stream create --name orders --definition " gemfire-cq --username=developer


--password=<developer-password> --connect-type=locator --host-
addresses=10.0.16.9:55221 --query='SELECT * from /Orders o where o > 999' --cq-event-
expression=#this.toString() | log" --deploy
Created and deployed new stream 'events'

5. Verify the stream is successfully deployed

dataflow:>stream list

6. Monitor stdout for the log sink

cf logs <log-sink-app-name>

7. Using gfsh , create and update some cache entries

gfsh>connect --use-http --url=<gfsh-url> --user=cluster_operator --password=


<cluster_operator_password>
gfsh>put --region Orders --value-class java.lang.Integer --key 01234 --value 1000
gfsh>put --region Orders --value-class java.lang.Integer --key 11234 --value 1005
gfsh>put --region Orders --value-class java.lang.Integer --key 21234 --value 100
gfsh>put --region Orders --value-class java.lang.Integer --key 31234 --value 999
gfsh>put --region Orders --value-class java.lang.Integer --key 21234 --value 1000

8. Observe the log output You should see messages like:


CONSOLE
2017-10-30 09:53:02.231 INFO 8563 --- [ire-cq.orders-1] log-sink
2017-10-30 09:53:19.732 INFO 8563 --- [ire-cq.orders-1] log-sink
2017-10-30 09:53:53.242 INFO 8563 --- [ire-cq.orders-1] log-sink

9. Another interesting demonstration combines gemfire-cq with the http-gemfire example.

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 54/144
11/22/2018 Spring Cloud Data Flow Samples

dataflow:>stream create --name stocks --definition "http --


security.basic.enabled=false | gemfire --json=true --username=developer --password=
<developer-password> --connect-type=locator --host-addresses=10.0.16.25:55221 --
regionName=Stocks --keyExpression=payload.getField('symbol')" --deploy

dataflow:>stream create --name stock_watch --definition "gemfire-cq --


username=developer --password=<developer-password> --connect-type=locator --host-
addresses=10.0.16.25:55221 --query='SELECT * from /Stocks where symbol=''VMW''' --cq-
event-expression=#this.toString() | log" --deploy

dataflow:>http post --target http://data-flow-server-dpvuo77-stocks-http.apps.scdf-


gcp.springapps.io/ --contentType application/json --data
'{"symbol":"VMW","price":117.06}'

10. You’re done!

3.4.4. Summary
In this sample, you have learned:

How to use Spring Cloud Data Flow’s Local and Cloud Foundry servers

How to use Spring Cloud Data Flow’s shell

How to create streaming data pipeline to connect and publish CQ events from gemfire
:sectnums: :docs_dir: ../../..

3.5. Gem re to Log Demo


In this demonstration, you will learn how to build a data pipeline using Spring Cloud Data Flow
(http://cloud.spring.io/spring-cloud-dataflow/) to consume data from a gemfire endpoint and write to
a log using the log sink. The gemfire source creates a CacheListener
(https://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/CacheListener.html) to monitor
events for a region and publish a message whenever an entry is changed.

We will take you through the steps to configure and run Spring Cloud Data Flow server in either a
local (https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-started/) or
Cloud Foundry
(https://docs.spring.io/spring-cloud-dataflow-server-cloudfoundry/docs/current/reference/htmlsingle/#getting-
started)
environment.

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 55/144
11/22/2018 Spring Cloud Data Flow Samples

For legacy reasons the gemfire Spring Cloud Stream Apps are named after
Pivotal GemFire . The code base for the commercial product has since been

 open sourced as Apache Geode . These samples should work with compatible
versions of Pivotal GemFire or Apache Geode. Herein we will refer to the
installed IMDG simply as Geode .

3.5.1. Prerequisites
A Running Data Flow Shell

The Spring Cloud Data Flow Shell is available for download


(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-started-deploying-
spring-cloud-dataflow)
or you can build (https://github.com/spring-cloud/spring-cloud-dataflow) it yourself.

the Spring Cloud Data Flow Shell and Local server implementation are in the
same repository and are both built by running ./mvnw install from the project
 root directory. If you have already run the build, use the jar in spring-cloud-
dataflow-shell/target

To run the Shell open a new terminal session:

$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR>
$ java -jar spring-cloud-dataflow-shell-<VERSION>.jar
____ ____ _ __
/ ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| |
\___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` |
___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| |
|____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_|
____ |_| _ __|___/ __________
| _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \
| | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \
| |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / /
|____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/

Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
dataflow:>

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 56/144
11/22/2018 Spring Cloud Data Flow Samples

The Spring Cloud Data Flow Shell is a Spring Boot application that connects to the
Data Flow Server’s REST API and supports a DSL that simplifies the process of

 defining a stream or task and managing its lifecycle. Most of these samples use
the shell. If you prefer, you can use the Data Flow UI localhost:9393/dashboard,
(or wherever it the server is hosted) to perform equivalent operations.

A Geode installation with a locator and cache server running

If you do not have access an existing Geode installation, install Apache Geode
(http://geode.apache.org) or Pivotal Gemfire (http://geode.apache.org/) and start the gfsh CLI in a
separate terminal.

_________________________ __
/ _____/ ______/ ______/ /____/ /
/ / __/ /___ /_____ / _____ /
/ /__/ / ____/ _____/ / / / /
/______/_/ /______/_/ /_/ 1.2.1

Monitor and Manage Apache Geode


gfsh>

3.5.2. Using the Local Server

Additional Prerequisites
A Running Data Flow Server

The Local Data Flow Server is Spring Boot application available for download
(http://cloud.spring.io/spring-cloud-dataflow/#platform-implementations) or you can build
(https://github.com/spring-cloud/spring-cloud-dataflow) it yourself. If you build it yourself, the
executable jar will be in spring-cloud-dataflow-server-local/target

To run the Local Data Flow server Open a new terminal session:

$cd <PATH/TO/SPRING-CLOUD-DATAFLOW-LOCAL-JAR>
$java -jar spring-cloud-dataflow-server-local-<VERSION>.jar

A running instance of Rabbit MQ (https://www.rabbitmq.com)

Building and Running the Demo


1. Use gfsh to start a locator and server

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 57/144
11/22/2018 Spring Cloud Data Flow Samples

gfsh>start locator --name=locator1


gfsh>start server --name=server1

2. Create a region called Test

gfsh>create region --name Test --type=REPLICATE

Use the Shell to create the sample stream

3. Register
(https://github.com/spring-cloud/spring-cloud-dataflow/blob/master/spring-cloud-dataflow-
docs/src/main/asciidoc/streams.adoc#register-a-stream-app)
the out-of-the-box applications for the Rabbit binder

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 58/144
11/22/2018 Spring Cloud Data Flow Samples

These samples assume that the Data Flow Server can access a remote Maven
repository, repo.spring.io/libs-release by default. If your Data Flow
server is running behind a firewall, or you are using a maven proxy
preventing access to public repositories, you will need to install the sample
apps in your internal Maven repository and configure
(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-
started-maven-configuration)
the server accordingly. The sample applications are typically registered using
Data Flow’s bulk import facility. For example, the Shell command
dataflow:>app import --uri bit.ly/Celsius-SR1-stream-
applications-rabbit-maven (The actual URI is release and binder specific so
refer to the sample instructions for the actual URL). The bulk import URI
references a plain text file containing entries for all of the publicly available
Spring Cloud Stream and Task applications published to repo.spring.io .
 For example,
source.http=maven://org.springframework.cloud.stream.app:http-
source-rabbit:1.3.1.RELEASE registers the http source app at the
corresponding Maven address, relative to the remote repository(ies)
configured for the Data Flow server. The format is maven://<groupId>:
<artifactId>:<version> You will need to download
(https://repo.spring.io/libs-release/org/springframework/cloud/stream/app/spring-cloud-
stream-app-descriptor/Bacon.RELEASE/spring-cloud-stream-app-descriptor-
Bacon.RELEASE.rabbit-apps-maven-repo-url.properties)
the required apps or build (https://github.com/spring-cloud-stream-app-starters)
them and then install them in your Maven repository, using whatever group,
artifact, and version you choose. If you do this, register individual apps using
dataflow:>app register… using the maven:// resource URI format
corresponding to your installed app.

dataflow:>app import --uri http://bit.ly/Celsius-SR1-stream-applications-rabbit-maven

4. Create the stream

This example creates an gemfire source to which will publish events on a region

dataflow:>stream create --name events --definition " gemfire --regionName=Test | log"


--deploy
Created and deployed new stream 'events'

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 59/144
11/22/2018 Spring Cloud Data Flow Samples

If the Geode locator isn’t running on default port on localhost , add the
options --connect-type=locator --host-addresses=<host>:<port> . If

 there are multiple locators, you can provide a comma separated list of locator
addresses. This is not necessary for the sample but is typical for production
environments to enable fail-over.

5. Verify the stream is successfully deployed

dataflow:>stream list

6. Monitor stdout for the log sink. When you deploy the stream, you will see log messages in the
Data Flow server console like this
CONSOLE
2017-10-28 17:28:23.275 INFO 15603 --- [nio-9393-exec-2] o.s.c.d.spi.local.LocalAppDeplo
Logs will be in /var/folders/hd/5yqz2v2d3sxd3n879f4sg4gr0000gn/T/spring-cloud-dataflow
2017-10-28 17:28:23.277 INFO 15603 --- [nio-9393-exec-2] o.s.c.d.s.c.StreamDeploymentCon
2017-10-28 17:28:23.311 INFO 15603 --- [nio-9393-exec-2] o.s.c.d.s.c.StreamDeploymentCon
2017-10-28 17:28:23.318 INFO 15603 --- [nio-9393-exec-2] o.s.c.d.spi.local.LocalAppDeplo
Logs will be in /var/folders/hd/5yqz2v2d3sxd3n879f4sg4gr0000gn/T/spring-cloud-dataflow

Copy the location of the log sink logs. This is a directory that ends in events.log . The log
files will be in stdout_0.log under this directory. You can monitor the output of the log sink
using tail , or something similar:
CONSOLE
$tail -f /var/folders/hd/5yqz2v2d3sxd3n879f4sg4gr0000gn/T/spring-cloud-dataflow-409399206

7. Using gfsh , create and update some cache entries

gfsh>put --region /Test --key 1 --value "value 1"


gfsh>put --region /Test --key 2 --value "value 2"
gfsh>put --region /Test --key 3 --value "value 3"
gfsh>put --region /Test --key 1 --value "new value 1"

8. Observe the log output You should see messages like:

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 60/144
11/22/2018 Spring Cloud Data Flow Samples

CONSOLE
2017-10-28 17:28:52.893 INFO 18986 --- [emfire.events-1] log sink
2017-10-28 17:28:52.893 INFO 18986 --- [emfire.events-1] log sink
2017-10-28 17:28:52.893 INFO 18986 --- [emfire.events-1] log sink
2017-10-28 17:28:52.893 INFO 18986 --- [emfire.events-1] log sink

By default, the message payload contains the updated value. Depending on your application,
you may need additional information. The data comes from EntryEvent
(https://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/EntryEvent.html). You can
access any fields using the source’s cache-event-expression property. This takes a SpEL
expression bound to the EntryEvent. Try something like --cache-event-
expression='{key:'+key+',new_value:'+newValue+'}' (HINT: You will need to destroy
the stream and recreate it to add this property, an exercise left to the reader). Now you should
see log messages like:
CONSOLE
2017-10-28 17:28:52.893 INFO 18986 --- [emfire.events-1] log-sink
2017-10-28 17:41:24.466 INFO 18986 --- [emfire.events-1] log-sink

9. You’re done!

3.5.3. Using the Cloud Foundry Server

Additional Prerequisites
A Cloud Foundry instance

Running instance of a rabbit service in Cloud Foundry

Running instance of the Pivotal Cloud Cache for PCF


(https://docs.pivotal.io/p-cloud-cache/1-0/developer.html) (PCC) service cloudcache in Cloud
Foundry.

The Spring Cloud Data Flow Cloud Foundry Server

The Cloud Foundry Data Flow Server is Spring Boot application available for download
(http://cloud.spring.io/spring-cloud-dataflow/#platform-implementations/) or you can build
(https://github.com/spring-cloud/spring-cloud-dataflow-server-cloudfoundry) it yourself. If you build it
yourself, the executable jar will be in spring-cloud-dataflow-server-cloudfoundry/target

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 61/144
11/22/2018 Spring Cloud Data Flow Samples

Although you can run the Data Flow Cloud Foundry Server locally and configure

 it to deploy to any Cloud Foundry instance, we will deploy the server to Cloud
Foundry as recommended.

1. Verify that CF instance is reachable (Your endpoint urls will be different from what is shown
here).

$ cf api
API endpoint: https://api.system.io (API version: ...)

$ cf apps
Getting apps in org [your-org] / space [your-space] as user...
OK

No apps found

2. Follow the instructions to deploy the Spring Cloud Data Flow Cloud Foundry server
(https://docs.spring.io/spring-cloud-dataflow-server-cloudfoundry/docs/current/reference/htmlsingle).
Don’t worry about creating a Redis service. We won’t need it. If you are familiar with Cloud
Foundry application manifests, we recommend creating a manifest for the the Data Flow
server as shown here
(https://docs.spring.io/spring-cloud-dataflow-server-
cloudfoundry/docs/current/reference/htmlsingle/#sample-manifest-template)
.

As of this writing, there is a typo on the SPRING_APPLICATION_JSON entry in


the sample manifest. SPRING_APPLICATION_JSON must be followed by : and
the JSON string must be wrapped in single quotes. Alternatively, you can
replace that line with MAVEN_REMOTE_REPOSITORIES_REPO1_URL:
repo.spring.io/libs-snapshot . If your Cloud Foundry installation is
 behind a firewall, you may need to install the stream apps used in this sample
in your internal Maven repository and configure
(https://docs.spring.io/spring-cloud-dataflow/docs/1.3.0.M2/reference/htmlsingle/#getting-
started-maven-configuration)
the server to access that repository.

3. Once you have successfully executed cf push , verify the dataflow server is running

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 62/144
11/22/2018 Spring Cloud Data Flow Samples

$ cf apps
Getting apps in org [your-org] / space [your-space] as user...
OK

name requested state instances memory disk urls


dataflow-server started 1/1 1G 1G dataflow-
server.app.io

4. Notice that the dataflow-server application is started and ready for interaction via the url
endpoint

5. Connect the shell with server running on Cloud Foundry, e.g., dataflow-server.app.io

$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR>
$ java -jar spring-cloud-dataflow-shell-<VERSION>.jar

____ ____ _ __
/ ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| |
\___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` |
___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| |
|____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_|
____ |_| _ __|___/ __________
| _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \
| | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \
| |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / /
|____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/

Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
server-unknown:>

server-unknown:>dataflow config server http://dataflow-server.app.io


Successfully targeted http://dataflow-server.app.io
dataflow:>

Building and Running the Demo


1. Register
(https://github.com/spring-cloud/spring-cloud-dataflow/blob/master/spring-cloud-dataflow-
docs/src/main/asciidoc/streams.adoc#register-a-stream-app)
the out-of-the-box applications for the Rabbit binder

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 63/144
11/22/2018 Spring Cloud Data Flow Samples

These samples assume that the Data Flow Server can access a remote Maven
repository, repo.spring.io/libs-release by default. If your Data Flow
server is running behind a firewall, or you are using a maven proxy
preventing access to public repositories, you will need to install the sample
apps in your internal Maven repository and configure
(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-
started-maven-configuration)
the server accordingly. The sample applications are typically registered using
Data Flow’s bulk import facility. For example, the Shell command
dataflow:>app import --uri bit.ly/Celsius-SR1-stream-
applications-rabbit-maven (The actual URI is release and binder specific so
refer to the sample instructions for the actual URL). The bulk import URI
references a plain text file containing entries for all of the publicly available
Spring Cloud Stream and Task applications published to repo.spring.io .
 For example,
source.http=maven://org.springframework.cloud.stream.app:http-
source-rabbit:1.3.1.RELEASE registers the http source app at the
corresponding Maven address, relative to the remote repository(ies)
configured for the Data Flow server. The format is maven://<groupId>:
<artifactId>:<version> You will need to download
(https://repo.spring.io/libs-release/org/springframework/cloud/stream/app/spring-cloud-
stream-app-descriptor/Bacon.RELEASE/spring-cloud-stream-app-descriptor-
Bacon.RELEASE.rabbit-apps-maven-repo-url.properties)
the required apps or build (https://github.com/spring-cloud-stream-app-starters)
them and then install them in your Maven repository, using whatever group,
artifact, and version you choose. If you do this, register individual apps using
dataflow:>app register… using the maven:// resource URI format
corresponding to your installed app.

dataflow:>app import --uri http://bit.ly/Celsius-SR1-stream-applications-rabbit-maven

2. Get the PCC connection information

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 64/144
11/22/2018 Spring Cloud Data Flow Samples

$ cf service-key cloudcache my-service-key


Getting key my-service-key for service instance cloudcache as <user>...

{
"locators": [
"10.0.16.9[55221]",
"10.0.16.11[55221]",
"10.0.16.10[55221]"
],
"urls": {
"gfsh": "http://...",
"pulse": "http://.../pulse"
},
"users": [
{
"password": <password>,
"username": "cluster_operator"
},
{
"password": <password>,
"username": "developer"
}
]
}

3. Using gfsh , connect to the PCC instance as cluster_operator using the service key values
and create the Test region.

gfsh>connect --use-http --url=<gfsh-url> --user=cluster_operator --password=


<cluster_operator_password>
gfsh>create region --name Test --type=REPLICATE

4. Create the stream, connecting to the PCC instance as developer. This example creates an
gemfire source to which will publish events on a region

dataflow>stream create --name events --definition " gemfire --username=developer --


password=<developer-password> --connect-type=locator --host-addresses=10.0.16.9:55221
--regionName=Test | log" --deploy

5. Verify the stream is successfully deployed

dataflow:>stream list

6. Monitor stdout for the log sink

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 65/144
11/22/2018 Spring Cloud Data Flow Samples

cf logs <log-sink-app-name>

7. Using gfsh , create and update some cache entries

gfsh>connect --use-http --url=<gfsh-url> --user=cluster_operator --password=


<cluster_operator_password>
gfsh>put --region /Test --key 1 --value "value 1"
gfsh>put --region /Test --key 2 --value "value 2"
gfsh>put --region /Test --key 3 --value "value 3"
gfsh>put --region /Test --key 1 --value "new value 1"

8. Observe the log output

You should see messages like:


CONSOLE
2017-10-28 17:28:52.893 INFO 18986 --- [emfire.events-1] log sink
2017-10-28 17:28:52.893 INFO 18986 --- [emfire.events-1] log sink
2017-10-28 17:28:52.893 INFO 18986 --- [emfire.events-1] log sink
2017-10-28 17:28:52.893 INFO 18986 --- [emfire.events-1] log sink

By default, the message payload contains the updated value. Depending on your application,
you may need additional information. The data comes from EntryEvent
(https://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/EntryEvent.html). You can
access any fields using the source’s cache-event-expression property. This takes a SpEL
expression bound to the EntryEvent. Try something like --cache-event-
expression='{key:'+key+',new_value:'+newValue+'}' (HINT: You will need to destroy
the stream and recreate it to add this property, an exercise left to the reader). Now you should
see log messages like:
CONSOLE
2017-10-28 17:28:52.893 INFO 18986 --- [emfire.events-1] log-sink
2017-10-28 17:41:24.466 INFO 18986 --- [emfire.events-1] log-sink

9. You’re done!

3.5.4. Summary
In this sample, you have learned:

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 66/144
11/22/2018 Spring Cloud Data Flow Samples

How to use Spring Cloud Data Flow’s Local and Cloud Foundry servers

How to use Spring Cloud Data Flow’s shell

How to create streaming data pipeline to connect and publish events from gemfire
:sectnums: :docs_dir: ../../..

3.6. Custom Spring Cloud Stream Processor


3.6.1. Prerequisites
A Running Data Flow Shell

The Spring Cloud Data Flow Shell is available for download


(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-started-deploying-
spring-cloud-dataflow)
or you can build (https://github.com/spring-cloud/spring-cloud-dataflow) it yourself.

the Spring Cloud Data Flow Shell and Local server implementation are in the
same repository and are both built by running ./mvnw install from the project
 root directory. If you have already run the build, use the jar in spring-cloud-
dataflow-shell/target

To run the Shell open a new terminal session:

$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR>
$ java -jar spring-cloud-dataflow-shell-<VERSION>.jar
____ ____ _ __
/ ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| |
\___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` |
___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| |
|____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_|
____ |_| _ __|___/ __________
| _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \
| | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \
| |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / /
|____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/

Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
dataflow:>

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 67/144
11/22/2018 Spring Cloud Data Flow Samples

The Spring Cloud Data Flow Shell is a Spring Boot application that connects to the
Data Flow Server’s REST API and supports a DSL that simplifies the process of

 defining a stream or task and managing its lifecycle. Most of these samples use
the shell. If you prefer, you can use the Data Flow UI localhost:9393/dashboard,
(or wherever it the server is hosted) to perform equivalent operations.

A running local Data Flow Server

The Local Data Flow Server is Spring Boot application available for download
(http://cloud.spring.io/spring-cloud-dataflow/#platform-implementations) or you can build
(https://github.com/spring-cloud/spring-cloud-dataflow) it yourself. If you build it yourself, the
executable jar will be in spring-cloud-dataflow-server-local/target

To run the Local Data Flow server Open a new terminal session:

$cd <PATH/TO/SPRING-CLOUD-DATAFLOW-LOCAL-JAR>
$java -jar spring-cloud-dataflow-server-local-<VERSION>.jar

A Java IDE

Maven (https://maven.apache.org/) Installed

A running instance of Rabbit MQ (https://www.rabbitmq.com/)

3.6.2. Creating the Custom Stream App


We will create a custom Spring Cloud Stream (https://cloud.spring.io/spring-cloud-stream/) application
and run it on Spring Cloud Data Flow. We’ll go through the steps to make a simple processor that
converts temperature from Fahrenheit to Celsius. We will be running the demo locally, but all the
steps will work in a Cloud Foundry environment as well.

1. Create a new spring cloud stream project

Create a Spring initializer (http://start.spring.io/) project

Set the group to demo.celsius.converter and the artifact name as celsius-


converter-processor

Choose a message transport binding as a dependency for the custom app There are options
for choosing Rabbit MQ or Kafka as the message transport. For this demo, we will use
rabbit . Type rabbit in the search bar under Search for dependencies and select Stream
Rabbit .

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 68/144
11/22/2018 Spring Cloud Data Flow Samples

Hit the generate project button and open the new project in an IDE of your choice

2. Develop the app

We can now create our custom app. Our Spring Cloud Stream application is a Spring Boot
application that runs as an executable jar. The application will include two Java classes:

CelsiusConverterProcessorAplication.java - the main Spring Boot application class,


generated by Spring initializr

CelsiusConverterProcessorConfiguration.java - the Spring Cloud Stream code that


we will write

We are creating a transformer that takes a Fahrenheit input and converts it to Celsius.
Following the same naming convention as the application file, create a new Java class in
the same package called CelsiusConverterProcessorConfiguration.java .

CelsiusConverterProcessorConfiguration.java

@EnableBinding(Processor.class)
public class CelsiusConverterProcessorConfiguration {

@Transformer(inputChannel = Processor.INPUT, outputChannel = Processor.OUTPUT)


public int convertToCelsius(String payload) {
int fahrenheitTemperature = Integer.parseInt(payload);
return (farenheitTemperature-32)*5/9;
}
}

Here we introduced two important Spring annotations. First we annotated the class with
@EnableBinding(Processor.class) . Second we created a method and annotated it with
@Transformer(inputChannel = Processor.INPUT, outputChannel =
Processor.OUTPUT) . By adding these two annotations we have configured this stream app
as a Processor (as opposed to a Source or a Sink ). This means that the application
receives input from an upstream application via the Processor.INPUT channel and sends
its output to a downstream application via the Processor.OUTPUT channel.

The convertToCelsius method takes a String as input for Fahrenheit and then returns
the converted Celsius as an integer. This method is very simple, but that is also the beauty
of this programming style. We can add as much logic as we want to this method to enrich
this processor. As long as we annotate it properly and return valid output, it works as a
Spring Cloud Stream Processor. Also note that it is straightforward to unit test this code.

3. Build the Spring Boot application with Maven

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 69/144
11/22/2018 Spring Cloud Data Flow Samples

$cd <PROJECT_DIR>
$./mvnw clean package

4. Run the Application standalone

java -jar target/celsius-converter-processor-0.0.1-SNAPSHOT.jar

If all goes well, we should have a running standalone Spring Boot Application. Once we verify
that the app is started and running without any errors, we can stop it.

3.6.3. Deploying the App to Spring Cloud Data Flow


1. Register
(https://github.com/spring-cloud/spring-cloud-dataflow/blob/master/spring-cloud-dataflow-
docs/src/main/asciidoc/streams.adoc#register-a-stream-app)
the out-of-the-box applications for the Rabbit binder

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 70/144
11/22/2018 Spring Cloud Data Flow Samples

These samples assume that the Data Flow Server can access a remote Maven
repository, repo.spring.io/libs-release by default. If your Data Flow
server is running behind a firewall, or you are using a maven proxy
preventing access to public repositories, you will need to install the sample
apps in your internal Maven repository and configure
(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-
started-maven-configuration)
the server accordingly. The sample applications are typically registered using
Data Flow’s bulk import facility. For example, the Shell command
dataflow:>app import --uri bit.ly/Celsius-SR1-stream-
applications-rabbit-maven (The actual URI is release and binder specific so
refer to the sample instructions for the actual URL). The bulk import URI
references a plain text file containing entries for all of the publicly available
Spring Cloud Stream and Task applications published to repo.spring.io .
 For example,
source.http=maven://org.springframework.cloud.stream.app:http-
source-rabbit:1.3.1.RELEASE registers the http source app at the
corresponding Maven address, relative to the remote repository(ies)
configured for the Data Flow server. The format is maven://<groupId>:
<artifactId>:<version> You will need to download
(https://repo.spring.io/libs-release/org/springframework/cloud/stream/app/spring-cloud-
stream-app-descriptor/Bacon.RELEASE/spring-cloud-stream-app-descriptor-
Bacon.RELEASE.rabbit-apps-maven-repo-url.properties)
the required apps or build (https://github.com/spring-cloud-stream-app-starters)
them and then install them in your Maven repository, using whatever group,
artifact, and version you choose. If you do this, register individual apps using
dataflow:>app register… using the maven:// resource URI format
corresponding to your installed app.

dataflow:>app import --uri http://bit.ly/Celsius-SR1-stream-applications-rabbit-maven

2. Register the custom processor

app register --type processor --name convertToCelsius --uri <File URL of the jar file
on the local filesystem where you built the project above> --force

3. Create the stream

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 71/144
11/22/2018 Spring Cloud Data Flow Samples

We will create a stream that uses the out of the box http source and log sink and our
custom transformer.

dataflow:>stream create --name convertToCelsiusStream --definition "http --port=9090


| convertToCelsius | log" --deploy

Created and deployed new stream 'convertToCelsiusStream'

4. Verify the stream is successfully deployed

dataflow:>stream list

5. Verify that the apps have successfully deployed

dataflow:>runtime apps

CONSOLE
2016-09-27 10:03:11.988 INFO 95234 --- [nio-9393-exec-9] o.s.c.d.spi.local.LocalAppDeplo
Logs will be in /var/folders/2q/krqwcbhj2d58csmthyq_n1nw0000gp/T/spring-cloud-dataflow
2016-09-27 10:03:12.397 INFO 95234 --- [nio-9393-exec-9] o.s.c.d.spi.local.LocalAppDeplo
Logs will be in /var/folders/2q/krqwcbhj2d58csmthyq_n1nw0000gp/T/spring-cloud-dataflow
2016-09-27 10:03:14.445 INFO 95234 --- [nio-9393-exec-9] o.s.c.d.spi.local.LocalAppDeplo
Logs will be in /var/folders/2q/krqwcbhj2d58csmthyq_n1nw0000gp/T/spring-cloud-dataflow

6. Post sample data to the http endpoint: localhost:9090 ( 9090 is the port we specified
for the http source in this case)

dataflow:>http post --target http://localhost:9090 --data 76


> POST (text/plain;Charset=UTF-8) http://localhost:9090 76
> 202 ACCEPTED

7. Open the log file for the convertToCelsiusStream.log app to see the output of our stream
CONSOLE
tail -f /var/folders/2q/krqwcbhj2d58csmthyq_n1nw0000gp/T/spring-cloud-dataflow-7563139704

You should see the temperature you posted converted to Celsius!

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 72/144
11/22/2018 Spring Cloud Data Flow Samples

2016-09-27 10:05:34.933 INFO 95616 --- [CelsiusStream-1] log.sink


: 24

3.6.4. Summary
In this sample, you have learned:

How to write a custom Processor stream application

How to use Spring Cloud Data Flow’s Local server

How to use Spring Cloud Data Flow’s shell application

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 73/144
11/22/2018 Spring Cloud Data Flow Samples

4. Task / Batch
4.1. Batch Job on Cloud Foundry
In this demonstration, you will learn how to orchestrate short-lived data processing application
(eg: Spring Batch Jobs) using Spring Cloud Task (http://cloud.spring.io/spring-cloud-task/) and Spring
Cloud Data Flow (http://cloud.spring.io/spring-cloud-dataflow/) on Cloud Foundry.

4.1.1. Prerequisites
Local PCFDev (https://pivotal.io/pcf-dev) instance

Local install of cf CLI (https://github.com/cloudfoundry/cli) command line tool

Running instance of mysql in PCFDev

A Running Data Flow Shell

The Spring Cloud Data Flow Shell is available for download


(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-started-deploying-
spring-cloud-dataflow)
or you can build (https://github.com/spring-cloud/spring-cloud-dataflow) it yourself.

the Spring Cloud Data Flow Shell and Local server implementation are in the
same repository and are both built by running ./mvnw install from the project
 root directory. If you have already run the build, use the jar in spring-cloud-
dataflow-shell/target

To run the Shell open a new terminal session:

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 74/144
11/22/2018 Spring Cloud Data Flow Samples

$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR>
$ java -jar spring-cloud-dataflow-shell-<VERSION>.jar
____ ____ _ __
/ ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| |
\___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` |
___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| |
|____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_|
____ |_| _ __|___/ __________
| _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \
| | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \
| |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / /
|____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/

Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
dataflow:>

The Spring Cloud Data Flow Shell is a Spring Boot application that connects to the
Data Flow Server’s REST API and supports a DSL that simplifies the process of

 defining a stream or task and managing its lifecycle. Most of these samples use
the shell. If you prefer, you can use the Data Flow UI localhost:9393/dashboard,
(or wherever it the server is hosted) to perform equivalent operations.

The Spring Cloud Data Flow Cloud Foundry Server running in PCFDev

The Cloud Foundry Data Flow Server is Spring Boot application available for download
(http://cloud.spring.io/spring-cloud-dataflow/#platform-implementations/) or you can build
(https://github.com/spring-cloud/spring-cloud-dataflow-server-cloudfoundry) it yourself. If you build it
yourself, the executable jar will be in spring-cloud-dataflow-server-cloudfoundry/target

Although you can run the Data Flow Cloud Foundry Server locally and configure

 it to deploy to any Cloud Foundry instance, we will deploy the server to Cloud
Foundry as recommended.

1. Verify that CF instance is reachable (Your endpoint urls will be different from what is shown
here).

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 75/144
11/22/2018 Spring Cloud Data Flow Samples

$ cf api
API endpoint: https://api.system.io (API version: ...)

$ cf apps
Getting apps in org [your-org] / space [your-space] as user...
OK

No apps found

2. Follow the instructions to deploy the Spring Cloud Data Flow Cloud Foundry server
(https://docs.spring.io/spring-cloud-dataflow-server-cloudfoundry/docs/current/reference/htmlsingle).
Don’t worry about creating a Redis service. We won’t need it. If you are familiar with Cloud
Foundry application manifests, we recommend creating a manifest for the the Data Flow
server as shown here
(https://docs.spring.io/spring-cloud-dataflow-server-
cloudfoundry/docs/current/reference/htmlsingle/#sample-manifest-template)
.

As of this writing, there is a typo on the SPRING_APPLICATION_JSON entry in


the sample manifest. SPRING_APPLICATION_JSON must be followed by : and
the JSON string must be wrapped in single quotes. Alternatively, you can
replace that line with MAVEN_REMOTE_REPOSITORIES_REPO1_URL:
repo.spring.io/libs-snapshot . If your Cloud Foundry installation is
 behind a firewall, you may need to install the stream apps used in this sample
in your internal Maven repository and configure
(https://docs.spring.io/spring-cloud-dataflow/docs/1.3.0.M2/reference/htmlsingle/#getting-
started-maven-configuration)
the server to access that repository.

3. Once you have successfully executed cf push , verify the dataflow server is running

$ cf apps
Getting apps in org [your-org] / space [your-space] as user...
OK

name requested state instances memory disk urls


dataflow-server started 1/1 1G 1G dataflow-
server.app.io

4. Notice that the dataflow-server application is started and ready for interaction via the url
endpoint

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 76/144
11/22/2018 Spring Cloud Data Flow Samples

5. Connect the shell with server running on Cloud Foundry, e.g., dataflow-server.app.io

$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR>
$ java -jar spring-cloud-dataflow-shell-<VERSION>.jar

____ ____ _ __
/ ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| |
\___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` |
___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| |
|____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_|
____ |_| _ __|___/ __________
| _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \
| | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \
| |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / /
|____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/

Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
server-unknown:>

server-unknown:>dataflow config server http://dataflow-server.app.io


Successfully targeted http://dataflow-server.app.io
dataflow:>

4.1.2. Building and Running the Demo

PCF 1.7.12 or greater is required to run Tasks on Spring Cloud Data Flow. As of
 this writing, PCFDev and PWS supports builds upon this version.

1. Task support needs to be enabled on pcf-dev. Being logged as admin , issue the following
command:

cf enable-feature-flag task_creation
Setting status of task_creation as admin...

OK

Feature task_creation Enabled.

For this sample, all you need is the mysql service and in PCFDev, the mysql
service comes with a different plan. From CF CLI, create the service by: cf
 create-service p-mysql 512mb mysql and bind this service to dataflow-
server by: cf bind-service dataflow-server mysql .

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 77/144
11/22/2018 Spring Cloud Data Flow Samples

All the apps deployed to PCFDev start with low memory by default. It is
recommended to change it to at least 768MB for dataflow-server . Ditto for
every app spawned by Spring Cloud Data Flow. Change the memory by: cf

 set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_MEMORY


512 . Likewise, we would have to skip SSL validation by: cf set-env
dataflow-server
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SKIP_SSL_VALIDATION true .

2. Tasks in Spring Cloud Data Flow require an RDBMS to host "task repository" (see here
(http://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#spring-cloud-dataflow-task-
repository)
for more details), so let’s instruct the Spring Cloud Data Flow server to bind the mysql service
to each deployed task:

$ cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_SERVICES mysql


$ cf restage dataflow-server

 We only need mysql service for this sample.

3. As a recap, here is what you should see as configuration for the Spring Cloud Data Flow
server:

cf env dataflow-server

....
User-Provided:
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_DOMAIN: local.pcfdev.io
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_MEMORY: 512
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_ORG: pcfdev-org
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_PASSWORD: pass
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SKIP_SSL_VALIDATION: false
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SPACE: pcfdev-space
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_SERVICES: mysql
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_URL: https://api.local.pcfdev.io
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_USERNAME: user

No running env variables have been set

No staging env variables have been set

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 78/144
11/22/2018 Spring Cloud Data Flow Samples

4. Notice that dataflow-server application is started and ready for interaction via dataflow-
server.local.pcfdev.io endpoint

5. Build and register the batch-job example


(https://github.com/spring-cloud/spring-cloud-task/tree/master/spring-cloud-task-samples/batch-job) from
Spring Cloud Task samples. For convenience, the final uber-jar artifact
(https://github.com/spring-cloud/spring-cloud-dataflow-samples/raw/master/src/main/asciidoc/tasks/simple-
batch-job/batch-job-1.3.0.BUILD-SNAPSHOT.jar)
is provided with this sample.

dataflow:>app register --type task --name simple_batch_job --uri


https://github.com/spring-cloud/spring-cloud-dataflow-
samples/raw/master/src/main/asciidoc/tasks/simple-batch-job/batch-job-1.3.0.BUILD-
SNAPSHOT.jar

6. Create the task with simple-batch-job application

dataflow:>task create foo --definition "simple_batch_job"

Unlike Streams, the Task definitions don’t require explicit deployment. They
 can be launched on-demand, scheduled, or triggered by streams.

7. Verify there’s still no Task applications running on PCFDev - they are listed only after the
initial launch/staging attempt on PCF

$ cf apps
Getting apps in org pcfdev-org / space pcfdev-space as user...
OK

name requested state instances memory disk urls


dataflow-server started 1/1 768M 512M dataflow-
server.local.pcfdev.io

8. Let’s launch foo

dataflow:>task launch foo

9. Verify the execution of foo by tailing the logs

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 79/144
11/22/2018 Spring Cloud Data Flow Samples

CONSOLE
$ cf logs foo
Retrieving logs for app foo in org pcfdev-org / space pcfdev-space as user...

2016-08-14T18:48:54.22-0700 [APP/TASK/foo/0]OUT Creating container


2016-08-14T18:48:55.47-0700 [APP/TASK/foo/0]OUT

2016-08-14T18:49:06.59-0700 [APP/TASK/foo/0]OUT 2016-08-15 01:49:06.598 INFO 14 --- [

...
...

2016-08-14T18:49:06.78-0700 [APP/TASK/foo/0]OUT 2016-08-15 01:49:06.785 INFO 14 --- [

...
...

2016-08-14T18:49:07.36-0700 [APP/TASK/foo/0]OUT 2016-08-15 01:49:07.363 INFO 14 --- [

...
...

2016-08-14T18:49:07.53-0700 [APP/TASK/foo/0]OUT 2016-08-15 01:49:07.536 INFO 14 --- [

...
...

2016-08-14T18:49:07.71-0700 [APP/TASK/foo/0]OUT Exit status 0


2016-08-14T18:49:07.78-0700 [APP/TASK/foo/0]OUT Destroying container
2016-08-14T18:49:08.47-0700 [APP/TASK/foo/0]OUT Successfully destroyed container

Verify job1 and job2 operations embedded in simple-batch-job

 application are launched independently and they returned with the status
COMPLETED .

Unlike LRPs in Cloud Foundry, tasks are short-lived, so the logs aren’t always
available. They are generated only when the Task application runs; at the end
 of Task operation, the container that ran the Task application is destroyed to
free-up resources.

10. List Tasks in Cloud Foundry

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 80/144
11/22/2018 Spring Cloud Data Flow Samples

$ cf apps
Getting apps in org pcfdev-org / space pcfdev-space as user...
OK

name requested state instances memory disk urls


dataflow-server started 1/1 768M 512M dataflow-
server.local.pcfdev.io
foo stopped 0/1 1G 1G

11. Verify Task execution details


CONSOLE
dataflow:>task execution list
╔══════════════════════════╤══╤════════════════════════════╤════════════════════════════╤════════
║ Task Name │ID│ Start Time │ End Time │
╠══════════════════════════╪══╪════════════════════════════╪════════════════════════════╪════════
║foo │1 │Sun Aug 14 18:49:05 PDT 2016│Sun Aug 14 18:49:07 PDT 2016│
╚══════════════════════════╧══╧════════════════════════════╧════════════════════════════╧════════

12. Verify Job execution details


CONSOLE
dataflow:>job execution list
╔═══╤═══════╤═════════╤════════════════════════════╤═════════════════════╤══════════════════╗
║ID │Task ID│Job Name │ Start Time │Step Execution Count │Definition Stat
╠═══╪═══════╪═════════╪════════════════════════════╪═════════════════════╪══════════════════╣
║2 │1 │job2 │Sun Aug 14 18:49:07 PDT 2016│1 │Destroyed
║1 │1 │job1 │Sun Aug 14 18:49:06 PDT 2016│1 │Destroyed
╚═══╧═══════╧═════════╧════════════════════════════╧═════════════════════╧══════════════════╝

4.1.3. Summary
In this sample, you have learned:

How to register and orchestrate Spring Batch jobs in Spring Cloud Data Flow

How to use the cf CLI in the context of Task applications orchestrated by Spring Cloud Data
Flow

How to verify task executions and task repository

4.2. Batch File Ingest

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 81/144
11/22/2018 Spring Cloud Data Flow Samples

In this demonstration, you will learn how to create a data processing application using Spring
Batch (http://projects.spring.io/spring-batch/) which will then be run within Spring Cloud Data Flow
(http://cloud.spring.io/spring-cloud-dataflow/).

4.2.1. Prerequisites
A Running Data Flow Server

The Local Data Flow Server is Spring Boot application available for download
(http://cloud.spring.io/spring-cloud-dataflow/#platform-implementations) or you can build
(https://github.com/spring-cloud/spring-cloud-dataflow) it yourself. If you build it yourself, the
executable jar will be in spring-cloud-dataflow-server-local/target

To run the Local Data Flow server Open a new terminal session:

$cd <PATH/TO/SPRING-CLOUD-DATAFLOW-LOCAL-JAR>
$java -jar spring-cloud-dataflow-server-local-<VERSION>.jar

A Running Data Flow Shell

The Spring Cloud Data Flow Shell is available for download


(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-started-deploying-
spring-cloud-dataflow)
or you can build (https://github.com/spring-cloud/spring-cloud-dataflow) it yourself.

the Spring Cloud Data Flow Shell and Local server implementation are in the
same repository and are both built by running ./mvnw install from the project
 root directory. If you have already run the build, use the jar in spring-cloud-
dataflow-shell/target

To run the Shell open a new terminal session:

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 82/144
11/22/2018 Spring Cloud Data Flow Samples

$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR>
$ java -jar spring-cloud-dataflow-shell-<VERSION>.jar
____ ____ _ __
/ ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| |
\___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` |
___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| |
|____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_|
____ |_| _ __|___/ __________
| _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \
| | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \
| |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / /
|____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/

Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
dataflow:>

The Spring Cloud Data Flow Shell is a Spring Boot application that connects to the
Data Flow Server’s REST API and supports a DSL that simplifies the process of

 defining a stream or task and managing its lifecycle. Most of these samples use
the shell. If you prefer, you can use the Data Flow UI localhost:9393/dashboard,
(or wherever it the server is hosted) to perform equivalent operations.

4.2.2. Batch File Ingest Demo Overview


The source for the demo project is located in here
(https://github.com/spring-cloud/spring-cloud-dataflow-samples/tree/master/batch/file-ingest). The sample is
a Spring Boot application that demonstrates how to read data from a flat file, perform processing
on the records, and store the transformed data into a database using Spring Batch.

The key classes for creating the batch job are:

BatchConfiguration.java - this is where we define our batch job, the step and components
that are used read, process, and write our data. In the sample we use a FlatFileItemReader
which reads a delimited file, a custom PersonItemProcessor to transform the data, and a
JdbcBatchItemWriter to write our data to a database.

Person.java - the domain object representing the data we are reading and processing in our
batch job. The sample data contains records made up of a persons first and last name.

PersonItemProcessor.java - this class is an ItemProcessor implementation which


receives records after they have been read and before they are written. This allows us to
transform the data between these two steps. In our sample ItemProcessor implementation,

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 83/144
11/22/2018 Spring Cloud Data Flow Samples

we simply transform the first and last name of each Person to uppercase characters.

Application.java - the main entry point into the Spring Boot application which is used to
launch the batch job

Resource files are included to set up the database and provide sample data:

schema-all.sql - this is the database schema that will be created when the application starts
up. In this sample, an in-memory database is created on start up and destroyed when the
application exits.

data.csv - sample data file containing person records used in the demo

This example expects to use the Spring Cloud Data Flow Server’s embedded H2

 database. If you wish to use another repository, be sure to add the correct
dependencies to the pom.xml and update the schema-all.sql.

4.2.3. Building and Running the Demo


1. Build the demo JAR

$ mvn clean package

2. Register the task

dataflow:>app register --name fileIngest --type task --uri


file:///path/to/target/ingest-X.X.X.jar
Successfully registered application 'task:fileIngest'
dataflow:>

3. Create the task

dataflow:>task create fileIngestTask --definition fileIngest


Created new task 'fileIngestTask'
dataflow:>

4. Launch the task

dataflow:>task launch fileIngestTask --arguments "localFilePath=classpath:data.csv"


Launched task 'fileIngestTask'
dataflow:>

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 84/144
11/22/2018 Spring Cloud Data Flow Samples

5. Inspect logs

The log file path for the launched task can be found in the local server output, for example:
CONSOLE
2017-10-27 14:58:18.112 INFO 19485 --- [nio-9393-exec-6] o.s.c.d.spi.local.LocalTaskLaun
Logs will be in /var/folders/6x/tgtx9xbn0x16xq2sx1j2rld80000gn/T/spring-cloud-dataflow

6. Verify Task execution details


CONSOLE
dataflow:>task execution list
╔══════════════╤══╤════════════════════════════╤════════════════════════════╤═════════╗
║ Task Name │ID│ Start Time │ End Time │Exit Code║
╠══════════════╪══╪════════════════════════════╪════════════════════════════╪═════════╣
║fileIngestTask│1 │Fri Oct 27 14:58:20 EDT 2017│Fri Oct 27 14:58:20 EDT 2017│0 ║
╚══════════════╧══╧════════════════════════════╧════════════════════════════╧═════════╝

7. Verify Job execution details


CONSOLE
dataflow:>job execution list
╔═══╤═══════╤═════════╤════════════════════════════╤═════════════════════╤══════════════════╗
║ID │Task ID│Job Name │ Start Time │Step Execution Count │Definition Stat
╠═══╪═══════╪═════════╪════════════════════════════╪═════════════════════╪══════════════════╣
║1 │1 │ingestJob│Fri Oct 27 14:58:20 EDT 2017│1 │Created
╚═══╧═══════╧═════════╧════════════════════════════╧═════════════════════╧══════════════════╝

4.2.4. Summary
In this sample, you have learned:

How to create a data processing batch job application

How to register and orchestrate Spring Batch jobs in Spring Cloud Data Flow

How to verify status via logs and shell commands

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 85/144
11/22/2018 Spring Cloud Data Flow Samples

5. Stream Launching Batch Job


5.1. Batch File Ingest - SFTP Demo
In the Batch File Ingest sample we built a Spring Batch (https://projects.spring.io/spring-batch)
application that Spring Cloud Data Flow (https://cloud.spring.io/spring-cloud-dataflow) launched as a
task to process a file. This time we will build on that sample to create and deploy a stream
(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#spring-cloud-dataflow-
streams)
that launches that task. The stream will poll an SFTP server and, for each new file that it finds,
will download the file and launch the batch job to process it.

The source for the demo project is located in the batch/file-ingest directory at the top-level
of this repository.

5.1.1. Prerequisites
A Running Data Flow Shell

The Spring Cloud Data Flow Shell is available for download


(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-started-deploying-
spring-cloud-dataflow)
or you can build (https://github.com/spring-cloud/spring-cloud-dataflow) it yourself.

the Spring Cloud Data Flow Shell and Local server implementation are in the
same repository and are both built by running ./mvnw install from the project
 root directory. If you have already run the build, use the jar in spring-cloud-
dataflow-shell/target

To run the Shell open a new terminal session:

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 86/144
11/22/2018 Spring Cloud Data Flow Samples

$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR>
$ java -jar spring-cloud-dataflow-shell-<VERSION>.jar
____ ____ _ __
/ ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| |
\___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` |
___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| |
|____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_|
____ |_| _ __|___/ __________
| _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \
| | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \
| |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / /
|____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/

Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
dataflow:>

The Spring Cloud Data Flow Shell is a Spring Boot application that connects to the
Data Flow Server’s REST API and supports a DSL that simplifies the process of

 defining a stream or task and managing its lifecycle. Most of these samples use
the shell. If you prefer, you can use the Data Flow UI localhost:9393/dashboard,
(or wherever it the server is hosted) to perform equivalent operations.

5.1.2. Using the Local Server

Additional Prerequisites
A running local Data Flow Server

The Local Data Flow Server is Spring Boot application available for download
(http://cloud.spring.io/spring-cloud-dataflow/#platform-implementations) or you can build
(https://github.com/spring-cloud/spring-cloud-dataflow) it yourself. If you build it yourself, the
executable jar will be in spring-cloud-dataflow-server-local/target

To run the Local Data Flow server Open a new terminal session:

$cd <PATH/TO/SPRING-CLOUD-DATAFLOW-LOCAL-JAR>
$java -jar spring-cloud-dataflow-server-local-<VERSION>.jar

Running instance of Kafka (http://kafka.apache.org/downloads.html)

Either a remote or local host accepting SFTP connections.

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 87/144
11/22/2018 Spring Cloud Data Flow Samples

A database tool such as DBeaver (https://dbeaver.jkiss.org/download/) to inspect the database


contents

To simplify the dependencies and configuration in this example, we will use our
 local machine acting as an SFTP server.

Building and Running the Demo


1. Build the demo JAR

From the root of this project:

$ cd batch/file-ingest
$ mvn clean package

For convenience, you can skip this step. The jar is published to the Spring
Maven repository
 (https://repo.spring.io/libs-snapshot-
local/io/spring/cloud/dataflow/ingest/ingest/1.0.0.BUILD-SNAPSHOT/)

2. Create the data directories

Now we create a remote directory on the SFTP server and a local directory where the batch
job expects to find files.

If you are using a remote SFTP server, create the remote directory on the

 SFTP server. Since we are using the local machine as the SFTP server, we will
create both the local and remote directories on the local machine.

$ mkdir -p /tmp/remote-files /tmp/local-files

3. Register the sftp-dataflow source and the task-launcher-dataflow sink

With our Spring Cloud Data Flow server running, we register the sftp-dataflow source and
task-launcher-dataflow sink. The sftp-dataflow source application will do the work of
polling the remote directory for new files and downloading them to the local directory. As
each file is received, it emits a message for the task-launcher-dataflow sink to launch the
task to process the data from that file.

In the Spring Cloud Data Flow shell:

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 88/144
11/22/2018 Spring Cloud Data Flow Samples

CONSOLE
dataflow:>app register --name sftp --type source --uri maven://org.springframework.cloud
Successfully registered application 'source:sftp'
dataflow:>app register --name task-launcher --type sink --uri maven://org.springframework
Successfully registered application 'sink:task-launcher'

4. Register and create the file ingest task. If you’re using the published jar, set --uri
maven://io.spring.cloud.dataflow.ingest:ingest:1.0.0.BUILD-SNAPSHOT :

CONSOLE
dataflow:>app register --name fileIngest --type task --uri file:///path/to/target/ingest-
Successfully registered application 'task:fileIngest'
dataflow:>task create fileIngestTask --definition fileIngest
Created new task 'fileIngestTask'

5. Create and deploy the stream

Now lets create and deploy the stream. Once deployed, the stream will start polling the SFTP
server and, when new files arrive, launch the batch job.

Replace <user> and '<pass>` below. The <username> and <password>


values are the credentials for the local (or remote) user. If not using a local

 SFTP server, specify the host using the --host , and optionally --port ,
parameters. If not defined, host defaults to 127.0.0.1 and port defaults
to 22 .

CONSOLE
dataflow:>stream create --name inboundSftp --definition "sftp --username=<user> --passwor
Created new stream 'inboundSftp'
Deployment request has been sent

6. Verify Stream deployment

We can see the status of the streams to be deployed with stream list , for example:

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 89/144
11/22/2018 Spring Cloud Data Flow Samples

CONSOLE
dataflow:>stream list
╔═══════════╤════════════════════════════════════════════════════════════════════════════════════
║Stream Name│ Stream Definition
╠═══════════╪════════════════════════════════════════════════════════════════════════════════════
║inboundSftp│sftp --password='******' --remote-dir=/tmp/remote-files/ --local-dir=/tmp/lo
║ │--allow-unknown-keys=true --username=<user> | task-launcher
╚═══════════╧════════════════════════════════════════════════════════════════════════════════════

7. Inspect logs

In the event the stream failed to deploy, or you would like to inspect the logs for any reason,
you can get the location of the logs to applications created for the inboundSftp stream using
the runtime apps command:
CONSOLE
dataflow:>runtime apps
╔═══════════════════════════╤═══════════╤════════════════════════════════════════════════════════
║ App Id / Instance Id │Unit Status│
╠═══════════════════════════╪═══════════╪════════════════════════════════════════════════════════
║inboundSftp.sftp │ deployed │
╟┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┼┈┈┈┈┈┈┈┈┈┈┈┼┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
║ │ │ guid = 23057
║ │ │ pid = 71927
║ │ │ port = 23057
║inboundSftp.sftp-0 │ deployed │ stderr = /var/folders/hd/5yqz2v2d3sxd3n879f
║ │ │ stdout = /var/folders/hd/5yqz2v2d3sxd3n879f
║ │ │ url = http://192.168.64.1:23057
║ │ │working.dir = /var/folders/hd/5yqz2v2d3sxd3n879f
╟───────────────────────────┼───────────┼────────────────────────────────────────────────────────
║inboundSftp.task-launcher │ deployed │
╟┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┼┈┈┈┈┈┈┈┈┈┈┈┼┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
║ │ │ guid = 60081
║ │ │ pid = 71926
║ │ │ port = 60081
║inboundSftp.task-launcher-0│ deployed │ stderr = /var/folders/hd/5yqz2v2d3sxd3n879f
║ │ │ stdout = /var/folders/hd/5yqz2v2d3sxd3n879f
║ │ │ url = http://192.168.64.1:60081
║ │ │working.dir = /var/folders/hd/5yqz2v2d3sxd3n879f
╚═══════════════════════════╧═══════════╧════════════════════════════════════════════════════════

8. Add data

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 90/144
11/22/2018 Spring Cloud Data Flow Samples

Normally data would be uploaded to an SFTP server. We will simulate this by copying a file
into the directory specified by --remote-dir . Sample data can be found in the data/
directory of the Batch File Ingest project.

Copy data/name-list.csv into the /tmp/remote-files directory which the SFTP source is
monitoring. When this file is detected, the sftp source will download it to the /tmp/local-
files directory specified by --local-dir , and emit a Task Launch Request. The Task
Launch Request includes the name of the task to launch along with the local file path, given as
the command line argument localFilePath . Spring Batch binds each command line
argument to a corresponding JobParameter. The FileIngestTask job processes the file given by
the JobParameter named localFilePath . The task-launcher sink polls for messages using
an exponential back-off. Since there have not been any recent requests, the task will launch
within 30 seconds after the request is published.

$ cp data/name-list.csv /tmp/remote-files

When the batch job launches, you will see something like this in the SCDF console log:
CONSOLE
2018-10-26 16:47:24.879 INFO 86034 --- [nio-9393-exec-7] o.s.c.d.spi.local.LocalTaskLaun
2018-10-26 16:47:25.100 INFO 86034 --- [nio-9393-exec-7] o.s.c.d.spi.local.LocalTaskLaun
Logs will be in /var/folders/hd/5yqz2v2d3sxd3n879f4sg4gr0000gn/T/fileIngestTask3100511

9. Inspect Job Executions

After data is received and the batch job runs, it will be recorded as a Job Execution. We can
view job executions by for example issuing the following command in the Spring Cloud Data
Flow shell:
CONSOLE
dataflow:>job execution list
╔═══╤═══════╤═════════╤════════════════════════════╤═════════════════════╤══════════════════╗
║ID │Task ID│Job Name │ Start Time │Step Execution Count │Definition Stat
╠═══╪═══════╪═════════╪════════════════════════════╪═════════════════════╪══════════════════╣
║1 │1 │ingestJob│Tue May 01 23:34:05 EDT 2018│1 │Created
╚═══╧═══════╧═════════╧════════════════════════════╧═════════════════════╧══════════════════╝

As well as list more details about that specific job execution:

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 91/144
11/22/2018 Spring Cloud Data Flow Samples

CONSOLE
dataflow:>job execution display --id 1
╔═══════════════════════════════════════╤══════════════════════════════╗
║ Key │ Value ║
╠═══════════════════════════════════════╪══════════════════════════════╣
║Job Execution Id │1 ║
║Task Execution Id │1 ║
║Task Instance Id │1 ║
║Job Name │ingestJob ║
║Create Time │Fri Oct 26 16:57:51 EDT 2018 ║
║Start Time │Fri Oct 26 16:57:51 EDT 2018 ║
║End Time │Fri Oct 26 16:57:53 EDT 2018 ║
║Running │false ║
║Stopping │false ║
║Step Execution Count │1 ║
║Execution Status │COMPLETED ║
║Exit Status │COMPLETED ║
║Exit Message │ ║
║Definition Status │Created ║
║Job Parameters │ ║
║-spring.cloud.task.executionid(STRING) │1 ║
║run.id(LONG) │1 ║
║localFilePath(STRING) │/tmp/local-files/name-list.csv║
╚═══════════════════════════════════════╧══════════════════════════════╝

10. Verify data

When the the batch job runs, it processes the file in the local directory /tmp/local-files
and transforms each item to uppercase names and inserts it into the database.

You may use any database tool that supports the H2 database to inspect the data. In this
example we use the database tool DBeaver . Lets inspect the table to ensure our data was
processed correctly.

Within DBeaver, create a connection to the database using the JDBC URL
jdbc:h2:tcp://localhost:19092/mem:dataflow , and user sa with no password. When
connected, expand the PUBLIC schema, then expand Tables and then double click on the
table PEOPLE . When the table data loads, click the "Data" tab to view the data.

11. You’re done!

5.1.3. Using the Cloud Foundry Server

Additional Prerequisites

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 92/144
11/22/2018 Spring Cloud Data Flow Samples

Running this demo in Cloud Foundry requires a shared file system that is
accessed by apps running in different containers. This feature is provided by NFS
Volume Services

 (https://docs.pivotal.io/pivotalcf/2-3/devguide/services/using-vol-services.html). To use
Volume Services with SCDF, it is required that we provide nfs configuration via
cf create-service rather than cf bind-service . Cloud Foundry introduced
the cf create-service configuration option for Volume Services in version 2.3.

A Cloud Foundry instance v2.3+ with NFS Volume Services enabled


(https://docs.pivotal.io/pivotalcf/2-3/opsguide/enable-vol-services.html)

An SFTP server accessible from the Cloud Foundry instance

An nfs service instance properly configured

For this example, we use an NFS host configured to allow read-write access
(https://www.tldp.org/HOWTO/NFS-HOWTO/server.html) to the Cloud Foundry instance.
Create the nfs service instance using a command as below, where share
 specifies the NFS host and shared directory( /export ), uid an gid specify an
account that has read-write access to the shared directory, and mount is the
container’s mount path for each application bound to nfs :

$ cf create-service nfs Existing nfs -c '{"share":"<nfs_host_ip>/export","uid":"


<uid>","gid":"<gid>", "mount":"/var/scdf"}'

A mysql service instance

A rabbit service instance

PivotalMySQLWeb (https://github.com/pivotal-cf/PivotalMySQLWeb) or another database tool to view


the data

The Spring Cloud Data Flow Cloud Foundry Server

The Cloud Foundry Data Flow Server is Spring Boot application available for download
(http://cloud.spring.io/spring-cloud-dataflow/#platform-implementations/) or you can build
(https://github.com/spring-cloud/spring-cloud-dataflow-server-cloudfoundry) it yourself. If you build it
yourself, the executable jar will be in spring-cloud-dataflow-server-cloudfoundry/target

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 93/144
11/22/2018 Spring Cloud Data Flow Samples

Although you can run the Data Flow Cloud Foundry Server locally and configure

 it to deploy to any Cloud Foundry instance, we will deploy the server to Cloud
Foundry as recommended.

1. Verify that CF instance is reachable (Your endpoint urls will be different from what is shown
here).

$ cf api
API endpoint: https://api.system.io (API version: ...)

$ cf apps
Getting apps in org [your-org] / space [your-space] as user...
OK

No apps found

2. Follow the instructions to deploy the Spring Cloud Data Flow Cloud Foundry server
(https://docs.spring.io/spring-cloud-dataflow-server-cloudfoundry/docs/current/reference/htmlsingle).
Don’t worry about creating a Redis service. We won’t need it. If you are familiar with Cloud
Foundry application manifests, we recommend creating a manifest for the the Data Flow
server as shown here
(https://docs.spring.io/spring-cloud-dataflow-server-
cloudfoundry/docs/current/reference/htmlsingle/#sample-manifest-template)
.

As of this writing, there is a typo on the SPRING_APPLICATION_JSON entry in


the sample manifest. SPRING_APPLICATION_JSON must be followed by : and
the JSON string must be wrapped in single quotes. Alternatively, you can
replace that line with MAVEN_REMOTE_REPOSITORIES_REPO1_URL:
repo.spring.io/libs-snapshot . If your Cloud Foundry installation is
 behind a firewall, you may need to install the stream apps used in this sample
in your internal Maven repository and configure
(https://docs.spring.io/spring-cloud-dataflow/docs/1.3.0.M2/reference/htmlsingle/#getting-
started-maven-configuration)
the server to access that repository.

3. Once you have successfully executed cf push , verify the dataflow server is running

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 94/144
11/22/2018 Spring Cloud Data Flow Samples

$ cf apps
Getting apps in org [your-org] / space [your-space] as user...
OK

name requested state instances memory disk urls


dataflow-server started 1/1 1G 1G dataflow-
server.app.io

4. Notice that the dataflow-server application is started and ready for interaction via the url
endpoint

5. Connect the shell with server running on Cloud Foundry, e.g., dataflow-server.app.io

$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR>
$ java -jar spring-cloud-dataflow-shell-<VERSION>.jar

____ ____ _ __
/ ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| |
\___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` |
___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| |
|____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_|
____ |_| _ __|___/ __________
| _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \
| | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \
| |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / /
|____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/

Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
server-unknown:>

server-unknown:>dataflow config server http://dataflow-server.app.io


Successfully targeted http://dataflow-server.app.io
dataflow:>

Con guring the SCDF server


For convenience, we will configure the SCDF server to bind all stream and task apps to the nfs
service. Using the Cloud Foundry CLI, set the following environment variables (or set them in the
manifest):

cf set-env <dataflow-server-app-name> SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_SERVICES


rabbitmq,nfs
cf set-env <dataflow-server-app-name> SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_SERVICES
mysql,nfs

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 95/144
11/22/2018 Spring Cloud Data Flow Samples

Normally, for security and operational efficiency, we may want more fine grained

 control of which apps bind to the nfs service. One way to do this is to set
deployment properties when creating and deploying the stream, as shown below.

Running the Demo


The source code for the Batch File Ingest batch job is located in batch/file-ingest . The
resulting executable jar file must be available in a location that is accessible to your Cloud
Foundry instance, such as an HTTP server or Maven repository. For convenience, the jar is
published to the Spring Maven repository
(https://repo.spring.io/libs-snapshot-local/io/spring/cloud/dataflow/ingest/ingest/1.0.0.BUILD-SNAPSHOT/)

1. Create the remote directory

Create a directory on the SFTP server where the sftp source will detect files and download
them for processing. This path must exist prior to running the demo and can be any location
that is accessible by the configured SFTP user. On the SFTP server create a directory called
remote-files , for example:

sftp> mkdir remote-files

2. Create a shared NFS directory

Create a directory on the NFS server that is accessible to the user, specified by uid and gid ,
used to create the nfs service:

$ sudo mkdir /export/shared-files


$ sudo chown <uid>:<gid> /export/shared-files

3. Register the sftp-dataflow source and the tasklauncher-dataflow sink

With our Spring Cloud Data Flow server running, we register the sftp-dataflow source and
task-launcher-dataflow sink. The sftp-dataflow source application will do the work of
polling the remote directory for new files and downloading them to the local directory. As
each file is received, it emits a message for the task-launcher-dataflow sink to launch the
task to process the data from that file.

In the Spring Cloud Data Flow shell:

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 96/144
11/22/2018 Spring Cloud Data Flow Samples

CONSOLE
dataflow:>app register --name sftp --type source --uri maven://org.springframework.cloud
Successfully registered application 'source:sftp'
dataflow:>app register --name task-launcher --type sink --uri maven://org.springframework
Successfully registered application 'sink:task-launcher'

4. Register and create the file ingest task:


CONSOLE
dataflow:>app register --name fileIngest --type task --uri maven://io.spring.cloud.datafl
Successfully registered application 'task:fileIngest'
dataflow:>task create fileIngestTask --definition fileIngest
Created new task 'fileIngestTask'

5. Create and deploy the stream

Now lets create and deploy the stream. Once deployed, the stream will start polling the SFTP
server and, when new files arrive, launch the batch job.

Replace <user> , '<pass>`, and <host> below. The <host> is the SFTP
server host, <user> and <password> values are the credentials for the
remote user. Additionally, replace --

 spring.cloud.dataflow.client.server-uri=http://<dataflow-server-
route> with the URL of your dataflow server, as shown by cf apps . If you
have security enabled for the SCDF server, set the appropriate
spring.cloud.dataflow.client options.

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 97/144
11/22/2018 Spring Cloud Data Flow Samples

CONSOLE
dataflow:> app info --name task-launcher --type sink
╔══════════════════════════════╤══════════════════════════════╤══════════════════════════════╤═══
║ Option Name │ Description │ Default
╠══════════════════════════════╪══════════════════════════════╪══════════════════════════════╪═══
║spring.cloud.dataflow.client.a│The login username. │<none>
║uthentication.basic.username │ │
║spring.cloud.dataflow.client.a│The login password. │<none>
║uthentication.basic.password │ │
║trigger.max-period │The maximum polling period in │30000
║ │milliseconds. Will be set to │
║ │period if period > maxPeriod. │
║trigger.period │The polling period in │1000
║ │milliseconds. │
║trigger.initial-delay │The initial delay in │1000
║ │milliseconds. │
║spring.cloud.dataflow.client.s│Skip Ssl validation. │true
║kip-ssl-validation │ │
║spring.cloud.dataflow.client.e│Enable Data Flow DSL access. │false
║nable-dsl │ │
║spring.cloud.dataflow.client.s│The Data Flow server URI. │http://localhost:9393
║erver-uri │ │
╚══════════════════════════════╧══════════════════════════════╧══════════════════════════════╧═══

Since we configured the SCDF server to bind all stream and task apps to the nfs service, no
deployment parameters are required.

CONSOLE
dataflow:>stream create inboundSftp --definition "sftp --username=<user> --password=<pass
Created new stream 'inboundSftp'
dataflow:>stream deploy inboundSftp
Deployment request has been sent for stream 'inboundSftp'

Alternatively, we can bind the nfs service to the fileIngestTask by passing deployment
properties to the task via the task launch request in the stream definition: --
task.launch.request.deployment-
properties=deployer.*.cloudfoundry.services=nfs

CONSOLE
dataflow:>stream deploy inboundSftp --properties "deployer.sftp.cloudfoundry.services=nfs

6. Verify Stream deployment

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 98/144
11/22/2018 Spring Cloud Data Flow Samples

The status of the stream to be deployed can be queried with stream list , for example:
CONSOLE
dataflow:>stream list
╔═══════════╤════════════════════════════════════════════════════════════════════════════════════
║Stream Name│ Stream Defi
╠═══════════╪════════════════════════════════════════════════════════════════════════════════════
║inboundSftp│sftp --task.launch.request.deployment-properties='deployer.*.cloudfoundry.se
║ │--remote-dir=remote-files --local-dir=/var/scdf/shared-files/ --task.launch.
║ │--username=<user> | task-launcher --spring.cloud.dataflow.client.server-uri=
╚═══════════╧════════════════════════════════════════════════════════════════════════════════════

7. Inspect logs

In the event the stream failed to deploy, or you would like to inspect the logs for any reason,
the logs can be obtained from individual applications. First list the deployed apps:

CONSOLE
$ cf apps
Getting apps in org cf_org / space cf_space as cf_user...
OK

name requested state instan


dataflow-server started 1/1
dataflow-server-N5RYLDj-inboundSftp-sftp started 1/1
dataflow-server-N5RYLDj-inboundSftp-task-launcher-cloudfoundry started 1/1

In this example, the logs for the SFTP application can be viewed by:

CONSOLE
cf logs dataflow-server-N5RYLDj-inboundSftp-sftp --recent

The log files of this application would be useful to debug issues such as SFTP connection
failures.

Additionally, the logs for the task-launcher application can be viewed by:

cf logs dataflow-server-N5RYLDj-inboundSftp-task-launcher --recent

8. Add data

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 99/144
11/22/2018 Spring Cloud Data Flow Samples

Sample data can be found in the data/ directory of the Batch File Ingest project. Connect to
the SFTP server and upload data/name-list.csv into the remote-files directory. Copy
data/name-list.csv into the /tmp/remote-files directory which the SFTP source is
monitoring. When this file is detected, the sftp source will download it to the
/var/scdf/shared-files directory specified by --local-dir , and emit a Task Launch
Request. The Task Launch Request includes the name of the task to launch along with the
local file path, given as a command line argument. Spring Batch binds each command line
argument to a corresponding JobParameter. The FileIngestTask job processes the file given by
the JobParameter named localFilePath . The task-launcher sink polls for messages using
an exponential back-off. Since there have not been any recent requests, the task will launch
within 30 seconds after the request is published.

9. Inspect Job Executions

After data is received and the batch job runs, it will be recorded as a Job Execution. We can
view job executions by for example issuing the following command in the Spring Cloud Data
Flow shell:
CONSOLE
dataflow:>job execution list
╔═══╤═══════╤═════════╤════════════════════════════╤═════════════════════╤══════════════════╗
║ID │Task ID│Job Name │ Start Time │Step Execution Count │Definition Stat
╠═══╪═══════╪═════════╪════════════════════════════╪═════════════════════╪══════════════════╣
║1 │1 │ingestJob│Thu Jun 07 13:46:42 EDT 2018│1 │Created
╚═══╧═══════╧═════════╧════════════════════════════╧═════════════════════╧══════════════════╝

As well as list more details about that specific job execution:

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 100/144
11/22/2018 Spring Cloud Data Flow Samples

CONSOLE
dataflow:>job execution display --id 1
╔═══════════════════════════════════════════╤════════════════════════════════════╗
║ Key │ Value ║
╠═══════════════════════════════════════════╪════════════════════════════════════╣
║Job Execution Id │1 ║
║Task Execution Id │1 ║
║Task Instance Id │1 ║
║Job Name │ingestJob ║
║Create Time │Wed Oct 31 03:17:34 EDT 2018 ║
║Start Time │Wed Oct 31 03:17:34 EDT 2018 ║
║End Time │Wed Oct 31 03:17:34 EDT 2018 ║
║Running │false ║
║Stopping │false ║
║Step Execution Count │1 ║
║Execution Status │COMPLETED ║
║Exit Status │COMPLETED ║
║Exit Message │ ║
║Definition Status │Created ║
║Job Parameters │ ║
║-spring.cloud.task.executionid(STRING) │1 ║
║run.id(LONG) │1 ║
║localFilePath(STRING) │/var/scdf/shared-files/name_list.csv║
╚═══════════════════════════════════════════╧════════════════════════════════════╝

10. Verify data

When the the batch job runs, it processes the file in the local directory /var/scdf/shared-
files and transforms each item to uppercase names and inserts it into the database.

Use PivotalMySQLWeb (https://github.com/pivotal-cf/PivotalMySQLWeb) to inspect the data.

5.1.4. Limiting Concurrent Task Executions


The Batch File Ingest - SFTP Demo processes a single file with 5000+ items. What if we copy 100
files to the remote directory? The sftp source will process them immediately, generating 100 task
launch requests. The Dataflow Server launches tasks asynchronously so this could potentially
overwhelm the resources of the runtime platform. For example, when running the Data Flow
server on your local machine, each launched task creates a new JVM. In Cloud Foundry, each task
creates a new container instance.

Fortunately, Spring Cloud Data Flow 1.7 introduced new features to manage concurrently
running tasks, including a new configuration parameter,
spring.cloud.dataflow.task.maximum-concurrent-tasks , to limit the number of

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 101/144
11/22/2018 Spring Cloud Data Flow Samples

concurrently running tasks


(http://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#spring-cloud-dataflow-task-
limit-concurrent-executions)
. We can use this demo to see how this works.

Con guring the SCDF server


Set the maximum concurrent tasks to 3. For the local server, restart the server, adding a
command line argument --spring.cloud.dataflow.task.maximum-concurrent-tasks=3 .

For the Cloud Foundry server, cf set-env <dataflow-server>


SPRING_CLOUD_DATAFLOW_TASK_MAXIMUM_CONCURRENT_TASKS 3 , and restage.

Running the demo


Follow the main demo instructions but change the Add Data step, as described below.

1. Monitor the task launcher

Tail the logs on the task-launcher app.

If there are no requests in the input queue, you will see something like:

CONSOLE
07:42:51.760 INFO o.s.c.s.a.t.l.d.s.LaunchRequestConsumer : No task launch request rec
07:42:53.768 INFO o.s.c.s.a.t.l.d.s.LaunchRequestConsumer : No task launch request rec
07:42:57.780 INFO o.s.c.s.a.t.l.d.s.LaunchRequestConsumer : No task launch request rec
07:43:05.791 INFO o.s.c.s.a.t.l.d.s.LaunchRequestConsumer : No task launch request rec
07:43:21.801 INFO o.s.c.s.a.t.l.d.s.LaunchRequestConsumer : No task launch request rec
07:43:51.811 INFO o.s.c.s.a.t.l.d.s.LaunchRequestConsumer : No task launch request rec
07:44:21.824 INFO o.s.c.s.a.t.l.d.s.LaunchRequestConsumer : No task launch request rec
07:44:51.834 INFO o.s.c.s.a.t.l.d.s.LaunchRequestConsumer : No task launch request rec

The first three messages show the exponential backoff at start up or after processing the final
request. The the last three message show the task launcher in a steady state of polling for
messages every 30 seconds. Of course, these values are configurable.

The task launcher sink polls the input destination. The polling period adjusts according to the
presence of task launch requests and also to the number of currently running tasks reported
via the Data Flow server’s tasks/executions/current REST endpoint. The sink queries this
endpoint and will pause polling the input for new requests if the number of concurrent tasks
is at its limit. This introduces a 1-30 second lag between the creation of the task launch
request and the execution of the request, sacrificing some performance for resilience. Task

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 102/144
11/22/2018 Spring Cloud Data Flow Samples

launch requests will never be sent to a dead letter queue because the server is busy or
unavailable. The exponential backoff also prevents the app from querying the server
excessively when there are no task launch requests.

You can also monitor the Data Flow server:

CONSOLE
$ watch curl <dataflow-server-url>/tasks/executions/current
Every 2.0s: curl http://localhost:9393/tasks/executions/current

% Total % Received % Xferd


Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0100 53
{"maximumTaskExecutions":3,"runningExecutionCount":0}

2. Add Data

The directory batch/file-ingest/data/split contains the contents of batch/file-


ingest/data/name-list.csv split into 20 files, not 100 but enough to illustrate the concept.
Upload these files to the SFTP remote directory, e.g.,

sftp>cd remote-files
sftp>lcd batch/file-ingest/data/split
sftp>mput *

Or if using the local machine as the SFTP server:

>cp * /tmp/remote-files

In the task-launcher logs, you should now see:

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 103/144
11/22/2018 Spring Cloud Data Flow Samples

CONSOLE
INFO o.s.c.s.a.t.l.d.s.LaunchRequestConsumer : Polling period reset to 1000 ms.
INFO o.s.c.s.a.t.l.d.s.LaunchRequestConsumer : Launching Task fileIngestTask
WARN o.s.c.s.a.t.l.d.s.LaunchRequestConsumer : Data Flow server has reached its concurrent
INFO o.s.c.s.a.t.l.d.s.LaunchRequestConsumer : Polling paused- increasing polling period to
INFO o.s.c.s.a.t.l.d.s.LaunchRequestConsumer : Polling resumed
INFO o.s.c.s.a.t.l.d.s.LaunchRequestConsumer : Launching Task fileIngestTask
INFO o.s.c.s.a.t.l.d.s.LaunchRequestConsumer : Polling period reset to 1000 ms.
WARN o.s.c.s.a.t.l.d.s.LaunchRequestConsumer : Data Flow server has reached its concurrent
INFO o.s.c.s.a.t.l.d.s.LaunchRequestConsumer : Polling paused- increasing polling period to
INFO o.s.c.s.a.t.l.d.s.LaunchRequestConsumer : Polling resumed
INFO o.s.c.s.a.t.l.d.s.LaunchRequestConsumer : Launching Task fileIngestTask
INFO o.s.c.s.a.t.l.d.s.LaunchRequestConsumer : Polling period reset to 1000 ms.
INFO o.s.c.s.a.t.l.d.s.LaunchRequestConsumer : Launching Task fileIngestTask
INFO o.s.c.s.a.t.l.d.s.LaunchRequestConsumer : Launching Task fileIngestTask
WARN o.s.c.s.a.t.l.d.s.LaunchRequestConsumer : Data Flow server has reached its concurrent
INFO o.s.c.s.a.t.l.d.s.LaunchRequestConsumer : Polling paused- increasing polling period to
...

5.1.5. Avoid Duplicate Processing


The sftp source will not process files that it has already seen. It uses a Metadata Store
(https://docs.spring.io/spring-integration/docs/current/reference/html/system-management-
chapter.html#metadata-store)
to keep track of files by extracting content from messages at runtime. Out of the box, it uses an in-
memory Metadata Store. Thus, if we re-deploy the stream, this state is lost and files will be
reprocessed. Thanks to the magic of Spring, we can inject one of the available persistent Metadata
Stores.

In this example, we will use the JDBC Metadata Store


(https://github.com/spring-cloud-stream-app-starters/core/tree/master/common/app-starters-metadata-store-
common#jdbc)
since we are already using a database.

1. Configure and Build the SFTP source

For this we add some JDBC dependencies to the sftp-dataflow source.

Clone the sftp (https://github.com/spring-cloud-stream-app-starters/sftp) stream app starter. From the


sftp directory. Replace <binder> below with kafka or rabbit as appropriate for your
configuration:

$ ./mvnw clean install -DskipTests -PgenerateApps


$ cd apps/sftp-dataflow-source-<binder>

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 104/144
11/22/2018 Spring Cloud Data Flow Samples

Add the following dependencies to pom.xml :

<dependency>
<groupId>org.springframework.integration</groupId>
<artifactId>spring-integration-jdbc</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
</dependency>

If you are running on a local server with the in memory H2 database, set the JDBC url in
src/main/resources/application.properties to use the Data Flow server’s database:

spring.datasource.url=jdbc:h2:tcp://localhost:19092/mem:dataflow

If you are running in Cloud Foundry, we will bind the source to the mysql service. Add the
following property to src/main/resources/application.properties :

spring.integration.jdbc.initialize-schema=always

Build the app:

$./mvnw clean package

2. Register the jar

If running in Cloud Foundry, the resulting executable jar file must be available in a location
that is accessible to your Cloud Foundry instance, such as an HTTP server or Maven
repository. If running on a local server:

dataflow>app register --name sftp --type source --uri file:<project-


directory>/sftp/apps/sftp-dataflow-source-kafka/target/sftp-dataflow-source-kafka-
X.X.X.jar --force

3. Run the Demo

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 105/144
11/22/2018 Spring Cloud Data Flow Samples

Follow the instructions for building and running the main SFTP File Ingest demo, for your
preferred platform, up to the Add Data Step . If you have already completed the main
exercise, restore the data to its initial state, and redeploy the stream:

Clean the data directories (e.g., tmp/local-files and tmp/remote-files )

Execute the SQL command DROP TABLE PEOPLE; in the database

Undeploy the stream, and deploy it again to run the updated sftp source

If you are running in Cloud Foundry, set the deployment properties to bind sftp to the
mysql service. For example:

dataflow>stream deploy inboundSftp --properties


"deployer.sftp.cloudfoundry.services=nfs,mysql"

4. Add Data

Let’s use one small file for this. The directory batch/file-ingest/data/split contains the
contents of batch/file-ingest/data/name-list.csv split into 20 files. Upload one of
them:

sftp>cd remote-files
sftp>lcd batch/file-ingest/data/split
sftp>put names_aa.csv

Or if using the local machine as the SFTP server:

$cp names_aa.csv truncate INT_METADATA_STORE;

5. Inspect data

Using a Database browser, as described in the main demo, view the contents of the
INT_METADATA_STORE table.

Figure 1. JDBC Metadata Store

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 106/144
11/22/2018 Spring Cloud Data Flow Samples

Note that there is a single key-value pair, where the key identies the file name (the prefix
sftpSource/ provides a namespace for the sftp source app) and the value is a timestamp
indicating when the message was received. The metadata store tracks files that have already
been processed. This prevents the same files from being pulled every from the remote
directory on every polling cycle. Only new files, or files that have been updated will be
processed. Since there are no uniqueness constraints on the data, a file processed multiple
times by our batch job will result in duplicate entries.

If we view the PEOPLE table, it should look something like this:

Figure 2. People Data

Now let’s update the remote file, using SFTP put or if using the local machine as an SFTP
server:

$touch /tmp/remote-files/names_aa.csv

Now the PEOPLE table will have duplicate data. If you ORDER BY FIRST_NAME , you will see
something like this:

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 107/144
11/22/2018 Spring Cloud Data Flow Samples

Figure 3. People Data with Duplicates

Of course, if we drop another one of files into the remote directory, that will processed and we
will see another entry in the Metadata Store.

5.1.6. Summary
In this sample, you have learned:

How to process SFTP files with a batch job

How to create a stream to poll files on an SFTP server and launch a batch job

How to verify job status via logs and shell commands

How the Data Flow Task Launcher limits concurrent task executions

How to avoid duplicate processing of files

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 108/144
11/22/2018 Spring Cloud Data Flow Samples

6. Analytics
6.1. Twitter Analytics
In this demonstration, you will learn how to build a data pipeline using Spring Cloud Data Flow
(http://cloud.spring.io/spring-cloud-dataflow/) to consume data from TwitterStream and compute
simple analytics over data-in-transit using Field-Value-Counter.

We will take you through the steps to configure Spring Cloud Data Flow’s Local server.

6.1.1. Prerequisites
A Running Data Flow Shell

The Spring Cloud Data Flow Shell is available for download


(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-started-deploying-
spring-cloud-dataflow)
or you can build (https://github.com/spring-cloud/spring-cloud-dataflow) it yourself.

the Spring Cloud Data Flow Shell and Local server implementation are in the
same repository and are both built by running ./mvnw install from the project
 root directory. If you have already run the build, use the jar in spring-cloud-
dataflow-shell/target

To run the Shell open a new terminal session:

$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR>
$ java -jar spring-cloud-dataflow-shell-<VERSION>.jar
____ ____ _ __
/ ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| |
\___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` |
___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| |
|____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_|
____ |_| _ __|___/ __________
| _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \
| | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \
| |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / /
|____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/

Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
dataflow:>

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 109/144
11/22/2018 Spring Cloud Data Flow Samples

The Spring Cloud Data Flow Shell is a Spring Boot application that connects to the
Data Flow Server’s REST API and supports a DSL that simplifies the process of

 defining a stream or task and managing its lifecycle. Most of these samples use
the shell. If you prefer, you can use the Data Flow UI localhost:9393/dashboard,
(or wherever it the server is hosted) to perform equivalent operations.

A running local Data Flow Server

The Local Data Flow Server is Spring Boot application available for download
(http://cloud.spring.io/spring-cloud-dataflow/#platform-implementations) or you can build
(https://github.com/spring-cloud/spring-cloud-dataflow) it yourself. If you build it yourself, the
executable jar will be in spring-cloud-dataflow-server-local/target

To run the Local Data Flow server Open a new terminal session:

$cd <PATH/TO/SPRING-CLOUD-DATAFLOW-LOCAL-JAR>
$java -jar spring-cloud-dataflow-server-local-<VERSION>.jar

Running instance of Redis (http://redis.io/)

Running instance of Kafka (http://kafka.apache.org/downloads.html)

Twitter credentials from Twitter Developers (https://apps.twitter.com/) site

6.1.2. Building and Running the Demo


1. Register
(https://github.com/spring-cloud/spring-cloud-dataflow/blob/master/spring-cloud-dataflow-
docs/src/main/asciidoc/streams.adoc#register-a-stream-app)
the out-of-the-box applications for the Kafka binder

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 110/144
11/22/2018 Spring Cloud Data Flow Samples

These samples assume that the Data Flow Server can access a remote Maven
repository, repo.spring.io/libs-release by default. If your Data Flow
server is running behind a firewall, or you are using a maven proxy
preventing access to public repositories, you will need to install the sample
apps in your internal Maven repository and configure
(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-
started-maven-configuration)
the server accordingly. The sample applications are typically registered using
Data Flow’s bulk import facility. For example, the Shell command
dataflow:>app import --uri bit.ly/Celsius-SR1-stream-
applications-rabbit-maven (The actual URI is release and binder specific so
refer to the sample instructions for the actual URL). The bulk import URI
references a plain text file containing entries for all of the publicly available
Spring Cloud Stream and Task applications published to repo.spring.io .
 For example,
source.http=maven://org.springframework.cloud.stream.app:http-
source-rabbit:1.3.1.RELEASE registers the http source app at the
corresponding Maven address, relative to the remote repository(ies)
configured for the Data Flow server. The format is maven://<groupId>:
<artifactId>:<version> You will need to download
(https://repo.spring.io/libs-release/org/springframework/cloud/stream/app/spring-cloud-
stream-app-descriptor/Bacon.RELEASE/spring-cloud-stream-app-descriptor-
Bacon.RELEASE.rabbit-apps-maven-repo-url.properties)
the required apps or build (https://github.com/spring-cloud-stream-app-starters)
them and then install them in your Maven repository, using whatever group,
artifact, and version you choose. If you do this, register individual apps using
dataflow:>app register… using the maven:// resource URI format
corresponding to your installed app.

dataflow:>app import --uri http://bit.ly/Celsius-SR1-stream-applications-kafka-10-


maven

2. Create and deploy the following streams

dataflow:>stream create tweets --definition "twitterstream --consumerKey=


<CONSUMER_KEY> --consumerSecret=<CONSUMER_SECRET> --accessToken=<ACCESS_TOKEN> --
accessTokenSecret=<ACCESS_TOKEN_SECRET> | log"
Created new stream 'tweets'

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 111/144
11/22/2018 Spring Cloud Data Flow Samples

dataflow:>stream create tweetlang --definition ":tweets.twitterstream > field-value-


counter --fieldName=lang --name=language" --deploy
Created and deployed new stream 'tweetlang'

dataflow:>stream create tagcount --definition ":tweets.twitterstream > field-value-


counter --fieldName=entities.hashtags.text --name=hashtags" --deploy
Created and deployed new stream 'tagcount'

dataflow:>stream deploy tweets


Deployed stream 'tweets'

To get a consumerKey and consumerSecret you need to register a twitter


application. If you don’t already have one set up, you can create an app at the
Twitter Developers (https://apps.twitter.com/) site to get these credentials. The
 tokens <CONSUMER_KEY> , <CONSUMER_SECRET> , <ACCESS_TOKEN> , and
<ACCESS_TOKEN_SECRET> are required to be replaced with your account
credentials.

3. Verify the streams are successfully deployed. Where: (1) is the primary pipeline; (2) and (3)
are tapping the primary pipeline with the DSL syntax <stream-name>.<label/app name>
[e.x. :tweets.twitterstream ]; and (4) is the final deployment of primary pipeline

dataflow:>stream list

4. Notice that tweetlang.field-value-counter , tagcount.field-value-counter ,


tweets.log and tweets.twitterstream Spring Cloud Stream
(https://github.com/spring-cloud-stream-app-starters/) applications are running as Spring Boot
applications within the local-server .
CONSOLE
2016-02-16 11:43:26.174 INFO 10189 --- [nio-9393-exec-2] o.s.c.d.d.l.OutOfProcessModuleD
Logs will be in /var/folders/c3/ctx7_rns6x30tq7rb76wzqwr0000gp/T/spring-cloud-data-flo
2016-02-16 11:43:26.206 INFO 10189 --- [nio-9393-exec-3] o.s.c.d.d.l.OutOfProcessModuleD
Logs will be in /var/folders/c3/ctx7_rns6x30tq7rb76wzqwr0000gp/T/spring-cloud-data-flo
2016-02-16 11:43:26.806 INFO 10189 --- [nio-9393-exec-4] o.s.c.d.d.l.OutOfProcessModuleD
Logs will be in /var/folders/c3/ctx7_rns6x30tq7rb76wzqwr0000gp/T/spring-cloud-data-flo
2016-02-16 11:43:26.813 INFO 10189 --- [nio-9393-exec-4] o.s.c.d.d.l.OutOfProcessModuleD
Logs will be in /var/folders/c3/ctx7_rns6x30tq7rb76wzqwr0000gp/T/spring-cloud-data-flo

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 112/144
11/22/2018 Spring Cloud Data Flow Samples

5. Verify that two field-value-counter with the names hashtags and language is listing
successfully
CONSOLE
dataflow:>field-value-counter list
╔════════════════════════╗
║Field Value Counter name║
╠════════════════════════╣
║hashtags ║
║language ║
╚════════════════════════╝

6. Verify you can query individual field-value-counter results successfully

CONSOLE
dataflow:>field-value-counter display hashtags
Displaying values for field value counter 'hashtags'
╔══════════════════════════════════════╤═════╗
║ Value │Count║
╠══════════════════════════════════════╪═════╣
║KCA │ 40║
║PENNYSTOCKS │ 17║
║TEAMBILLIONAIRE │ 17║
║UCL │ 11║
║... │ ..║
║... │ ..║
║... │ ..║
╚══════════════════════════════════════╧═════╝

dataflow:>field-value-counter display language


Displaying values for field value counter 'language'
╔═════╤═════╗
║Value│Count║
╠═════╪═════╣
║en │1,171║
║es │ 337║
║ar │ 296║
║und │ 251║
║pt │ 175║
║ja │ 137║
║.. │ ...║
║.. │ ...║
║.. │ ...║
╚═════╧═════╝

7. Go to Dashboard accessible at localhost:9393/dashboard and launch the Analytics tab.


From the default Dashboard menu, select the following combinations to visualize real-time
updates on field-value-counter .

For real-time updates on language tags, select:

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 113/144
11/22/2018 Spring Cloud Data Flow Samples

a. Metric Type as Field-Value-Counters

b. Stream as language

c. Visualization as Bubble-Chart or Pie-Chart

For real-time updates on hashtags tags, select:

a. Metric Type as Field-Value-Counters

b. Stream as hashtags

c. Visualization as Bubble-Chart or Pie-Chart

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 114/144
11/22/2018 Spring Cloud Data Flow Samples

6.1.3. Summary
In this sample, you have learned:

How to use Spring Cloud Data Flow’s Local server

How to use Spring Cloud Data Flow’s shell application

How to create streaming data pipeline to compute simple analytics using Twitter Stream
and Field Value Counter applications

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 115/144
11/22/2018 Spring Cloud Data Flow Samples

7. Data Science
7.1. Species Prediction
In this demonstration, you will learn how to use PMML
(https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language) model in the context of streaming
data pipeline orchestrated by Spring Cloud Data Flow (http://cloud.spring.io/spring-cloud-dataflow/).

We will present the steps to prep, configure and rub Spring Cloud Data Flow’s Local server, a
Spring Boot application.

7.1.1. Prerequisites
A Running Data Flow Shell

The Spring Cloud Data Flow Shell is available for download


(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-started-deploying-
spring-cloud-dataflow)
or you can build (https://github.com/spring-cloud/spring-cloud-dataflow) it yourself.

the Spring Cloud Data Flow Shell and Local server implementation are in the
same repository and are both built by running ./mvnw install from the project
 root directory. If you have already run the build, use the jar in spring-cloud-
dataflow-shell/target

To run the Shell open a new terminal session:

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 116/144
11/22/2018 Spring Cloud Data Flow Samples

$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR>
$ java -jar spring-cloud-dataflow-shell-<VERSION>.jar
____ ____ _ __
/ ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| |
\___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` |
___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| |
|____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_|
____ |_| _ __|___/ __________
| _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \
| | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \
| |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / /
|____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/

Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
dataflow:>

The Spring Cloud Data Flow Shell is a Spring Boot application that connects to the
Data Flow Server’s REST API and supports a DSL that simplifies the process of

 defining a stream or task and managing its lifecycle. Most of these samples use
the shell. If you prefer, you can use the Data Flow UI localhost:9393/dashboard,
(or wherever it the server is hosted) to perform equivalent operations.

A running local Data Flow Server

The Local Data Flow Server is Spring Boot application available for download
(http://cloud.spring.io/spring-cloud-dataflow/#platform-implementations) or you can build
(https://github.com/spring-cloud/spring-cloud-dataflow) it yourself. If you build it yourself, the
executable jar will be in spring-cloud-dataflow-server-local/target

To run the Local Data Flow server Open a new terminal session:

$cd <PATH/TO/SPRING-CLOUD-DATAFLOW-LOCAL-JAR>
$java -jar spring-cloud-dataflow-server-local-<VERSION>.jar

Running instance of Kafka (http://kafka.apache.org/downloads.html)

7.1.2. Building and Running the Demo


1. Register
(https://github.com/spring-cloud/spring-cloud-dataflow/blob/master/spring-cloud-dataflow-
docs/src/main/asciidoc/streams.adoc#register-a-stream-app)
the out-of-the-box applications for the Kafka binder

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 117/144
11/22/2018 Spring Cloud Data Flow Samples

These samples assume that the Data Flow Server can access a remote Maven
repository, repo.spring.io/libs-release by default. If your Data Flow
server is running behind a firewall, or you are using a maven proxy
preventing access to public repositories, you will need to install the sample
apps in your internal Maven repository and configure
(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-
started-maven-configuration)
the server accordingly. The sample applications are typically registered using
Data Flow’s bulk import facility. For example, the Shell command
dataflow:>app import --uri bit.ly/Celsius-SR1-stream-
applications-rabbit-maven (The actual URI is release and binder specific so
refer to the sample instructions for the actual URL). The bulk import URI
references a plain text file containing entries for all of the publicly available
Spring Cloud Stream and Task applications published to repo.spring.io .
 For example,
source.http=maven://org.springframework.cloud.stream.app:http-
source-rabbit:1.3.1.RELEASE registers the http source app at the
corresponding Maven address, relative to the remote repository(ies)
configured for the Data Flow server. The format is maven://<groupId>:
<artifactId>:<version> You will need to download
(https://repo.spring.io/libs-release/org/springframework/cloud/stream/app/spring-cloud-
stream-app-descriptor/Bacon.RELEASE/spring-cloud-stream-app-descriptor-
Bacon.RELEASE.rabbit-apps-maven-repo-url.properties)
the required apps or build (https://github.com/spring-cloud-stream-app-starters)
them and then install them in your Maven repository, using whatever group,
artifact, and version you choose. If you do this, register individual apps using
dataflow:>app register… using the maven:// resource URI format
corresponding to your installed app.

dataflow:>app import --uri http://bit.ly/Celsius-SR1-stream-applications-kafka-10-


maven

2. Create and deploy the following stream

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 118/144
11/22/2018 Spring Cloud Data Flow Samples

dataflow:>stream create --name pmmlTest --definition "http --server.port=9001 | pmml -


-modelLocation=https://raw.githubusercontent.com/spring-cloud/spring-cloud-stream-
modules/master/pmml-processor/src/test/resources/iris-flower-classification-naive-
bayes-1.pmml.xml --
inputs='Sepal.Length=payload.sepalLength,Sepal.Width=payload.sepalWidth,Petal.Length=p
ayload.petalLength,Petal.Width=payload.petalWidth' --
outputs='Predicted_Species=payload.predictedSpecies' --inputType='application/x-
spring-tuple' --outputType='application/json'| log" --deploy
Created and deployed new stream 'pmmlTest'

The built-in pmml processor will load the given PMML model definition and
create an internal object representation that can be evaluated quickly. When
the stream receives the data, it will be used as the input for the evaluation of

 the analytical model iris-flower-classifier-1 contained in the PMML


document. The result of this evaluation is a new field predictedSpecies
that was created from the pmml processor by applying a classifier that uses
the naiveBayes algorithm.

3. Verify the stream is successfully deployed

dataflow:>stream list

4. Notice that pmmlTest.http , pmmlTest.pmml , and pmmlTest.log Spring Cloud Stream


(https://github.com/spring-cloud-stream-app-starters/) applications are running within the local-
server .

CONSOLE
2016-02-18 06:36:45.396 INFO 31194 --- [nio-9393-exec-1] o.s.c.d.d.l.OutOfProcessModuleD
Logs will be in /var/folders/c3/ctx7_rns6x30tq7rb76wzqwr0000gp/T/spring-cloud-data-flo
2016-02-18 06:36:45.402 INFO 31194 --- [nio-9393-exec-1] o.s.c.d.d.l.OutOfProcessModuleD
Logs will be in /var/folders/c3/ctx7_rns6x30tq7rb76wzqwr0000gp/T/spring-cloud-data-flo
2016-02-18 06:36:45.407 INFO 31194 --- [nio-9393-exec-1] o.s.c.d.d.l.OutOfProcessModuleD
Logs will be in /var/folders/c3/ctx7_rns6x30tq7rb76wzqwr0000gp/T/spring-cloud-data-flo

5. Post sample data to the http endpoint: localhost:9001 ( 9001 is the port we specified
for the http source in this case)

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 119/144
11/22/2018 Spring Cloud Data Flow Samples

dataflow:>http post --target http://localhost:9001 --contentType application/json --


data "{ \"sepalLength\": 6.4, \"sepalWidth\": 3.2, \"petalLength\":4.5,
\"petalWidth\":1.5 }"
> POST (application/json;charset=UTF-8) http://localhost:9001 { "sepalLength": 6.4,
"sepalWidth": 3.2, "petalLength":4.5, "petalWidth":1.5 }
> 202 ACCEPTED

6. Verify the predicted outcome by tailing <PATH/TO/LOGAPP/pmmlTest.log/stdout_0.log


file. The predictedSpecies in this case is versicolor .

{
"sepalLength": 6.4,
"sepalWidth": 3.2,
"petalLength": 4.5,
"petalWidth": 1.5,
"Species": {
"result": "versicolor",
"type": "PROBABILITY",
"categoryValues": [
"setosa",
"versicolor",
"virginica"
]
},
"predictedSpecies": "versicolor",
"Probability_setosa": 4.728207706362856E-9,
"Probability_versicolor": 0.9133639504608079,
"Probability_virginica": 0.0866360448109845
}

7. Let’s post with a slight variation in data.

dataflow:>http post --target http://localhost:9001 --contentType application/json --


data "{ \"sepalLength\": 6.4, \"sepalWidth\": 3.2, \"petalLength\":4.5,
\"petalWidth\":1.8 }"
> POST (application/json;charset=UTF-8) http://localhost:9001 { "sepalLength": 6.4,
"sepalWidth": 3.2, "petalLength":4.5, "petalWidth":1.8 }
> 202 ACCEPTED

 petalWidth value changed from 1.5 to 1.8

8. The predictedSpecies will now be listed as virginica .

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 120/144
11/22/2018 Spring Cloud Data Flow Samples

{
"sepalLength": 6.4,
"sepalWidth": 3.2,
"petalLength": 4.5,
"petalWidth": 1.8,
"Species": {
"result": "virginica",
"type": "PROBABILITY",
"categoryValues": [
"setosa",
"versicolor",
"virginica"
]
},
"predictedSpecies": "virginica",
"Probability_setosa": 1.0443898084700813E-8,
"Probability_versicolor": 0.1750120333571921,
"Probability_virginica": 0.8249879561989097
}

7.1.3. Summary
In this sample, you have learned:

How to use Spring Cloud Data Flow’s Local server

How to use Spring Cloud Data Flow’s shell application

How to use pmml processor to compute real-time predictions

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 121/144
11/22/2018 Spring Cloud Data Flow Samples

8. Functions
8.1. Functions in Spring Cloud Data Flow
This is a experiment to run Spring Cloud Function workload in Spring Cloud Data Flow. The
current release of function-runner used in this sample is at 1.0 M1 release and it is not
recommended to be used in production.

In this sample, you will learn how to use Spring Cloud Function
(https://github.com/spring-cloud/spring-cloud-function) based streaming applications in Spring Cloud
Data Flow. To learn more about Spring Cloud Function, check out the project page
(http://cloud.spring.io/spring-cloud-function/).

8.1.1. Prerequisites
A Running Data Flow Shell

The Spring Cloud Data Flow Shell is available for download


(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-started-deploying-
spring-cloud-dataflow)
or you can build (https://github.com/spring-cloud/spring-cloud-dataflow) it yourself.

the Spring Cloud Data Flow Shell and Local server implementation are in the
same repository and are both built by running ./mvnw install from the project
 root directory. If you have already run the build, use the jar in spring-cloud-
dataflow-shell/target

To run the Shell open a new terminal session:

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 122/144
11/22/2018 Spring Cloud Data Flow Samples

$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR>
$ java -jar spring-cloud-dataflow-shell-<VERSION>.jar
____ ____ _ __
/ ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| |
\___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` |
___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| |
|____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_|
____ |_| _ __|___/ __________
| _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \
| | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \
| |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / /
|____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/

Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
dataflow:>

The Spring Cloud Data Flow Shell is a Spring Boot application that connects to the
Data Flow Server’s REST API and supports a DSL that simplifies the process of

 defining a stream or task and managing its lifecycle. Most of these samples use
the shell. If you prefer, you can use the Data Flow UI localhost:9393/dashboard,
(or wherever it the server is hosted) to perform equivalent operations.

A running local Data Flow Server

The Local Data Flow Server is Spring Boot application available for download
(http://cloud.spring.io/spring-cloud-dataflow/#platform-implementations) or you can build
(https://github.com/spring-cloud/spring-cloud-dataflow) it yourself. If you build it yourself, the
executable jar will be in spring-cloud-dataflow-server-local/target

To run the Local Data Flow server Open a new terminal session:

$cd <PATH/TO/SPRING-CLOUD-DATAFLOW-LOCAL-JAR>
$java -jar spring-cloud-dataflow-server-local-<VERSION>.jar

This sample requires access to both Spring’s snapshot and milestone repos. Please
follow how-to-guides

 (https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#howto) on
how to set repo.spring.io/libs-release and repo.spring.io/libs-
milestone as remote repositories in SCDF.

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 123/144
11/22/2018 Spring Cloud Data Flow Samples

A local build of Spring Cloud Function (https://github.com/spring-cloud/spring-cloud-function)

A running instance of Rabbit MQ (https://www.rabbitmq.com/)

General understanding of the out-of-the-box function-runner


(https://github.com/spring-cloud-stream-app-starters/function/blob/master/spring-cloud-starter-stream-app-
function/README.adoc)
application

8.1.2. Building and Running the Demo


1. Register
(https://github.com/spring-cloud/spring-cloud-dataflow/blob/master/spring-cloud-dataflow-
docs/src/main/asciidoc/streams.adoc#register-a-stream-app)
the out-of-the-box applications for the Rabbit binder

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 124/144
11/22/2018 Spring Cloud Data Flow Samples

These samples assume that the Data Flow Server can access a remote Maven
repository, repo.spring.io/libs-release by default. If your Data Flow
server is running behind a firewall, or you are using a maven proxy
preventing access to public repositories, you will need to install the sample
apps in your internal Maven repository and configure
(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-
started-maven-configuration)
the server accordingly. The sample applications are typically registered using
Data Flow’s bulk import facility. For example, the Shell command
dataflow:>app import --uri bit.ly/Celsius-SR1-stream-
applications-rabbit-maven (The actual URI is release and binder specific so
refer to the sample instructions for the actual URL). The bulk import URI
references a plain text file containing entries for all of the publicly available
Spring Cloud Stream and Task applications published to repo.spring.io .
 For example,
source.http=maven://org.springframework.cloud.stream.app:http-
source-rabbit:1.3.1.RELEASE registers the http source app at the
corresponding Maven address, relative to the remote repository(ies)
configured for the Data Flow server. The format is maven://<groupId>:
<artifactId>:<version> You will need to download
(https://repo.spring.io/libs-release/org/springframework/cloud/stream/app/spring-cloud-
stream-app-descriptor/Bacon.RELEASE/spring-cloud-stream-app-descriptor-
Bacon.RELEASE.rabbit-apps-maven-repo-url.properties)
the required apps or build (https://github.com/spring-cloud-stream-app-starters)
them and then install them in your Maven repository, using whatever group,
artifact, and version you choose. If you do this, register individual apps using
dataflow:>app register… using the maven:// resource URI format
corresponding to your installed app.

dataflow:>app import --uri http://bit.ly/Celsius-SR1-stream-applications-rabbit-maven

2. Register the out-of-the-box function-runner


(https://github.com/spring-cloud-stream-app-starters/function/blob/master/spring-cloud-starter-stream-app-
function/README.adoc)
application (current release is at 1.0.0.M1)

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 125/144
11/22/2018 Spring Cloud Data Flow Samples

dataflow:>app register --name function-runner --type processor --uri


maven://org.springframework.cloud.stream.app:function-app-rabbit:1.0.0.M1 --metadata-
uri maven://org.springframework.cloud.stream.app:function-app-
rabbit:jar:metadata:1.0.0.M1

3. Create and deploy the following stream

dataflow:>stream create foo --definition "http --server.port=9001 | function-runner --


function.className=com.example.functions.CharCounter --
function.location=file:///<PATH/TO/SPRING-CLOUD-FUNCTION>/spring-cloud-function-
samples/function-sample/target/spring-cloud-function-sample-1.0.0.BUILD-SNAPSHOT.jar |
log" --deploy

 Replace the <PATH/TO/SPRING-CLOUD-FUNCTION> with the correct path.

The source core of CharCounter function is in Spring cloud Function’s


samples repo

 (https://github.com/spring-cloud/spring-cloud-function/blob/master/spring-cloud-
function-samples/function-
sample/src/main/java/com/example/functions/CharCounter.java)
.

4. Verify the stream is successfully deployed.

dataflow:>stream list

5. Notice that foo-http , foo-function-runner , and foo-log Spring Cloud Stream


(https://github.com/spring-cloud-stream-app-starters/) applications are running as Spring Boot
applications and the log locations will be printed in the Local-server console.

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 126/144
11/22/2018 Spring Cloud Data Flow Samples

BASH
....
....
2017-10-17 11:43:03.714 INFO 18409 --- [nio-9393-exec-7] o.s.c.d.s.s.AppDeployerStreamDe
2017-10-17 11:43:04.379 INFO 18409 --- [nio-9393-exec-7] o.s.c.d.spi.local.LocalAppDeplo
Logs will be in /var/folders/c3/ctx7_rns6x30tq7rb76wzqwr0000gs/T/spring-cloud-dataflow
2017-10-17 11:43:04.380 INFO 18409 --- [nio-9393-exec-7] o.s.c.d.s.s.AppDeployerStreamDe
2017-10-17 11:43:04.384 INFO 18409 --- [nio-9393-exec-7] o.s.c.d.spi.local.LocalAppDeplo
Logs will be in /var/folders/c3/ctx7_rns6x30tq7rb76wzqwr0000gs/T/spring-cloud-dataflow
2017-10-17 11:43:04.385 INFO 18409 --- [nio-9393-exec-7] o.s.c.d.s.s.AppDeployerStreamDe
2017-10-17 11:43:04.391 INFO 18409 --- [nio-9393-exec-7] o.s.c.d.spi.local.LocalAppDeplo
Logs will be in /var/folders/c3/ctx7_rns6x30tq7rb76wzqwr0000gs/T/spring-cloud-dataflow
....
....

6. Post sample data to the http endpoint: localhost:9001 ( 9001 is the port we specified
for the http source in this case)

dataflow:>http post --target http://localhost:9001 --data "hello world"


> POST (text/plain) http://localhost:9001 hello world
> 202 ACCEPTED

dataflow:>http post --target http://localhost:9001 --data "hmm, yeah, it works now!"


> POST (text/plain) http://localhost:9001 hmm, yeah, it works now!
> 202 ACCEPTED

7. Tail the log-sink’s standard-out logs to see the character counts

BASH
$ tail -f /var/folders/c3/ctx7_rns6x30tq7rb76wzqwr0000gs/T/spring-cloud-dataflow-65490254

....
....
....
....
2017-10-17 11:45:39.363 INFO 19193 --- [on-runner.foo-1] log-sink : 11
2017-10-17 11:46:40.997 INFO 19193 --- [on-runner.foo-1] log-sink : 24
....
....

8.1.3. Summary
In this sample, you have learned:

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 127/144
11/22/2018 Spring Cloud Data Flow Samples

How to use Spring Cloud Data Flow’s Local server

How to use Spring Cloud Data Flow’s shell application

How to use the out-of-the-box function-runner application in Spring Cloud Data Flow

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 128/144
11/22/2018 Spring Cloud Data Flow Samples

9. Micrometer
9.1. SCDF metrics with In uxDB and Grafana
In this demonstration, you will learn how Micrometer (http://micrometer.io) can help to monitor
your Spring Cloud Data Flow (http://cloud.spring.io/spring-cloud-dataflow/) (SCDF) streams using
InfluxDB (https://docs.influxdata.com/influxdb/v1.5/) and Grafana (https://grafana.com/grafana).

InfluxDB (https://docs.influxdata.com/influxdb/v1.5/) is a real-time storage for time-series data, such as


SCDF metrics. It supports downsampling, automatically expiring and deleting unwanted data, as
well as backup and restore. Analysis of data is done via a SQL-like query
(https://docs.influxdata.com/influxdb/v1.5/query_language/) language.

Grafana (https://grafana.com/grafana) is open source metrics Dashboard platform. It supports


multiple backend time-series databases including InluxDB.

The architecture (Fig.1) builds on the Spring Boot Micrometer


(https://docs.spring.io/spring-boot/docs/2.0.1.RELEASE/reference/htmlsingle/#production-ready-metrics-getting-
started)
functionality. When a micrometer-registry-influx (http://micrometer.io/docs/registry/influx)
dependency is found on the classpath the Spring Boot auto-configures the metrics export for
InfluxDB .

The Spring Cloud Stream (https://cloud.spring.io/spring-cloud-stream-app-starters/) (SCSt) applications


inherit the mircometer functionality, allowing them to compute and send various application
metrics to the configured time-series database.

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 129/144
11/22/2018 Spring Cloud Data Flow Samples

Figure 4. SCDF metrics analyzis with InfluxDB and Grafana

Out of the box, SCSt sends core metrics


(https://docs.spring.io/spring-boot/docs/2.0.1.RELEASE/reference/htmlsingle/#production-ready-metrics-meter)
such as CPU , Memory , MVC and Health to name some. Among those the Spring Integration
metrics
(https://docs.spring.io/spring-integration/docs/current/reference/html/system-management-
chapter.html#micrometer-integration)
allows computing the Rate and the Latency of the messages in the SCDF streams.

Unlike Spring Cloud Data Flow Metrics Collector, metrics here are sent
 synchronously over HTTP not through a Binder channel topic.

All Spring Cloud Stream App Starers enrich the standard dimensional tags
(http://micrometer.io/docs/concepts#_supported_monitoring_systems) with the following SCDF specific
tags:

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 130/144
11/22/2018 Spring Cloud Data Flow Samples

tag name SCDF property default value

stream.name spring.cloud.dataflow.stream.name unknown

application.name spring.cloud.dataflow.stream.app.label unknown

instance.index instance.index 0

application.guid spring.cloud.application.guid unknown

application.type spring.cloud.dataflow.stream.app.type unknown

For custom app starters that don’t extend from the core
(https://github.com/spring-cloud-stream-app-starters/core) parent, you should add the
 app-starters-common : org.springframework.cloud.stream.app
dependency to enable the SCDF tags.

Below we will present the steps to prep, configure the demo of Spring Cloud Data Flow’s Local
server integration with InfluxDB . For other deployment environment, such as Cloud Foundry
or Kubernetes , additional configurations might be required.

9.1.1. Prerequisites
A Running Data Flow Shell

The Spring Cloud Data Flow Shell is available for download


(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-started-deploying-
spring-cloud-dataflow)
or you can build (https://github.com/spring-cloud/spring-cloud-dataflow) it yourself.

the Spring Cloud Data Flow Shell and Local server implementation are in the
same repository and are both built by running ./mvnw install from the project
 root directory. If you have already run the build, use the jar in spring-cloud-
dataflow-shell/target

To run the Shell open a new terminal session:

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 131/144
11/22/2018 Spring Cloud Data Flow Samples

$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR>
$ java -jar spring-cloud-dataflow-shell-<VERSION>.jar
____ ____ _ __
/ ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| |
\___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` |
___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| |
|____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_|
____ |_| _ __|___/ __________
| _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \
| | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \
| |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / /
|____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/

Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
dataflow:>

The Spring Cloud Data Flow Shell is a Spring Boot application that connects to the
Data Flow Server’s REST API and supports a DSL that simplifies the process of

 defining a stream or task and managing its lifecycle. Most of these samples use
the shell. If you prefer, you can use the Data Flow UI localhost:9393/dashboard,
(or wherever it the server is hosted) to perform equivalent operations.

A running local Data Flow Server

The Local Data Flow Server is Spring Boot application available for download
(http://cloud.spring.io/spring-cloud-dataflow/#platform-implementations) or you can build
(https://github.com/spring-cloud/spring-cloud-dataflow) it yourself. If you build it yourself, the
executable jar will be in spring-cloud-dataflow-server-local/target

To run the Local Data Flow server Open a new terminal session:

$cd <PATH/TO/SPRING-CLOUD-DATAFLOW-LOCAL-JAR>
$java -jar spring-cloud-dataflow-server-local-<VERSION>.jar

Running instance of Kafka (http://kafka.apache.org/downloads.html)

Spring Cloud Stream 2.x based Time


(https://github.com/spring-cloud-stream-app-starters/time/blob/master/spring-cloud-starter-stream-source-
time/README.adoc)
and Log

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 132/144
11/22/2018 Spring Cloud Data Flow Samples

(https://github.com/spring-cloud-stream-app-starters/log/blob/master/spring-cloud-starter-stream-sink-
log/README.adoc)
applications starters, pre-built with io.micrometer:micrometer-registry-influx
dependency.

Next versions of the SCSt App Initializr (https://start-scs.cfapps.io/) utility would

 add support for Micrometer dependencies to facilitate the injection of


micrometer-registries with SCSt apps.

9.1.2. Building and Running the Demo


1. Register time and log applications that are pre-built with io.micrometer:micrometer-
registry-influx . The next version of SCSt App Initializr (https://start-scs.cfapps.io/) allows
adding Micrometer registry dependencies as well.

BASH
app register --name time2 --type source --uri file://<path-to-your-time-app>/time-
source-kafka-2.0.0.BUILD-SNAPSHOT.jar --metadata-uri file://<path-to-your-time-
app>/time-source-kafka-2.0.0.BUILD-SNAPSHOT-metadata.jar

app register --name log2 --type sink --uri file://<path-to-your-log-app>/log-sink-


kafka-2.0.0.BUILD-SNAPSHOT.jar --metadata-uri file://<path-to-your-log-app>/log-sink-
kafka-2.0.0.BUILD-SNAPSHOT-metadata.jar

2. Create InfluxDB and Grafana Docker containers

BASH
docker run -d --name grafana -p 3000:3000 grafana/grafana:5.1.0

docker run -d --name influxdb -p 8086:8086 influxdb:1.5.2-alpine

3. Create and deploy the following stream


BASH
dataflow:>stream create --name t2 --definition "time2 | log2"

dataflow:>stream deploy --name t2 --properties


"app.*.management.metrics.export.influx.db=myinfluxdb"

The app.*.management.metrics.export.influx.db=myinfluxdb instructs the time2 and


log2 apps to use the myinfluxdb database (created automatically).

By default, the InfluxDB server runs on localhost:8086 . You can add the
app.*.management.metrics.export.influx.uri={influxbb-server-url} property to
alter the default location.

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 133/144
11/22/2018 Spring Cloud Data Flow Samples

You can connect to the InfluxDB and explore the measurements

BASH
docker exec -it influxdb /bin/bash
root:/# influx
> show databases
> use myinfluxdb
> show measurements
> select * from spring_integration_send limit 10

4. Configure Grafana

Open Grafana UI (localhost:3000) and log-in (user: admin , password: admin ).

Create InfluxDB datasource called: influx_auto_DataFlowMetricsCollector that


connects to our myinfluxdb influx database.

Table 1. DataSource Properties

Name influx_auto_DataFlowMetricsCollector

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 134/144
11/22/2018 Spring Cloud Data Flow Samples

Type InfluxDB

Host localhost:8086

Access Browser

Database myinfluxdb

User (DB) admin

Password admin
(DB)

 For previous Grafana 4.x set the Access property to direct instead.

Import the scdf-influxdb-dashboard.json dashboard

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 135/144
11/22/2018 Spring Cloud Data Flow Samples

9.1.3. Summary
In this sample, you have learned:

How to use Spring Cloud Data Flow’s Local server

How to use Spring Cloud Data Flow’s shell application

How to use InfluxDB and Grafana to monitor and visualize Spring Cloud Stream
application metrics. :sectnums: :docs_dir: ../..

9.2. SCDF metrics with Prometheus and Grafana


In this demonstration, you will learn how Micrometer (http://micrometer.io) can help to monitor
your Spring Cloud Data Flow (http://cloud.spring.io/spring-cloud-dataflow/) Streams using Prometheus
(http://prometheus.io) and Grafana (https://grafana.com/grafana).

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 136/144
11/22/2018 Spring Cloud Data Flow Samples

Prometheus is time series database used for monitoring of highly dynamic service-oriented
architectures. In a world of microservices, its support for multi-dimensional data collection and
querying is a particular strength.

Grafana (https://grafana.com/grafana) is open source metrics Dashboard platform. It supports


multiple backend time-series databases including Prometheus.

The architecture (Fig.1) builds on the Spring Boot Micrometer


(https://docs.spring.io/spring-boot/docs/2.0.1.RELEASE/reference/htmlsingle/#production-ready-metrics-getting-
started)
functionality. When a micrometer-registry-prometheus (http://micrometer.io/docs/registry/prometheus)
dependency is found on the classpath the Spring Boot auto-configures the metrics export for
Prometheus .

The Spring Cloud Stream (https://cloud.spring.io/spring-cloud-stream-app-starters/) (SCSt) applications


inherit the mircometer functionality, allowing them to compute and send various application
metrics to the configured time-series database.

Figure 5. SCDF metrics analyzis with Prometheus and Grafana

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 137/144
11/22/2018 Spring Cloud Data Flow Samples

Out of the box, SCSt sends core metrics


(https://docs.spring.io/spring-boot/docs/2.0.1.RELEASE/reference/htmlsingle/#production-ready-metrics-meter)
such as CPU , Memory , MVC and Health to name some. Among those the Spring Integration
metrics
(https://docs.spring.io/spring-integration/docs/current/reference/html/system-management-
chapter.html#micrometer-integration)
allows computing the Rate and the Latency of the messages in the SCDF streams.

Unlike Spring Cloud Data Flow Metrics Collector, metrics here are sent
 synchronously over HTTP not through a Binder channel topic.

All Spring Cloud Stream App Starers enrich the standard dimensional tags
(http://micrometer.io/docs/concepts#_supported_monitoring_systems) with the following SCDF specific
tags:

tag name SCDF property default value

stream.name spring.cloud.dataflow.stream.name unknown

application.name spring.cloud.dataflow.stream.app.label unknown

instance.index instance.index 0

application.guid spring.cloud.application.guid unknown

application.gype spring.cloud.dataflow.stream.app.type unknown

For custom app starters that don’t extend from the core
(https://github.com/spring-cloud-stream-app-starters/core) parent, you should add the
 app-starters-common : org.springframework.cloud.stream.app
dependency to enable the SCDF tags.

Prometheus employs the pull-metrics model, called metrics scraping. Spring Boot provides an
actuator endpoint available at /actuator/prometheus to present a Prometheus scrape with the
appropriate format.

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 138/144
11/22/2018 Spring Cloud Data Flow Samples

Furthermore Prometheus requires a mechanism to discover the target applications to be


monitored (e.g. the URLs of the SCSt app instances). Targets may be statically configured via the
static_configs parameter or dynamically discovered using one of the supported service-
discovery mechanisms.

The SCDF Prometheus Service Discovery


(https://github.com/tzolov/spring-cloud-dataflow-prometheus-service-discovery) is a standalone (Spring
Boot) service, that uses the runtime/apps (https://goo.gl/kE4eLV) endpoint to retrieve the URLs of the
running SCDF applications and generate targets.json file. The targets.json file is compliant
with the <file_sd_config>
(https://prometheus.io/docs/prometheus/latest/configuration/configuration/#%3Cfile_sd_config%3E)
Prometheus discovery format.

Below we will present the steps to prepare, configure the demo of Spring Cloud Data Flow’s
Local server integration with Prometheus . For other deployment environment, such as Cloud
Foundry or Kubernetes , additional configurations might be required.

9.2.1. Prerequisites
A Running Data Flow Shell

The Spring Cloud Data Flow Shell is available for download


(https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-started-deploying-
spring-cloud-dataflow)
or you can build (https://github.com/spring-cloud/spring-cloud-dataflow) it yourself.

the Spring Cloud Data Flow Shell and Local server implementation are in the
same repository and are both built by running ./mvnw install from the project
 root directory. If you have already run the build, use the jar in spring-cloud-
dataflow-shell/target

To run the Shell open a new terminal session:

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 139/144
11/22/2018 Spring Cloud Data Flow Samples

$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR>
$ java -jar spring-cloud-dataflow-shell-<VERSION>.jar
____ ____ _ __
/ ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| |
\___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` |
___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| |
|____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_|
____ |_| _ __|___/ __________
| _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \
| | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \
| |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / /
|____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/

Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
dataflow:>

The Spring Cloud Data Flow Shell is a Spring Boot application that connects to the
Data Flow Server’s REST API and supports a DSL that simplifies the process of

 defining a stream or task and managing its lifecycle. Most of these samples use
the shell. If you prefer, you can use the Data Flow UI localhost:9393/dashboard,
(or wherever it the server is hosted) to perform equivalent operations.

A running local Data Flow Server

The Local Data Flow Server is Spring Boot application available for download
(http://cloud.spring.io/spring-cloud-dataflow/#platform-implementations) or you can build
(https://github.com/spring-cloud/spring-cloud-dataflow) it yourself. If you build it yourself, the
executable jar will be in spring-cloud-dataflow-server-local/target

To run the Local Data Flow server Open a new terminal session:

$cd <PATH/TO/SPRING-CLOUD-DATAFLOW-LOCAL-JAR>
$java -jar spring-cloud-dataflow-server-local-<VERSION>.jar

Running instance of Kafka (http://kafka.apache.org/downloads.html)

Spring Cloud Stream 2.x based Time


(https://github.com/spring-cloud-stream-app-starters/time/blob/master/spring-cloud-starter-stream-source-
time/README.adoc)
and Log

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 140/144
11/22/2018 Spring Cloud Data Flow Samples

(https://github.com/spring-cloud-stream-app-starters/log/blob/master/spring-cloud-starter-stream-sink-
log/README.adoc)
applications starters, pre-built with io.micrometer:micrometer-registry-prometheus
dependency.

Next versions of the SCSt App Initializr (https://start-scs.cfapps.io/) utility would

 add support for Micrometer dependencies to facilitate the injection of


micrometer-registries with SCSt apps.

9.2.2. Building and Running the Demo


1. Register time and log applications that are pre-built with io.micrometer:micrometer-
registry-prometheus . The next version of SCSt App Initializr (https://start-scs.cfapps.io/) allows
adding Micrometer registry dependencies as well.
BASH
app register --name time2 --type source --uri file://<path-to-your-time-app>/time-
source-kafka-2.0.0.BUILD-SNAPSHOT.jar --metadata-uri file://<path-to-your-time-
app>/time-source-kafka-2.0.0.BUILD-SNAPSHOT-metadata.jar

app register --name log2 --type sink --uri file://<path-to-your-log-app>/log-sink-


kafka-2.0.0.BUILD-SNAPSHOT.jar --metadata-uri file://<path-to-your-log-app>/log-sink-
kafka-2.0.0.BUILD-SNAPSHOT-metadata.jar

2. Create and deploy the following stream


BASH
dataflow:>stream create --name t2 --definition "time2 | log2"

dataflow:>stream deploy --name t2 --properties


"app.*.management.endpoints.web.exposure.include=prometheus,app.*.spring.autoconfigure
.exclude=org.springframework.boot.autoconfigure.security.servlet.SecurityAutoConfigura
tion"

The deployment properties make sure that the prometheus actuator is enabled and the Spring
Boot security is disabled

3. Build and start the SCDF Prometheus Service Discovery application

Build the spring-cloud-dataflow-prometheus-service-discover project form:


github.com/spring-cloud/spring-cloud-dataflow-samples/micrometer/spring-cloud-dataflow-
prometheus-service-discovery

BASH
cd ./spring-cloud-dataflow-samples/micrometer/spring-cloud-dataflow-prometheus-
service-discovery
./mvnw clean install

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 141/144
11/22/2018 Spring Cloud Data Flow Samples

For convenience, the final spring-cloud-dataflow-prometheus-service-discovery-0.0.1-


SNAPSHOT.jar
(https://github.com/spring-cloud/spring-cloud-dataflow-
samples/raw/master/src/main/asciidoc/micrometer/prometheus/spring-cloud-dataflow-prometheus-
service-discovery-0.0.1-SNAPSHOT.jar)
artifact is provided with this sample.

Start the service discovery application:


BASH
java -jar ./target/spring-cloud-dataflow-prometheus-service-discovery-0.0.1-
SNAPSHOT.jar \
--metrics.prometheus.target.discovery.url=http://localhost:9393/runtime/apps \
--metrics.prometheus.target.file.path=/tmp/targets.json \
--metrics.prometheus.target.refresh.rate=10000 \
--metrics.prometheus.target.mode=local

It will connect to the SCDF runtime url, and generates /tmp/targets.json files every 10 sec.

4. Create Prometheus configuration file (prometheus-local-file.yml)

YAML
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is
every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1
minute.
# scrape_timeout is set to the global default (10s).

# A scrape configuration containing exactly one endpoint to scrape:


scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from
this config.
- job_name: 'scdf'
metrics_path: '/actuator/prometheus'
file_sd_configs:
- files:
- targets.json
refresh_interval: 30s

Configure the file_sd_config discovery mechanism using the generated targets.json:

5. Start Prometheus

BASH
docker run -d --name prometheus \
-p 9090:9090 \
-v <full-path-to>/prometheus-local-file.yml:/etc/prometheus/prometheus.yml \
-v /tmp/targets.json:/etc/prometheus/targets.json \
prom/prometheus:v2.2.1

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 142/144
11/22/2018 Spring Cloud Data Flow Samples

Pass the prometheus.yml and map the /tmp/targets.json into /etc/prometheus/targets.json

Use the management UI: localhost:9090/graph to verify that SCDF apps metrics have been
collected:

# Throughput
rate(spring_integration_send_seconds_count{type="channel"}[60s])

# Latency
rate(spring_integration_send_seconds_sum{type="channel"}
[60s])/rate(spring_integration_send_seconds_count{type="channel"}[60s])

6. Start Grafana Docker containers

BASH
docker run -d --name grafana -p 3000:3000 grafana/grafana:5.1.0

7. Configure Grafana

Open Grafana UI (localhost:3000) and log-in (user: admin , password: admin ).

Create Prometheus datasource called: ScdfPrometheus

Table 2. DataSource Properties

Name ScdfPrometheus

Type Prometheus

Host localhost:9090

Access Browser

 For previous Grafana 4.x set the Access property to direct instead.

Import the scdf-prometheus-grafana-dashboard.json dashboard

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 143/144
11/22/2018 Spring Cloud Data Flow Samples

9.2.3. Summary
In this sample, you have learned:

How to use Spring Cloud Data Flow’s Local server

How to use Spring Cloud Data Flow’s shell application

How to use Prometheus and Grafana to monitor and visualize Spring Cloud Stream
application metrics.

Last updated 2018-11-06 15:19:02 UTC

https://docs.spring.io/spring-cloud-dataflow-samples/docs/current/reference/htmlsingle/ 144/144

You might also like