Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Confidential and Proprietary to Daugherty Business Solutions
Spark: Building an application from Start
to Finish
Adam Doyle
STLHUG August 2017
Confidential & Proprietary to Daugherty Business Solutions.
EIM and Analytics
Data Science
• Predictive and Prescriptive Analytics
• Social, Text and Sentiment Analytics
• Natural Language Processing
• Machine Learning, Artificial Intelligence
• SPSS, SAS, R, IBM Watson™
Strategy and Competency Building
• Build the right, comprehensive solution blueprint across
12 Domains
• Establish, specific, actionable plan and ROIs
• Protecting your investments
• Organization, Talent, Competency
• Processes, Methods, Techniques, Tools
• Speed – Agile EIM Transformation
• Governance processes
Customer and Business Analytics
• Customer/Buyer/Channel Segmentation
• Persona Development, Customer Scoring (Value, Potential)
• Attrition Modeling, Engagement and Response Modeling
• Inventory Management, Marketing Campaigns
• Product Design Analytics, Workforce Planning, Location
Based Advertising
• Data Monetization
Traditional Data Warehouse and Business
Intelligence
• EDW, ODS, Data Mart and Integration
• Master Data Management
• Data Governance
• Dashboards, Scorecards,
• Reports , Alerts
• Multidimensional Analysis
• Ad hoc slicing and dicing
• Self Service Enablement
• Cloud Migration and Agile EIM
ANALYTICS
STRATEGY
EIM and
ANALYTICS
400+ employees
strong
Digital Engagement/Analytics
• Customer Engagement Strategies
• Omni-channel and Integrated Marketing
• Strategic Planning, Building and Executing
Digital and Customer Engagement Solutions.
Big Data and Next
Generation Technologies
• Data Lab Development Centers
• Data Lakes, Analytic Platforms
• Hadoop (Cloudera, Hortonworks)
• NoSQL / Graph DB (MongoDB, DataStax
• Cloud platforms (AWS, Google, Azure)
• Spark, Sqoop, Hive, Pig, Kafka, etc.
Confidential and Proprietary to Daugherty Business Solutions
• 20 year veteran of the St. Louis
IT community
• Co-Organizer, St. Louis Hadoop
User Group
• Big Data Community Lead,
Daugherty Business Solutions
• Formerly Big Data Solution
Architect at Amitech, Lead Big
Data developer at Mercy
• Speaker at local and national
Big Data conferences
Meet Adam Doyle
3
Confidential and Proprietary to Daugherty Business Solutions
• GDPR
• Why Spark
• Infrastructure
• Language
• Components
• Development
• Debugging
• Deployment
• Monitoring
• Questions
4
Agenda
Confidential and Proprietary to Daugherty Business Solutions
• If your company stores data about citizens from any of the countries in
the European Union, you should be preparing for GDPR.
• Here is the official link http://www.eugdpr.org/ Here is why:
(Financial) Penalties: Under GDPR organizations in breach of GDPR
can be fined up to 4% of annual global turnover (Revenue) or €20
Million (whichever is greater). This is the maximum fine that can
be imposed for the most serious infringements e.g. not having
sufficient customer consent to process data or violating the core of
Privacy by Design concepts.
5
GDPR
Confidential and Proprietary to Daugherty Business Solutions
• Privacy by Design is an approach to systems engineering which
takes privacy into account throughout the whole engineering process.
• Privacy by Design is not about data protection but designing so data
doesn't need protection. The root principle therefore is based on
enabling service without data control transfer from the citizen to the
system (the citizen become identifiable or recognizable).
• How Privacy by Design is achieved depends on the application,
technologies and choice of approach.
• Whether the methodology actually achieve Privacy by Design is not to
be evaluated based on intent or approach, but outcome. I.e. if data
do not need protection to not represent a risk to the citizens, the
principle of Privacy by Design can be said to be achieved.
• Even "anonymized" data are still personal data if data are derived from
personal data outside the control of the individual citizen in question
or any means to re-identify or recognize citizens or citizen devices
exist. Anonymous isn’t anonymous if you can reverse it.
6
Privacy by Design
Confidential and Proprietary to Daugherty Business Solutions
• One simple example is Dynamic Host Configuration Protocol (DHCP) where
devices based on random identifiers gets an IP from the server and thus is enabled
to communicate without having leaked personal identifiers per se.
• A more advanced example is Global Positioning System where devices client-side
can detect their geographical location without leaking identity or location.
• Another example in Internet of Things is RFID where citizens' ability to
communicate with their devices without leaking identifiers can be achieved
using Zero-knowledge proof.
7
Good Privacy by Design
Confidential and Proprietary to Daugherty Business Solutions
• Is your company doing business with European entities?
• Have you analyzed your IT infrastructure for privacy leakage?
• Can someone reverse your anonymity algorithms?
• How can you better address privacy concerns?
• “The cloud industry is aware of GDPR. We're actually scrambling to find
more IT consulting firms/partners who could handle this to come on
board with us. Many of the U.S. companies, some in St. Louis, that do
business with EU, have data stored in EU, or are acquired by/acquiring E
• https://www.datanami.com/this-just-in/mapr-talend-collaborate-deliver-
governed-gdpr-data-lake-solution/
• https://hortonworks.com/webinar/apache-atlas-ranger-can-help-become-
gdpr-compliant/
8
Discussion
Confidential and Proprietary to Daugherty Business Solutions 9
Why Spark in Pictures
https://www.sigmoid.com/apache-spark-internals/
Confidential and Proprietary to Daugherty Business Solutions
• The client wants to get a real-time view of where the tweets about
them are coming from.
10
Problem statement
Confidential and Proprietary to Daugherty Business Solutions
When deploying Spark for use, you have a couple of options
11
Infrastructure
Confidential and Proprietary to Daugherty Business Solutions
• Scala
• Java
• Python
12
Language
val lines = sc.textFile("data.txt")
val lineLengths = lines.map(s => s.length)
val totalLength = lineLengths.reduce((a, b) => a + b)
JavaRDD<String> lines = sc.textFile("data.txt");
JavaRDD<Integer> lineLengths = lines.map(s -> s.length());
int totalLength = lineLengths.reduce((a, b) -> a + b);
lines = sc.textFile("data.txt")
lineLengths = lines.map(lambda s: len(s))
totalLength = lineLengths.reduce(lambda a, b: a + b)
Confidential and Proprietary to Daugherty Business Solutions 13
Components
Confidential and Proprietary to Daugherty Business Solutions
• Spark Batch (or Core) is used to perform batch computations on sets
of data.
• Becoming the base language for much of the current stack of Hadoop
processing
– Mahout moves from MapReduce engine to Spark
• Expressed as a series of transformations and actions on your RDDs.
• Transformations are lazily applied once an action is invoked.
14
Spark Batch
Confidential and Proprietary to Daugherty Business Solutions
• Module for structured data processing.
• Interaction modes include
– SQL
– DataFrames API
– Datasets API
• Can join sets of objects with tables
• Can be used to expose data sets to external applications
15
Spark SQL
Confidential and Proprietary to Daugherty Business Solutions 16
Spark Streaming
Confidential and Proprietary to Daugherty Business Solutions
• Combination of Spark SQL and Spark Streaming.
• Provides
– Fast
– Scalable
– Fault-tolerant
– End to end
– Exactly One
stream processing without the user having to reason about streaming.
• New in v2.1
• Hadoop vendors may not have implemented this functionality
17
Structured Streaming
Confidential and Proprietary to Daugherty Business Solutions
• Spark’s Machine Learning Library
• ML Algorithms
– Classification
– Regression
– Clustering
– Collaborative Filtering
• Featurization
• Pipelines
• Persistence
• Data Science Utilities
18
Spark MLIB
Confidential and Proprietary to Daugherty Business Solutions
• Graph processing
• Includes
– Graph abstraction
– Graph operations
– Pregel API
– Graph algorithms
– Graph Builders
19
GraphX
Confidential and Proprietary to Daugherty Business Solutions
• Transformations
• Actions
20
Spark Development
T:filter
T:map
T:flatMap
A:forEach
T:map
A:save
Confidential and Proprietary to Daugherty Business Solutions
• Start the REPL from the command line:
– spark-shell
• Creates Interactive Scala interpreter with Spark libraries
• Can add additional libraries into REPL on Launch
21
REPL
Confidential and Proprietary to Daugherty Business Solutions
// Create a local StreamingContext with two working thread and batch
// interval of 60 second
SparkConf conf = new
SparkConf().setMaster("local[2]").setAppName("SparkDemo");
JavaStreamingContext jssc = new JavaStreamingContext(conf,
Durations.seconds(60));
SparkSession spark = SparkSession
.builder()
.appName("Java Spark SQL basic example")
.getOrCreate();
// Get Data
// Process Data
// Act on the Data
jssc.start();
try {
jssc.awaitTermination();
} catch (InterruptedException e) {
e.printStackTrace();
}
22
Shell of our application
Confidential and Proprietary to Daugherty Business Solutions 23
Get the Data
// Get Data
JavaReceiverInputDStream<Status> stream =
TwitterUtils.createStream(jssc, filters);
Confidential and Proprietary to Daugherty Business Solutions
Transformations Actions
filter(func) reduce(func)
flatMap(func) collect()
map(func) count()
mapPartitions(func) first()
sample(withReplacement, fraction, seed) take(n)
union(otherDataset) takeSample(withReplacement, num, [seed])
intersection(otherDataset) takeOrdered(n, [ordering])
pipe(command, [envVars]) saveAsSequenceFile(path) (Java and Scala)
coalesce(numPartitions) saveAsObjectFile(path) (Java and Scala)
repartition(numPartitions) saveAsTextFile(path)
repartitionAndSortWithinPartitions(partitioner) countByKey()
join(otherDataset, [numTasks]) foreach(func)
cogroup(otherDataset, [numTasks])
groupByKey([numTasks])
reduceByKey(func, [numTasks])
aggregateByKey(zeroValue)(seqOp, combOp,
[numTasks])
sortByKey([ascending], [numTasks])
24
Transformations and Actions
Confidential and Proprietary to Daugherty Business Solutions
// Get Data
JavaReceiverInputDStream<Status> stream =
TwitterUtils.createStream(jssc, filters);
// Process Data
JavaDStream<Status> filteredStream = stream.filter(new
FilterNullGeoLocation());
JavaPairDStream<String, Long> geoHashCounts =
filteredStream
.mapToPair(new StatusToGeoHashCountPair());
// Act on the Data
geoHashCounts = geoHashCounts.reduceByKey(new
CombineCounts());
geoHashCounts.print();
geoHashCounts.foreachRDD(new SaveAsPlaces(spark));
25
Process the Data
Confidential and Proprietary to Daugherty Business Solutions 26
Bring the Func
JavaPairDStream<String, Long> geoHashCounts =
filteredStream.mapToPair(new StatusToGeoHashCountPair());
public class StatusToGeoHashCountPair implements
PairFunction<Status, String, Long> {
@Override
public Tuple2<String, Long> call(Status status) throws
Exception {
return new Tuple2<String,
Long>(GeoHash.geoHashStringWithCharacterPrecision(
status.getGeoLocation().getLatitude(),
status.getGeoLocation().getLongitude(), 5),
1L);
}
}
Confidential and Proprietary to Daugherty Business Solutions
public class StatusToGeoHashCountPair implements PairFunction<Status, String,
Long> {
@Override
public Tuple2<String, Long> call(Status status) throws Exception {
return new Tuple2<String, Long>
( GeoHash.geoHashStringWithCharacterPrecision(
status.getGeoLocation().getLatitude(),
status.getGeoLocation().getLongitude(), 5)
, 1L);
}
}
public class StatusToGeoHashCountPairTest {
@Test
public void getsExpectedResult() throws Exception {
Status status = mock(Status.class);
when(status.getGeoLocation()).thenReturn(new GeoLocation(10, -20));
Tuple2<String, Long> tuple = new StatusToGeoHashCountPair().call(status);
assertEquals("e9cbb", tuple._1());
assertEquals(new Long(1L), tuple._2());
}
}
27
Testing Your functions example
Confidential and Proprietary to Daugherty Business Solutions
• Understanding closures
• ForEach; Connections
• Stream Start
28
Lessons Learned
Confidential and Proprietary to Daugherty Business Solutions
• Non-deterministic run-time
• Testing a distributed application
29
Debugging
Confidential and Proprietary to Daugherty Business Solutions
• Deploy each jar to each client
• Create a Maven Uberjar using Shade
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.0.0</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
30
Deployment
Confidential and Proprietary to Daugherty Business Solutions
• Finding errors
• Keeping streaming timeline below processing line
31
Monitoring
#Trump
#McDonalds
Confidential and Proprietary to Daugherty Business Solutions
Join Our Team
Contact:
Adam.doyle@daugherty.com
Confidential and Proprietary to Daugherty Business Solutions

More Related Content

Spark: Building an application from Start to Finish

  • 1. Confidential and Proprietary to Daugherty Business Solutions Spark: Building an application from Start to Finish Adam Doyle STLHUG August 2017
  • 2. Confidential & Proprietary to Daugherty Business Solutions. EIM and Analytics Data Science • Predictive and Prescriptive Analytics • Social, Text and Sentiment Analytics • Natural Language Processing • Machine Learning, Artificial Intelligence • SPSS, SAS, R, IBM Watson™ Strategy and Competency Building • Build the right, comprehensive solution blueprint across 12 Domains • Establish, specific, actionable plan and ROIs • Protecting your investments • Organization, Talent, Competency • Processes, Methods, Techniques, Tools • Speed – Agile EIM Transformation • Governance processes Customer and Business Analytics • Customer/Buyer/Channel Segmentation • Persona Development, Customer Scoring (Value, Potential) • Attrition Modeling, Engagement and Response Modeling • Inventory Management, Marketing Campaigns • Product Design Analytics, Workforce Planning, Location Based Advertising • Data Monetization Traditional Data Warehouse and Business Intelligence • EDW, ODS, Data Mart and Integration • Master Data Management • Data Governance • Dashboards, Scorecards, • Reports , Alerts • Multidimensional Analysis • Ad hoc slicing and dicing • Self Service Enablement • Cloud Migration and Agile EIM ANALYTICS STRATEGY EIM and ANALYTICS 400+ employees strong Digital Engagement/Analytics • Customer Engagement Strategies • Omni-channel and Integrated Marketing • Strategic Planning, Building and Executing Digital and Customer Engagement Solutions. Big Data and Next Generation Technologies • Data Lab Development Centers • Data Lakes, Analytic Platforms • Hadoop (Cloudera, Hortonworks) • NoSQL / Graph DB (MongoDB, DataStax • Cloud platforms (AWS, Google, Azure) • Spark, Sqoop, Hive, Pig, Kafka, etc.
  • 3. Confidential and Proprietary to Daugherty Business Solutions • 20 year veteran of the St. Louis IT community • Co-Organizer, St. Louis Hadoop User Group • Big Data Community Lead, Daugherty Business Solutions • Formerly Big Data Solution Architect at Amitech, Lead Big Data developer at Mercy • Speaker at local and national Big Data conferences Meet Adam Doyle 3
  • 4. Confidential and Proprietary to Daugherty Business Solutions • GDPR • Why Spark • Infrastructure • Language • Components • Development • Debugging • Deployment • Monitoring • Questions 4 Agenda
  • 5. Confidential and Proprietary to Daugherty Business Solutions • If your company stores data about citizens from any of the countries in the European Union, you should be preparing for GDPR. • Here is the official link http://www.eugdpr.org/ Here is why: (Financial) Penalties: Under GDPR organizations in breach of GDPR can be fined up to 4% of annual global turnover (Revenue) or €20 Million (whichever is greater). This is the maximum fine that can be imposed for the most serious infringements e.g. not having sufficient customer consent to process data or violating the core of Privacy by Design concepts. 5 GDPR
  • 6. Confidential and Proprietary to Daugherty Business Solutions • Privacy by Design is an approach to systems engineering which takes privacy into account throughout the whole engineering process. • Privacy by Design is not about data protection but designing so data doesn't need protection. The root principle therefore is based on enabling service without data control transfer from the citizen to the system (the citizen become identifiable or recognizable). • How Privacy by Design is achieved depends on the application, technologies and choice of approach. • Whether the methodology actually achieve Privacy by Design is not to be evaluated based on intent or approach, but outcome. I.e. if data do not need protection to not represent a risk to the citizens, the principle of Privacy by Design can be said to be achieved. • Even "anonymized" data are still personal data if data are derived from personal data outside the control of the individual citizen in question or any means to re-identify or recognize citizens or citizen devices exist. Anonymous isn’t anonymous if you can reverse it. 6 Privacy by Design
  • 7. Confidential and Proprietary to Daugherty Business Solutions • One simple example is Dynamic Host Configuration Protocol (DHCP) where devices based on random identifiers gets an IP from the server and thus is enabled to communicate without having leaked personal identifiers per se. • A more advanced example is Global Positioning System where devices client-side can detect their geographical location without leaking identity or location. • Another example in Internet of Things is RFID where citizens' ability to communicate with their devices without leaking identifiers can be achieved using Zero-knowledge proof. 7 Good Privacy by Design
  • 8. Confidential and Proprietary to Daugherty Business Solutions • Is your company doing business with European entities? • Have you analyzed your IT infrastructure for privacy leakage? • Can someone reverse your anonymity algorithms? • How can you better address privacy concerns? • “The cloud industry is aware of GDPR. We're actually scrambling to find more IT consulting firms/partners who could handle this to come on board with us. Many of the U.S. companies, some in St. Louis, that do business with EU, have data stored in EU, or are acquired by/acquiring E • https://www.datanami.com/this-just-in/mapr-talend-collaborate-deliver- governed-gdpr-data-lake-solution/ • https://hortonworks.com/webinar/apache-atlas-ranger-can-help-become- gdpr-compliant/ 8 Discussion
  • 9. Confidential and Proprietary to Daugherty Business Solutions 9 Why Spark in Pictures https://www.sigmoid.com/apache-spark-internals/
  • 10. Confidential and Proprietary to Daugherty Business Solutions • The client wants to get a real-time view of where the tweets about them are coming from. 10 Problem statement
  • 11. Confidential and Proprietary to Daugherty Business Solutions When deploying Spark for use, you have a couple of options 11 Infrastructure
  • 12. Confidential and Proprietary to Daugherty Business Solutions • Scala • Java • Python 12 Language val lines = sc.textFile("data.txt") val lineLengths = lines.map(s => s.length) val totalLength = lineLengths.reduce((a, b) => a + b) JavaRDD<String> lines = sc.textFile("data.txt"); JavaRDD<Integer> lineLengths = lines.map(s -> s.length()); int totalLength = lineLengths.reduce((a, b) -> a + b); lines = sc.textFile("data.txt") lineLengths = lines.map(lambda s: len(s)) totalLength = lineLengths.reduce(lambda a, b: a + b)
  • 13. Confidential and Proprietary to Daugherty Business Solutions 13 Components
  • 14. Confidential and Proprietary to Daugherty Business Solutions • Spark Batch (or Core) is used to perform batch computations on sets of data. • Becoming the base language for much of the current stack of Hadoop processing – Mahout moves from MapReduce engine to Spark • Expressed as a series of transformations and actions on your RDDs. • Transformations are lazily applied once an action is invoked. 14 Spark Batch
  • 15. Confidential and Proprietary to Daugherty Business Solutions • Module for structured data processing. • Interaction modes include – SQL – DataFrames API – Datasets API • Can join sets of objects with tables • Can be used to expose data sets to external applications 15 Spark SQL
  • 16. Confidential and Proprietary to Daugherty Business Solutions 16 Spark Streaming
  • 17. Confidential and Proprietary to Daugherty Business Solutions • Combination of Spark SQL and Spark Streaming. • Provides – Fast – Scalable – Fault-tolerant – End to end – Exactly One stream processing without the user having to reason about streaming. • New in v2.1 • Hadoop vendors may not have implemented this functionality 17 Structured Streaming
  • 18. Confidential and Proprietary to Daugherty Business Solutions • Spark’s Machine Learning Library • ML Algorithms – Classification – Regression – Clustering – Collaborative Filtering • Featurization • Pipelines • Persistence • Data Science Utilities 18 Spark MLIB
  • 19. Confidential and Proprietary to Daugherty Business Solutions • Graph processing • Includes – Graph abstraction – Graph operations – Pregel API – Graph algorithms – Graph Builders 19 GraphX
  • 20. Confidential and Proprietary to Daugherty Business Solutions • Transformations • Actions 20 Spark Development T:filter T:map T:flatMap A:forEach T:map A:save
  • 21. Confidential and Proprietary to Daugherty Business Solutions • Start the REPL from the command line: – spark-shell • Creates Interactive Scala interpreter with Spark libraries • Can add additional libraries into REPL on Launch 21 REPL
  • 22. Confidential and Proprietary to Daugherty Business Solutions // Create a local StreamingContext with two working thread and batch // interval of 60 second SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("SparkDemo"); JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(60)); SparkSession spark = SparkSession .builder() .appName("Java Spark SQL basic example") .getOrCreate(); // Get Data // Process Data // Act on the Data jssc.start(); try { jssc.awaitTermination(); } catch (InterruptedException e) { e.printStackTrace(); } 22 Shell of our application
  • 23. Confidential and Proprietary to Daugherty Business Solutions 23 Get the Data // Get Data JavaReceiverInputDStream<Status> stream = TwitterUtils.createStream(jssc, filters);
  • 24. Confidential and Proprietary to Daugherty Business Solutions Transformations Actions filter(func) reduce(func) flatMap(func) collect() map(func) count() mapPartitions(func) first() sample(withReplacement, fraction, seed) take(n) union(otherDataset) takeSample(withReplacement, num, [seed]) intersection(otherDataset) takeOrdered(n, [ordering]) pipe(command, [envVars]) saveAsSequenceFile(path) (Java and Scala) coalesce(numPartitions) saveAsObjectFile(path) (Java and Scala) repartition(numPartitions) saveAsTextFile(path) repartitionAndSortWithinPartitions(partitioner) countByKey() join(otherDataset, [numTasks]) foreach(func) cogroup(otherDataset, [numTasks]) groupByKey([numTasks]) reduceByKey(func, [numTasks]) aggregateByKey(zeroValue)(seqOp, combOp, [numTasks]) sortByKey([ascending], [numTasks]) 24 Transformations and Actions
  • 25. Confidential and Proprietary to Daugherty Business Solutions // Get Data JavaReceiverInputDStream<Status> stream = TwitterUtils.createStream(jssc, filters); // Process Data JavaDStream<Status> filteredStream = stream.filter(new FilterNullGeoLocation()); JavaPairDStream<String, Long> geoHashCounts = filteredStream .mapToPair(new StatusToGeoHashCountPair()); // Act on the Data geoHashCounts = geoHashCounts.reduceByKey(new CombineCounts()); geoHashCounts.print(); geoHashCounts.foreachRDD(new SaveAsPlaces(spark)); 25 Process the Data
  • 26. Confidential and Proprietary to Daugherty Business Solutions 26 Bring the Func JavaPairDStream<String, Long> geoHashCounts = filteredStream.mapToPair(new StatusToGeoHashCountPair()); public class StatusToGeoHashCountPair implements PairFunction<Status, String, Long> { @Override public Tuple2<String, Long> call(Status status) throws Exception { return new Tuple2<String, Long>(GeoHash.geoHashStringWithCharacterPrecision( status.getGeoLocation().getLatitude(), status.getGeoLocation().getLongitude(), 5), 1L); } }
  • 27. Confidential and Proprietary to Daugherty Business Solutions public class StatusToGeoHashCountPair implements PairFunction<Status, String, Long> { @Override public Tuple2<String, Long> call(Status status) throws Exception { return new Tuple2<String, Long> ( GeoHash.geoHashStringWithCharacterPrecision( status.getGeoLocation().getLatitude(), status.getGeoLocation().getLongitude(), 5) , 1L); } } public class StatusToGeoHashCountPairTest { @Test public void getsExpectedResult() throws Exception { Status status = mock(Status.class); when(status.getGeoLocation()).thenReturn(new GeoLocation(10, -20)); Tuple2<String, Long> tuple = new StatusToGeoHashCountPair().call(status); assertEquals("e9cbb", tuple._1()); assertEquals(new Long(1L), tuple._2()); } } 27 Testing Your functions example
  • 28. Confidential and Proprietary to Daugherty Business Solutions • Understanding closures • ForEach; Connections • Stream Start 28 Lessons Learned
  • 29. Confidential and Proprietary to Daugherty Business Solutions • Non-deterministic run-time • Testing a distributed application 29 Debugging
  • 30. Confidential and Proprietary to Daugherty Business Solutions • Deploy each jar to each client • Create a Maven Uberjar using Shade <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>3.0.0</version> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> </execution> </executions> </plugin> </plugins> </build> 30 Deployment
  • 31. Confidential and Proprietary to Daugherty Business Solutions • Finding errors • Keeping streaming timeline below processing line 31 Monitoring #Trump #McDonalds
  • 32. Confidential and Proprietary to Daugherty Business Solutions Join Our Team Contact: Adam.doyle@daugherty.com
  • 33. Confidential and Proprietary to Daugherty Business Solutions