4 Building Blocks of A Streaming Data Architecture
4 Building Blocks of A Streaming Data Architecture
of a Streaming
Data Architecture
Whitepaper
Streaming data is becoming a core component of enterprise data architecture.
Streaming technologies are not new, but they have considerably matured over
the past year. The industry is moving from painstaking integration of
technologies like Kafka and Storm, towards full stack solutions that provide an
end-to-end streaming data architecture.
2
• Able to deal with never-ending streams of events—some data is naturally
structured this way. Traditional batch processing tools require stopping the
stream of events, capturing batches of data and combining the batches to
draw overall conclusions. In stream processing, while it is challenging to
combine and capture data from multiple streams, it lets you derive
immediate insights from large volumes of streaming data.
3
The Components of a Traditional
Streaming Architecture
1. The Message Broker
This is the element that takes data from a source, called a producer, translates it
into a standard message format, and streams it on an ongoing basis. Other
components can then listen in and consume the messages passed on by the
broker.
4
Unlike the old MoM brokers, streaming brokers support very high performance
with persistence, have massive capacity of a Gigabyte per second or more of
message traffic, and are tightly focused on streaming with no support for data
transformations or task scheduling. You can learn more about message brokers
in our article on analyzing Apache Kafka data.
5
A few examples of stream processors are Apache Storm, Spark Streaming and
WSO2 Stream Processor. While stream processors work in different ways, they
are all capable of listening to message streams, processing the data and saving
it to storage. Some stream processors, including Spark and WSO2, provide a SQL
syntax for querying and manipulating the data.
6
Kafka Connect can be used
to stream topics directly
into Elasticsearch. If you use
the Avro data format and a
schema registry,
Elasticsearch Text Search Elasticsearch mappings with
correct datatypes are
created automatically. You
can then perform rapid text
search or analytics within
Elasticsearch.
7
Streaming Data
Pros Cons
Storage Options
In a database or data
Hard to scale and manage. If
warehouse- for example,
Easy SQL-based data analysis cloud-based, storage is
PostgreSQL or Amazon
expensive
Redshift
8
A data lake is the most flexible and inexpensive option for storing event data,
but it has several limitations for streaming data applications. Upsolver provides
a data lake platform that ingests streaming data into a data lake, creates
schema-on-read, and extracts metadata. This allows data consumers to easily
prepare data for analytics tools and real time analytics.
9
Benefits of a modern streaming architecture:
• Newer platforms are cloud-based and can be deployed very quickly with no
upfront investment
10
• Automation of data plumbing—organizations are becoming reluctant to
spend precious data engineering time on data plumbing, instead of activities
that add value, such as data cleansing or enrichment. Increasingly, data
teams prefer full stack platforms that reduce time-to-value, over tailored
home-grown solutions.
You can read more of our predictions for streaming data trends here.
By using Upsolver, you get the best of both worlds—low cost storage on a data
lake, easy transformation to tabular formats, and real time support. Begin your
free trial to start building a next-gen streaming data architecture.
11