Utilizing Apache NiFi we read various open data REST APIs and camera feeds to ingest crime and related data real-time streaming it into HBase and Phoenix tables. HBase makes an excellent storage option for our real-time time series data sources. We can immediately query our data utilizing Apache Zeppelin against Phoenix tables as well as Hive external tables to HBase.
Apache Phoenix tables also make a great option since we can easily put microservices on top of them for application usage. I have an example Spring Boot application that reads from our Philadelphia crime table for front-end web applications as well as RESTful APIs.
Apache NiFi makes it easy to push records with schemas to HBase and insert into Phoenix SQL tables.
Resources:
https://community.hortonworks.com/articles/54947/reading-opendata-json-and-storing-into-phoenix-tab.html
https://community.hortonworks.com/articles/56642/creating-a-spring-boot-java-8-microservice-to-read.html
https://community.hortonworks.com/articles/64122/incrementally-streaming-rdbms-data-to-your-hadoop.html
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Report
Share
1 of 19
More Related Content
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
1. Tracking Crime as It Occurs with Apache Phoenix,
Apache HBase and Apache NiFi
TIMOTHY SPANN
Field Engineer, Data in Motion
Cloudera
2. Introduction
Tim Spann has been running meetups in Princeton on Big Data technologies since 2015.
Tim has spoken at many international conferences on Apache NiFi, Deep Learning and
Streaming.
https://community.hortonworks.com/users/9304/tspann.html
https://dzone.com/users/297029/bunkertor.html
https://www.meetup.com/futureofdata-princeton/
https://dzone.com/articles/integrating-keras-tensorflow-yolov3-into-apache-ni
3. Introduction
Using Apache NiFi we can ingest various sources of criminal data real-time as activities happen as well as monitor
live traffic cameras (Source: TrafficLand).
We can do a lot of alerting, routing and react to crime data as it arrives, but we need more. We need to update
totals, store this data for future machine learning analytics and make it available for instant update dashboards and
reports.
The best destination for this data is Apache HBase and Apache Phoenix. We’ll populate tables with ease and speed!
Resources:
https://community.hortonworks.com/articles/54947/reading-opendata-json-and-storing-into-phoenix-tab.html
https://community.hortonworks.com/articles/56642/creating-a-spring-boot-java-8-microservice-to-read.html
https://community.hortonworks.com/articles/64122/incrementally-streaming-rdbms-data-to-your-hadoop.html
11. Apache Phoenix-5.0
• Expect similar timeframe for Phoenix-5.0
• We are working for HBase-2.0 support
• Re-write internals using Apache Calcite
• SQL-parser, planner and optimizer
• Cost based Optimizer used by Hive, Drill, etc
• Pluggable rules with default rules, and Phoenix specific ones
• SQL-92 support
• Apache NiFi calls Apache Calcite Avatica JDBC