Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
How to use Apache Zeppelin with
Hortonworks HDB
Dan Baskette
December 2016
HORTONWORKS
HDBPowered by Apache HAWQ
2© 2016 Pivotal Software, Inc. All rights reserved.
Agenda
● Hortonworks HDB/HAWQ
● Apache Zeppelin
● Demo
● Resources
3© 2016 Pivotal Software, Inc. All rights reserved.
What is HDB / Apache HAWQ ?
Hadoop-native SQL query engine and
advanced analytics MPP database that
offers high-performance interactive
query execution and machine learning
to Data Analysts & Data Scientists who
want to find insights in large/complex
datasets.
Pivotal HDB
HORTONWORKS
HDBPowered by Apache HAWQ
4© 2016 Pivotal Software, Inc. All rights reserved.
Advanced Analytics
Performance
Exceptional MPP performance, low latency,
ACID reliability, data federation
ANSI SQL Compliance
Higher degree of SQL compatibility, SQL-92,
99, 2003, OLAP (leverage existing SQL skills)
Advanced Query Optimizer
Maximize performance and
do advanced queries with confidence
Elastic Architecture for
Scalability
Scale-up/down or scale-in/out,
expand/shrink clusters on the fly
Integrated w/MADlib
Machine Learning
Advanced MPP analytics, data science at
scale, directly on Hadoop data
HDB / HAWQ Advantages
MAD
5© 2016 Pivotal Software, Inc. All rights reserved.
Apache MADlib: In-Database Machine Learning
• ApacheTM MADlib® (incubating) is an open-source library for scalable in-database
analytics
• Provides parallel implementations of mathematical, statistical and machine learning
methods for structured and unstructured data
• Supports Apache HAWQ, Greenplum Database and Postgres
• Analytics on all data in-database, without sampling (produces more accurate results, less
effort)
http://madlib.incubator.apache.org
6© 2016 Pivotal Software, Inc. All rights reserved.
• A web-based notebook that enables interactive
data analytics.
• Used to build data-driven, interactive and
collaborative documents with SQL, Scala and
more.
• Used for data ingestion, discovery, analytics,
visualization, and collaboration
• Very Extensible
Apache Zeppelin
7© 2016 Pivotal Software, Inc. All rights reserved.
• Any language/data processing
engine can be plugged into
Zeppelin
• Supports many engines out of the
box
• Support Apache HAWQ,
Greenplum Database, and
PostgreSQL via psql interpreter.
This interface will be merging with
JDBC interpreter.
Apache Zeppelin Interpreters
8© 2016 Pivotal Software, Inc. All rights reserved.
Apache Zeppelin Example
9© 2016 Pivotal Software, Inc. All rights reserved.
Learn more
http://hortonworks.com/apache/hawq/
Recording: http://hortonworks.com/webinar/use-apache-zeppelin-
hortonworks-hdb/
10© 2016 Pivotal Software, Inc. All rights reserved.

More Related Content

How to Use Apache Zeppelin with HWX HDB

  • 1. How to use Apache Zeppelin with Hortonworks HDB Dan Baskette December 2016 HORTONWORKS HDBPowered by Apache HAWQ
  • 2. 2© 2016 Pivotal Software, Inc. All rights reserved. Agenda ● Hortonworks HDB/HAWQ ● Apache Zeppelin ● Demo ● Resources
  • 3. 3© 2016 Pivotal Software, Inc. All rights reserved. What is HDB / Apache HAWQ ? Hadoop-native SQL query engine and advanced analytics MPP database that offers high-performance interactive query execution and machine learning to Data Analysts & Data Scientists who want to find insights in large/complex datasets. Pivotal HDB HORTONWORKS HDBPowered by Apache HAWQ
  • 4. 4© 2016 Pivotal Software, Inc. All rights reserved. Advanced Analytics Performance Exceptional MPP performance, low latency, ACID reliability, data federation ANSI SQL Compliance Higher degree of SQL compatibility, SQL-92, 99, 2003, OLAP (leverage existing SQL skills) Advanced Query Optimizer Maximize performance and do advanced queries with confidence Elastic Architecture for Scalability Scale-up/down or scale-in/out, expand/shrink clusters on the fly Integrated w/MADlib Machine Learning Advanced MPP analytics, data science at scale, directly on Hadoop data HDB / HAWQ Advantages MAD
  • 5. 5© 2016 Pivotal Software, Inc. All rights reserved. Apache MADlib: In-Database Machine Learning • ApacheTM MADlib® (incubating) is an open-source library for scalable in-database analytics • Provides parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data • Supports Apache HAWQ, Greenplum Database and Postgres • Analytics on all data in-database, without sampling (produces more accurate results, less effort) http://madlib.incubator.apache.org
  • 6. 6© 2016 Pivotal Software, Inc. All rights reserved. • A web-based notebook that enables interactive data analytics. • Used to build data-driven, interactive and collaborative documents with SQL, Scala and more. • Used for data ingestion, discovery, analytics, visualization, and collaboration • Very Extensible Apache Zeppelin
  • 7. 7© 2016 Pivotal Software, Inc. All rights reserved. • Any language/data processing engine can be plugged into Zeppelin • Supports many engines out of the box • Support Apache HAWQ, Greenplum Database, and PostgreSQL via psql interpreter. This interface will be merging with JDBC interpreter. Apache Zeppelin Interpreters
  • 8. 8© 2016 Pivotal Software, Inc. All rights reserved. Apache Zeppelin Example
  • 9. 9© 2016 Pivotal Software, Inc. All rights reserved. Learn more http://hortonworks.com/apache/hawq/ Recording: http://hortonworks.com/webinar/use-apache-zeppelin- hortonworks-hdb/
  • 10. 10© 2016 Pivotal Software, Inc. All rights reserved.