Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
APACHE HIVE
(Apache Hadoop Sub Project)


Agenda:
 Story – Making of Apache Hive
 What is Apache Hive
 Physical Layout
 Hive CLI
 Hive QL
Introduction to Apache Hive
Can Elephants Fly?




Concern: Can hadoop be used more efficiently/fruitfully by developers?

                 © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   3
Introduction to Apache Hive

Recommended for you

Big Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive ComparisonBig Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive Comparison

In a recent Big Data Warehousing Meetup in NYC, Caserta Concepts partnered with Datameer to explore big data analytics techniques. In the presentation, we made a Hive vs. Pig Comparison. For more information on our services or this presentation, please visit www.casertaconcepts.com or contact us at info (at) casertaconcepts.com. http://www.casertaconcepts.com

map reducedata warehousepig and hadoop
Oracle Migration to Postgres in the Cloud
Oracle Migration to Postgres in the CloudOracle Migration to Postgres in the Cloud
Oracle Migration to Postgres in the Cloud

Join Marc Linster and Kachan Mohitey as they show you how to migrate from Oracle to Postgres in the cloud. This hands-on webinar will cover a number of topics including: Highlights include: • Identifying good migration candidates • Reviewing the key capabilities needed to run Postgres reliably in the cloud • Demoing on how to migrate tables, views, stored procedures, data, etc.

 
by EDB
edbedb postgresedb postgres cloud management
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowImproving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache Arrow

This document discusses improving Python and Spark performance and interoperability with Apache Arrow. It begins with an overview of current limitations of PySpark UDFs, such as inefficient data movement and scalar computation. It then introduces Apache Arrow, an open source in-memory columnar data format, and how it can help by allowing more efficient data sharing and vectorized computation. The document shows how Arrow improved PySpark UDF performance by 53x through vectorization and reduced serialization. It outlines future plans to further optimize UDFs and integration with Spark and other projects.

sparkcolumnarapache
Thinking…. ?
Step 1. Give him Wings




                                                        Mr. Hadoop energizing himself.




         © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved                           5
Thinking… ?
Step 2. Pray to Gravity

Thanks to gravity, sky never fell down on us ;)
But wait 2012 is not yet over. Keep Praying.




                     Mr. Hadoop enjoying his first air ride.

   “God did not create the universe, gravity did” - Stephen Hawking

                   © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   6
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   7
Upshot of the down-fall




              Victims                                                          Mr. Hadoo
                                                                                        p – The Fly
                                                                                                   ing Elephan
                                                                                                              t


Blame Gravity! The Fall will have a huge impact.




                           © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved                                  8

Recommended for you

The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...

1) Columnar formats like Parquet, Kudu and Arrow provide more efficient data storage and querying by organizing data by column rather than row. 2) Parquet provides an immutable columnar format well-suited for storage, while Kudu allows for mutable updates but is optimized for scans. Arrow provides an in-memory columnar format focused on CPU efficiency. 3) By establishing common in-memory and on-disk columnar standards, Arrow and Parquet enable more efficient data sharing and querying across systems without serialization overhead.

hadoop summit
Authoring and Hosting Applications on YARN using Slider
Authoring and Hosting Applications on YARN using SliderAuthoring and Hosting Applications on YARN using Slider
Authoring and Hosting Applications on YARN using Slider

The document discusses authoring and hosting applications on YARN using Slider. It provides an overview of Slider, which allows deploying and managing applications on a YARN cluster. It then covers topics like simplified packaging that makes it easier to run simple applications, application upgrades using rolling upgrades without downtime, security enhancements like application keytabs and certificate stores, and integration with Docker to deploy Dockerized applications on YARN via Slider.

apache hadoophadoop summithadoop
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN

The document discusses enabling diverse workload scheduling in YARN. It covers several topics including node labeling, resource preemption, reservation systems, pluggable scheduler behavior, and Docker container support in YARN. The presenters are Wangda Tan and Craig Welch from Hortonworks who have experience with big data systems like Hadoop, YARN, and OpenMPI. They aim to discuss how these features can help different types of workloads like batch, interactive, and real-time jobs run together more happily in YARN.

hadoophortonworkshadoop summit
Introduction to Apache Hive
Saving Life…
                                  Step1. Shrink


BEFORE -




          ACME Elephant Shrinker


AFTER -


                        © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   10
Saving Life…
Step2. Genetic Engineering & a bit of magic
         BEFORE                                                     AFTER




                                             Mr. Hadoop

                                                                    Ms. Hive




                    Injecting Insecto-receptors



            © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved              11
Introduction to Apache Hive

Recommended for you

Big Data Certification
Big Data CertificationBig Data Certification
Big Data Certification

This document provides information about Big Data certifications. It discusses why individuals and companies may want to pursue certifications, the various certification options available, what the certification tests entail, and next steps after completing a certification. Certifications can provide benefits like partnerships with vendors, discounts, and publicity for consulting firms and companies. The document outlines certification options for Hadoop developers, administrators, data analysts, and Spark developers from vendors like Cloudera, Hortonworks, and MapR. It provides sample exam objectives and available study materials. The certification tests are remotely proctored and may provide access to a test cluster. Results are typically available the same day, and the document recommends sharing the certification accomplishment with employers and professional networks

hadoopbig datacertification
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQSQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQ

Pivotal, la plateforme Big Data signé EMC, embarque des technologies pour gérer des requêtes sql en mémoire très performante et pas que ... Présentation de Alexandre Vasseur et Jérôme Campo de Pivotal

pivotalhadoopsql
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view

These slides provide highlights of my book HDInsight Essentials. Book link is here: http://www.packtpub.com/establish-a-big-data-solution-using-hdinsight/book

hadoopbigdatahdp
Behind the scenes…?




Hive was initially developed by Facebook.


 © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   13
 Hive is a datawarehouse infrastructure built
  on top of hadoop.
 Supports analysis of large datasets stored in
  Hadoop compatible file systems like HDFS,
  Amazon S3 fs.
 Provides SQL-like query language called
  HiveQL.
 To accelerate queries, it provides indexing.


            © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   14
   Warehouse directory in hdfs
     /user/hive/warehouse
   Tables ~ Subdirectories of warehouse
   Partitions ~ Subdirectories of corresponding
    Table directory.




               © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   15
 Hive Queries are implicitly converted to map-
  reduce code by hive engine.
 Compiler translates all the queries into a
  directed acyclic graph of map-reduce jobs.
 These map-reduce jobs are sent to hadoop
  for execution.



            © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   16

Recommended for you

Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop

This document provides an overview of real-time processing capabilities on Hortonworks Data Platform (HDP). It discusses how a trucking company uses HDP to analyze sensor data from trucks in real-time to monitor for violations and integrate predictive analytics. The company collects data using Kafka and analyzes it using Storm, HBase and Hive on Tez. This provides real-time dashboards as well as querying of historical data to identify issues with routes, trucks or drivers. The document explains components like Kafka, Storm and HBase and how they enable a unified YARN-based architecture for multiple workloads on a single HDP cluster.

internet of thingshadoop summitiot
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense

Hortonworks SmartSense provides proactive recommendations that improve cluster performance, security and operations. And since 30% of issues are configuration related, Hortonworks SmartSense makes an immediate impact on Hadoop system performance and availability, in some cases boosting hardware performance by two times. Learn how SmartSense can help you increase the efficiency of your Hadoop hardware, through customized cluster recommendations. View the on-demand webinar: https://hortonworks.com/webinar/boosts-hadoop-hardware-performance-2x-smartsense/

hadoopsmartsense
High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig Latin

This slide deck is used as an introduction to the Apache Pig system and the Pig Latin high-level programming language, as part of the Distributed Systems and Cloud Computing course I hold at Eurecom. Course website: http://michiard.github.io/DISC-CLOUD-COURSE/ Sources available here: https://github.com/michiard/DISC-CLOUD-COURSE

hadooppig latinoptimization
   /user/hive directory is created automatically as soon
    as hive session is started first time.
   /user/hive/warehouse directory shall be accessible
    by all.
     hadoop dfs -chmod –R 1777 /user/hive/warehouse
   Recommended to activate sticky bit if supported by
    the hadoop version installed on cluster.
   /tmp directory shall also be made as a sticky
    directory.
     hadoop dfs –chmod –R 1777 /tmp

                © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   17
   Hive CLI(Command Line Interface) can be
    invoked by hive command.
     % hive




               © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   18
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   19
Introduction to Apache Hive

Recommended for you

Big data overview by Edgars
Big data overview by EdgarsBig data overview by Edgars
Big data overview by Edgars

Big Data" šodien ir viens no populārākajiem mārketinga saukļiem, kas tiek pamatoti un nepamatoti izmantots, runājot par (lielu?) datu uzglabāšanu un apstrādi. Prezentācijā es aplūkošu, kas tad patiesībā ir "big data" no tehnoloģijju viedokļa, kādi ir galvenie izmantošanas scenāriji un ieguvumi. Prezentācijā apskatīšu tādas tehnoloģijas kā Hadoop, HDFS, MapReduce, Impala, Sparc, Pig, Hive un citas. Tāpat tiks apskatīta integrācija ar tradicionālām DBVS un galvenie izmantošanas scenāriji.

oraclelvougbig data
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB

Part five in a five-part series, this webcast will be a demonstration of the integration of Apache Zeppelin and Pivotal HDB. Apache Zeppelin is a web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more. This webinar will demonstrate the configuration of the psql interpreter and the basic operations of Apache Zeppelin when used in conjunction with Hortonworks HDB.

apache zeppelinhortonworkspivotal
Introduction to pig
Introduction to pigIntroduction to pig
Introduction to pig

This document provides an introduction to Apache Pig, including: - Pig is a system for processing large unstructured data using HDFS and MapReduce. It uses a high-level data flow language called Pig Latin. - Pig aims to increase programmer productivity by abstracting low-level MapReduce jobs and providing a procedural language for parallel data flows. - Pig components include the Pig engine for parsing, optimizing, and executing queries, and the Grunt shell for running interactive commands. - The document then covers Pig data types, input/output, relational operations, user-defined functions, and new features in Pig version 0.10.0.

 DML’s
  ▪ Select
 DDL’s
  ▪ SHOW TABLES
  ▪ CREATE TABLE
  ▪ ALTER TABLE
  ▪ DROP TABLE




          © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   21
Introduction to Apache Hive
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   23
   Normal Tables are created under warehouse
    directory. (source Data migrates to warehouse)
   Normal Tables are directly visible through hdfs
    directory browsing.
   On Dropping a normal table, the source data and
    table meta data both are deleted.
   External Tables read directly from hdfs files.
   External tables not visible in warehouse
    directory.
   On Dropping an external table, only the meta
    data is deleted but not the source data.

              © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   24

Recommended for you

DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN

DeathStar is a system that runs HBase on YARN to provide easy, dynamic multi-tenant HBase clusters via YARN. It allows different applications to run HBase in separate application-specific clusters on a shared HDFS and YARN infrastructure. This provides strict isolation between applications and enables dynamic scaling of clusters as needed. Some key benefits are improved cluster utilization, easier capacity planning and configuration, and the ability to start new clusters on demand without lengthy provisioning times.

yarnhadoop summithadoop
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?

S3Guard provides a consistent metadata store for S3 using DynamoDB. It allows file system operations on S3, like listing and getting file status, to be consistent by checking results from S3 against metadata stored in DynamoDB. Mutating operations write to both S3 and DynamoDB, while read operations first check S3 results against DynamoDB to handle eventual consistency in S3. The goal is to improve performance of real workloads by providing consistent metadata operations on S3 objects written with S3Guard enabled.

hortonworks
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...

Description of how Sematext SPM Performance Monitoring service is built and how it works. Originally presented at Berlin Buzzwords 2012.

performance monitoringsolralerts
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   25
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   26
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   27
 Hive QL supports Joins on only equality
  expressions. Complex boolean expressions,
  inequality conditions are not supported.
 More than 2 tables can be joined.
 Number of map-reduce jobs generated for a
  join depend on the columns being used.
     If same col is used for all the tables, then n=1
     Otherwise n>1


                © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   28

Recommended for you

Hadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA confHadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA conf

The document is a presentation on new features in Hadoop 2. Some key highlights include: - Hadoop 2 introduces NameNode high availability to address single point of failure through an active-passive setup using shared storage. - Federation allows spreading metadata over multiple NameNodes for very large clusters. - Snapshots provide point-in-time copies of data for backup and recovery from deletes or disasters. - YARN separates processing from resource management, allowing various types of applications beyond batch processing.

hadoophadoop2
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview

Hadoop is an open-source framework for distributed storage and processing of large datasets across clusters of commodity hardware. It addresses challenges in handling large amounts of data in a scalable, cost-effective manner. While early adoption was in web companies, enterprises are increasingly adopting Hadoop to gain insights from new sources of big data. However, Hadoop deployment presents challenges for enterprises in areas like setup/configuration, skills, integration, management at scale, and backup/recovery. Greenplum HD addresses these challenges by providing an enterprise-ready Hadoop distribution with simplified deployment, flexible scaling of compute and storage, seamless analytics integration, and advanced management capabilities backed by enterprise support.

 
by EMC
apache hadoopbig data & analyticshadoop
Track B-2: Advancing Collaboration & eLearning to Achieve Mission Goals, ...
Track B-2: Advancing Collaboration & eLearning to Achieve Mission Goals, ...Track B-2: Advancing Collaboration & eLearning to Achieve Mission Goals, ...
Track B-2: Advancing Collaboration & eLearning to Achieve Mission Goals, ...

This document summarizes a presentation about Adobe Connect for government use. It discusses how government agencies are using Adobe Connect for online training and collaboration. It also outlines Adobe's plans to support HTML5 to allow access without Flash and achieve FedRAMP compliance. The presentation demonstrates current HTML5 capabilities and indicates Adobe is working to fully deliver Adobe Connect via HTML5 as browsers progress.

goalsgovernmentfedscoop
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   29
 HiveQL Doesn’t follow SQL-92 standard
 Lack support
     No Materialized views
     No Transaction level support
     Limited Sub-query support




               © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   30
Hadoop – Entering into the new world!




    © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   31
Reach me




                    Tapan Avasthi
Associate Software Developer Intern, Travelocity Global
           tapan.avasthi@travelocity.com
             tapan.k.avasthi@gmail.com


        © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   32

Recommended for you

The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...

Hadoop / Spark Conference Japan 2016 キーノート講演資料 The Evolution and Future of Hadoop Storage Cloudera Todd Lipcon氏

hdfshadoopkudu
Building infrastructure for Big Data
Building infrastructure for Big DataBuilding infrastructure for Big Data
Building infrastructure for Big Data

This deck gives a sample overview on different pain points while building the infrastructure for big data and solutions to the same.

splunkvoldemortchef
Node.js and Photoshop Generator - JSConf Asia 2013
Node.js and Photoshop Generator - JSConf Asia 2013Node.js and Photoshop Generator - JSConf Asia 2013
Node.js and Photoshop Generator - JSConf Asia 2013

Making Generator plugins for Photoshop with Node.js - slides for a talk I gave at JSConf Asia in Manila.

nodejsphotoshop

More Related Content

What's hot

Hive paris
Hive parisHive paris
Hive paris
Szehon Ho
 
Hive on mesos Strata
Hive on mesos StrataHive on mesos Strata
Hive on mesos Strata
Szehon Ho
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
DataWorks Summit
 
Big Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive ComparisonBig Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive Comparison
Caserta
 
Oracle Migration to Postgres in the Cloud
Oracle Migration to Postgres in the CloudOracle Migration to Postgres in the Cloud
Oracle Migration to Postgres in the Cloud
EDB
 
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowImproving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache Arrow
Julien Le Dem
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
DataWorks Summit/Hadoop Summit
 
Authoring and Hosting Applications on YARN using Slider
Authoring and Hosting Applications on YARN using SliderAuthoring and Hosting Applications on YARN using Slider
Authoring and Hosting Applications on YARN using Slider
DataWorks Summit
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
DataWorks Summit
 
Big Data Certification
Big Data CertificationBig Data Certification
Big Data Certification
Adam Doyle
 
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQSQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
Modern Data Stack France
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
Rajesh Nadipalli
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
DataWorks Summit
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
Hortonworks
 
High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig Latin
Pietro Michiardi
 
Big data overview by Edgars
Big data overview by EdgarsBig data overview by Edgars
Big data overview by Edgars
Andrejs Vorobjovs
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
Hortonworks
 
Introduction to pig
Introduction to pigIntroduction to pig
Introduction to pig
Ravi Mutyala
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DataWorks Summit
 
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?
Hortonworks
 

What's hot (20)

Hive paris
Hive parisHive paris
Hive paris
 
Hive on mesos Strata
Hive on mesos StrataHive on mesos Strata
Hive on mesos Strata
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
 
Big Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive ComparisonBig Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive Comparison
 
Oracle Migration to Postgres in the Cloud
Oracle Migration to Postgres in the CloudOracle Migration to Postgres in the Cloud
Oracle Migration to Postgres in the Cloud
 
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowImproving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache Arrow
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
 
Authoring and Hosting Applications on YARN using Slider
Authoring and Hosting Applications on YARN using SliderAuthoring and Hosting Applications on YARN using Slider
Authoring and Hosting Applications on YARN using Slider
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
 
Big Data Certification
Big Data CertificationBig Data Certification
Big Data Certification
 
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQSQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
 
High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig Latin
 
Big data overview by Edgars
Big data overview by EdgarsBig data overview by Edgars
Big data overview by Edgars
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
 
Introduction to pig
Introduction to pigIntroduction to pig
Introduction to pig
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
 
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?
 

Similar to Introduction to Apache Hive

Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Sematext Group, Inc.
 
Hadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA confHadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA conf
Sujee Maniyam
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview
EMC
 
Track B-2: Advancing Collaboration & eLearning to Achieve Mission Goals, ...
Track B-2: Advancing Collaboration & eLearning to Achieve Mission Goals, ...Track B-2: Advancing Collaboration & eLearning to Achieve Mission Goals, ...
Track B-2: Advancing Collaboration & eLearning to Achieve Mission Goals, ...
scoopnewsgroup
 
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
Hadoop / Spark Conference Japan
 
Building infrastructure for Big Data
Building infrastructure for Big DataBuilding infrastructure for Big Data
Building infrastructure for Big Data
PromptCloud
 
Node.js and Photoshop Generator - JSConf Asia 2013
Node.js and Photoshop Generator - JSConf Asia 2013Node.js and Photoshop Generator - JSConf Asia 2013
Node.js and Photoshop Generator - JSConf Asia 2013
Andy Hall
 
Paremus Cloud and OSGi Beyond the VM - OSGi Cloud Workshop March 2012
Paremus Cloud and OSGi Beyond the VM - OSGi Cloud Workshop March 2012Paremus Cloud and OSGi Beyond the VM - OSGi Cloud Workshop March 2012
Paremus Cloud and OSGi Beyond the VM - OSGi Cloud Workshop March 2012
mfrancis
 
Hadoop-as-a-Service for Lifecycle Management Simplicity
Hadoop-as-a-Service for Lifecycle Management SimplicityHadoop-as-a-Service for Lifecycle Management Simplicity
Hadoop-as-a-Service for Lifecycle Management Simplicity
DataWorks Summit
 
Go daddy.com Cloud Storage Solution (Adam Knapp)
Go daddy.com Cloud Storage Solution (Adam Knapp)Go daddy.com Cloud Storage Solution (Adam Knapp)
Go daddy.com Cloud Storage Solution (Adam Knapp)
Ontico
 
HBase and Hadoop at Adobe
HBase and Hadoop at AdobeHBase and Hadoop at Adobe
HBase and Hadoop at Adobe
Cosmin Lehene
 
Greenplum Database on HDFS
Greenplum Database on HDFSGreenplum Database on HDFS
Greenplum Database on HDFS
DataWorks Summit
 
OWF12/Java Sacha labourey
OWF12/Java Sacha laboureyOWF12/Java Sacha labourey
OWF12/Java Sacha labourey
Paris Open Source Summit
 
Machine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureMachine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and Future
Data Science London
 
Hadoop operations
Hadoop operationsHadoop operations
Hadoop operations
DataWorks Summit
 
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
Michael Arnold
 
Oop2012 keynote Design Driven Development
Oop2012 keynote Design Driven DevelopmentOop2012 keynote Design Driven Development
Oop2012 keynote Design Driven Development
Michael Chaize
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Allen Wittenauer
 
eFolder Webinar: How One Partner Leverages Dell AppAssure and StorageCraft
eFolder Webinar: How One Partner Leverages Dell AppAssure and StorageCrafteFolder Webinar: How One Partner Leverages Dell AppAssure and StorageCraft
eFolder Webinar: How One Partner Leverages Dell AppAssure and StorageCraft
Dropbox
 
Hadoop's Impact on the Future of Data Management | Amr Awadallah
Hadoop's Impact on the Future of Data Management | Amr AwadallahHadoop's Impact on the Future of Data Management | Amr Awadallah
Hadoop's Impact on the Future of Data Management | Amr Awadallah
Cloudera, Inc.
 

Similar to Introduction to Apache Hive (20)

Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
 
Hadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA confHadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA conf
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview
 
Track B-2: Advancing Collaboration & eLearning to Achieve Mission Goals, ...
Track B-2: Advancing Collaboration & eLearning to Achieve Mission Goals, ...Track B-2: Advancing Collaboration & eLearning to Achieve Mission Goals, ...
Track B-2: Advancing Collaboration & eLearning to Achieve Mission Goals, ...
 
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
 
Building infrastructure for Big Data
Building infrastructure for Big DataBuilding infrastructure for Big Data
Building infrastructure for Big Data
 
Node.js and Photoshop Generator - JSConf Asia 2013
Node.js and Photoshop Generator - JSConf Asia 2013Node.js and Photoshop Generator - JSConf Asia 2013
Node.js and Photoshop Generator - JSConf Asia 2013
 
Paremus Cloud and OSGi Beyond the VM - OSGi Cloud Workshop March 2012
Paremus Cloud and OSGi Beyond the VM - OSGi Cloud Workshop March 2012Paremus Cloud and OSGi Beyond the VM - OSGi Cloud Workshop March 2012
Paremus Cloud and OSGi Beyond the VM - OSGi Cloud Workshop March 2012
 
Hadoop-as-a-Service for Lifecycle Management Simplicity
Hadoop-as-a-Service for Lifecycle Management SimplicityHadoop-as-a-Service for Lifecycle Management Simplicity
Hadoop-as-a-Service for Lifecycle Management Simplicity
 
Go daddy.com Cloud Storage Solution (Adam Knapp)
Go daddy.com Cloud Storage Solution (Adam Knapp)Go daddy.com Cloud Storage Solution (Adam Knapp)
Go daddy.com Cloud Storage Solution (Adam Knapp)
 
HBase and Hadoop at Adobe
HBase and Hadoop at AdobeHBase and Hadoop at Adobe
HBase and Hadoop at Adobe
 
Greenplum Database on HDFS
Greenplum Database on HDFSGreenplum Database on HDFS
Greenplum Database on HDFS
 
OWF12/Java Sacha labourey
OWF12/Java Sacha laboureyOWF12/Java Sacha labourey
OWF12/Java Sacha labourey
 
Machine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureMachine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and Future
 
Hadoop operations
Hadoop operationsHadoop operations
Hadoop operations
 
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
 
Oop2012 keynote Design Driven Development
Oop2012 keynote Design Driven DevelopmentOop2012 keynote Design Driven Development
Oop2012 keynote Design Driven Development
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
 
eFolder Webinar: How One Partner Leverages Dell AppAssure and StorageCraft
eFolder Webinar: How One Partner Leverages Dell AppAssure and StorageCrafteFolder Webinar: How One Partner Leverages Dell AppAssure and StorageCraft
eFolder Webinar: How One Partner Leverages Dell AppAssure and StorageCraft
 
Hadoop's Impact on the Future of Data Management | Amr Awadallah
Hadoop's Impact on the Future of Data Management | Amr AwadallahHadoop's Impact on the Future of Data Management | Amr Awadallah
Hadoop's Impact on the Future of Data Management | Amr Awadallah
 

Recently uploaded

Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Mydbops
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
Stephanie Beckett
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
Enterprise Wired
 
K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024
The Digital Insurer
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
BookNet Canada
 
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating AppsecGDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
James Anderson
 
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design ApproachesKnowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Earley Information Science
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
Running a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU ImpactsRunning a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU Impacts
ScyllaDB
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Erasmo Purificato
 
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfINDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
jackson110191
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
Matthew Sinclair
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
ishalveerrandhawa1
 
5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx
SATYENDRA100
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
UiPathCommunity
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Chris Swan
 
How Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global ScaleHow Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global Scale
ScyllaDB
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
Yevgen Sysoyev
 

Recently uploaded (20)

Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
 
K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
 
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating AppsecGDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
 
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design ApproachesKnowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
Running a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU ImpactsRunning a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU Impacts
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
 
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfINDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
 
5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
 
How Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global ScaleHow Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global Scale
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
 

Introduction to Apache Hive

  • 1. APACHE HIVE (Apache Hadoop Sub Project) Agenda:  Story – Making of Apache Hive  What is Apache Hive  Physical Layout  Hive CLI  Hive QL
  • 3. Can Elephants Fly? Concern: Can hadoop be used more efficiently/fruitfully by developers? © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 3
  • 5. Thinking…. ? Step 1. Give him Wings Mr. Hadoop energizing himself. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 5
  • 6. Thinking… ? Step 2. Pray to Gravity Thanks to gravity, sky never fell down on us ;) But wait 2012 is not yet over. Keep Praying. Mr. Hadoop enjoying his first air ride. “God did not create the universe, gravity did” - Stephen Hawking © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 6
  • 7. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 7
  • 8. Upshot of the down-fall Victims Mr. Hadoo p – The Fly ing Elephan t Blame Gravity! The Fall will have a huge impact. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 8
  • 10. Saving Life… Step1. Shrink BEFORE - ACME Elephant Shrinker AFTER - © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 10
  • 11. Saving Life… Step2. Genetic Engineering & a bit of magic BEFORE AFTER Mr. Hadoop Ms. Hive Injecting Insecto-receptors © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 11
  • 13. Behind the scenes…? Hive was initially developed by Facebook. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 13
  • 14.  Hive is a datawarehouse infrastructure built on top of hadoop.  Supports analysis of large datasets stored in Hadoop compatible file systems like HDFS, Amazon S3 fs.  Provides SQL-like query language called HiveQL.  To accelerate queries, it provides indexing. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 14
  • 15. Warehouse directory in hdfs  /user/hive/warehouse  Tables ~ Subdirectories of warehouse  Partitions ~ Subdirectories of corresponding Table directory. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 15
  • 16.  Hive Queries are implicitly converted to map- reduce code by hive engine.  Compiler translates all the queries into a directed acyclic graph of map-reduce jobs.  These map-reduce jobs are sent to hadoop for execution. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 16
  • 17. /user/hive directory is created automatically as soon as hive session is started first time.  /user/hive/warehouse directory shall be accessible by all.  hadoop dfs -chmod –R 1777 /user/hive/warehouse  Recommended to activate sticky bit if supported by the hadoop version installed on cluster.  /tmp directory shall also be made as a sticky directory.  hadoop dfs –chmod –R 1777 /tmp © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 17
  • 18. Hive CLI(Command Line Interface) can be invoked by hive command.  % hive © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 18
  • 19. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 19
  • 21.  DML’s ▪ Select  DDL’s ▪ SHOW TABLES ▪ CREATE TABLE ▪ ALTER TABLE ▪ DROP TABLE © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 21
  • 23. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 23
  • 24. Normal Tables are created under warehouse directory. (source Data migrates to warehouse)  Normal Tables are directly visible through hdfs directory browsing.  On Dropping a normal table, the source data and table meta data both are deleted.  External Tables read directly from hdfs files.  External tables not visible in warehouse directory.  On Dropping an external table, only the meta data is deleted but not the source data. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 24
  • 25. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 25
  • 26. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 26
  • 27. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 27
  • 28.  Hive QL supports Joins on only equality expressions. Complex boolean expressions, inequality conditions are not supported.  More than 2 tables can be joined.  Number of map-reduce jobs generated for a join depend on the columns being used.  If same col is used for all the tables, then n=1  Otherwise n>1 © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 28
  • 29. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 29
  • 30.  HiveQL Doesn’t follow SQL-92 standard  Lack support  No Materialized views  No Transaction level support  Limited Sub-query support © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 30
  • 31. Hadoop – Entering into the new world! © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 31
  • 32. Reach me Tapan Avasthi Associate Software Developer Intern, Travelocity Global tapan.avasthi@travelocity.com tapan.k.avasthi@gmail.com © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 32