Ebook692 pages3 hours

Learning HBase

Name: Learning HBase
Author: Shashwat Shriparv
ISBN: 9781783985951

By Shashwat Shriparv

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Apache HBase is a nonrelational NoSQL database management system that runs on top of HDFS. It is an open source, distributed, versioned, column-oriented store. It facilitates the tech industry with random, real-time read/write access to your Big Data with the benefit of linear scalability on the fly.

This book will take you through a series of core tasks in HBase. The introductory chapter will give you all the information you need about the HBase ecosystem. Furthermore, you'll learn how to configure, create, verify, and test clusters. The book also explores different parameters of Hadoop and HBase that need to be considered for optimization and a trouble-free operation of the cluster. It will focus more on HBase's data model, storage, and structure layout. You will also get to know the different options that can be used to speed up the operation and functioning of HBase. The book will also teach the users basic- and advance-level coding in Java for HBase. By the end of the book, you will have learned how to use HBase with large data sets and integrate them with Hadoop.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateNov 25, 2014

ISBN9781783985951

Author

Shashwat Shriparv

Related authors

Skip carousel

Related to Learning HBase

Related ebooks

Skip carousel

Mastering Hadoop
Ebook
Mastering Hadoop
bySandeep Karanth
Rating: 0 out of 5 stars
0 ratings
Hadoop Blueprints
Ebook
Hadoop Blueprints
byAnurag Shrivastava
Rating: 0 out of 5 stars
0 ratings
Mastering Apache Cassandra - Second Edition
Ebook
Mastering Apache Cassandra - Second Edition
byNishant Neeraj
Rating: 0 out of 5 stars
0 ratings
HDInsight Essentials - Second Edition
Ebook
HDInsight Essentials - Second Edition
byRajesh Nadipalli
Rating: 0 out of 5 stars
0 ratings
Learn Hbase in 24 Hours
Ebook
Learn Hbase in 24 Hours
byAlex Nordeen
Rating: 0 out of 5 stars
0 ratings
Cloudera Administration Handbook
Ebook
Cloudera Administration Handbook
byRohit Menon
Rating: 0 out of 5 stars
0 ratings
DynamoDB Applied Design Patterns
Ebook
DynamoDB Applied Design Patterns
byUchit Vyas
Rating: 3 out of 5 stars
3/5
Hadoop MapReduce v2 Cookbook - Second Edition
Ebook
Hadoop MapReduce v2 Cookbook - Second Edition
byThilina Gunarathne
Rating: 0 out of 5 stars
0 ratings
Instant MapReduce Patterns – Hadoop Essentials How-to
Ebook
Instant MapReduce Patterns – Hadoop Essentials How-to
bySrinath Perera
Rating: 0 out of 5 stars
0 ratings
Apache Hive Essentials
Ebook
Apache Hive Essentials
byDayong Du
Rating: 0 out of 5 stars
0 ratings
PostgreSQL Development Essentials
Ebook
PostgreSQL Development Essentials
byManpreet Kaur
Rating: 5 out of 5 stars
5/5
Cassandra High Availability
Ebook
Cassandra High Availability
byRobbie Strickland
Rating: 5 out of 5 stars
5/5
Instant Redis Optimization How-to
Ebook
Instant Redis Optimization How-to
byArun Chinnachamy
Rating: 0 out of 5 stars
0 ratings
Amazon SimpleDB: LITE
Ebook
Amazon SimpleDB: LITE
byPrabhakar Chaganti
Rating: 0 out of 5 stars
0 ratings
Tika in Action
Ebook
Tika in Action
byJukka L. Zitting
Rating: 0 out of 5 stars
0 ratings
Practical OneOps
Ebook
Practical OneOps
byNilesh Nimkar
Rating: 0 out of 5 stars
0 ratings
Hadoop Real-World Solutions Cookbook - Second Edition
Ebook
Hadoop Real-World Solutions Cookbook - Second Edition
byDeshpande Tanmay
Rating: 0 out of 5 stars
0 ratings
Scala for Data Science
Ebook
Scala for Data Science
byBugnion Pascal
Rating: 0 out of 5 stars
0 ratings
Learn Hadoop in 24 Hours
Ebook
Learn Hadoop in 24 Hours
byAlex Nordeen
Rating: 0 out of 5 stars
0 ratings
Ultimate Azure Data Engineering: Build Robust Data Engineering Systems on Azure with SQL, ETL, Data Modeling, and Power BI for Business Insights and Crack Azure Certifications (English Edition)
Ebook
Ultimate Azure Data Engineering: Build Robust Data Engineering Systems on Azure with SQL, ETL, Data Modeling, and Power BI for Business Insights and Crack Azure Certifications (English Edition)
byAshish Agarwal
Rating: 0 out of 5 stars
0 ratings
MySQL 5.1 Plugin Development
Ebook
MySQL 5.1 Plugin Development
byAndrew Hutchings
Rating: 0 out of 5 stars
0 ratings
Getting Started with Big Data Query using Apache Impala
Ebook
Getting Started with Big Data Query using Apache Impala
byAgus Kurniawan
Rating: 0 out of 5 stars
0 ratings
Hadoop Cluster Deployment
Ebook
Hadoop Cluster Deployment
byDanil Zburivsky
Rating: 0 out of 5 stars
0 ratings
PostgreSQL 9 Administration Cookbook - Second Edition
Ebook
PostgreSQL 9 Administration Cookbook - Second Edition
bySimon Riggs
Rating: 0 out of 5 stars
0 ratings
Hadoop in Practice
Ebook
Hadoop in Practice
byAlex Holmes
Rating: 0 out of 5 stars
0 ratings
Instant Pentaho Data Integration Kitchen
Ebook
Instant Pentaho Data Integration Kitchen
bySergio Ramazzina
Rating: 0 out of 5 stars
0 ratings
Elasticsearch 8.x Cookbook: Over 180 recipes to perform fast, scalable, and reliable searches for your enterprise
Ebook
Elasticsearch 8.x Cookbook: Over 180 recipes to perform fast, scalable, and reliable searches for your enterprise
byAlberto Paro
Rating: 0 out of 5 stars
0 ratings
DynamoDB Cookbook
Ebook
DynamoDB Cookbook
byDeshpande Tanmay
Rating: 0 out of 5 stars
0 ratings
Apache Hive Cookbook
Ebook
Apache Hive Cookbook
byShrey Mehrotra
Rating: 0 out of 5 stars
0 ratings
Docker A Complete Guide - 2020 Edition
Ebook
Docker A Complete Guide - 2020 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings

Programming For You

Skip carousel

Coding All-in-One For Dummies
Ebook
Coding All-in-One For Dummies
byNikhil Abraham
Rating: 4 out of 5 stars
4/5
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
Ebook
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
byAnthony Adams
Rating: 4 out of 5 stars
4/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
SQL All-in-One For Dummies
Ebook
SQL All-in-One For Dummies
byAllen G. Taylor
Rating: 3 out of 5 stars
3/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
So You Want to Start a Podcast: Finding Your Voice, Telling Your Story, and Building a Community That Will Listen
Ebook
So You Want to Start a Podcast: Finding Your Voice, Telling Your Story, and Building a Community That Will Listen
byKristen Meinzer
Rating: 3 out of 5 stars
3/5
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
Ebook
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
byJason Scotts
Rating: 4 out of 5 stars
4/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
JavaScript All-in-One For Dummies
Ebook
JavaScript All-in-One For Dummies
byChris Minnick
Rating: 5 out of 5 stars
5/5
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
Ebook
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
byMark Chan
Rating: 5 out of 5 stars
5/5
Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time!
Ebook
Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time!
byJohannes Wild
Rating: 0 out of 5 stars
0 ratings
Linux: Learn in 24 Hours
Ebook
Linux: Learn in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
Python: Learn Python in 24 Hours
Ebook
Python: Learn Python in 24 Hours
byAlex Nordeen
Rating: 4 out of 5 stars
4/5
Spies, Lies, and Algorithms: The History and Future of American Intelligence
Ebook
Spies, Lies, and Algorithms: The History and Future of American Intelligence
byAmy B. Zegart
Rating: 4 out of 5 stars
4/5
Mastering Excel: Starter Set
Ebook
Mastering Excel: Starter Set
byMark Moore
Rating: 2 out of 5 stars
2/5
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
Ebook
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
byGwendolyn Faraday
Rating: 5 out of 5 stars
5/5
Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS
Ebook
Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS
byTravis Plunk
Rating: 5 out of 5 stars
5/5
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
Ebook
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
byHeath Haskins
Rating: 5 out of 5 stars
5/5
HTML & CSS: Learn the Fundaments in 7 Days
Ebook
HTML & CSS: Learn the Fundaments in 7 Days
byMichael Knapp
Rating: 4 out of 5 stars
4/5
Coding with JavaScript For Dummies
Ebook
Coding with JavaScript For Dummies
byChris Minnick
Rating: 0 out of 5 stars
0 ratings
Beginning Programming with C++ For Dummies
Ebook
Beginning Programming with C++ For Dummies
byStephen R. Davis
Rating: 4 out of 5 stars
4/5
HTML in 30 Pages
Ebook
HTML in 30 Pages
byU.Q. Magnusson
Rating: 5 out of 5 stars
5/5
HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design
Ebook
HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design
byDavid DuRocher
Rating: 4 out of 5 stars
4/5
C Programming For Beginners: The Simple Guide to Learning C Programming Language Fast!
Ebook
C Programming For Beginners: The Simple Guide to Learning C Programming Language Fast!
byTim Warren
Rating: 5 out of 5 stars
5/5
Python QuickStart Guide: The Simplified Beginner's Guide to Python Programming Using Hands-On Projects and Real-World Applications
Ebook
Python QuickStart Guide: The Simplified Beginner's Guide to Python Programming Using Hands-On Projects and Real-World Applications
byRobert Oliver
Rating: 0 out of 5 stars
0 ratings
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
Ebook
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
byJames Tudor
Rating: 5 out of 5 stars
5/5
Poirot's Early Cases
Ebook
Poirot's Early Cases
byAgatha Christie
Rating: 5 out of 5 stars
5/5
The Python Workshop: Learn to code in Python and kickstart your career in software development or data science
Ebook
The Python Workshop: Learn to code in Python and kickstart your career in software development or data science
byAndrew Bird
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

Distributing Geospatial Data: Distributing Geospatial Data - Every wondered why you might what to do this? Or maybe you understand the why but are unsure about the how? Perhaps you have heard people talk about partitioning data or sharding data, you might have heard some of thes...
UNLIMITED
Distributing Geospatial Data: Distributing Geospatial Data - Every wondered why you might what to do this? Or maybe you understand the why but are unsure about the how? Perhaps you have heard people talk about partitioning data or sharding data, you might have heard some of thes...
byThe MapScaping Podcast - GIS, Geospatial, Remote Sensing, earth observation and digital geography
0 ratings
0% found this document useful
Episode 101. Allright, let's talk about Kafka: Whew! So we took a big break over summer (like Bob said, we were just swamped with work.. oof), but we are BACK! and like always we are ready to explore even deeper Java topics for the professional developer. This time we set our sights in Apache...
UNLIMITED
Episode 101. Allright, let's talk about Kafka: Whew! So we took a big break over summer (like Bob said, we were just swamped with work.. oof), but we are BACK! and like always we are ready to explore even deeper Java topics for the professional developer. This time we set our sights in Apache...
byJava Pub House
0 ratings
0% found this document useful
Iceberg at Netflix and Beyond with Ryan Blue: Apache Iceberg is an open source high-performance format for huge data tables. Iceberg enables the use of SQL tables for big data, while making it possible for engines like Spark and Hive to safely work with the same tables, at the same time.
UNLIMITED
Iceberg at Netflix and Beyond with Ryan Blue: Apache Iceberg is an open source high-performance format for huge data tables. Iceberg enables the use of SQL tables for big data, while making it possible for engines like Spark and Hive to safely work with the same tables, at the same time.
byData Archives - Software Engineering Daily
0 ratings
0% found this document useful
Build A Data Lake For Your Security Logs With Scanner: Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. In this episode he shares the story of how it got started, how it works, and how you can get started with it.
UNLIMITED
Build A Data Lake For Your Security Logs With Scanner: Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. In this episode he shares the story of how it got started, how it works, and how you can get started with it.
byData Engineering Podcast
0 ratings
0% found this document useful
#536: [INTRODUCING] Amazon Redshift Serverless: With Amazon Redshift Serverless, all users—including data analysts, developers, and data scientists—
UNLIMITED
#536: [INTRODUCING] Amazon Redshift Serverless: With Amazon Redshift Serverless, all users—including data analysts, developers, and data scientists—
byAWS Podcast
0 ratings
0% found this document useful
Episode 102. Oh my... Spring Boot 3 is out! An interview with Dan Vega from the Pivotal Team!: Ok, so it's an incredible time to be in the Java Ecosystem, and one of the biggest frameworks out there just dropped their three-point-oh version! That's right! So Spring Boot is not officially 3.0, and it has as a Baseline Java 17! (oohh!!). So we...
UNLIMITED
Episode 102. Oh my... Spring Boot 3 is out! An interview with Dan Vega from the Pivotal Team!: Ok, so it's an incredible time to be in the Java Ecosystem, and one of the biggest frameworks out there just dropped their three-point-oh version! That's right! So Spring Boot is not officially 3.0, and it has as a Baseline Java 17! (oohh!!). So we...
byJava Pub House
0 ratings
0% found this document useful
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
UNLIMITED
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
byCoder Radio
0 ratings
0% found this document useful
#456: Data Architectures with AWS Hero Elliott Cordo: AWS Data Hero and Head of Data at Capsule, Elliott Cordo, has built many ground-up data architecture
UNLIMITED
#456: Data Architectures with AWS Hero Elliott Cordo: AWS Data Hero and Head of Data at Capsule, Elliott Cordo, has built many ground-up data architecture
byAWS Podcast
0 ratings
0% found this document useful
gRPC & protocol buffers: with Askhay Shah
UNLIMITED
gRPC & protocol buffers: with Askhay Shah
byGo Time: Golang, Software Engineering
0 ratings
0% found this document useful
EP 09: Application Contexts, Dependency Injection, and Inversion of Control - OH MY!
UNLIMITED
EP 09: Application Contexts, Dependency Injection, and Inversion of Control - OH MY!
byPro Coder Show
0 ratings
0% found this document useful
Yugabyte and Database Innovations with Karthik Ranganathan: This week Corey is joined by Karthik Ranganathan, CTO and Co-Founder of Yugabyte, to talk about databases of which YugabyteDB is one of the best. Karthik started at Facebook building distributed databases and now has moved onto building even more! Why? We
UNLIMITED
Yugabyte and Database Innovations with Karthik Ranganathan: This week Corey is joined by Karthik Ranganathan, CTO and Co-Founder of Yugabyte, to talk about databases of which YugabyteDB is one of the best. Karthik started at Facebook building distributed databases and now has moved onto building even more! Why? We
byScreaming in the Cloud
0 ratings
0% found this document useful
25: Selenium, pytest, Mozilla – Dave Hunt: Interview with Dave Hunt @davehunt82. We Cover: Selenium Driver: http://www.seleniumhq.org/ pytest: http://docs.pytest.org/ pytest plugins: pytest-selenium: http://pytest-selenium.readthedocs.io/ pytest-html: https://pypi.python.
UNLIMITED
25: Selenium, pytest, Mozilla – Dave Hunt: Interview with Dave Hunt @davehunt82. We Cover: Selenium Driver: http://www.seleniumhq.org/ pytest: http://docs.pytest.org/ pytest plugins: pytest-selenium: http://pytest-selenium.readthedocs.io/ pytest-html: https://pypi.python.
byTest and Code
0 ratings
0% found this document useful
DynamoDB The Database of Choice for Serverless Applications with Alex DeBrie: Alex DeBrie is the founder of DeBrie, LLC, a cloud-native training and AWS consulting company with a focus on DynamoDB and serverless technologies. He’s also the author of The DynamoDB Book, a 450-page tome that offers tips, strategies, and more about dat
UNLIMITED
DynamoDB The Database of Choice for Serverless Applications with Alex DeBrie: Alex DeBrie is the founder of DeBrie, LLC, a cloud-native training and AWS consulting company with a focus on DynamoDB and serverless technologies. He’s also the author of The DynamoDB Book, a 450-page tome that offers tips, strategies, and more about dat
byScreaming in the Cloud
0 ratings
0% found this document useful
EP 01: The Best of SpringOne 2021 (ft. Dan Vega)
UNLIMITED
EP 01: The Best of SpringOne 2021 (ft. Dan Vega)
byPro Coder Show
0 ratings
0% found this document useful
#321: Understanding the AWS Serverless Application Model (SAM): Do you want to deploy Serverless applications faster, easier and more reliably? The AWS Serverless A
UNLIMITED
#321: Understanding the AWS Serverless Application Model (SAM): Do you want to deploy Serverless applications faster, easier and more reliably? The AWS Serverless A
byAWS Podcast
0 ratings
0% found this document useful
Cloud Dataflow with Eric Anderson: Batch and stream processing systems have been evolving for the past decade. From MapReduce to Apache Storm to Dataflow, the best practices for large volume data processing have become more sophisticated as the industry and open source communities have ...
UNLIMITED
Cloud Dataflow with Eric Anderson: Batch and stream processing systems have been evolving for the past decade. From MapReduce to Apache Storm to Dataflow, the best practices for large volume data processing have become more sophisticated as the industry and open source communities have ...
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
DevOps Cafe Ep. 49 - Brandon Burton: John and Damon catch-up with Brandon Burton (Mozilla) to talk about his DevOps journey, the upcoming SCALE conference, and how DevOps patterns are really Lean patterns and apply everywhere. Show notes at http://devopscafe.org
UNLIMITED
DevOps Cafe Ep. 49 - Brandon Burton: John and Damon catch-up with Brandon Burton (Mozilla) to talk about his DevOps journey, the upcoming SCALE conference, and how DevOps patterns are really Lean patterns and apply everywhere. Show notes at http://devopscafe.org
byDevOps Cafe Podcast
0 ratings
0% found this document useful
Tackling Real Time Streaming Data With SQL Using RisingWave: Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable.
UNLIMITED
Tackling Real Time Streaming Data With SQL Using RisingWave: Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable.
byData Engineering Podcast
0 ratings
0% found this document useful
STUMP'D - Coding Interview Questions: In this episode of Syntax, Scott and Wes are back with another edition of Stump’d! where they try to stump each other with interview questions. Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How...
UNLIMITED
STUMP'D - Coding Interview Questions: In this episode of Syntax, Scott and Wes are back with another edition of Stump’d! where they try to stump each other with interview questions. Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
Architecting for Scale with Lee Atchison: Lee Atchison spent seven years at Amazon working in retail, software distribution, and Amazon Web Services. He then moved to New Relic, where he has spent four years scaling the company’s internal architecture.
UNLIMITED
Architecting for Scale with Lee Atchison: Lee Atchison spent seven years at Amazon working in retail, software distribution, and Amazon Web Services. He then moved to New Relic, where he has spent four years scaling the company’s internal architecture.
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
CockroachDB In Depth with Peter Mattis - Episode 35
UNLIMITED
CockroachDB In Depth with Peter Mattis - Episode 35
byData Engineering Podcast
0 ratings
0% found this document useful
#143 - How to Think Like a Software Engineering Manager - Akanksha Gupta
UNLIMITED
#143 - How to Think Like a Software Engineering Manager - Akanksha Gupta
byTech Lead Journal
100%
100% found this document useful
#567: AWS Lambda SnapStart
UNLIMITED
#567: AWS Lambda SnapStart
byAWS Podcast
0 ratings
0% found this document useful
Conversation AI with Priyanka Vergadia: The podcast today is all about conversational AI and Dialogflow with our Google guest, Priyanka Vergadia.
UNLIMITED
Conversation AI with Priyanka Vergadia: The podcast today is all about conversational AI and Dialogflow with our Google guest, Priyanka Vergadia.
byGoogle Cloud Platform Podcast
100%
100% found this document useful
Level Up Your Data Platform With Active Metadata: A conversation with Atlan co-founder Prukalpa Sankar about the idea of active metadata and how it can reduce the toil involved in managing a data platform
UNLIMITED
Level Up Your Data Platform With Active Metadata: A conversation with Atlan co-founder Prukalpa Sankar about the idea of active metadata and how it can reduce the toil involved in managing a data platform
byData Engineering Podcast
0 ratings
0% found this document useful
MLA 014 Machine Learning Server: Server-side ML. Training & hosting for inference, with a goal towards serverless. AWS SageMaker, Batch, Lambda, EFS, Cortex.dev
UNLIMITED
MLA 014 Machine Learning Server: Server-side ML. Training & hosting for inference, with a goal towards serverless. AWS SageMaker, Batch, Lambda, EFS, Cortex.dev
byMachine Learning Guide
0 ratings
0% found this document useful
ingress-nginx, with Alejandro de Brito Fontes and Ricardo Katz: The most popular Ingress controller for Kubernetes is ingress-nginx, created in 2015 by Alejandro de Brito Fontes. Alejandro stepped down earlier this year, and the project is now maintained by a team including Ricardo Katz. Learn the history and what's in the new 1.0 release from a pair of South American self-proclaimed sysadmins.
UNLIMITED
ingress-nginx, with Alejandro de Brito Fontes and Ricardo Katz: The most popular Ingress controller for Kubernetes is ingress-nginx, created in 2015 by Alejandro de Brito Fontes. Alejandro stepped down earlier this year, and the project is now maintained by a team including Ricardo Katz. Learn the history and what's in the new 1.0 release from a pair of South American self-proclaimed sysadmins.
byKubernetes Podcast from Google
0 ratings
0% found this document useful
Running Databases on Kubernetes
UNLIMITED
Running Databases on Kubernetes
byThe Cloudcast
0 ratings
0% found this document useful
Episode 1. Volatile, and Synchronized: On this Episode, we talk about the keyword "volatile", and what does it really mean. Even if you are a multithreading guru, this chapter goes in deep of the different things that volatile protects you from, including L2 caches and code re-ordering. We...
UNLIMITED
Episode 1. Volatile, and Synchronized: On this Episode, we talk about the keyword "volatile", and what does it really mean. Even if you are a multithreading guru, this chapter goes in deep of the different things that volatile protects you from, including L2 caches and code re-ordering. We...
byJava Pub House
0 ratings
0% found this document useful
Renee M. P. Teate, "SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis" (John Wiley & Sons, 2021): An interview with Renee M. P. Teate
UNLIMITED
Renee M. P. Teate, "SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis" (John Wiley & Sons, 2021): An interview with Renee M. P. Teate
byNew Books in Science, Technology, and Society
0 ratings
0% found this document useful

Skip carousel

MARIADB Optimise And Control Your Databases
Linux Format
UNLIMITED
MARIADB Optimise And Control Your Databases
Jul 30, 2019
9 min read
Join the Pod, Man!
Linux Format
UNLIMITED
Join the Pod, Man!
May 30, 2023
8 min read
Grafana Terminology
Linux Format
UNLIMITED
Grafana Terminology
Jan 14, 2020
A Grafana data source is a database, file or service that provides data to Grafana – it cannot operate without data. A Grafana panel is the basic building block of Grafana. Panels are made of visualisations or queries. A Grafana query is used for req
1 min read
Your First Steps In Grafana
Linux Format
UNLIMITED
Your First Steps In Grafana
Nov 17, 2020
The easiest way to get hold of Grafana and begin using it as soon as possible is by downloading and executing its official Docker image. This means that apart from the Docker image, you won’t need to download, set up or install anything else for Graf
1 min read
Build Your First Reverse Proxy
Maximum PC
UNLIMITED
Build Your First Reverse Proxy
Jan 7, 2020
7 min read
All Your Database Are Belong To Us
Linux Format
UNLIMITED
All Your Database Are Belong To Us
Apr 6, 2021
7 min read
Grafana, Telegraf And Influxdb
Linux Format
UNLIMITED
Grafana, Telegraf And Influxdb
Jun 30, 2020
If you don’t like Netdata or if you want to try something else, you can give Grafana (https://grafana.com), Telegraf (www.influxdata.com/time-series-platform/telegraf) and InfluxDB (www.influxdata.com/products/influxdb-overview) a try. Grafana can’t
1 min read
Basic Concepts
Linux Format
UNLIMITED
Basic Concepts
Jul 2, 2019
A messaging system such as Kafka enables you to send messages between processes, applications and servers. Applications connect to Kafka to send or get data. Strictly speaking, a Kafka ‘topic’ is a unit of storage in Kafka: data in Kafka is stored in
1 min read
Build A Search And Analytic Engine
Linux Format
UNLIMITED
Build A Search And Analytic Engine
Mar 10, 2020
7 min read
KAFKA Build Utilities With The Kafka Server
Linux Format
UNLIMITED
KAFKA Build Utilities With The Kafka Server
Jul 2, 2019
Nowadays, quite a few data architectures involve both a database and Apache Kafka, which is a distributed streaming platform and the subject of this tutorial. You can also find Kafka described as a publish-subscribe message system, which is a fancy w
7 min read
How To Develop A RESTful Client In Go
Linux Format
UNLIMITED
How To Develop A RESTful Client In Go
Nov 16, 2021
Mihalis Tsoukalos is a systems engineer and technical writer. He’s the author of Go Systems Programming and Mastering Go. You can reach him at @mactsouk. The subject of this month’s tutorial is RESTful services. In particular, you’re going to learn h
9 min read
Usability
Linux Format
UNLIMITED
Usability
Oct 19, 2021
3 min read
The Razor’s Edge
Linux Format
UNLIMITED
The Razor’s Edge
Mar 10, 2020
10 min read
An easy-to-Understand Overview of Popular extended BPF Tools: BCC, Falco, and More
Techfastly
UNLIMITED
An easy-to-Understand Overview of Popular extended BPF Tools: BCC, Falco, and More
Apr 1, 2022
7 min read
Monitor Systems And Docker Deployments
Linux Format
UNLIMITED
Monitor Systems And Docker Deployments
Jun 30, 2020
Welcome to Netdata, software for distributed real-time performance and health monitoring of UNIX machines. Don’t you dare turn that page! A key advantage of Netdata is that it collects all of its metrics without introducing too much load on to the Li
8 min read
Collect And Graph Metrics With Python
Linux Format
UNLIMITED
Collect And Graph Metrics With Python
May 4, 2021
7 min read
Workflow
Linux Format
UNLIMITED
Workflow
Nov 17, 2020
3 min read
Rediscover Speed With The Redis Revolution
Linux Format
UNLIMITED
Rediscover Speed With The Redis Revolution
Jul 25, 2023
Credit: https://redis.io Redis is an open-source, in-memory data structure store that has gained popularity R as a highly efficient caching and messaging system. It prioritises speed, efficiency and versatility, making it a top choice for various ap
8 min read
Set Up A Production- Ready Web Server
APC
UNLIMITED
Set Up A Production- Ready Web Server
Nov 4, 2019
8 min read
Set Up A Production-ready Web Server
Linux Format
UNLIMITED
Set Up A Production-ready Web Server
Sep 24, 2019
8 min read
Twenty Years Of WordPress Websites!
Linux Format
UNLIMITED
Twenty Years Of WordPress Websites!
Oct 17, 2023
11 min read
Enterprise-grade Monitoring Made Easy
Linux Format
UNLIMITED
Enterprise-grade Monitoring Made Easy
Mar 10, 2020
9 min read
AI Coded Bash Scripts
Linux Format
UNLIMITED
AI Coded Bash Scripts
Nov 14, 2023
2 min read
Cloud Support
Linux Format
UNLIMITED
Cloud Support
Jan 11, 2022
Rsync doesn’t have any specific network backup features, but it could, theoretically, make use of a networked resource (such as an SMB share) as the destination folder, but that could be said of practically any backup software, so it’s a bit of a str
1 min read
How We Tested…
Linux Format
UNLIMITED
How We Tested…
Jan 12, 2021
You’ll find these applications in the software repositories of most desktop distributions, even if the featured version is not the latest. Some programs provide Snap packages, and others provide installable binaries for RPM- and DEB-based distributio
1 min read
HotPicks
Linux Format
UNLIMITED
HotPicks
Nov 15, 2022
12 min read
How To Build The Linux Format Server
Linux Format
UNLIMITED
How To Build The Linux Format Server
Oct 19, 2021
10 min read
Smart Pi Wi-Fi
APC
UNLIMITED
Smart Pi Wi-Fi
Sep 16, 2024
If your access point is for the benefit of the technically challenged, a robust 3D-printed housing increases longevity. Visit www.thingiverse.com or https://models.makewithtech.com to find 3D-printable templates. While everything we discus here can
5 min read
Build A Pi-powered Network Storage Device
Linux Format
UNLIMITED
Build A Pi-powered Network Storage Device
Dec 14, 2021
10 min read
Yarp: A Similar Framework
Linux Format
UNLIMITED
Yarp: A Similar Framework
Jan 12, 2021
YARP is the framework for communications within robotics. It can replace the ROS master as a name server. You can also do this the other way around using the YARP as a name server. The name server will support the nodes and protocols across your syst
1 min read

Related categories

Skip carousel

Reviews for Learning HBase

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Learning HBase - Shashwat Shriparv

Learning HBase

Credits

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Understanding the HBase Ecosystem

HBase layout on top of Hadoop

Comparing architectural differences between RDBMs and HBase

HBase features

HBase in the Hadoop ecosystem

Data representation in HBase

Hadoop

Core daemons of Hadoop

Comparing HBase with Hadoop

Comparing functional differences between RDBMs and HBase

Logical view of row-oriented databases

Logical view of column-oriented databases

Pros and cons of column-oriented databases

About the internal storage architecture of HBase

Getting started with HBase

When it started

HBase components and functionalities

ZooKeeper

Why an odd number of ZooKeepers?

HMaster

If a master node goes down

RegionServer

Components of a RegionServer

Client

Catalog tables

Who is using HBase and why?

When should we think of using HBase?

When not to use HBase

Understanding some open source HBase tools

The Hadoop-HBase version compatibility table

Applications of HBase

HBase pros and cons

Summary

2. Let's Begin with HBase

Understanding HBase components in detail

HFile

Region

Scalability – understanding the scale up and scale out processes

Scale in

Scale out

Reading and writing cycle

Write-Ahead Logs

MemStore

HBase housekeeping

Compaction

Minor compaction

Major compaction

Region split

Region assignment

Region merge

RegionServer failovers

The HBase delete request

The reading and writing cycle

List of available HBase distributions

Prerequisites and capacity planning for HBase

The forward DNS resolution

The reverse DNS resolution

Java

SSH

Domain Name Server

Using Network Time Protocol to keep your node on time

OS-level changes and tuning up OS for HBase

Summary

3. Let's Start Building It

Downloading Java on Ubuntu

Considering host configurations

Host file based

Command based

File based

DNS based

Installing and configuring SSH

Installing SSH on Ubuntu/Red Hat/CentOS

Configuring SSH

Installing and configuring NTP

Performing capacity planning

Installing and configuring Hadoop

core-site.xml

hdfs-site.xml

yarn-site.xml

mapred-site.xml

hadoop-env.sh

yarn-env.sh

Slaves file

Hadoop start up steps

Configuring Apache HBase

Configuring HBase in the standalone mode

Configuring HBase in the distributed mode

hbase-site.xml

HBase-env.sh

regionservers

Installing and configuring ZooKeeper

Installing Cloudera Hadoop and HBase

Downloading the required RPM packages

Installing Cloudera in an easier way

Installing the Hadoop and MapReduce packages

Installing Hadoop on Windows

Summary

4. Optimizing the HBase/Hadoop Cluster

Setup types for Hadoop and HBase clusters

Recommendations for CDH cluster configuration

Capacity planning

Hadoop optimization

General optimization tips

Optimizing Java GC

Optimizing Linux OS

Optimizing the Hadoop parameter

Optimizing MapReduce

Rack awareness in Hadoop

Number of Map and Reduce limits in configuration files

Considering and deciding the maximum number of Map and Reduce tasks

Optimizing HBase

Hadoop

Memory

Java

HBase

Optimizing ZooKeeper

Important files in Hadoop

Important files in HBase

Summary

5. The Storage, Structure Layout, and Data Model of HBase

Data types in HBase

Storing data in HBase – logical view versus actual physical view

Namespace

Commands available for namespaces

Services of HBase

Row key

Column family

Column

Cell

Version

Timestamp

Data model operations

Get

Put

Scan

Delete

Versioning and why

Deciding the number of the version

Lower bound of versions

Upper bound of versions

Schema designing

Types of table designs

Benefits of Short Wide and Tall-Thin design patterns

Composite key designing

Real-time use case of schema in an HBase table

Schema change operations

Calculating the data size stored in HBase

Summary

6. HBase Cluster Maintenance and Troubleshooting

Hadoop shell commands

Types of Hadoop shell commands

Administration commands

User commands

File system-related commands

Difference between copyToLocal/copyFromLocal and get/put

HBase shell commands

HBase administration tools

hbck – HBase check

HBase health check script

Writing HBase shell scripts

Using the Hadoop tool or JARs for HBase

Connecting HBase with Hive

HBase region management

Compaction

Merge

HBase node management

Commissioning

Decommissioning

Implementing security

Secure access

Requirement

Kerberos KDC

Client-side security configuration

Client-side security configuration for thrift requests

Server-side security configuration

Simple security

Server-side configuration

Client-side configuration

The tag security feature

Access control in HBase

Server-side access control

Cell-level access using tags

Configuring ZooKeeper for security

Troubleshooting the most frequent HBase errors and their explanations

What might fail in cluster

Monitoring HBase health

HBase web UI

Master

RegionServer

ZooKeeper command line

Linux tools

Summary

7. Scripting in HBase

HBase backup and restore techniques

Offline backup / full-shutdown backup

Backup

Restore

Online backup

The HBase snapshot

Online

Offline

The HBase replication method

Setting up cluster replication

Backup and restore using Export and Import commands

Export

Import

Miscellaneous utilities

CopyTable

HTable API

Backup using a Mozilla tool

HBase on Windows

Scripting in HBase

The .irbrc file

Getting the HBase timestamp from HBase shell

Enabling debugging shell

Enabling the debug level in HBase shell

Enabling SQL in HBase

Contributing to HBase

Summary

8. Coding HBase in Java

Setting up the environment for development

Building a Java client to code in HBase

Data types

Data model Java operations

Read

Get()

Constructors

Supported methods

Scan()

Constructors

Methods

Write

Put()

Constructors

Methods

Modify

Delete()

Constructors

Methods

HBase filters

Types of filters

Client APIs

Summary

9. Advance Coding in Java for HBase

Interfaces, classes, and exceptions

Code related to administrative tasks

Data operation code

MapReduce and HBase

RESTful services and Thrift services interface

REST service interfaces

Thrift

Coding for HDFS operations

Some advance topics in brief

Coprocessors

Types of coprocessors

Bloom filters

The Lily project

Features

Summary

10. HBase Use Cases

HBase in industry today

The future of HBase against relational databases

Some real-world project examples' use cases

HBase at Facebook

Choosing HBase

Storing in HBase

The architecture of a Facebook message

Facts and figures

HBase at Pinterest

The layout architecture

HBase at Groupon

The layout architecture

HBase at LongTail Video

The layout architecture

HBase at Aadhaar (UIDAI)

The layout architecture

Useful links and references

Summary

Index

Learning HBase

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: November 2014

Production reference: 1181114

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78398-594-4

www.packtpub.com

Credits

Author

Shashwat Shriparv

Reviewers

Ashutosh Bijoor

Chhavi Gangwal

Henry Garner

Nitin Pawar

Jing Song

Arun Vasudevan

Commissioning Editor

Akram Hussain

Acquisition Editor

Kevin Colaco

Content Development Editor

Prachi Bisht

Technical Editor

Pankaj Kadam

Copy Editors

Janbal Dharmaraj

Sayanee Mukherjee

Project Coordinator

Sageer Parkar

Proofreaders

Bridget Braund

Maria Gould

Lucy Rowland

Indexer

Tejal Soni

Graphics

Ronak Dhruv

Production Coordinator

Aparna Bhagat

Cover Work

Aparna Bhagat

About the Author

Shashwat Shriparv was born in Muzaffarpur, Bihar. He did his schooling from Muzaffarpur and Shillong, Meghalaya. He received his BCA degree from IGNOU, Delhi and his MCA degree from Cochin University of Science and Technology, Kerala (C-DAC Trivandrum).

He was introduced to Big Data technologies in early 2010 when he was asked to perform a proof of concept (POC) on Big Data technologies in storing and processing logs. He was also given another project, where he was required to store huge binary files with variable headers and process them. At this time, he started configuring, setting up, and testing Hadoop HBase clusters and writing sample code for them. After performing a successful POC, he initiated serious development using Java REST and SOAP web services, building a system to store and process logs to Hadoop using web services, and then storing these logs in HBase using homemade schema and reading data using HBase APIs and HBase-Hive mapped queries. Shashwat successfully implemented the project, and then moved on to work on huge binary files of size 1 to 3 TB, processing the header and storing metadata to HBase and files on HDFS.

Shashwat started his career as a software developer at C-DAC Cyber Forensics, Trivandrum, building mobile-related software for forensics analysis. Then, he moved to Genilok Computer Solutions, where he worked on cluster computing, HPC technologies, and web technologies. After this, he moved to Bangalore from Trivandrum and joined PointCross, where he started working with Big Data technologies, developing software using Java, web services, and platform as Big Data. He worked on many projects revolving around Big Data technologies, such as Hadoop, HBase, Hive, Pig, Sqoop, Flume, and so on at PointCross. From here, he moved to HCL Infosystems Ltd. to work on the UIDAI project, which is one of the most prestigious projects in India, providing a unique identification number to every resident of India. Here, he worked on technologies such as HBase, Hive, Hadoop, Pig, and Linux, scripting, managing HBase Hadoop clusters, writing scripts, automating tasks and processes, and building dashboards for monitoring clusters.

Currently, he is working with Cognilytics, Inc. on Big Data technologies, HANA, and other high-performance technologies.

You can find out more about him at https://github.com/shriparv and http://helpmetocode.blogspot.com. You can connect with him on LinkedIn at http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9. You can also e-mail him at .

Shashwat has worked as a reviewer on the book Pig Design Pattern, Pradeep Pasupuleti, Packt Publishing. He also contributed to his college magazine, InfinityTech, as an editor.

Acknowledgments

First, I would like to thank a few people from Packt Publishing: Kevin for encouraging me to write this book, Prachi for assisting and guiding me throughout the writing process, Pankaj for helping me out in technical editing, and all other contributors to this book.

I would like to thank all the developers, contributors, and forums of Hadoop, HBase, and Big Data technologies for giving the industry such awesome technologies and contributing to it continuously. Thanks to Lars and Noll for their contribution towards HBase and Hadoop, respectively.

I would like to thank some people who helped me to learn from life, including teachers at my college—Roshani ma'am (Principal), Namboothari sir, Santosh sir, Manjush ma'am, Hudlin Leo ma'am, and my seniors Jitesh sir, Nilanchal sir, Vaidhath sir, Jwala sir, Ashutosh sir, Anzar sir, Kishor sir, and all my friends in Batch 6. I dedicate this book to my friend, Nikhil, who is not in this world now. Special thanks to Ratnakar Mishra and Chandan Jha for always being with me and believing in me. Thanks also go out to Vineet, Shashi bhai, Shailesh, Rajeev, Pintu, Darshna, Priya, Amit, Manzar, Sunil, Ashok bhai, Pradeep, Arshad, Sujith, Vinay, Rachana, Ashwathi, Rinku, Pheona, Lizbeth, Arun, Kalesh, Chitra, Fatima, Rajesh, Jasmin, and all my friends from C-DAC Trivendrum college. I thank all my juniors, seniors, and friends in college. Thanks to all my colleagues at C-DAC Cyber Forensic: Sateesh sir, my project manager; Anwer Reyaz. J, an enthusiast who is always encouraging; Bibin bhai sahab; Ramani sir; Potty sir; Bhadran sir; Thomas sir; Satish sir; Nabeel sir; Balan sir; Abhin sir; and others. I would also like to thank Mani sir; Raja sir; my friends and teammates: Maruthi Kiran, Chethan, Alok, Tariq, Sujatha, Bhagya, and Mukesh; Sri Gopal sir, my team leader; and all my other colleagues from PointCross. I thank Ramesh Krishnan sir, Manoj sir, Vinod sir, Nand Kishor sir, and my teammates Varun bhai sahab, Preeti Gupta, Kuldeep bhai sahab, and all my colleagues at HCL Infosystems Ltd. and UIDAI. I would also like to thank Satish sir; Sudipta sir; my manager, Atul sir; Pradeep; Nikhil; Mohit; Brijesh; Kranth; Ashish Chopara; Sudhir; and all my colleagues at Cognilytics, Inc.

Last but not the least, I would like to thank papa, Dr. Rameshwar Dwivedi; mummy, Smt. Rewa Dwivedi; bhai, Vishwas Priambud; sister-in-law, Ragini Dwivedi; sweet sister, Bhumika; brother-in-law, Chandramauli Dwivedi; and new members of my family, Vasu and Atmana.

If I missed any names, it does not mean that I am not thankful to them, they all are in my heart and I am thankful to everyone who has come in my life and left their mark. Also, thanking is not in any order.

About the Reviewers

Ashutosh Bijoor (Ash) is Chief Technology Officer at Accion Labs India Private Limited. He has over 20 years of experience in the technology industry with customers ranging from start-ups to large multinationals in a wide range of industries, including high tech, engineering, software, insurance, banking, chemicals, pharmaceuticals, healthcare, media, and entertainment. He is experienced in leading and managing cross-functional teams through an entire product development life cycle.

Ashutosh is skilled in emerging technologies, software architectures, framework design, and agile process definition. He has implemented enterprise solutions as well as commercial products in domains such as Big Data, business intelligence, graphics and image processing, sound and video processing, and advanced text search and analytics.

His e-mail ID is <ashutosh.bijoor@accionlabs.com>. You can also visit his website at http://bijoor.me.

Chhavi Gangwal is currently associated with Impetus Infotech (India) Pvt. Ltd. as a technical lead. With over 7 years of experience in the IT industry, she has worked on various dimensions of social media and the Web and witnessed the rise of Big Data first hand.

Presently, Chhavi is leading the development of Kundera, a JPA 2.0-compliant object-datastore mapping library for NoSQL data stores. She is also actively involved in the product management and development of multitude of Big Data tools. Apart from a working knowledge of several NoSQL data stores, Java, PHP, and different JavaScript frameworks, her passion lies in product designing and learning the latest technologies. Connect with Chhavi on https://www.linkedin.com/profile/view?id=58308893.

Nitin Pawar started his career as a release engineer with Veritas Systems, and so, the quality of software systems is always the main goal in his approach towards work. He has been lucky to work in multiple work profiles at companies such as Yahoo! for almost 5 years, where he learned a lot about the Hadoop ecosystem. After this, he worked with start-ups in analytics and Big Data domains, helping them design backend analytics infrastructures and platforms.

He enjoys solving problems and helping others facing technical issues. Reviewing this book gave him a better understanding of the HBase system, and he hopes that the readers will like it too.

He has also reviewed the book Securing Hadoop, Sudheesh Narayanan, and a video, Building Hadoop Clusters [Video], Sean Mikha, both by Packt Publishing.

Jing Song has been working in the software industry as an engineer for more than 14 years since she graduated school. She enjoys solving problems and learning about new technologies in the Computer Science domain. Her interests and experiences lie across multiple tiers such as web-frontend GUI to middleware, middleware to backend SQL RDBMS, and NoSQL data storage. In the last 5 years, she has mainly focused on enterprise application performance and cloud computing areas. Jing currently works for Apple as a tech lead, leading various Java applications from design to implementation and performance tuning.

Arun Vasudevan is a technical lead at Accion Labs India Private Limited. He specializes in Business Analytics and Visualization and has worked on solutions in various industry verticals, including insurance, telecom, and retail. He specializes in developing applications on Big Data technologies, including Hadoop stack, Cloud technologies, and NoSQL databases. He also has expertise on cloud infrastructure setup and management using OpenStack and AWS APIs.

Arun is skilled in Java J2EE, JavaScript, relational databases, NoSQL technologies, and visualization using custom-built JavaScript visualization tools such as D3JS. Arun manages a team that delivers business analytics and visualization solutions.

His e-mail address is <arun.vasudevan@accionlabs.com>. You can also visit his LinkedIn account at https://www.linkedin.com/profile/view?id=40201159.

www.PacktPub.com

Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

I would like to thank god for giving me this opportunity. I dedicate this book to baba, dadi, nana, and nani.

Preface

This book will provide a top-down approach to learning HBase, which will be useful for both novices and experts. You will start learning configuration, code to maintenance, and troubleshooting—a kind of all-in-one HBase knowledge bank. This will be a step-by-step guide, which will help you work on HBase. The book will include day-to-day activities using HBase administration, and the implementation of Hadoop, plus HBase cluster setup from ground approach. The book will cover a complete list of use cases and explanations to implement HBase as an effective Big Data tool. It will also help you understand the layout and structure of HBase. There are lots of books available on the market on HBase, but they lack something in them; some of them focuses more on configuration and some on coding, but this book will provide a kind of start-to-end approach, which will be useful for a person with zero knowledge in HBase to the person proficient in HBase. This book is a complete guide to HBase administration and development with real-time scenarios and an operation guide.

This book will provide an understanding of what HBase is like, where it came from, who all are involved, why should we consider using it, why people are using it, when to use it, and how to use it. This book will give overall information about the HBase ecosystem. It's more like an HBase-confusion-buster book, a book to read and implement in real life. The book has in-depth theory and practical examples on HBase features. This theoretical and practical approach clears doubts on Hadoop and HBase. It provides complete guidance on configuration/management/troubleshooting of HBase clusters and their operations. The book is targeted at administration and development aspects of HBase; administration with troubleshooting, setup, and development with client and server APIs. This book also enables you to design schema, code in Java, and write shell scripts to work with HBase.

What this book covers

Chapter 1, Understanding the HBase Ecosystem, introduces HBase in detail, and discusses its features, its evolution, and its architecture. We will compare HBase with traditional databases and look at add-on features and the various underlying components, and its uses in the industry.

Chapter 2, Let's Begin with HBase, deals with the HBase components in detail, their internal architecture, communication between different components, how it provides scalability, as well as the HBase reading and writing cycle process, HBase housekeeping tasks, region-related operations, the different components needed for a HBase cluster configuration, and some basic OS tuning.

Chapter 3, Let's Start Building It, lets us proceed ahead with building an HBase cluster. In this chapter, you will find information on the various components and the places we can get it from. We will start configuring the cluster and consider all the parameters and optimization tweaks while building the Hadoop and HBase cluster. One section in the chapter will focus on the various component-level and OS-level parameters for an optimized cluster.

Chapter 4, Optimizing the HBase/Hadoop Cluster, teaches us to optimize the HBase cluster according to the production environment and running cluster troubleshooting tasks. We will look at optimization on hardware, OS, software, and network parameters. This chapter will also teach us how we can optimize Hadoop for a better HBase.

Chapter 5, The Storage, Structure Layout, and Data Model of HBase, discusses HBase's data model and its various data model operations for fetching and writing data in HBase tables. We will also consider some use cases in order to design schema in HBase.

Chapter 6, HBase Cluster Maintenance and Troubleshooting, covers all the aspects of HBase cluster management, operation, and maintenance. Once a cluster is built and in operation, we need to look after it, continuously tune it up, and troubleshoot in order to have a healthy HBase cluster. We will also study the commands available with HBase and Hadoop shell.

Chapter 7, Scripting in HBase, explains an automation process using HBase and shell scripts. We will learn to write scripts as an administrator or developer to automate various data-model-related tasks. We will also read about various backup and restore options available in HBase and how to perform them.

Chapter 8, Coding HBase in Java, teaches Java coding in HBase. We will start with basic Java coding in HBase and learn about Java APIs available for client requests. You will also learn to build a basic client in Java, which can be used to contact an HBase cluster for various operations using Java code.

Chapter 9, Advance Coding in Java for HBase, focuses more on Java coding in HBase. It is a more detailed learning about all the different kind of APIs, classes, methods,

Enjoying the preview?

Page 1 of 1

Learning HBase

About this ebook

Shashwat Shriparv

Related authors

Related to Learning HBase

Related ebooks

Mastering Hadoop

Hadoop Blueprints

Mastering Apache Cassandra - Second Edition

HDInsight Essentials - Second Edition

Learn Hbase in 24 Hours

Cloudera Administration Handbook

DynamoDB Applied Design Patterns

Hadoop MapReduce v2 Cookbook - Second Edition

Instant MapReduce Patterns – Hadoop Essentials How-to

Apache Hive Essentials

PostgreSQL Development Essentials

Cassandra High Availability

Instant Redis Optimization How-to

Amazon SimpleDB: LITE

Tika in Action

Practical OneOps

Hadoop Real-World Solutions Cookbook - Second Edition

Scala for Data Science

Learn Hadoop in 24 Hours

Ultimate Azure Data Engineering: Build Robust Data Engineering Systems on Azure with SQL, ETL, Data Modeling, and Power BI for Business Insights and Crack Azure Certifications (English Edition)

MySQL 5.1 Plugin Development

Getting Started with Big Data Query using Apache Impala

Hadoop Cluster Deployment

PostgreSQL 9 Administration Cookbook - Second Edition

Hadoop in Practice

Instant Pentaho Data Integration Kitchen

Elasticsearch 8.x Cookbook: Over 180 recipes to perform fast, scalable, and reliable searches for your enterprise

DynamoDB Cookbook

Apache Hive Cookbook

Docker A Complete Guide - 2020 Edition

Programming For You

Coding All-in-One For Dummies

Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning

SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL

SQL All-in-One For Dummies

Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees

Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1

So You Want to Start a Podcast: Finding Your Voice, Telling Your Story, and Building a Community That Will Listen

Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps

Grokking Algorithms: An illustrated guide for programmers and other curious people

JavaScript All-in-One For Dummies

PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project

Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time!

Linux: Learn in 24 Hours

Python: Learn Python in 24 Hours

Spies, Lies, and Algorithms: The History and Future of American Intelligence

Mastering Excel: Starter Set

Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.

Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS

The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!

HTML & CSS: Learn the Fundaments in 7 Days

Coding with JavaScript For Dummies

Beginning Programming with C++ For Dummies

HTML in 30 Pages

HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design

C Programming For Beginners: The Simple Guide to Learning C Programming Language Fast!

Python QuickStart Guide: The Simplified Beginner's Guide to Python Programming Using Hands-On Projects and Real-World Applications

Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)

Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)

Poirot's Early Cases

The Python Workshop: Learn to code in Python and kickstart your career in software development or data science

Related podcast episodes

Related articles

Related categories

Reviews for Learning HBase

What did you think?

Book preview

Learning HBase - Shashwat Shriparv

Table of Contents

Learning HBase

Learning HBase

Credits

About the Author

Acknowledgments