Learning HBase
()
About this ebook
Apache HBase is a nonrelational NoSQL database management system that runs on top of HDFS. It is an open source, distributed, versioned, column-oriented store. It facilitates the tech industry with random, real-time read/write access to your Big Data with the benefit of linear scalability on the fly.
This book will take you through a series of core tasks in HBase. The introductory chapter will give you all the information you need about the HBase ecosystem. Furthermore, you'll learn how to configure, create, verify, and test clusters. The book also explores different parameters of Hadoop and HBase that need to be considered for optimization and a trouble-free operation of the cluster. It will focus more on HBase's data model, storage, and structure layout. You will also get to know the different options that can be used to speed up the operation and functioning of HBase. The book will also teach the users basic- and advance-level coding in Java for HBase. By the end of the book, you will have learned how to use HBase with large data sets and integrate them with Hadoop.
Related to Learning HBase
Related ebooks
Mastering Hadoop Rating: 0 out of 5 stars0 ratingsHadoop Blueprints Rating: 0 out of 5 stars0 ratingsMastering Apache Cassandra - Second Edition Rating: 0 out of 5 stars0 ratingsHDInsight Essentials - Second Edition Rating: 0 out of 5 stars0 ratingsLearn Hbase in 24 Hours Rating: 0 out of 5 stars0 ratingsCloudera Administration Handbook Rating: 0 out of 5 stars0 ratingsDynamoDB Applied Design Patterns Rating: 3 out of 5 stars3/5Hadoop MapReduce v2 Cookbook - Second Edition Rating: 0 out of 5 stars0 ratingsInstant MapReduce Patterns – Hadoop Essentials How-to Rating: 0 out of 5 stars0 ratingsApache Hive Essentials Rating: 0 out of 5 stars0 ratingsPostgreSQL Development Essentials Rating: 5 out of 5 stars5/5Cassandra High Availability Rating: 5 out of 5 stars5/5Instant Redis Optimization How-to Rating: 0 out of 5 stars0 ratingsAmazon SimpleDB: LITE Rating: 0 out of 5 stars0 ratingsTika in Action Rating: 0 out of 5 stars0 ratingsPractical OneOps Rating: 0 out of 5 stars0 ratingsHadoop Real-World Solutions Cookbook - Second Edition Rating: 0 out of 5 stars0 ratingsScala for Data Science Rating: 0 out of 5 stars0 ratingsLearn Hadoop in 24 Hours Rating: 0 out of 5 stars0 ratingsMySQL 5.1 Plugin Development Rating: 0 out of 5 stars0 ratingsGetting Started with Big Data Query using Apache Impala Rating: 0 out of 5 stars0 ratingsHadoop Cluster Deployment Rating: 0 out of 5 stars0 ratingsPostgreSQL 9 Administration Cookbook - Second Edition Rating: 0 out of 5 stars0 ratingsHadoop in Practice Rating: 0 out of 5 stars0 ratingsInstant Pentaho Data Integration Kitchen Rating: 0 out of 5 stars0 ratingsElasticsearch 8.x Cookbook: Over 180 recipes to perform fast, scalable, and reliable searches for your enterprise Rating: 0 out of 5 stars0 ratingsDynamoDB Cookbook Rating: 0 out of 5 stars0 ratingsApache Hive Cookbook Rating: 0 out of 5 stars0 ratingsDocker A Complete Guide - 2020 Edition Rating: 0 out of 5 stars0 ratings
Programming For You
Coding All-in-One For Dummies Rating: 4 out of 5 stars4/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5SQL All-in-One For Dummies Rating: 3 out of 5 stars3/5Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps Rating: 4 out of 5 stars4/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5JavaScript All-in-One For Dummies Rating: 5 out of 5 stars5/5PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project Rating: 5 out of 5 stars5/5Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratingsLinux: Learn in 24 Hours Rating: 5 out of 5 stars5/5Python: Learn Python in 24 Hours Rating: 4 out of 5 stars4/5Spies, Lies, and Algorithms: The History and Future of American Intelligence Rating: 4 out of 5 stars4/5Mastering Excel: Starter Set Rating: 2 out of 5 stars2/5Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS Rating: 5 out of 5 stars5/5HTML & CSS: Learn the Fundaments in 7 Days Rating: 4 out of 5 stars4/5Coding with JavaScript For Dummies Rating: 0 out of 5 stars0 ratingsBeginning Programming with C++ For Dummies Rating: 4 out of 5 stars4/5HTML in 30 Pages Rating: 5 out of 5 stars5/5C Programming For Beginners: The Simple Guide to Learning C Programming Language Fast! Rating: 5 out of 5 stars5/5Python QuickStart Guide: The Simplified Beginner's Guide to Python Programming Using Hands-On Projects and Real-World Applications Rating: 0 out of 5 stars0 ratingsPoirot's Early Cases Rating: 5 out of 5 stars5/5
Reviews for Learning HBase
0 ratings0 reviews
Book preview
Learning HBase - Shashwat Shriparv
Table of Contents
Learning HBase
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Understanding the HBase Ecosystem
HBase layout on top of Hadoop
Comparing architectural differences between RDBMs and HBase
HBase features
HBase in the Hadoop ecosystem
Data representation in HBase
Hadoop
Core daemons of Hadoop
Comparing HBase with Hadoop
Comparing functional differences between RDBMs and HBase
Logical view of row-oriented databases
Logical view of column-oriented databases
Pros and cons of column-oriented databases
About the internal storage architecture of HBase
Getting started with HBase
When it started
HBase components and functionalities
ZooKeeper
Why an odd number of ZooKeepers?
HMaster
If a master node goes down
RegionServer
Components of a RegionServer
Client
Catalog tables
Who is using HBase and why?
When should we think of using HBase?
When not to use HBase
Understanding some open source HBase tools
The Hadoop-HBase version compatibility table
Applications of HBase
HBase pros and cons
Summary
2. Let's Begin with HBase
Understanding HBase components in detail
HFile
Region
Scalability – understanding the scale up and scale out processes
Scale in
Scale out
Reading and writing cycle
Write-Ahead Logs
MemStore
HBase housekeeping
Compaction
Minor compaction
Major compaction
Region split
Region assignment
Region merge
RegionServer failovers
The HBase delete request
The reading and writing cycle
List of available HBase distributions
Prerequisites and capacity planning for HBase
The forward DNS resolution
The reverse DNS resolution
Java
SSH
Domain Name Server
Using Network Time Protocol to keep your node on time
OS-level changes and tuning up OS for HBase
Summary
3. Let's Start Building It
Downloading Java on Ubuntu
Considering host configurations
Host file based
Command based
File based
DNS based
Installing and configuring SSH
Installing SSH on Ubuntu/Red Hat/CentOS
Configuring SSH
Installing and configuring NTP
Performing capacity planning
Installing and configuring Hadoop
core-site.xml
hdfs-site.xml
yarn-site.xml
mapred-site.xml
hadoop-env.sh
yarn-env.sh
Slaves file
Hadoop start up steps
Configuring Apache HBase
Configuring HBase in the standalone mode
Configuring HBase in the distributed mode
hbase-site.xml
HBase-env.sh
regionservers
Installing and configuring ZooKeeper
Installing Cloudera Hadoop and HBase
Downloading the required RPM packages
Installing Cloudera in an easier way
Installing the Hadoop and MapReduce packages
Installing Hadoop on Windows
Summary
4. Optimizing the HBase/Hadoop Cluster
Setup types for Hadoop and HBase clusters
Recommendations for CDH cluster configuration
Capacity planning
Hadoop optimization
General optimization tips
Optimizing Java GC
Optimizing Linux OS
Optimizing the Hadoop parameter
Optimizing MapReduce
Rack awareness in Hadoop
Number of Map and Reduce limits in configuration files
Considering and deciding the maximum number of Map and Reduce tasks
Optimizing HBase
Hadoop
Memory
Java
OS
HBase
Optimizing ZooKeeper
Important files in Hadoop
Important files in HBase
Summary
5. The Storage, Structure Layout, and Data Model of HBase
Data types in HBase
Storing data in HBase – logical view versus actual physical view
Namespace
Commands available for namespaces
Services of HBase
Row key
Column family
Column
Cell
Version
Timestamp
Data model operations
Get
Put
Scan
Delete
Versioning and why
Deciding the number of the version
Lower bound of versions
Upper bound of versions
Schema designing
Types of table designs
Benefits of Short Wide and Tall-Thin design patterns
Composite key designing
Real-time use case of schema in an HBase table
Schema change operations
Calculating the data size stored in HBase
Summary
6. HBase Cluster Maintenance and Troubleshooting
Hadoop shell commands
Types of Hadoop shell commands
Administration commands
User commands
File system-related commands
Difference between copyToLocal/copyFromLocal and get/put
HBase shell commands
HBase administration tools
hbck – HBase check
HBase health check script
Writing HBase shell scripts
Using the Hadoop tool or JARs for HBase
Connecting HBase with Hive
HBase region management
Compaction
Merge
HBase node management
Commissioning
Decommissioning
Implementing security
Secure access
Requirement
Kerberos KDC
Client-side security configuration
Client-side security configuration for thrift requests
Server-side security configuration
Simple security
Server-side configuration
Client-side configuration
The tag security feature
Access control in HBase
Server-side access control
Cell-level access using tags
Configuring ZooKeeper for security
Troubleshooting the most frequent HBase errors and their explanations
What might fail in cluster
Monitoring HBase health
HBase web UI
Master
RegionServer
ZooKeeper command line
Linux tools
Summary
7. Scripting in HBase
HBase backup and restore techniques
Offline backup / full-shutdown backup
Backup
Restore
Online backup
The HBase snapshot
Online
Offline
The HBase replication method
Setting up cluster replication
Backup and restore using Export and Import commands
Export
Import
Miscellaneous utilities
CopyTable
HTable API
Backup using a Mozilla tool
HBase on Windows
Scripting in HBase
The .irbrc file
Getting the HBase timestamp from HBase shell
Enabling debugging shell
Enabling the debug level in HBase shell
Enabling SQL in HBase
Contributing to HBase
Summary
8. Coding HBase in Java
Setting up the environment for development
Building a Java client to code in HBase
Data types
Data model Java operations
Read
Get()
Constructors
Supported methods
Scan()
Constructors
Methods
Write
Put()
Constructors
Methods
Modify
Delete()
Constructors
Methods
HBase filters
Types of filters
Client APIs
Summary
9. Advance Coding in Java for HBase
Interfaces, classes, and exceptions
Code related to administrative tasks
Data operation code
MapReduce and HBase
RESTful services and Thrift services interface
REST service interfaces
Thrift
Coding for HDFS operations
Some advance topics in brief
Coprocessors
Types of coprocessors
Bloom filters
The Lily project
Features
Summary
10. HBase Use Cases
HBase in industry today
The future of HBase against relational databases
Some real-world project examples' use cases
HBase at Facebook
Choosing HBase
Storing in HBase
The architecture of a Facebook message
Facts and figures
HBase at Pinterest
The layout architecture
HBase at Groupon
The layout architecture
HBase at LongTail Video
The layout architecture
HBase at Aadhaar (UIDAI)
The layout architecture
Useful links and references
Summary
Index
Learning HBase
Learning HBase
Copyright © 2014 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: November 2014
Production reference: 1181114
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78398-594-4
www.packtpub.com
Credits
Author
Shashwat Shriparv
Reviewers
Ashutosh Bijoor
Chhavi Gangwal
Henry Garner
Nitin Pawar
Jing Song
Arun Vasudevan
Commissioning Editor
Akram Hussain
Acquisition Editor
Kevin Colaco
Content Development Editor
Prachi Bisht
Technical Editor
Pankaj Kadam
Copy Editors
Janbal Dharmaraj
Sayanee Mukherjee
Project Coordinator
Sageer Parkar
Proofreaders
Bridget Braund
Maria Gould
Lucy Rowland
Indexer
Tejal Soni
Graphics
Ronak Dhruv
Production Coordinator
Aparna Bhagat
Cover Work
Aparna Bhagat
About the Author
Shashwat Shriparv was born in Muzaffarpur, Bihar. He did his schooling from Muzaffarpur and Shillong, Meghalaya. He received his BCA degree from IGNOU, Delhi and his MCA degree from Cochin University of Science and Technology, Kerala (C-DAC Trivandrum).
He was introduced to Big Data technologies in early 2010 when he was asked to perform a proof of concept (POC) on Big Data technologies in storing and processing logs. He was also given another project, where he was required to store huge binary files with variable headers and process them. At this time, he started configuring, setting up, and testing Hadoop HBase clusters and writing sample code for them. After performing a successful POC, he initiated serious development using Java REST and SOAP web services, building a system to store and process logs to Hadoop using web services, and then storing these logs in HBase using homemade schema and reading data using HBase APIs and HBase-Hive mapped queries. Shashwat successfully implemented the project, and then moved on to work on huge binary files of size 1 to 3 TB, processing the header and storing metadata to HBase and files on HDFS.
Shashwat started his career as a software developer at C-DAC Cyber Forensics, Trivandrum, building mobile-related software for forensics analysis. Then, he moved to Genilok Computer Solutions, where he worked on cluster computing, HPC technologies, and web technologies. After this, he moved to Bangalore from Trivandrum and joined PointCross, where he started working with Big Data technologies, developing software using Java, web services, and platform as Big Data. He worked on many projects revolving around Big Data technologies, such as Hadoop, HBase, Hive, Pig, Sqoop, Flume, and so on at PointCross. From here, he moved to HCL Infosystems Ltd. to work on the UIDAI project, which is one of the most prestigious projects in India, providing a unique identification number to every resident of India. Here, he worked on technologies such as HBase, Hive, Hadoop, Pig, and Linux, scripting, managing HBase Hadoop clusters, writing scripts, automating tasks and processes, and building dashboards for monitoring clusters.
Currently, he is working with Cognilytics, Inc. on Big Data technologies, HANA, and other high-performance technologies.
You can find out more about him at https://github.com/shriparv and http://helpmetocode.blogspot.com. You can connect with him on LinkedIn at http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9. You can also e-mail him at
Shashwat has worked as a reviewer on the book Pig Design Pattern, Pradeep Pasupuleti, Packt Publishing. He also contributed to his college magazine, InfinityTech, as an editor.
Acknowledgments
First, I would like to thank a few people from Packt Publishing: Kevin for encouraging me to write this book, Prachi for assisting and guiding me throughout the writing process, Pankaj for helping me out in technical editing, and all other contributors to this book.
I would like to thank all the developers, contributors, and forums of Hadoop, HBase, and Big Data technologies for giving the industry such awesome technologies and contributing to it continuously. Thanks to Lars and Noll for their contribution towards HBase and Hadoop, respectively.
I would like to thank some people who helped me to learn from life, including teachers at my college—Roshani ma'am (Principal), Namboothari sir, Santosh sir, Manjush ma'am, Hudlin Leo ma'am, and my seniors Jitesh sir, Nilanchal sir, Vaidhath sir, Jwala sir, Ashutosh sir, Anzar sir, Kishor sir, and all my friends in Batch 6. I dedicate this book to my friend, Nikhil, who is not in this world now. Special thanks to Ratnakar Mishra and Chandan Jha for always being with me and believing in me. Thanks also go out to Vineet, Shashi bhai, Shailesh, Rajeev, Pintu, Darshna, Priya, Amit, Manzar, Sunil, Ashok bhai, Pradeep, Arshad, Sujith, Vinay, Rachana, Ashwathi, Rinku, Pheona, Lizbeth, Arun, Kalesh, Chitra, Fatima, Rajesh, Jasmin, and all my friends from C-DAC Trivendrum college. I thank all my juniors, seniors, and friends in college. Thanks to all my colleagues at C-DAC Cyber Forensic: Sateesh sir, my project manager; Anwer Reyaz. J, an enthusiast who is always encouraging; Bibin bhai sahab; Ramani sir; Potty sir; Bhadran sir; Thomas sir; Satish sir; Nabeel sir; Balan sir; Abhin sir; and others. I would also like to thank Mani sir; Raja sir; my friends and teammates: Maruthi Kiran, Chethan, Alok, Tariq, Sujatha, Bhagya, and Mukesh; Sri Gopal sir, my team leader; and all my other colleagues from PointCross. I thank Ramesh Krishnan sir, Manoj sir, Vinod sir, Nand Kishor sir, and my teammates Varun bhai sahab, Preeti Gupta, Kuldeep bhai sahab, and all my colleagues at HCL Infosystems Ltd. and UIDAI. I would also like to thank Satish sir; Sudipta sir; my manager, Atul sir; Pradeep; Nikhil; Mohit; Brijesh; Kranth; Ashish Chopara; Sudhir; and all my colleagues at Cognilytics, Inc.
Last but not the least, I would like to thank papa, Dr. Rameshwar Dwivedi; mummy, Smt. Rewa Dwivedi; bhai, Vishwas Priambud; sister-in-law, Ragini Dwivedi; sweet sister, Bhumika; brother-in-law, Chandramauli Dwivedi; and new members of my family, Vasu and Atmana.
If I missed any names, it does not mean that I am not thankful to them, they all are in my heart and I am thankful to everyone who has come in my life and left their mark. Also, thanking is not in any order.
About the Reviewers
Ashutosh Bijoor (Ash) is Chief Technology Officer at Accion Labs India Private Limited. He has over 20 years of experience in the technology industry with customers ranging from start-ups to large multinationals in a wide range of industries, including high tech, engineering, software, insurance, banking, chemicals, pharmaceuticals, healthcare, media, and entertainment. He is experienced in leading and managing cross-functional teams through an entire product development life cycle.
Ashutosh is skilled in emerging technologies, software architectures, framework design, and agile process definition. He has implemented enterprise solutions as well as commercial products in domains such as Big Data, business intelligence, graphics and image processing, sound and video processing, and advanced text search and analytics.
His e-mail ID is <ashutosh.bijoor@accionlabs.com>. You can also visit his website at http://bijoor.me.
Chhavi Gangwal is currently associated with Impetus Infotech (India) Pvt. Ltd. as a technical lead. With over 7 years of experience in the IT industry, she has worked on various dimensions of social media and the Web and witnessed the rise of Big Data first hand.
Presently, Chhavi is leading the development of Kundera, a JPA 2.0-compliant object-datastore mapping library for NoSQL data stores. She is also actively involved in the product management and development of multitude of Big Data tools. Apart from a working knowledge of several NoSQL data stores, Java, PHP, and different JavaScript frameworks, her passion lies in product designing and learning the latest technologies. Connect with Chhavi on https://www.linkedin.com/profile/view?id=58308893.
Nitin Pawar started his career as a release engineer with Veritas Systems, and so, the quality of software systems is always the main goal in his approach towards work. He has been lucky to work in multiple work profiles at companies such as Yahoo! for almost 5 years, where he learned a lot about the Hadoop ecosystem. After this, he worked with start-ups in analytics and Big Data domains, helping them design backend analytics infrastructures and platforms.
He enjoys solving problems and helping others facing technical issues. Reviewing this book gave him a better understanding of the HBase system, and he hopes that the readers will like it too.
He has also reviewed the book Securing Hadoop, Sudheesh Narayanan, and a video, Building Hadoop Clusters [Video], Sean Mikha, both by Packt Publishing.
Jing Song has been working in the software industry as an engineer for more than 14 years since she graduated school. She enjoys solving problems and learning about new technologies in the Computer Science domain. Her interests and experiences lie across multiple tiers such as web-frontend GUI to middleware, middleware to backend SQL RDBMS, and NoSQL data storage. In the last 5 years, she has mainly focused on enterprise application performance and cloud computing areas. Jing currently works for Apple as a tech lead, leading various Java applications from design to implementation and performance tuning.
Arun Vasudevan is a technical lead at Accion Labs India Private Limited. He specializes in Business Analytics and Visualization and has worked on solutions in various industry verticals, including insurance, telecom, and retail. He specializes in developing applications on Big Data technologies, including Hadoop stack, Cloud technologies, and NoSQL databases. He also has expertise on cloud infrastructure setup and management using OpenStack and AWS APIs.
Arun is skilled in Java J2EE, JavaScript, relational databases, NoSQL technologies, and visualization using custom-built JavaScript visualization tools such as D3JS. Arun manages a team that delivers business analytics and visualization solutions.
His e-mail address is <arun.vasudevan@accionlabs.com>. You can also visit his LinkedIn account at https://www.linkedin.com/profile/view?id=40201159.
www.PacktPub.com
Support files, eBooks, discount offers, and more
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
Why subscribe?
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Free access for Packt account holders
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.
I would like to thank god for giving me this opportunity. I dedicate this book to baba, dadi, nana, and nani.
Preface
This book will provide a top-down approach to learning HBase, which will be useful for both novices and experts. You will start learning configuration, code to maintenance, and troubleshooting—a kind of all-in-one HBase knowledge bank. This will be a step-by-step guide, which will help you work on HBase. The book will include day-to-day activities using HBase administration, and the implementation of Hadoop, plus HBase cluster setup from ground approach. The book will cover a complete list of use cases and explanations to implement HBase as an effective Big Data tool. It will also help you understand the layout and structure of HBase. There are lots of books available on the market on HBase, but they lack something in them; some of them focuses more on configuration and some on coding, but this book will provide a kind of start-to-end approach, which will be useful for a person with zero knowledge in HBase to the person proficient in HBase. This book is a complete guide to HBase administration and development with real-time scenarios and an operation guide.
This book will provide an understanding of what HBase is like, where it came from, who all are involved, why should we consider using it, why people are using it, when to use it, and how to use it. This book will give overall information about the HBase ecosystem. It's more like an HBase-confusion-buster book, a book to read and implement in real life. The book has in-depth theory and practical examples on HBase features. This theoretical and practical approach clears doubts on Hadoop and HBase. It provides complete guidance on configuration/management/troubleshooting of HBase clusters and their operations. The book is targeted at administration and development aspects of HBase; administration with troubleshooting, setup, and development with client and server APIs. This book also enables you to design schema, code in Java, and write shell scripts to work with HBase.
What this book covers
Chapter 1, Understanding the HBase Ecosystem, introduces HBase in detail, and discusses its features, its evolution, and its architecture. We will compare HBase with traditional databases and look at add-on features and the various underlying components, and its uses in the industry.
Chapter 2, Let's Begin with HBase, deals with the HBase components in detail, their internal architecture, communication between different components, how it provides scalability, as well as the HBase reading and writing cycle process, HBase housekeeping tasks, region-related operations, the different components needed for a HBase cluster configuration, and some basic OS tuning.
Chapter 3, Let's Start Building It, lets us proceed ahead with building an HBase cluster. In this chapter, you will find information on the various components and the places we can get it from. We will start configuring the cluster and consider all the parameters and optimization tweaks while building the Hadoop and HBase cluster. One section in the chapter will focus on the various component-level and OS-level parameters for an optimized cluster.
Chapter 4, Optimizing the HBase/Hadoop Cluster, teaches us to optimize the HBase cluster according to the production environment and running cluster troubleshooting tasks. We will look at optimization on hardware, OS, software, and network parameters. This chapter will also teach us how we can optimize Hadoop for a better HBase.
Chapter 5, The Storage, Structure Layout, and Data Model of HBase, discusses HBase's data model and its various data model operations for fetching and writing data in HBase tables. We will also consider some use cases in order to design schema in HBase.
Chapter 6, HBase Cluster Maintenance and Troubleshooting, covers all the aspects of HBase cluster management, operation, and maintenance. Once a cluster is built and in operation, we need to look after it, continuously tune it up, and troubleshoot in order to have a healthy HBase cluster. We will also study the commands available with HBase and Hadoop shell.
Chapter 7, Scripting in HBase, explains an automation process using HBase and shell scripts. We will learn to write scripts as an administrator or developer to automate various data-model-related tasks. We will also read about various backup and restore options available in HBase and how to perform them.
Chapter 8, Coding HBase in Java, teaches Java coding in HBase. We will start with basic Java coding in HBase and learn about Java APIs available for client requests. You will also learn to build a basic client in Java, which can be used to contact an HBase cluster for various operations using Java code.
Chapter 9, Advance Coding in Java for HBase, focuses more on Java coding in HBase. It is a more detailed learning about all the different kind of APIs, classes, methods,