Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

Elasticsearch 8.x Cookbook: Over 180 recipes to perform fast, scalable, and reliable searches for your enterprise
Elasticsearch 8.x Cookbook: Over 180 recipes to perform fast, scalable, and reliable searches for your enterprise
Elasticsearch 8.x Cookbook: Over 180 recipes to perform fast, scalable, and reliable searches for your enterprise
Ebook1,798 pages9 hours

Elasticsearch 8.x Cookbook: Over 180 recipes to perform fast, scalable, and reliable searches for your enterprise

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Elasticsearch is a Lucene-based distributed search engine at the heart of the Elastic Stack that allows you to index and search unstructured content with petabytes of data. With this updated fifth edition, you'll cover comprehensive recipes relating to what's new in Elasticsearch 8.x and see how to create and run complex queries and analytics.
The recipes will guide you through performing index mapping, aggregation, working with queries, and scripting using Elasticsearch. You'll focus on numerous solutions and quick techniques for performing both common and uncommon tasks such as deploying Elasticsearch nodes, using the ingest module, working with X-Pack, and creating different visualizations. As you advance, you'll learn how to manage various clusters, restore data, and install Kibana to monitor a cluster and extend it using a variety of plugins. Furthermore, you'll understand how to integrate your Java, Scala, Python, and big data applications such as Apache Spark and Pig with Elasticsearch and create efficient data applications powered by enhanced functionalities and custom plugins.
By the end of this Elasticsearch cookbook, you'll have gained in-depth knowledge of implementing the Elasticsearch architecture and be able to manage, search, and store data efficiently and effectively using Elasticsearch.

LanguageEnglish
Release dateMay 27, 2022
ISBN9781801072885
Elasticsearch 8.x Cookbook: Over 180 recipes to perform fast, scalable, and reliable searches for your enterprise

Read more from Alberto Paro

Related to Elasticsearch 8.x Cookbook

Related ebooks

Computers For You

View More

Related articles

Reviews for Elasticsearch 8.x Cookbook

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Elasticsearch 8.x Cookbook - Alberto Paro

    Cover.png

    Fifth Edition

    BIRMINGHAM—MUMBAI

    Elasticsearch 8.x Cookbook

    Fifth Edition

    Copyright © 2022 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    Publishing Product Manager: Devika Battike

    Senior Editor: Nathanya Dias

    Content Development Editor: Sean Lobo

    Technical Editor: Rahul Limbachiya

    Copy Editor: Safis Editing

    Project Coordinator: Aparna Ravikumar Nair

    Proofreader: Safis Editing

    Indexer: Manju Arasan

    Production Designer: Ponraj Dhandapani

    Marketing Coordinator: Priyanka Mhatre

    First published: December 2013

    Second edition: January 2015

    Third edition: February 2017

    Fourth edition: April 2019

    Fifth edition: May 2022

    Production reference: 1280422

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham

    B3 2PB, UK.

    ISBN 978-1-80107-981-5

    www.packt.com

    Contributors

    About the author

    Alberto Paro is an engineer, manager, and software developer. He currently works as the technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural Language Processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching others how to effectively use big data solutions, NoSQL data stores, and related technologies.

    About the reviewers

    Kyle Davis is the senior developer advocate with OpenSearch and Open Distro for Elasticsearch at Amazon Web Services (AWS). Kyle has a long history of working in software development, starting in the late 1990s. His experience runs the gamut from frontend development to microcontrollers, but his most passionate area of interest is NoSQL databases. He has blogged and presented extensively about technology and is the author of Redis Microservices for Dummies. Kyle is based out of Edmonton, Alberta, Canada.

    Mahipalsinh Rana is currently chief technology officer (CTO) of Inexture Solutions LLP. At Inexture, he specializes in enterprise searching, Python, Java, and ML/AI. He has 15 years of experience. His stint with search technologies started in 2010 when he started working with Solr. He then started working with Elastic and has done various large-scale implementations and consultations. At the start of his career, he worked for Sun Microsystems, where he worked on internationalization (i18n). He likes exploring emerging technology trends such as NLP and intuitive searching for e-commerce. He plans to develop a search engine for people who are still in the early stages of technological advancement to provide them with information at ease. He has also worked on Liferay Beginner's Guide by Packt.

    Arpit Dubey is a big data engineer with over 14 years of experience in building large-scale, data-intensive applications. He has experience in envisioning enterprise-wide data strategies, roadmaps, and architecture for large internet companies, with varied use cases. He specializes in building event-driven architectures and real-time analytical solutions, using distributed systems such as Kafka, Flink, Spark, the Hadoop stack, NoSQL databases, and graph databases. He has been an active public speaker on various technology topics and has spoken at Kafka Summit, Druid Summit, and several other technology meetups.

    I would like to thank my entire family for always being my guiding light for every path I choose and every step I take.

    Table of Contents

    Preface

    Who this book is for

    What this book covers

    To get the most out of this book

    Download the example code files

    Download the color images

    Conventions used

    Sections

    Getting ready

    How to do it…

    How it works…

    There's more…

    See also

    Get in touch

    Share Your Thoughts

    Chapter 1: Getting Started

    Technical requirements

    Downloading and installing Elasticsearch

    Getting ready

    How to do it…

    How it works…

    There's more…

    See also

    Setting up networking

    Getting ready

    How to do it…

    How it works…

    See also

    Setting up a node

    Getting ready

    How to do it…

    How it works…

    See also

    Setting up Linux systems

    Getting ready

    How to do it…

    How it works…

    There's more…

    Setting up different node roles

    Getting ready

    How to do it…

    How it works…

    There's more…

    See also

    Setting up a coordinating-only node

    Getting ready

    How to do it…

    How it works…

    Setting up an ingestion node

    Getting ready

    How to do it…

    How it works…

    There's more…

    Installing plugins in Elasticsearch

    Getting ready

    How to do it…

    How it works…

    There's more…

    See also

    Removing a plugin

    Getting ready

    How to do it…

    How it works…

    Changing logging settings

    Getting ready

    How to do it…

    How it works…

    Setting up a node via Docker

    Getting ready

    How to do it…

    How it works…

    There's more…

    See also

    Deploying on Elastic Cloud Enterprise

    Getting ready

    How to do it…

    How it works…

    See also

    Chapter 2: Managing Mappings

    Technical requirements

    Using explicit mapping creation

    Getting ready

    How to do it…

    How it works…

    There's more…

    See also

    Mapping base types

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Mapping arrays

    Getting ready

    How to do it…

    How it works…

    Mapping an object

    Getting ready

    How to do it…

    How it works…

    See also

    Mapping a document

    Getting ready

    How to do it…

    How it works…

    See also

    Using dynamic templates in document mapping

    Getting ready

    How to do it…

    How it works…

    There's more...

    See also

    Managing nested objects

    Getting ready

    How to do it…

    How it works…

    There's more...

    See also

    Managing a child document with a join field

    Getting ready

    How to do it…

    How it works…

    There's more...

    See also

    Adding a field with multiple mappings

    Getting ready

    How to do it…

    How it works…

    There's more...

    See also

    Mapping a GeoPoint field

    Getting ready

    How to do it…

    How it works…

    There's more...

    Mapping a GeoShape field

    Getting ready

    How to do it…

    How it works…

    See also

    Mapping an IP field

    Getting ready

    How to do it…

    How it works…

    Mapping an Alias field

    Getting ready

    How to do it...

    How it works…

    Mapping a Percolator field

    Getting ready

    How to do it...

    How it works…

    Mapping the Rank Feature and Feature Vector fields

    Getting ready

    How to do it…

    How it works…

    Mapping the Search as you type field

    Getting ready

    How to do it…

    How it works…

    See also

    Using the Range Fields type

    Getting ready

    How to do it...

    How it works…

    See also

    Using the Flattened field type

    Getting ready

    How to do it…

    How it works…

    See also

    Using the Point and Shape field types

    Getting ready

    How to do it…

    How it works…

    See also

    Using the Dense Vector field type

    Getting ready

    How to do it…

    How it works...

    Using the Histogram field type

    Getting ready

    How to do it…

    How it works…

    See also

    Adding metadata to a mapping

    Getting ready

    How to do it…

    How it works…

    Specifying different analyzers

    Getting ready

    How to do it…

    How it works…

    See also

    Using index components and templates

    Getting ready

    How to do it…

    How it works…

    See also

    Chapter 3: Basic Operations

    Technical requirements

    Creating an index

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Deleting an index

    Getting ready

    How to do it...

    How it works...

    See also

    Opening or closing an index

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Putting a mapping in an index

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Getting a mapping

    Getting ready

    How to do it...

    How it works...

    See also

    Reindexing an index

    Getting ready

    How to do it...

    How it works...

    See also

    Refreshing an index

    Getting ready

    How to do it...

    How it works...

    See also

    Flushing an index

    Getting ready

    How to do it...

    How it works...

    See also

    Using ForceMerge on an index

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Shrinking an index

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Checking whether an index exists

    Getting ready

    How to do it...

    How it works...

    Managing index settings

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Using index aliases

    Getting ready

    How to do it...

    How it works...

    There's more...

    Managing dangling indices

    Getting ready

    How to do it…

    How it works...

    See also

    Resolving index names

    Getting ready

    How to do it…

    How it works...

    See also

    Rolling over an index

    Getting ready

    How to do it…

    How it works...

    There's more...

    See also

    Indexing a document

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Getting a document

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Deleting a document

    Getting ready

    How to do it...

    How it works...

    See also

    Updating a document

    Getting ready

    How to do it...

    How it works...

    See also

    Speeding up atomic operations (bulk operations)

    Getting ready

    How to do it...

    How it works...

    Speeding up GET operations (multi-GET)

    Getting ready

    How to do it...

    How it works...

    See also...

    Chapter 4: Exploring Search Capabilities

    Technical requirements

    Executing a search

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Sorting results

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Highlighting results

    Getting ready

    How to do it...

    How it works...

    See also

    Executing a scrolling query

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Using the search_after functionality

    Getting ready

    How to do it…

    How it works...

    See also

    Returning inner hits in results

    Getting ready

    How to do it...

    How it works...

    See also

    Suggesting a correct query

    Getting ready

    How to do it...

    How it works...

    See also

    Counting matched results

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Explaining a query

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Query profiling

    Getting ready

    How to do it...

    How it works...

    Deleting by query

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Updating by query

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Matching all of the documents

    Getting ready

    How to do it...

    How it works...

    See also

    Using a Boolean query

    Getting ready

    How to do it...

    How it works...

    There's more...

    Using the search template

    Getting ready

    How to do it...

    How it works...

    See also

    Chapter 5: Text and Numeric Queries

    Technical requirements

    Using a term query

    Getting ready

    How to do it...

    How it works...

    There's more...

    Using a terms query

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Using a terms set query

    Getting ready

    How to do it...

    How it works...

    See also

    Using a prefix query

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Using a wildcard query

    Getting ready

    How to do it...

    How it works...

    See also

    Using a regexp query

    Getting ready

    How to do it...

    How it works...

    See also

    Using span queries

    Getting ready

    How to do it...

    How it works...

    See also

    Using a match query

    Getting ready

    How to do it...

    How it works...

    See also

    Using a query string query

    Getting ready

    How to do it...

    How it works...

    There's more…

    See also

    Using a simple query string query

    Getting ready

    How to do it...

    How it works...

    See also

    Using the range query

    Getting ready

    How to do it...

    How it works...

    There's more...

    Using an IDs query

    Getting ready

    How to do it...

    How it works...

    See also

    Using the function score query

    Getting ready

    How to do it...

    How it works...

    See also

    Using the exists query

    Getting ready

    How to do it...

    How it works...

    See also

    Using a pinned query (XPACK)

    Getting ready

    How to do it...

    How it works...

    See also

    Chapter 6: Relationships and Geo Queries

    Technical requirements

    Using the has_child query

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Using the has_parent query

    Getting ready

    How to do it...

    How it works...

    See also

    Using the nested query

    Getting ready

    How to do it...

    How it works...

    See also

    Using the geo_bounding_box query

    Getting ready

    How to do it...

    How it works...

    See also

    Using the geo_shape query

    Getting ready

    How to do it...

    How it works...

    See also

    Using the geo_distance query

    Getting ready

    How to do it...

    How it works...

    See also

    Chapter 7: Aggregations

    Executing an aggregation

    Getting ready

    How to do it...

    How it works...

    See also

    Executing a stats aggregation

    Getting ready

    How to do it...

    How it works...

    See also

    Executing a terms aggregation

    Getting ready

    How to do it...

    How it works...

    There’s more...

    See also

    Executing a significant terms aggregation

    Getting ready

    How to do it...

    How it works...

    Executing a range aggregation

    Getting ready

    How to do it...

    How it works...

    There’s more...

    See also

    Executing a histogram aggregation

    Getting ready

    How to do it...

    How it works...

    There’s more...

    See also

    Executing a date histogram aggregation

    Getting ready

    How to do it...

    How it works...

    There’s more...

    See also

    Executing a filter aggregation

    Getting ready

    How to do it...

    How it works...

    There’s more...

    See also

    Executing a filters aggregation

    Getting ready

    How to do it...

    How it works...

    Executing a global aggregation

    Getting ready

    How to do it...

    How it works...

    Executing a geo distance aggregation

    Getting ready

    How to do it...

    How it works...

    See also

    Executing a children aggregation

    Getting ready

    How to do it...

    How it works...

    Executing a nested aggregation

    Getting ready

    How to do it...

    How it works...

    There’s more...

    Executing a top hit aggregation

    Getting ready

    How to do it...

    How it works...

    See also

    Executing a matrix stats aggregation

    Getting ready

    How to do it...

    How it works...

    Executing a geo bounds aggregation

    Getting ready

    How to do it...

    How it works...

    See also

    Executing a geo centroid aggregation

    Getting ready

    How to do it...

    How it works...

    See also

    Executing a geotile grid aggregation

    Getting ready

    How to do it...

    How it works...

    See also

    Executing a sampler aggregation

    Getting ready

    How to do it...

    How it works...

    Executing a pipeline aggregation

    Getting ready

    How to do it...

    How it works...

    See also

    Chapter 8: Scripting in Elasticsearch

    Painless scripting

    Getting ready

    How to do it...

    How it works...

    There’s more...

    See also

    Installing additional scripting languages

    Getting ready

    How to do it...

    How it works...

    There’s more...

    Managing scripts

    Getting ready

    How to do it...

    How it works...

    There’s more...

    See also

    Sorting data using scripts

    Getting ready

    How to do it...

    How it works...

    There’s more...

    Computing return fields with scripting

    Getting ready

    How to do it...

    How it works...

    See also

    Filtering a search using scripting

    Getting ready

    How to do it...

    How it works...

    See also

    Using scripting in aggregations

    Getting ready

    How to do it...

    How it works...

    Updating a document using scripts

    Getting ready

    How to do it...

    How it works...

    There’s more...

    Reindexing with a script

    Getting ready

    How to do it...

    How it works...

    Scripting in ingest processors

    Getting ready

    How to do it...

    How it works...

    See also

    Chapter 9: Managing Clusters

    Controlling the cluster health using the health API

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Controlling the cluster state using the API

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Getting cluster node information using the API

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Getting node statistics using the API

    Getting ready

    How to do it...

    How it works...

    There's more...

    Using the task management API

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Using the hot threads API

    Getting ready

    How to do it...

    How it works...

    Managing the shard allocation

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Monitoring segments with the segment API

    Getting ready

    How to do it...

    How it works...

    See also

    Cleaning the cache

    Getting ready

    How to do it...

    How it works...

    Chapter 10: Backups and Restoring Data

    Managing repositories

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Executing a snapshot

    Getting ready

    How to do it...

    How it works...

    There's more...

    Restoring a snapshot

    Getting ready

    How to do it...

    How it works...

    Setting up an NFS share for backups

    Getting ready

    How to do it...

    How it works...

    Reindexing from a remote cluster

    Getting ready

    How to do it...

    How it works...

    See also

    Chapter 11: User Interfaces

    Installing Kibana

    Getting ready

    How to do it...

    How it works...

    See also

    Managing Kibana Discover

    Getting ready

    How to do it...

    How it works...

    Visualizing data with Kibana

    Getting ready

    How to do it...

    How it works...

    Using Kibana Dev Tools

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Chapter 12: Using the Ingest Module

    Pipeline definition

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Inserting an ingest pipeline

    Getting ready

    How to do it...

    How it works...

    Getting an ingest pipeline

    Getting ready

    How to do it...

    How it works...

    There's more...

    Deleting an ingest pipeline

    Getting ready

    How to do it...

    How it works...

    Simulating an ingest pipeline

    Getting ready

    How to do it...

    How it works...

    There's more...

    Built-in processors

    Getting ready

    How to do it...

    How it works...

    See also

    The grok processor

    Getting ready

    How to do it...

    How it works...

    See also

    Using the ingest attachment plugin

    Getting ready

    How to do it...

    How it works...

    Using the ingest GeoIP processor

    Getting ready

    How to do it...

    How it works...

    See also

    Using the enrichment processor

    Getting ready

    How to do it...

    How it works...

    See also

    Chapter 13: Java Integration

    Creating a standard Java HTTP client

    Getting ready

    How to do it...

    How it works...

    See also

    Creating a low-level Elasticsearch client

    Getting ready

    How to do it...

    How it works...

    See also

    Using the Elasticsearch official Java client

    Getting ready

    How to do it...

    How it works...

    See also

    Managing indices

    Getting ready

    How to do it...

    How it works...

    See also

    Managing mappings

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Managing documents

    Getting ready

    How to do it...

    How it works...

    See also

    Managing bulk actions

    Getting ready

    How to do it...

    How it works...

    Building a query

    Getting ready

    How to do it...

    How it works...

    There's more...

    Executing a standard search

    Getting ready

    How to do it...

    How it works...

    See also

    Executing a search with aggregations

    Getting ready

    How to do it...

    How it works...

    See also

    Executing a scroll search

    Getting ready

    How to do it...

    How it works...

    See also

    Integrating with DeepLearning4j

    Getting ready

    How to do it...

    How it works...

    See also

    Chapter 14: Scala Integration

    Creating a client in Scala

    Getting ready

    How to do it…

    How it works...

    See also

    Managing indices

    Getting ready

    How to do it...

    How it works...

    See also

    Managing mappings

    Getting ready

    How to do it...

    How it works...

    See also

    Managing documents

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Executing a standard search

    Getting ready

    How to do it...

    How it works...

    See also

    Executing a search with aggregations

    Getting ready

    How to do it...

    How it works...

    See also

    Integrating with DeepLearning.scala

    Getting ready

    How to do it...

    How it works...

    See also

    Chapter 15: Python Integration

    Creating a client

    Getting ready

    How to do it...

    How it works…

    See also

    Managing indices

    Getting ready

    How to do it…

    How it works…

    There's more…

    See also

    Managing mappings

    Getting ready

    How to do it…

    How it works…

    See also

    Managing documents

    Getting ready

    How to do it…

    How it works…

    See also

    Executing a standard search

    Getting ready

    How to do it…

    How it works…

    See also

    Executing a search with aggregations

    Getting ready

    How to do it…

    How it works…

    See also

    Integrating with NumPy and scikit-learn

    Getting ready

    How to do it...

    How it works...

    See also

    Using AsyncElasticsearch

    Getting ready

    How to do it...

    How it works...

    See also

    Using Elasticsearch with FastAPI

    Getting ready

    How to do it...

    How it works...

    See also

    Chapter 16: Plugin Development

    Creating a plugin

    Getting ready

    How to do it...

    How it works...

    There's more...

    Creating an analyzer plugin

    Getting ready

    How to do it...

    How it works...

    There's more...

    Creating a REST plugin

    Getting ready

    How to do it...

    How it works...

    See also

    Creating a cluster action

    Getting ready

    How to do it...

    How it works...

    See also

    Creating an ingest plugin

    Getting ready

    How to do it...

    How it works...

    See also

    Chapter 17: Big Data Integration

    Installing Apache Spark

    Getting ready

    How to do it...

    How it works...

    There's more...

    Indexing data using Apache Spark

    Getting ready

    How to do it...

    How it works...

    See also

    Indexing data with meta using Apache Spark

    Getting ready

    How to do it...

    How it works...

    There's more...

    Reading data with Apache Spark

    Getting ready

    How to do it...

    How it works...

    Reading data using Spark SQL

    Getting ready

    How to do it...

    How it works...

    Indexing data with Apache Pig

    Getting ready

    How to do it...

    How it works...

    Using Elasticsearch with Alpakka

    Getting ready

    How to do it...

    How it works...

    See also

    Using Elasticsearch with MongoDB

    Getting ready

    How to do it...

    How it works...

    See also

    Chapter 18: X-Pack

    ILM – managing the index life cycle

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    ILM – automating rollover

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Using the SQL Rest API

    Getting ready

    How to do it…

    How it works...

    There's more...

    See also

    Using SQL via JDBC

    Getting ready

    How to do it…

    How it works...

    See also

    Using X-Pack Security

    Getting ready

    How to do it…

    How it works...

    See also

    Using alerting to monitor data events

    Getting ready

    How to do it…

    How it works...

    See also

    Why subscribe?

    Other Books You May Enjoy

    Preface

    Welcome to the fifth edition of Elasticsearch Cookbook targeting Elasticsearch 8.x. It's a long journey (about 12 years) that I have been on with both Elasticsearch and readers of my books. Every version of Elasticsearch brings breaking changes and new functionalities, and the evolution of already present components is a continuous cycle of product and marketing evolution.

    Elasticsearch, once a very niche product, is now one of the most used databases in the world (ranked seventh in April 2022 – source: https://db-engines.com/en/ranking), and both the on-premises (bare metal, Docker, or K8S) and multi-cloud markets provided by Elastic on Amazon, Azure, and Google will rank it as one of the next best solutions for cloud searching and storage.

    The growth of Elasticsearch is mainly due to it being one of the best solutions for searching, storage, and providing analytics on unstructured content in petabyte-sized datasets, and these are the main pillars of modern data-centered companies.

    In this book, you'll be guided through comprehensive recipes on Elasticsearch 8.x and see how you can create and run complex queries and analytics.

    Packed with recipes on performing index mapping, aggregation, and scripting using Elasticsearch, this fifth edition of Elasticsearch Cookbook will get you acquainted with numerous solutions and quick techniques to perform both everyday and uncommon tasks, such as how to deploy Elasticsearch nodes, integrate other tools into Elasticsearch, and create different visualizations with Kibana. Finally, you will integrate your Java, Scala, Python, and big data applications, such as Apache Spark and Pig, and create efficient data applications powered by enhanced functionalities and custom plugins.

    By the end of this book, you will have gained in-depth knowledge of implementing Elasticsearch architecture, and you'll be able to manage, search, and store data efficiently and effectively using Elasticsearch.

    IMHO, this book is the last of a long series and, due to continuous refinements, technical/stylistic improvements, and the suggestions of about 10 years of readers, it's probably one of the most complete and effective books on Elasticsearch.

    Dear reader, thus, it is a technical book. I hope you'll enjoy it from the bottom of your heart!

    Sincerely,

    Alberto

    Who this book is for

    If you're a software engineer, big data infrastructure engineer, or Elasticsearch developer, you'll find this book useful. This Elasticsearch book will also help data professionals working in the e-commerce and FMCG industries who use Elasticsearch for metrics evaluation and search analytics to get deeper insights for better business decisions.

    Prior experience with Elasticsearch will help you get the most out of this book in the latter chapters, which cover more advanced topics.

    What this book covers

    Chapter 1, Getting Started, covers the basic steps to start using Elasticsearch, from the simple installation to the cloud. We also cover several setup cases.

    Chapter 2, Managing Mappings, covers the correct definition of the data fields to improve both indexing and searching quality.

    Chapter 3, Basic Operations, introduces the most common actions that are required to ingest data in Elasticsearch and manage it.

    Chapter 4, Exploring Search Capabilities, talks about executing searches, sorting, and related API calls. The APIs discussed in this chapter are the essential ones.

    Chapter 5, Text and Numeric Queries, talks about the search DSL part of text and numeric fields – the core of the search functionalities of Elasticsearch.

    Chapter 6, Relationships and Geo Queries, talks about queries that work on related documents (child/parent and nested) and geo-located fields.

    Chapter 7, Aggregations, covers another capability of Elasticsearch, the possibility to execute analytics on search results to improve both the user experience and to drill down on the information contained in Elasticsearch.

    Chapter 8, Scripting in Elasticsearch, shows how to customize Elasticsearch with scripting and how to use the scripting capabilities in different parts of Elasticsearch (search, aggregation, and ingestion) using different languages. The chapter is mainly focused on Painless, the new scripting language developed by the Elastic team.

    Chapter 9, Managing Clusters, shows how to analyze the behavior of a cluster/node to understand common pitfalls.

    Chapter 10, Backups and Restoring Data, covers one of the most important components in managing data: backing up. It shows how to manage a distributed backup and the restoration of snapshots.

    Chapter 11, User Interfaces, describes two of the most common user interfaces for Elasticsearch: Cerebro, mainly used for admin activities, and Kibana, with X-Pack as a common UI extension for Elasticsearch.

    Chapter 12, Using the Ingest Module, talks about the ingest functionality for importing data into Elasticsearch via an ingestion pipeline.

    Chapter 13, Java Integration, describes how to integrate Elasticsearch in a Java application using both REST and native protocols.

    Chapter 14, Scala Integration, describes how to integrate Elasticsearch in Scala using elastic4s – an advanced type-safe and feature-rich Scala library based on the native Java API.

    Chapter 15, Python Integration, covers the usage of the official Elasticsearch Python client.

    Chapter 16, Plugin Development, describes how to create native plugins to extend Elasticsearch functionalities. Some examples show the plugin skeletons, the setup process, and the building of them.

    Chapter 17, Big Data Integration, covers how to integrate Elasticsearch in common big data tools, such as Apache Spark and Apache Pig.

    Chapter 18, X-Pack, covers the extra functionalities provided by XPack, including security, machine learning, SQL, and reporting.

    To get the most out of this book

    Basic knowledge of Java, Scala, and Python would be beneficial.

    If you are using the digital version of this book, we advise you to type the code yourself or access the code via the GitHub repository (link available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

    Download the example code files

    You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Elasticsearch-8.x-Cookbook. In case there's an update to the code, it will be updated on the existing GitHub repository.

    We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

    Download the color images

    We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781801079815_ColorImages.pdf.

    Conventions used

    There are a number of text conventions used throughout this book.

    Code in text: Indicates code words in the text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system.

    A block of code is set as follows:

    html, body, #map {

    height: 100%;

    margin: 0;

    padding: 0

    }

    When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

    [default]

    exten => s,1,Dial(Zap/1|30)

    exten => s,2,Voicemail(u100)

    exten => s,102,Voicemail(b100)

    exten => i,1,Voicemail(s0)

    Any command-line input or output is written as follows:

    $ mkdir css

    $ cd css

    Bold: Indicates a new term, an important word, or words that you see on screen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: Select System info from the Administration panel.

    Tips or Important Notes

    Appear like this.

    Sections

    In this book, you will find several headings that appear frequently (Getting ready, How to do it..., How it works..., There's more..., and See also).

    To give clear instructions on how to complete a recipe, use these sections as follows:

    Getting ready

    This section tells you what to expect in the recipe and describes how to set up any software or any preliminary settings required for the recipe.

    How to do it…

    This section contains the steps required to follow the recipe.

    How it works…

    This section usually consists of a detailed explanation of what happened in the previous section.

    There's more…

    This section consists of additional information about the recipe in order to enhance your knowledge of it.

    See also

    This section provides helpful links to other useful information for the recipe.

    Get in touch

    Feedback from our readers is always welcome.

    General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at customercare@packtpub.com.

    Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

    Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

    If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

    Share Your Thoughts

    Once you've read Elasticsearch 8.x Cookbook, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

    Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.

    Chapter 1: Getting Started

    In this chapter, we will start using Elasticsearch by downloading the correct version for our operating system, configuring it to perform at its best, and extending it via plugins. By the end of the chapter, we will see how to set it up on Docker, and in a cluster using Elastic Cloud Enterprise (Docker/Kubernetes).

    We will cover the following recipes:

    Downloading and installing Elasticsearch

    Setting up networking

    Setting up a node

    Setting up Linux systems

    Setting up different node roles

    Setting up a coordinating-only node

    Setting up an ingestion node

    Installing plugins in Elasticsearch

    Removing a plugin

    Changing logging settings

    Setting up a node via Docker

    Deploying on Elastic Cloud Enterprise

    Technical requirements

    Elasticsearch runs on Linux/macOS/Windows, and a browser to access Kibana.

    All the examples and code in this book are available at https://github.com/PacktPublishing/Elasticsearch-8.x-Cookbook.

    If you don't want to go into the details of installing and configuring your Elasticsearch instance, and instead want to quickly set up your environment for developing or fun purposes, you can skip and go straight to the Setting up a node via Docker recipe to fire it up via Docker Compose. This will quickly help you install an Elasticsearch instance with Kibana and other tools.

    Downloading and installing Elasticsearch

    Elasticsearch has an active community, and the release cycles are very fast; generally, new minor releases are available every 2 or 3 weeks.

    Since Elasticsearch depends on many common Java libraries (Lucene, Guice, and Jackson are the most famous ones), the Elasticsearch community tries to keep them updated and fix bugs that are discovered in them and in the Elasticsearch core.

    The large user base is also a source of new ideas and features for improving Elasticsearch use cases.

    For these reasons, if possible, it's best to use the latest available release; this is usually the most stable, with plenty of rich features, and bug-free as well. At the time of writing this book, the version is 8.0.0.

    Getting ready

    To install Elasticsearch, you need a supported operating system (Linux/macOS X/Windows) and a web browser, which is required to download the Elasticsearch binary release. At least 1 GB of free disk space is required to install Elasticsearch.

    How to do it…

    The following steps will show how Elasticsearch can be downloaded and successfully installed:

    We will start by downloading Elasticsearch from the web.

    Elasticsearch is distributed in two different versions: the commercial one with integrated X-Pack, whose latest version is always downloadable at https://www.elastic.co/downloads/elasticsearch.

    The versions that are available for different operating systems are as follows:

    elasticsearch-{version-number}-windows-x86_64.zip and elasticsearch-{version-number}.msi are for the Windows operating systems.

    elasticsearch-{version-number}-darwin-x86_64.tar.gz is for macOS X.

    elasticsearch-{version-number}-linux-x86_64.tar.gz is for Linux.

    elasticsearch-{version-number}-x86_64.deb is for Debian-based Linux distributions (this also covers the Ubuntu family); this is installable with Debian by using the dpkg -i elasticsearch-*.deb command.

    elasticsearch-{version-number}-x86_64.rpm is for Red Hat-based Linux distributions (this also covers the Cent OS family). This is installable with the rpm -i elasticsearch-*.rpm command.

    The preceding packages contain everything to start Elasticsearch (the application and a bundled Java Virtual Machine (JVM) for running it). This book targets version 8.x or higher. The latest and most stable version of Elasticsearch is 8.0.0. To check out whether this is the latest version when you read this, visit https://www.elastic.co/downloads/elasticsearch.

    Extract the binary content. After downloading the correct release for your platform, the installation involves expanding the archive in a working directory.

    Choose a working directory that is safe from charset problems and does not have a long path. This prevents problems when Elasticsearch creates its directories to store index data.

    For the Windows platform, a good directory in which to install Elasticsearch could be c:\es, on Unix, and /opt/es on macOS X.

    Let's start Elasticsearch to check whether everything is working. To start your Elasticsearch server, just access the directory, and for Linux and macOS X execute the following command:

    # bin/elasticsearch

    Alternatively, you can type the following command line for Windows:

    # bin\elasticserch.bat

    Your server should now start up and show logs similar to the following (I commented out the most important part. Pay attention to the credential part for accessing Elasticsearch/Kibana):

    [2022-02-13T11:18:17,230][INFO ][o.e.n.Node               ] [iMacParo] version[8.0.0], pid[57579], build[default/tar/1b6a7ece17463df5ff54a3e1302d825889aa1161/2022-02-03T16:47:57.507843096Z], OS[Mac OS X/11.1/x86_64], JVM[Eclipse Adoptium/OpenJDK 64-Bit Server VM/17.0.1/17.0.1+12]

    [2022-02-13T11:18:17,235][INFO ][o.e.n.Node               ] [iMacParo] JVM home [/opt/elasticsearch-8.x-cookbook/elasticsearch/jdk.app/Contents/Home], using bundled JDK [true] …

    Module and plugin loading:

    [2022-02-13T11:18:20,382][INFO ][o.e.p.PluginsService     ] [iMacParo] loaded module [aggs-matrix-stats] …

    Setup node networking functionalities:

    [2022-02-13T11:18:20,454][INFO ][o.e.e.NodeEnvironment    ] [iMacParo] using [1] data paths, mounts [[/System/Volumes/Data (/dev/disk1s1)]], net usable_space [141.7gb], net total_space [931.6gb], types [apfs]

    [2022-02-13T11:18:20,454][INFO ][o.e.e.NodeEnvironment    ] [iMacParo] heap size [31gb], compressed ordinary object pointers [true] …

    Current license:

    [2022-02-13T11:18:26,646][INFO ][o.e.x.s.a.Realms         ] [iMacParo] license mode is [trial], currently licensed security realms are [reserved/reserved,file/default_file,native/default_native] …

    Binding Transport Protocol Network address:

    [2022-02-13T11:18:29,642][INFO ][o.e.t.TransportService   ] [iMacParo] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300} …

    Binding HTTP Protocol Network address:

    [2022-02-13T11:18:30,550][INFO ][o.e.h.AbstractHttpServerTransport] [iMacParo] publish_address {192.168.1.31:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}, {192.168.1.31:9200}

    [2022-02-13T11:18:30,551][INFO ][o.e.n.Node               ] [iMacParo] started …

    Registering new index patterns:

    [2022-02-13T11:18:30,972][INFO ][o.e.c.m.MetadataIndexTemplateService] [iMacParo] adding template [.monitoring-kibana] for index patterns [.monitoring-kibana-7-*]  …

    Registering license check:

    [2022-02-13T11:18:35,079][INFO ][o.e.x.i.a.TransportPutLifecycleAction] [iMacParo] adding index lifecycle policy [.fleet-actions-results-ilm-policy]

    [2022-02-13T11:18:35,335][INFO ][o.e.l.LicenseService     ] [iMacParo] license [880f6db9-75b6-4106-8e2e-0c06cb0e8b30] mode [basic] - valid

    [2022-02-13T11:18:35,336][INFO ][o.e.x.s.a.Realms         ] [iMacParo] license mode is [basic], currently licensed security realms are [reserved/reserved,file/default_file,native/default_native]

    [2022-02-13T11:18:36,244][INFO ][o.e.c.m.MetadataCreateIndexService] [iMacParo] [.geoip_databases] creating index, cause [auto(bulk api)], templates [], shards [1]/[0]

    Generation of token to connect other nodes:

    [2022-02-13T11:18:39,862][INFO ][o.e.x.s.e.InternalEnrollmentTokenGenerator] [iMacParo] Will not generate node enrollment token because node is only bound on localhost for transport and cannot connect to nodes from other hosts

    [2022-02-13T11:18:39,950][INFO ][o.e.c.m.MetadataCreateIndexService] [iMacParo] [.security-7] creating index, cause [api], templates [], shards [1]/[0]…

    Credentials:

    Elasticsearch security features have been automatically configured!

    Authentication is enabled and cluster connections are encrypted.    … truncated…

    i  Configure Kibana to use this cluster:

    • Run Kibana and click the configuration link in the terminal when Kibana starts.

    • Copy the following enrollment token and paste it into Kibana in your browser (valid for the next 30 minutes):

    eyJ2ZXIiOiI4LjAuMCIsImFkciI6WyIxOTIuMTY4LjEuMzE6OTIwM CJdLCJmZ3IiOiJjNDRkMTZmNWEzODljODhkMDhlY2MxNjNmZDEyM GQyNGUzMzYwOTBlOTRmNTc3NjQ1MWVhNzU5MDY4MWE1MTAyIiwia2V 5IjoiREt1WDhuNEJRYl9MRXFtN2Q5YkY6UnZzNVU1Wk1UY3l1Qm9SZ HRtTG5DdyJ9 … truncated…

    Download of geoip data:

    [2022-02-13T11:18:41,922][INFO ][o.e.i.g.GeoIpDownloader  ] [iMacParo] successfully downloaded geoip database [GeoLite2-City.mmdb]

    … truncated…

    How it works…

    The Elasticsearch package generally contains the following directories:

    bin: This contains the scripts to start and manage Elasticsearch.

    elasticsearch.bat: This is the main executable script to start Elasticsearch.

    elasticsearch-plugin.bat: This is a script to manage plugins.

    config: This contains the Elasticsearch configurations. The most important ones are as follows:

    elasticsearch.yml: This is the main config file for Elasticsearch.

    log4j2.properties: This is the logging config file.

    data: This stores all the ingested data in Elasticsearch.

    jdk.app: The name of this directory can change based on the operating system. It contains a bundled JVM 11 version to be used with Elasticsearch.

    lib: This contains all the libraries required to run Elasticsearch.

    logs: This directory is empty at installation time, but in the future, it will contain the application logs.

    modules: This contains the Elasticsearch default plugin modules.

    plugins: This directory is empty at installation time, but it's the place where custom plugins will be installed.

    During Elasticsearch startup, the following events happen:

    A node name is taken from the hostname of the machine. The default installed modules are loaded. The most important ones are as follows:

    aggs-matrix-stats: This provides support for aggregation matrix statistics.

    analysis-common: This is a common analyzer that extends the language processing capabilities of Elasticsearch.

    ingest-common/ingest-geoip/ingest-user-agent: These include common functionalities for the ingest module plus geo/user agent management.

    kibana: This sets up special indices for Kibana functionalities, including .kibana*, .reporting*, and .apm*.

    lang-expression/lang-mustache/lang-painless: These are the default supported scripting languages of Elasticsearch. 

    mapper-extras/mapper-version: These provide extra mapper types to be used, such as token_count and scaled_float.

    parent-join: This provides an extra query, such as has_children and has_parent.

    percolator: This provides percolator capabilities.

    rank-eval: This provides support for the experimental rank evaluation Application Programming Interface (APIs). These are used to evaluate hit scoring based on queries.

    reindex: This provides support for reindex actions (reindex/update by query).

    repository-*: These modules allow the use of external cloud services as repository storage (Azure, Google Cloud Storage, and S3).

    x-pack-*: All the xpack modules depend on a subscription for their activation.

    If there are plugins, they are loaded.

    If not configured, Elasticsearch binds the following two ports on the 127.0.0.1 localhost automatically:

    9300: This port is used for internal intranode communication.

    9200: This port is used for the HTTP REST API.

    After starting, if indices are available, they are restored and ready to be used.

    There are more events that are fired during the Elasticsearch startup. We'll see them in detail in other recipes.

    There's more…

    During a node's startup, a lot of required services are automatically started. The most important ones are as follows:

    Cluster services: These help you manage the cluster state and intranode communication and synchronization.

    Indexing service: This helps you manage all the index operations, initializing all active indices and shards.

    Mapping service: This helps you manage the document types stored in the cluster (we'll discuss mapping in Chapter 2,Managing Mapping).

    Network services: These include services such as HTTP REST services (default on port 9200), and the internal Elasticsearch protocol (port 9300).

    Plugin service: This manages the loading of the plugins. 

    Aggregation services: These provide advanced analytics on stored Elasticsearch documents, such as statistics, histograms, and document grouping.

    Ingesting services: These provide support for document preprocessing before ingestion, such as field enrichment, Natural Language Processing (NLP), type conversion, and automatic field population.

    Language scripting services: These allow you to add new language scripting support to Elasticsearch.

    See also

    The Setting up networking recipe we're going to cover next will help you with the initial network setup. Check the official Elasticsearch download page at https://www.elastic.co/downloads/elasticsearch to get the latest version.

    Setting up networking

    Correctly setting up networking is very important for your nodes and cluster.

    There are a lot of different installation scenarios and networking issues. The first step for configuring the nodes in order to build a cluster is to correctly set the node discovery.

    Getting ready

    To change configuration files, you will need a working Elasticsearch installation and a simple text editor, as well as your current networking configuration (your IP address).

    How to do it…

    To set up the networking, use the following steps:

    Use a standard Elasticsearch configuration config/elasticsearch.yml file; your node will be configured to bind on the localhost interface (by default) so that it can't be accessed by external machines or nodes.

    To allow another machine to connect to our node, we need to set network.host to our IP address (for example, I have 192.168.1.164).

    To be able to discover other nodes, we need to list them in the discovery.zen.ping.unicast.hosts parameter. This means that it sends signals to the machine in a unicast list and waits for a response. If a node responds to it, it can join a cluster.

    In general, since Elasticsearch version

    Enjoying the preview?
    Page 1 of 1