Elasticsearch 8.x Cookbook: Over 180 recipes to perform fast, scalable, and reliable searches for your enterprise
By Alberto Paro
()
About this ebook
Elasticsearch is a Lucene-based distributed search engine at the heart of the Elastic Stack that allows you to index and search unstructured content with petabytes of data. With this updated fifth edition, you'll cover comprehensive recipes relating to what's new in Elasticsearch 8.x and see how to create and run complex queries and analytics.
The recipes will guide you through performing index mapping, aggregation, working with queries, and scripting using Elasticsearch. You'll focus on numerous solutions and quick techniques for performing both common and uncommon tasks such as deploying Elasticsearch nodes, using the ingest module, working with X-Pack, and creating different visualizations. As you advance, you'll learn how to manage various clusters, restore data, and install Kibana to monitor a cluster and extend it using a variety of plugins. Furthermore, you'll understand how to integrate your Java, Scala, Python, and big data applications such as Apache Spark and Pig with Elasticsearch and create efficient data applications powered by enhanced functionalities and custom plugins.
By the end of this Elasticsearch cookbook, you'll have gained in-depth knowledge of implementing the Elasticsearch architecture and be able to manage, search, and store data efficiently and effectively using Elasticsearch.
Read more from Alberto Paro
ElasticSearch Cookbook - Second Edition Rating: 0 out of 5 stars0 ratingsElasticsearch 5.x Cookbook: Distributed Search and Analytics Rating: 0 out of 5 stars0 ratings
Related to Elasticsearch 8.x Cookbook
Related ebooks
Deep Learning for Time Series Cookbook: Use PyTorch and Python recipes for forecasting, classification, and anomaly detection Rating: 0 out of 5 stars0 ratingsLucene 4 Cookbook Rating: 0 out of 5 stars0 ratingsMySQL Admin Cookbook LITE: Configuration, Server Monitoring, Managing Users Rating: 4 out of 5 stars4/5Jump Start Web Performance Rating: 0 out of 5 stars0 ratingsMonitoring Elasticsearch Rating: 0 out of 5 stars0 ratingsData Science Solutions on Azure: Tools and Techniques Using Databricks and MLOps Rating: 0 out of 5 stars0 ratingsAzure Data Engineering Cookbook: Design and implement batch and streaming analytics using Azure Cloud Services Rating: 0 out of 5 stars0 ratingsEnterprise Bug Busting: From Testing through CI/CD to Deliver Business Results Rating: 0 out of 5 stars0 ratingsLearn T-SQL Querying: A guide to developing efficient and elegant T-SQL code Rating: 0 out of 5 stars0 ratingsPostgreSQL 9 Administration Cookbook - Second Edition Rating: 0 out of 5 stars0 ratingsStart Concurrent: An Introduction to Problem Solving in Java with a Focus on Concurrency, 2014 Rating: 0 out of 5 stars0 ratingsRelational Database Index Design and the Optimizers: DB2, Oracle, SQL Server, et al. Rating: 5 out of 5 stars5/5Snowflake Cookbook: Techniques for building modern cloud data warehousing solutions Rating: 0 out of 5 stars0 ratingsSoftware Metrics A Complete Guide - 2019 Edition Rating: 0 out of 5 stars0 ratingsSCRUM: Mastering Agile Project Management for Exceptional Results (2023 Guide for Beginners) Rating: 0 out of 5 stars0 ratingsInstant Pentaho Data Integration Kitchen Rating: 0 out of 5 stars0 ratingsInstant SQL Server Analysis Services 2012 Cube Security Rating: 0 out of 5 stars0 ratingsData Lakehouse in Action: Architecting a modern and scalable data analytics platform Rating: 0 out of 5 stars0 ratingsHands-On Kubernetes, Service Mesh and Zero-Trust: Build and manage secure applications using Kubernetes and Istio (English Edition) Rating: 0 out of 5 stars0 ratingsSpring 2.5 Aspect Oriented Programming Rating: 0 out of 5 stars0 ratingsLegacy Application Modernization A Complete Guide - 2019 Edition Rating: 0 out of 5 stars0 ratingsGetting Started with Hazelcast - Second Edition Rating: 0 out of 5 stars0 ratingsManaging Technical Debt A Complete Guide - 2021 Edition Rating: 0 out of 5 stars0 ratingsData Engineering with Databricks Cookbook: Build effective data and AI solutions using Apache Spark, Databricks, and Delta Lake Rating: 0 out of 5 stars0 ratings
Computers For You
The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution Rating: 4 out of 5 stars4/5The Invisible Rainbow: A History of Electricity and Life Rating: 5 out of 5 stars5/5Elon Musk Rating: 4 out of 5 stars4/5Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls Rating: 4 out of 5 stars4/5How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally Rating: 4 out of 5 stars4/5Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics Rating: 4 out of 5 stars4/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 4 out of 5 stars4/5Uncanny Valley: A Memoir Rating: 4 out of 5 stars4/5Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad Rating: 5 out of 5 stars5/5Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition Rating: 4 out of 5 stars4/5People Skills for Analytical Thinkers Rating: 5 out of 5 stars5/5Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratingsCompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61 Rating: 0 out of 5 stars0 ratingsThe Hacker Crackdown: Law and Disorder on the Electronic Frontier Rating: 4 out of 5 stars4/5Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Rating: 4 out of 5 stars4/5Deep Search: How to Explore the Internet More Effectively Rating: 5 out of 5 stars5/5Computer Science I Essentials Rating: 5 out of 5 stars5/5The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling Rating: 0 out of 5 stars0 ratingsCompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide Rating: 5 out of 5 stars5/5CompTia Security 701: Fundamentals of Security Rating: 0 out of 5 stars0 ratings101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters Rating: 4 out of 5 stars4/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5
Reviews for Elasticsearch 8.x Cookbook
0 ratings0 reviews
Book preview
Elasticsearch 8.x Cookbook - Alberto Paro
Fifth Edition
BIRMINGHAM—MUMBAI
Elasticsearch 8.x Cookbook
Fifth Edition
Copyright © 2022 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Publishing Product Manager: Devika Battike
Senior Editor: Nathanya Dias
Content Development Editor: Sean Lobo
Technical Editor: Rahul Limbachiya
Copy Editor: Safis Editing
Project Coordinator: Aparna Ravikumar Nair
Proofreader: Safis Editing
Indexer: Manju Arasan
Production Designer: Ponraj Dhandapani
Marketing Coordinator: Priyanka Mhatre
First published: December 2013
Second edition: January 2015
Third edition: February 2017
Fourth edition: April 2019
Fifth edition: May 2022
Production reference: 1280422
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80107-981-5
www.packt.com
Contributors
About the author
Alberto Paro is an engineer, manager, and software developer. He currently works as the technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural Language Processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching others how to effectively use big data solutions, NoSQL data stores, and related technologies.
About the reviewers
Kyle Davis is the senior developer advocate with OpenSearch and Open Distro for Elasticsearch at Amazon Web Services (AWS). Kyle has a long history of working in software development, starting in the late 1990s. His experience runs the gamut from frontend development to microcontrollers, but his most passionate area of interest is NoSQL databases. He has blogged and presented extensively about technology and is the author of Redis Microservices for Dummies. Kyle is based out of Edmonton, Alberta, Canada.
Mahipalsinh Rana is currently chief technology officer (CTO) of Inexture Solutions LLP. At Inexture, he specializes in enterprise searching, Python, Java, and ML/AI. He has 15 years of experience. His stint with search technologies started in 2010 when he started working with Solr. He then started working with Elastic and has done various large-scale implementations and consultations. At the start of his career, he worked for Sun Microsystems, where he worked on internationalization (i18n). He likes exploring emerging technology trends such as NLP and intuitive searching for e-commerce. He plans to develop a search engine for people who are still in the early stages of technological advancement to provide them with information at ease. He has also worked on Liferay Beginner's Guide by Packt.
Arpit Dubey is a big data engineer with over 14 years of experience in building large-scale, data-intensive applications. He has experience in envisioning enterprise-wide data strategies, roadmaps, and architecture for large internet companies, with varied use cases. He specializes in building event-driven architectures and real-time analytical solutions, using distributed systems such as Kafka, Flink, Spark, the Hadoop stack, NoSQL databases, and graph databases. He has been an active public speaker on various technology topics and has spoken at Kafka Summit, Druid Summit, and several other technology meetups.
I would like to thank my entire family for always being my guiding light for every path I choose and every step I take.
Table of Contents
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Sections
Getting ready
How to do it…
How it works…
There's more…
See also
Get in touch
Share Your Thoughts
Chapter 1: Getting Started
Technical requirements
Downloading and installing Elasticsearch
Getting ready
How to do it…
How it works…
There's more…
See also
Setting up networking
Getting ready
How to do it…
How it works…
See also
Setting up a node
Getting ready
How to do it…
How it works…
See also
Setting up Linux systems
Getting ready
How to do it…
How it works…
There's more…
Setting up different node roles
Getting ready
How to do it…
How it works…
There's more…
See also
Setting up a coordinating-only node
Getting ready
How to do it…
How it works…
Setting up an ingestion node
Getting ready
How to do it…
How it works…
There's more…
Installing plugins in Elasticsearch
Getting ready
How to do it…
How it works…
There's more…
See also
Removing a plugin
Getting ready
How to do it…
How it works…
Changing logging settings
Getting ready
How to do it…
How it works…
Setting up a node via Docker
Getting ready
How to do it…
How it works…
There's more…
See also
Deploying on Elastic Cloud Enterprise
Getting ready
How to do it…
How it works…
See also
Chapter 2: Managing Mappings
Technical requirements
Using explicit mapping creation
Getting ready
How to do it…
How it works…
There's more…
See also
Mapping base types
Getting ready
How to do it...
How it works...
There's more...
See also
Mapping arrays
Getting ready
How to do it…
How it works…
Mapping an object
Getting ready
How to do it…
How it works…
See also
Mapping a document
Getting ready
How to do it…
How it works…
See also
Using dynamic templates in document mapping
Getting ready
How to do it…
How it works…
There's more...
See also
Managing nested objects
Getting ready
How to do it…
How it works…
There's more...
See also
Managing a child document with a join field
Getting ready
How to do it…
How it works…
There's more...
See also
Adding a field with multiple mappings
Getting ready
How to do it…
How it works…
There's more...
See also
Mapping a GeoPoint field
Getting ready
How to do it…
How it works…
There's more...
Mapping a GeoShape field
Getting ready
How to do it…
How it works…
See also
Mapping an IP field
Getting ready
How to do it…
How it works…
Mapping an Alias field
Getting ready
How to do it...
How it works…
Mapping a Percolator field
Getting ready
How to do it...
How it works…
Mapping the Rank Feature and Feature Vector fields
Getting ready
How to do it…
How it works…
Mapping the Search as you type field
Getting ready
How to do it…
How it works…
See also
Using the Range Fields type
Getting ready
How to do it...
How it works…
See also
Using the Flattened field type
Getting ready
How to do it…
How it works…
See also
Using the Point and Shape field types
Getting ready
How to do it…
How it works…
See also
Using the Dense Vector field type
Getting ready
How to do it…
How it works...
Using the Histogram field type
Getting ready
How to do it…
How it works…
See also
Adding metadata to a mapping
Getting ready
How to do it…
How it works…
Specifying different analyzers
Getting ready
How to do it…
How it works…
See also
Using index components and templates
Getting ready
How to do it…
How it works…
See also
Chapter 3: Basic Operations
Technical requirements
Creating an index
Getting ready
How to do it...
How it works...
There's more...
See also
Deleting an index
Getting ready
How to do it...
How it works...
See also
Opening or closing an index
Getting ready
How to do it...
How it works...
There's more...
See also
Putting a mapping in an index
Getting ready
How to do it...
How it works...
There's more...
See also
Getting a mapping
Getting ready
How to do it...
How it works...
See also
Reindexing an index
Getting ready
How to do it...
How it works...
See also
Refreshing an index
Getting ready
How to do it...
How it works...
See also
Flushing an index
Getting ready
How to do it...
How it works...
See also
Using ForceMerge on an index
Getting ready
How to do it...
How it works...
There's more...
See also
Shrinking an index
Getting ready
How to do it...
How it works...
There's more...
See also
Checking whether an index exists
Getting ready
How to do it...
How it works...
Managing index settings
Getting ready
How to do it...
How it works...
There's more...
See also
Using index aliases
Getting ready
How to do it...
How it works...
There's more...
Managing dangling indices
Getting ready
How to do it…
How it works...
See also
Resolving index names
Getting ready
How to do it…
How it works...
See also
Rolling over an index
Getting ready
How to do it…
How it works...
There's more...
See also
Indexing a document
Getting ready
How to do it...
How it works...
There's more...
See also
Getting a document
Getting ready
How to do it...
How it works...
There's more...
See also
Deleting a document
Getting ready
How to do it...
How it works...
See also
Updating a document
Getting ready
How to do it...
How it works...
See also
Speeding up atomic operations (bulk operations)
Getting ready
How to do it...
How it works...
Speeding up GET operations (multi-GET)
Getting ready
How to do it...
How it works...
See also...
Chapter 4: Exploring Search Capabilities
Technical requirements
Executing a search
Getting ready
How to do it...
How it works...
There's more...
See also
Sorting results
Getting ready
How to do it...
How it works...
There's more...
See also
Highlighting results
Getting ready
How to do it...
How it works...
See also
Executing a scrolling query
Getting ready
How to do it...
How it works...
There's more...
See also
Using the search_after functionality
Getting ready
How to do it…
How it works...
See also
Returning inner hits in results
Getting ready
How to do it...
How it works...
See also
Suggesting a correct query
Getting ready
How to do it...
How it works...
See also
Counting matched results
Getting ready
How to do it...
How it works...
There's more...
See also
Explaining a query
Getting ready
How to do it...
How it works...
There's more...
See also
Query profiling
Getting ready
How to do it...
How it works...
Deleting by query
Getting ready
How to do it...
How it works...
There's more...
See also
Updating by query
Getting ready
How to do it...
How it works...
There's more...
See also
Matching all of the documents
Getting ready
How to do it...
How it works...
See also
Using a Boolean query
Getting ready
How to do it...
How it works...
There's more...
Using the search template
Getting ready
How to do it...
How it works...
See also
Chapter 5: Text and Numeric Queries
Technical requirements
Using a term query
Getting ready
How to do it...
How it works...
There's more...
Using a terms query
Getting ready
How to do it...
How it works...
There's more...
See also
Using a terms set query
Getting ready
How to do it...
How it works...
See also
Using a prefix query
Getting ready
How to do it...
How it works...
There's more...
See also
Using a wildcard query
Getting ready
How to do it...
How it works...
See also
Using a regexp query
Getting ready
How to do it...
How it works...
See also
Using span queries
Getting ready
How to do it...
How it works...
See also
Using a match query
Getting ready
How to do it...
How it works...
See also
Using a query string query
Getting ready
How to do it...
How it works...
There's more…
See also
Using a simple query string query
Getting ready
How to do it...
How it works...
See also
Using the range query
Getting ready
How to do it...
How it works...
There's more...
Using an IDs query
Getting ready
How to do it...
How it works...
See also
Using the function score query
Getting ready
How to do it...
How it works...
See also
Using the exists query
Getting ready
How to do it...
How it works...
See also
Using a pinned query (XPACK)
Getting ready
How to do it...
How it works...
See also
Chapter 6: Relationships and Geo Queries
Technical requirements
Using the has_child query
Getting ready
How to do it...
How it works...
There's more...
See also
Using the has_parent query
Getting ready
How to do it...
How it works...
See also
Using the nested query
Getting ready
How to do it...
How it works...
See also
Using the geo_bounding_box query
Getting ready
How to do it...
How it works...
See also
Using the geo_shape query
Getting ready
How to do it...
How it works...
See also
Using the geo_distance query
Getting ready
How to do it...
How it works...
See also
Chapter 7: Aggregations
Executing an aggregation
Getting ready
How to do it...
How it works...
See also
Executing a stats aggregation
Getting ready
How to do it...
How it works...
See also
Executing a terms aggregation
Getting ready
How to do it...
How it works...
There’s more...
See also
Executing a significant terms aggregation
Getting ready
How to do it...
How it works...
Executing a range aggregation
Getting ready
How to do it...
How it works...
There’s more...
See also
Executing a histogram aggregation
Getting ready
How to do it...
How it works...
There’s more...
See also
Executing a date histogram aggregation
Getting ready
How to do it...
How it works...
There’s more...
See also
Executing a filter aggregation
Getting ready
How to do it...
How it works...
There’s more...
See also
Executing a filters aggregation
Getting ready
How to do it...
How it works...
Executing a global aggregation
Getting ready
How to do it...
How it works...
Executing a geo distance aggregation
Getting ready
How to do it...
How it works...
See also
Executing a children aggregation
Getting ready
How to do it...
How it works...
Executing a nested aggregation
Getting ready
How to do it...
How it works...
There’s more...
Executing a top hit aggregation
Getting ready
How to do it...
How it works...
See also
Executing a matrix stats aggregation
Getting ready
How to do it...
How it works...
Executing a geo bounds aggregation
Getting ready
How to do it...
How it works...
See also
Executing a geo centroid aggregation
Getting ready
How to do it...
How it works...
See also
Executing a geotile grid aggregation
Getting ready
How to do it...
How it works...
See also
Executing a sampler aggregation
Getting ready
How to do it...
How it works...
Executing a pipeline aggregation
Getting ready
How to do it...
How it works...
See also
Chapter 8: Scripting in Elasticsearch
Painless scripting
Getting ready
How to do it...
How it works...
There’s more...
See also
Installing additional scripting languages
Getting ready
How to do it...
How it works...
There’s more...
Managing scripts
Getting ready
How to do it...
How it works...
There’s more...
See also
Sorting data using scripts
Getting ready
How to do it...
How it works...
There’s more...
Computing return fields with scripting
Getting ready
How to do it...
How it works...
See also
Filtering a search using scripting
Getting ready
How to do it...
How it works...
See also
Using scripting in aggregations
Getting ready
How to do it...
How it works...
Updating a document using scripts
Getting ready
How to do it...
How it works...
There’s more...
Reindexing with a script
Getting ready
How to do it...
How it works...
Scripting in ingest processors
Getting ready
How to do it...
How it works...
See also
Chapter 9: Managing Clusters
Controlling the cluster health using the health API
Getting ready
How to do it...
How it works...
There's more...
See also
Controlling the cluster state using the API
Getting ready
How to do it...
How it works...
There's more...
See also
Getting cluster node information using the API
Getting ready
How to do it...
How it works...
There's more...
See also
Getting node statistics using the API
Getting ready
How to do it...
How it works...
There's more...
Using the task management API
Getting ready
How to do it...
How it works...
There's more...
See also
Using the hot threads API
Getting ready
How to do it...
How it works...
Managing the shard allocation
Getting ready
How to do it...
How it works...
There's more...
See also
Monitoring segments with the segment API
Getting ready
How to do it...
How it works...
See also
Cleaning the cache
Getting ready
How to do it...
How it works...
Chapter 10: Backups and Restoring Data
Managing repositories
Getting ready
How to do it...
How it works...
There's more...
See also
Executing a snapshot
Getting ready
How to do it...
How it works...
There's more...
Restoring a snapshot
Getting ready
How to do it...
How it works...
Setting up an NFS share for backups
Getting ready
How to do it...
How it works...
Reindexing from a remote cluster
Getting ready
How to do it...
How it works...
See also
Chapter 11: User Interfaces
Installing Kibana
Getting ready
How to do it...
How it works...
See also
Managing Kibana Discover
Getting ready
How to do it...
How it works...
Visualizing data with Kibana
Getting ready
How to do it...
How it works...
Using Kibana Dev Tools
Getting ready
How to do it...
How it works...
There's more...
See also
Chapter 12: Using the Ingest Module
Pipeline definition
Getting ready
How to do it...
How it works...
There's more...
See also
Inserting an ingest pipeline
Getting ready
How to do it...
How it works...
Getting an ingest pipeline
Getting ready
How to do it...
How it works...
There's more...
Deleting an ingest pipeline
Getting ready
How to do it...
How it works...
Simulating an ingest pipeline
Getting ready
How to do it...
How it works...
There's more...
Built-in processors
Getting ready
How to do it...
How it works...
See also
The grok processor
Getting ready
How to do it...
How it works...
See also
Using the ingest attachment plugin
Getting ready
How to do it...
How it works...
Using the ingest GeoIP processor
Getting ready
How to do it...
How it works...
See also
Using the enrichment processor
Getting ready
How to do it...
How it works...
See also
Chapter 13: Java Integration
Creating a standard Java HTTP client
Getting ready
How to do it...
How it works...
See also
Creating a low-level Elasticsearch client
Getting ready
How to do it...
How it works...
See also
Using the Elasticsearch official Java client
Getting ready
How to do it...
How it works...
See also
Managing indices
Getting ready
How to do it...
How it works...
See also
Managing mappings
Getting ready
How to do it...
How it works...
There's more...
See also
Managing documents
Getting ready
How to do it...
How it works...
See also
Managing bulk actions
Getting ready
How to do it...
How it works...
Building a query
Getting ready
How to do it...
How it works...
There's more...
Executing a standard search
Getting ready
How to do it...
How it works...
See also
Executing a search with aggregations
Getting ready
How to do it...
How it works...
See also
Executing a scroll search
Getting ready
How to do it...
How it works...
See also
Integrating with DeepLearning4j
Getting ready
How to do it...
How it works...
See also
Chapter 14: Scala Integration
Creating a client in Scala
Getting ready
How to do it…
How it works...
See also
Managing indices
Getting ready
How to do it...
How it works...
See also
Managing mappings
Getting ready
How to do it...
How it works...
See also
Managing documents
Getting ready
How to do it...
How it works...
There's more...
See also
Executing a standard search
Getting ready
How to do it...
How it works...
See also
Executing a search with aggregations
Getting ready
How to do it...
How it works...
See also
Integrating with DeepLearning.scala
Getting ready
How to do it...
How it works...
See also
Chapter 15: Python Integration
Creating a client
Getting ready
How to do it...
How it works…
See also
Managing indices
Getting ready
How to do it…
How it works…
There's more…
See also
Managing mappings
Getting ready
How to do it…
How it works…
See also
Managing documents
Getting ready
How to do it…
How it works…
See also
Executing a standard search
Getting ready
How to do it…
How it works…
See also
Executing a search with aggregations
Getting ready
How to do it…
How it works…
See also
Integrating with NumPy and scikit-learn
Getting ready
How to do it...
How it works...
See also
Using AsyncElasticsearch
Getting ready
How to do it...
How it works...
See also
Using Elasticsearch with FastAPI
Getting ready
How to do it...
How it works...
See also
Chapter 16: Plugin Development
Creating a plugin
Getting ready
How to do it...
How it works...
There's more...
Creating an analyzer plugin
Getting ready
How to do it...
How it works...
There's more...
Creating a REST plugin
Getting ready
How to do it...
How it works...
See also
Creating a cluster action
Getting ready
How to do it...
How it works...
See also
Creating an ingest plugin
Getting ready
How to do it...
How it works...
See also
Chapter 17: Big Data Integration
Installing Apache Spark
Getting ready
How to do it...
How it works...
There's more...
Indexing data using Apache Spark
Getting ready
How to do it...
How it works...
See also
Indexing data with meta using Apache Spark
Getting ready
How to do it...
How it works...
There's more...
Reading data with Apache Spark
Getting ready
How to do it...
How it works...
Reading data using Spark SQL
Getting ready
How to do it...
How it works...
Indexing data with Apache Pig
Getting ready
How to do it...
How it works...
Using Elasticsearch with Alpakka
Getting ready
How to do it...
How it works...
See also
Using Elasticsearch with MongoDB
Getting ready
How to do it...
How it works...
See also
Chapter 18: X-Pack
ILM – managing the index life cycle
Getting ready
How to do it...
How it works...
There's more...
See also
ILM – automating rollover
Getting ready
How to do it...
How it works...
There's more...
See also
Using the SQL Rest API
Getting ready
How to do it…
How it works...
There's more...
See also
Using SQL via JDBC
Getting ready
How to do it…
How it works...
See also
Using X-Pack Security
Getting ready
How to do it…
How it works...
See also
Using alerting to monitor data events
Getting ready
How to do it…
How it works...
See also
Why subscribe?
Other Books You May Enjoy
Preface
Welcome to the fifth edition of Elasticsearch Cookbook targeting Elasticsearch 8.x. It's a long journey (about 12 years) that I have been on with both Elasticsearch and readers of my books. Every version of Elasticsearch brings breaking changes and new functionalities, and the evolution of already present components is a continuous cycle of product and marketing evolution.
Elasticsearch, once a very niche product, is now one of the most used databases in the world (ranked seventh in April 2022 – source: https://db-engines.com/en/ranking), and both the on-premises (bare metal, Docker, or K8S) and multi-cloud markets provided by Elastic on Amazon, Azure, and Google will rank it as one of the next best solutions for cloud searching and storage.
The growth of Elasticsearch is mainly due to it being one of the best solutions for searching, storage, and providing analytics on unstructured content in petabyte-sized datasets, and these are the main pillars of modern data-centered companies.
In this book, you'll be guided through comprehensive recipes on Elasticsearch 8.x and see how you can create and run complex queries and analytics.
Packed with recipes on performing index mapping, aggregation, and scripting using Elasticsearch, this fifth edition of Elasticsearch Cookbook will get you acquainted with numerous solutions and quick techniques to perform both everyday and uncommon tasks, such as how to deploy Elasticsearch nodes, integrate other tools into Elasticsearch, and create different visualizations with Kibana. Finally, you will integrate your Java, Scala, Python, and big data applications, such as Apache Spark and Pig, and create efficient data applications powered by enhanced functionalities and custom plugins.
By the end of this book, you will have gained in-depth knowledge of implementing Elasticsearch architecture, and you'll be able to manage, search, and store data efficiently and effectively using Elasticsearch.
IMHO, this book is the last of a long series and, due to continuous refinements, technical/stylistic improvements, and the suggestions of about 10 years of readers, it's probably one of the most complete and effective books on Elasticsearch.
Dear reader, thus, it is a technical book. I hope you'll enjoy it from the bottom of your heart!
Sincerely,
Alberto
Who this book is for
If you're a software engineer, big data infrastructure engineer, or Elasticsearch developer, you'll find this book useful. This Elasticsearch book will also help data professionals working in the e-commerce and FMCG industries who use Elasticsearch for metrics evaluation and search analytics to get deeper insights for better business decisions.
Prior experience with Elasticsearch will help you get the most out of this book in the latter chapters, which cover more advanced topics.
What this book covers
Chapter 1, Getting Started, covers the basic steps to start using Elasticsearch, from the simple installation to the cloud. We also cover several setup cases.
Chapter 2, Managing Mappings, covers the correct definition of the data fields to improve both indexing and searching quality.
Chapter 3, Basic Operations, introduces the most common actions that are required to ingest data in Elasticsearch and manage it.
Chapter 4, Exploring Search Capabilities, talks about executing searches, sorting, and related API calls. The APIs discussed in this chapter are the essential ones.
Chapter 5, Text and Numeric Queries, talks about the search DSL part of text and numeric fields – the core of the search functionalities of Elasticsearch.
Chapter 6, Relationships and Geo Queries, talks about queries that work on related documents (child/parent and nested) and geo-located fields.
Chapter 7, Aggregations, covers another capability of Elasticsearch, the possibility to execute analytics on search results to improve both the user experience and to drill down on the information contained in Elasticsearch.
Chapter 8, Scripting in Elasticsearch, shows how to customize Elasticsearch with scripting and how to use the scripting capabilities in different parts of Elasticsearch (search, aggregation, and ingestion) using different languages. The chapter is mainly focused on Painless, the new scripting language developed by the Elastic team.
Chapter 9, Managing Clusters, shows how to analyze the behavior of a cluster/node to understand common pitfalls.
Chapter 10, Backups and Restoring Data, covers one of the most important components in managing data: backing up. It shows how to manage a distributed backup and the restoration of snapshots.
Chapter 11, User Interfaces, describes two of the most common user interfaces for Elasticsearch: Cerebro, mainly used for admin activities, and Kibana, with X-Pack as a common UI extension for Elasticsearch.
Chapter 12, Using the Ingest Module, talks about the ingest functionality for importing data into Elasticsearch via an ingestion pipeline.
Chapter 13, Java Integration, describes how to integrate Elasticsearch in a Java application using both REST and native protocols.
Chapter 14, Scala Integration, describes how to integrate Elasticsearch in Scala using elastic4s – an advanced type-safe and feature-rich Scala library based on the native Java API.
Chapter 15, Python Integration, covers the usage of the official Elasticsearch Python client.
Chapter 16, Plugin Development, describes how to create native plugins to extend Elasticsearch functionalities. Some examples show the plugin skeletons, the setup process, and the building of them.
Chapter 17, Big Data Integration, covers how to integrate Elasticsearch in common big data tools, such as Apache Spark and Apache Pig.
Chapter 18, X-Pack, covers the extra functionalities provided by XPack, including security, machine learning, SQL, and reporting.
To get the most out of this book
Basic knowledge of Java, Scala, and Python would be beneficial.
If you are using the digital version of this book, we advise you to type the code yourself or access the code via the GitHub repository (link available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
Download the example code files
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Elasticsearch-8.x-Cookbook. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Download the color images
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781801079815_ColorImages.pdf.
Conventions used
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in the text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system.
A block of code is set as follows:
html, body, #map {
height: 100%;
margin: 0;
padding: 0
}
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
[default]
exten => s,1,Dial(Zap/1|30)
exten => s,2,Voicemail(u100)
exten => s,102,Voicemail(b100)
exten => i,1,Voicemail(s0)
Any command-line input or output is written as follows:
$ mkdir css
$ cd css
Bold: Indicates a new term, an important word, or words that you see on screen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: Select System info from the Administration panel.
Tips or Important Notes
Appear like this.
Sections
In this book, you will find several headings that appear frequently (Getting ready, How to do it..., How it works..., There's more..., and See also).
To give clear instructions on how to complete a recipe, use these sections as follows:
Getting ready
This section tells you what to expect in the recipe and describes how to set up any software or any preliminary settings required for the recipe.
How to do it…
This section contains the steps required to follow the recipe.
How it works…
This section usually consists of a detailed explanation of what happened in the previous section.
There's more…
This section consists of additional information about the recipe in order to enhance your knowledge of it.
See also
This section provides helpful links to other useful information for the recipe.
Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at customercare@packtpub.com.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Share Your Thoughts
Once you've read Elasticsearch 8.x Cookbook, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.
Chapter 1: Getting Started
In this chapter, we will start using Elasticsearch by downloading the correct version for our operating system, configuring it to perform at its best, and extending it via plugins. By the end of the chapter, we will see how to set it up on Docker, and in a cluster using Elastic Cloud Enterprise (Docker/Kubernetes).
We will cover the following recipes:
Downloading and installing Elasticsearch
Setting up networking
Setting up a node
Setting up Linux systems
Setting up different node roles
Setting up a coordinating-only node
Setting up an ingestion node
Installing plugins in Elasticsearch
Removing a plugin
Changing logging settings
Setting up a node via Docker
Deploying on Elastic Cloud Enterprise
Technical requirements
Elasticsearch runs on Linux/macOS/Windows, and a browser to access Kibana.
All the examples and code in this book are available at https://github.com/PacktPublishing/Elasticsearch-8.x-Cookbook.
If you don't want to go into the details of installing and configuring your Elasticsearch instance, and instead want to quickly set up your environment for developing or fun purposes, you can skip and go straight to the Setting up a node via Docker recipe to fire it up via Docker Compose. This will quickly help you install an Elasticsearch instance with Kibana and other tools.
Downloading and installing Elasticsearch
Elasticsearch has an active community, and the release cycles are very fast; generally, new minor releases are available every 2 or 3 weeks.
Since Elasticsearch depends on many common Java libraries (Lucene, Guice, and Jackson are the most famous ones), the Elasticsearch community tries to keep them updated and fix bugs that are discovered in them and in the Elasticsearch core.
The large user base is also a source of new ideas and features for improving Elasticsearch use cases.
For these reasons, if possible, it's best to use the latest available release; this is usually the most stable, with plenty of rich features, and bug-free as well. At the time of writing this book, the version is 8.0.0.
Getting ready
To install Elasticsearch, you need a supported operating system (Linux/macOS X/Windows) and a web browser, which is required to download the Elasticsearch binary release. At least 1 GB of free disk space is required to install Elasticsearch.
How to do it…
The following steps will show how Elasticsearch can be downloaded and successfully installed:
We will start by downloading Elasticsearch from the web.
Elasticsearch is distributed in two different versions: the commercial one with integrated X-Pack, whose latest version is always downloadable at https://www.elastic.co/downloads/elasticsearch.
The versions that are available for different operating systems are as follows:
elasticsearch-{version-number}-windows-x86_64.zip and elasticsearch-{version-number}.msi are for the Windows operating systems.
elasticsearch-{version-number}-darwin-x86_64.tar.gz is for macOS X.
elasticsearch-{version-number}-linux-x86_64.tar.gz is for Linux.
elasticsearch-{version-number}-x86_64.deb is for Debian-based Linux distributions (this also covers the Ubuntu family); this is installable with Debian by using the dpkg -i elasticsearch-*.deb command.
elasticsearch-{version-number}-x86_64.rpm is for Red Hat-based Linux distributions (this also covers the Cent OS family). This is installable with the rpm -i elasticsearch-*.rpm command.
The preceding packages contain everything to start Elasticsearch (the application and a bundled Java Virtual Machine (JVM) for running it). This book targets version 8.x or higher. The latest and most stable version of Elasticsearch is 8.0.0. To check out whether this is the latest version when you read this, visit https://www.elastic.co/downloads/elasticsearch.
Extract the binary content. After downloading the correct release for your platform, the installation involves expanding the archive in a working directory.
Choose a working directory that is safe from charset problems and does not have a long path. This prevents problems when Elasticsearch creates its directories to store index data.
For the Windows platform, a good directory in which to install Elasticsearch could be c:\es, on Unix, and /opt/es on macOS X.
Let's start Elasticsearch to check whether everything is working. To start your Elasticsearch server, just access the directory, and for Linux and macOS X execute the following command:
# bin/elasticsearch
Alternatively, you can type the following command line for Windows:
# bin\elasticserch.bat
Your server should now start up and show logs similar to the following (I commented out the most important part. Pay attention to the credential part for accessing Elasticsearch/Kibana):
[2022-02-13T11:18:17,230][INFO ][o.e.n.Node ] [iMacParo] version[8.0.0], pid[57579], build[default/tar/1b6a7ece17463df5ff54a3e1302d825889aa1161/2022-02-03T16:47:57.507843096Z], OS[Mac OS X/11.1/x86_64], JVM[Eclipse Adoptium/OpenJDK 64-Bit Server VM/17.0.1/17.0.1+12]
[2022-02-13T11:18:17,235][INFO ][o.e.n.Node ] [iMacParo] JVM home [/opt/elasticsearch-8.x-cookbook/elasticsearch/jdk.app/Contents/Home], using bundled JDK [true] …
Module and plugin loading:
[2022-02-13T11:18:20,382][INFO ][o.e.p.PluginsService ] [iMacParo] loaded module [aggs-matrix-stats] …
Setup node networking functionalities:
[2022-02-13T11:18:20,454][INFO ][o.e.e.NodeEnvironment ] [iMacParo] using [1] data paths, mounts [[/System/Volumes/Data (/dev/disk1s1)]], net usable_space [141.7gb], net total_space [931.6gb], types [apfs]
[2022-02-13T11:18:20,454][INFO ][o.e.e.NodeEnvironment ] [iMacParo] heap size [31gb], compressed ordinary object pointers [true] …
Current license:
[2022-02-13T11:18:26,646][INFO ][o.e.x.s.a.Realms ] [iMacParo] license mode is [trial], currently licensed security realms are [reserved/reserved,file/default_file,native/default_native] …
Binding Transport Protocol Network address:
[2022-02-13T11:18:29,642][INFO ][o.e.t.TransportService ] [iMacParo] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300} …
Binding HTTP Protocol Network address:
[2022-02-13T11:18:30,550][INFO ][o.e.h.AbstractHttpServerTransport] [iMacParo] publish_address {192.168.1.31:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}, {192.168.1.31:9200}
[2022-02-13T11:18:30,551][INFO ][o.e.n.Node ] [iMacParo] started …
Registering new index patterns:
[2022-02-13T11:18:30,972][INFO ][o.e.c.m.MetadataIndexTemplateService] [iMacParo] adding template [.monitoring-kibana] for index patterns [.monitoring-kibana-7-*] …
Registering license check:
[2022-02-13T11:18:35,079][INFO ][o.e.x.i.a.TransportPutLifecycleAction] [iMacParo] adding index lifecycle policy [.fleet-actions-results-ilm-policy]
[2022-02-13T11:18:35,335][INFO ][o.e.l.LicenseService ] [iMacParo] license [880f6db9-75b6-4106-8e2e-0c06cb0e8b30] mode [basic] - valid
[2022-02-13T11:18:35,336][INFO ][o.e.x.s.a.Realms ] [iMacParo] license mode is [basic], currently licensed security realms are [reserved/reserved,file/default_file,native/default_native]
[2022-02-13T11:18:36,244][INFO ][o.e.c.m.MetadataCreateIndexService] [iMacParo] [.geoip_databases] creating index, cause [auto(bulk api)], templates [], shards [1]/[0]
Generation of token to connect other nodes:
[2022-02-13T11:18:39,862][INFO ][o.e.x.s.e.InternalEnrollmentTokenGenerator] [iMacParo] Will not generate node enrollment token because node is only bound on localhost for transport and cannot connect to nodes from other hosts
[2022-02-13T11:18:39,950][INFO ][o.e.c.m.MetadataCreateIndexService] [iMacParo] [.security-7] creating index, cause [api], templates [], shards [1]/[0]…
Credentials:
Elasticsearch security features have been automatically configured!
Authentication is enabled and cluster connections are encrypted. … truncated…
i Configure Kibana to use this cluster:
• Run Kibana and click the configuration link in the terminal when Kibana starts.
• Copy the following enrollment token and paste it into Kibana in your browser (valid for the next 30 minutes):
eyJ2ZXIiOiI4LjAuMCIsImFkciI6WyIxOTIuMTY4LjEuMzE6OTIwM CJdLCJmZ3IiOiJjNDRkMTZmNWEzODljODhkMDhlY2MxNjNmZDEyM GQyNGUzMzYwOTBlOTRmNTc3NjQ1MWVhNzU5MDY4MWE1MTAyIiwia2V 5IjoiREt1WDhuNEJRYl9MRXFtN2Q5YkY6UnZzNVU1Wk1UY3l1Qm9SZ HRtTG5DdyJ9 … truncated…
Download of geoip data:
[2022-02-13T11:18:41,922][INFO ][o.e.i.g.GeoIpDownloader ] [iMacParo] successfully downloaded geoip database [GeoLite2-City.mmdb]
… truncated…
How it works…
The Elasticsearch package generally contains the following directories:
bin: This contains the scripts to start and manage Elasticsearch.
elasticsearch.bat: This is the main executable script to start Elasticsearch.
elasticsearch-plugin.bat: This is a script to manage plugins.
config: This contains the Elasticsearch configurations. The most important ones are as follows:
elasticsearch.yml: This is the main config file for Elasticsearch.
log4j2.properties: This is the logging config file.
data: This stores all the ingested data in Elasticsearch.
jdk.app: The name of this directory can change based on the operating system. It contains a bundled JVM 11 version to be used with Elasticsearch.
lib: This contains all the libraries required to run Elasticsearch.
logs: This directory is empty at installation time, but in the future, it will contain the application logs.
modules: This contains the Elasticsearch default plugin modules.
plugins: This directory is empty at installation time, but it's the place where custom plugins will be installed.
During Elasticsearch startup, the following events happen:
A node name is taken from the hostname of the machine. The default installed modules are loaded. The most important ones are as follows:
aggs-matrix-stats: This provides support for aggregation matrix statistics.
analysis-common: This is a common analyzer that extends the language processing capabilities of Elasticsearch.
ingest-common/ingest-geoip/ingest-user-agent: These include common functionalities for the ingest module plus geo/user agent management.
kibana: This sets up special indices for Kibana functionalities, including .kibana*, .reporting*, and .apm*.
lang-expression/lang-mustache/lang-painless: These are the default supported scripting languages of Elasticsearch.
mapper-extras/mapper-version: These provide extra mapper types to be used, such as token_count and scaled_float.
parent-join: This provides an extra query, such as has_children and has_parent.
percolator: This provides percolator capabilities.
rank-eval: This provides support for the experimental rank evaluation Application Programming Interface (APIs). These are used to evaluate hit scoring based on queries.
reindex: This provides support for reindex actions (reindex/update by query).
repository-*: These modules allow the use of external cloud services as repository storage (Azure, Google Cloud Storage, and S3).
x-pack-*: All the xpack modules depend on a subscription for their activation.
If there are plugins, they are loaded.
If not configured, Elasticsearch binds the following two ports on the 127.0.0.1 localhost automatically:
9300: This port is used for internal intranode communication.
9200: This port is used for the HTTP REST API.
After starting, if indices are available, they are restored and ready to be used.
There are more events that are fired during the Elasticsearch startup. We'll see them in detail in other recipes.
There's more…
During a node's startup, a lot of required services are automatically started. The most important ones are as follows:
Cluster services: These help you manage the cluster state and intranode communication and synchronization.
Indexing service: This helps you manage all the index operations, initializing all active indices and shards.
Mapping service: This helps you manage the document types stored in the cluster (we'll discuss mapping in Chapter 2,Managing Mapping).
Network services: These include services such as HTTP REST services (default on port 9200), and the internal Elasticsearch protocol (port 9300).
Plugin service: This manages the loading of the plugins.
Aggregation services: These provide advanced analytics on stored Elasticsearch documents, such as statistics, histograms, and document grouping.
Ingesting services: These provide support for document preprocessing before ingestion, such as field enrichment, Natural Language Processing (NLP), type conversion, and automatic field population.
Language scripting services: These allow you to add new language scripting support to Elasticsearch.
See also
The Setting up networking recipe we're going to cover next will help you with the initial network setup. Check the official Elasticsearch download page at https://www.elastic.co/downloads/elasticsearch to get the latest version.
Setting up networking
Correctly setting up networking is very important for your nodes and cluster.
There are a lot of different installation scenarios and networking issues. The first step for configuring the nodes in order to build a cluster is to correctly set the node discovery.
Getting ready
To change configuration files, you will need a working Elasticsearch installation and a simple text editor, as well as your current networking configuration (your IP address).
How to do it…
To set up the networking, use the following steps:
Use a standard Elasticsearch configuration config/elasticsearch.yml file; your node will be configured to bind on the localhost interface (by default) so that it can't be accessed by external machines or nodes.
To allow another machine to connect to our node, we need to set network.host to our IP address (for example, I have 192.168.1.164).
To be able to discover other nodes, we need to list them in the discovery.zen.ping.unicast.hosts parameter. This means that it sends signals to the machine in a unicast list and waits for a response. If a node responds to it, it can join a cluster.
In general, since Elasticsearch version