Elasticsearch Quick Start: An Introduction To Elasticsearch in Tutorial Form
Elasticsearch Quick Start: An Introduction To Elasticsearch in Tutorial Form
https://tcsltd.skillport.com/skillportfe/main.action#summary/BOOKS/RW$290415:_ss_book:120101
Getting Started
Overview
ElasticSearch is an open source project, under the Apache License version 2, built on top of Lucene
and Java. The source code is located on GitHub at https://github.com/elastic/elasticsearch.
Documentation, download links and other useful information can be found
at https://www.elastic.co/products/elasticsearch.
Behind the ElasticSearch product is a company named Elastic. It's website is located
at https://www.elastic.co/. Elastic is the core developers of the open source project and owns the
copyright for it. Additionally the company provides training, support and a number of commercial
add-ons for ElasticSearch.
In other words, ElasticSearch is free to use but there is a company that supports its development. This
company also provides services and add-ons which are not free. It's entirely up to you whether you
pay anything in conjunction with using ElasticSearch or not. If you don't you will still have access to
the full ElasticSearch product, but if you do pay money you'll be able to get training, support and/or
nice add-ons.
Apart from ElasticSearch there are a number of other projects within the same ecosystem. Two of
those are LogStash and Kibana. LogStash can be used to store logs from various sources in
ElasticSearch. Kibana provides functionality to visualize data stored in ElasticSearch in dashboards.
Together ElasticSearch, LogStash and Kibana is referred to as the "ELK stack".
Installing ElasticSearch
ElasticSearch is a Java application built for Java 7 or higher. Therefore the first step in setting up
ElasticSearch is to ensure that you have Java installed and the JAVA_HOME environment variable
correctly configured.
To check that you have a compatible version of Java installed open up a terminal window and
type java -version. The output should look something like this:
Running java -version in a console where Java 8 is installed.
$ java -version
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
Once you have made sure that you have Java 7 or higher installed and the java executable in your path
ensure that you have the JAVA_HOME environment variable configured by typing echo
$JAVA_HOME into your terminal. The output should look something like this:
Verifying that the JAVA_HOME environment variable is set on a computer where it indeed is set.
$ echo $JAVA_HOME
/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home
If you don't have Java installed and/or the JAVA_HOME environment variable set up fix that prior to
proceeding. Java can be downloaded from https://java.com/en/download/. We won't go into details
about setting up Java here as there is plenty of documentation online.
Next, with Java correctly set up, you're ready to download and install ElasticSearch. This can be done
using various package managers such as Homebrew on OS X. However, it can also be done manually
by downloading from www.elastic.co which we'll cover here.
Take a look inside the unzipped folder. You should find find a few text files and some directories.
Inspecting the contents of the elasticsearch folder.
~/elasticsearch$ ls -p
LICENSE.txt NOTICE.txt README.textile bin/ config/
The "lib" directory contains the compiled JAR files that make up ElasticSearch and the "config"
directory contains configuration files. For running ElasticSearch the most interesting directory though
is the "bin" directory. In there you'll find a shell script named "elasticsearch" and a Windows batch
file named "elasticsearch.bat". These provide the recommended ways for starting ElasticSearch on
*nix and Windows environments respectively.
If you're on Linux or OS X execute the "bin/elasticsearch" shell script to start ElasticSearch. If you're
on Windows instead execute the "bin/elasticsearch.bat" batch file. The output should look something
like this:
Larger View
As you can see from the timestamps in the console output above it took a few seconds but
ElasticSearch is now up and running. To verify this open up a browser and make a request to
http://localhost:9200. The response should be in the form of JSON, looking something like this:
Example response from ElasticSearch when making a request to it's / endpoint.
{
"status" : 200,
"name" : "Tyger Tiger",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "1.5.0",
"build_hash" : "927caff6f05403e936c20bf4529f144f0c89fd8c",
"build_timestamp" : "2015-03-23T14:30:58Z",
"build_snapshot" : false,
"lucene_version" : "4.10.4"
},
"tagline" : "You Know, for Search"
}
Given that your browser can connect to http://localhost:9200 and you see a response similar to the one
above ElasticSearch is running fine. The exact response isn't very interesting. ElasticSearch's "/"
endpoint, which is what we've requested, responds with some basic information about the cluster,
such as which version of ElasticSearch that it's running.
To shut down ElasticSearch simply press CTRL+C. To start it again execute the same command as
you previously used.
However, there are a number of tools that can aid you beyond what generic HTTP clients provides.
One such tool is Sense. Sense is a "JSON aware developer console to ElasticSearch" that offers auto
completion and nice formatting of requests and responses.
These days[1] Sense is shipped as a part of Marvel, a commercial plug-in for ElasticSearch. Marvel
provides management and monitoring dashboards for an ElasticSearch cluster. And, Sense.
While Marvel requires a paid for license for production use it's free for development use. I
recommend you to install it now as it's a good tool to get to know and as Sense will make it more
convenient to play with ElasticSearch. To do so you can use the "bin/plugin" tool. From the
ElasticSearch home directory run bin/plugin -i elasticsearch/marvel/latest. The output should look
something like this:
Installing Marvel.
Once the installation is complete restart (or start) ElasticSearch. You can now navigate
to http://localhost:9200/_plugin/marvel/ where Marvel resides.
Larger View
In order to access Sense use the "Dashboards" drop down menu in the top right part of Marvel and
click on Sense. Alternatively you can navigate directly to Sense by directing your browser
to http://localhost:9200/_plugin/marvel/sense/.
Larger View
[1]
In "the early days" Sense was a Chrome plug-in but these days it's a part of Marvel. However, there
has been some efforts to bring it back as a Chrome plug-in. If you're interested in that search the
Chrome web store for Sense.
Use of content on this site is subject to the restrictions set forth in the Terms of Use.
Page Layout and Design © 2018 Skillsoft Ireland Limited - All rights reserved, individual content is
owned by respective copyright holder.
v1.0.8.414
Privacy and Cookie Policy Terms of Use Help Print Page Citation
Curl syntax
We use HTTP requests to talk to ElasticSearch. A HTTP requests is made up of several components
such as the URL to make the request to, HTTP verbs (GET, POST etc) and headers. In order to
succinctly and consistently describe HTTP requests the ElasticSearch documentation uses cURL
command line syntax. This is also the standard practice to describe requests made to ElasticSearch
within the user community and the standard that we'll use throughout this book.
The above snippet, when executed in a console, runs the curl program with three arguments. The first
argument, -XPOST, means that the request that cURL makes should use the POST HTTP verb. The
second argument, "http://localhost:9200/_search" is the URL that the request should be made to. The
final argument, -d'{...}' uses the -d flag which instructs cURL to send what follows the flag as the
HTTP POST data.
Whenever you see a request formatted using cURL syntax you can either:
Copy it and execute it in a console (given that you have cURL installed).
Read it and translate it into whatever HTTP client that you are using.
Paste it into the left part of Sense.
When using the last option, pasting cURL formatted requests into Sense, Sense will recognize the
cURL syntax and automatically transform it to a request formatted the Sense way. Sense also offers
functionality for doing the opposite. When you have a request in Sense you can click the wrench icon
to bring up a dialog offering an option to "Copy as cURL".
Larger View
Hello world
Now that you have ElasticSearch up and running let's end this chapter with a quick example. We
won't go into any details about what we're doing here as we'll cover that in the coming chapters. For
now, just run the below HTTP requests in Sense and take them at face value. Or, if you prefer to know
what you're doing, skip to the next chapter.
The response from ElasticSearch to the second HTTP request should look like the one below,
containing a single hit.
Example response from ElasticSearch to the above request.
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 12,
"successful": 12,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.19178301,
"hits": [
{
"_index": "my-first-index",
"_type": "message",
"_id": "AUqiBnvdK4Rpq0ZV4-Wp",
"_score": 0.19178301,
"_source": {
"text": "Hello world!"
}
}
]
}
}
Curl syntax is programming language agnostic making it perfect to show HTTP interactions in a way
that is both succinct and independent of any programming language. However, in the real world,
except when debugging, we usually interact with ElasticSearch from our programming language of
choice. Let's look at a couple of examples of how the above requests could be implemented in actual
applications.
Node.JS example
For Node.JS we use the official JavaScript client which can be installed in a Node.JS application
using npm install elasticsearch. A simple application that indexes a single document and then
proceeds to search for it, printing the search results to the console, looks like this:
A simple implementation of the Hello World example in Node.JS.
Note that we add a waiting period between indexing and searching. 1.1 second to be exact. We do this
because an indexed document won't immediately be searchable after indexing. We have to wait for
the index to be refreshed which by default happens every second. It's possible to require ElasticSearch
to immediately refresh the index when indexing a document but that's bad performance wise and
therefore we opt to wait a little.
.NET example
In the Node.JS example we (naturally) used JavaScript and the official ElasticSearch client which
more or less maps directly to ElasticSearch's HTTP/JSON API. Therefore the code for our Node.JS
application looked quite similar to the original cURL based example. Now, let's look how we can
interact with ElasticSearch from a strongly typed language, C#, using a client library that introduces
more abstractions, NEST.
In order to implement the Hello World example in C# we start by creating a new console application
to which we add the NEST ElasticSearch client using NuGet (PM > Install-Package NEST). Next we
create a class which we'll index and search for instances of.
A C# class representing a message.
namespace HelloElasticSearch
{
public class Message
{
public Message(string text)
{
Text = text;
}
The entry point of our application which implements the Hello World example by indexing a message
and then searching for it looks like this:
Basic example of indexing and searching using C# and the NEST client library.
using System;
using System.Threading;
using Nest;
namespace HelloElasticSearch
{
class Program
{
static void Main(string[] args)
{
//Create a client that will talk to our ES cluster
var node = new Uri("http://localhost:9200");
var settings = new ConnectionSettings(
node,
defaultIndex: "my-first-index"
);
var client = new ElasticClient(settings);
//Creating and indexing a message object
var theMessage = new Message("Hello world!");
client.Index(theMessage);
Use of content on this site is subject to the restrictions set forth in the Terms of Use.
Page Layout and Design © 2018 Skillsoft Ireland Limited - All rights reserved, individual content is
owned by respective copyright holder.
v1.0.8.414
Privacy and Cookie Policy Terms of Use Help Print Page Citation
BASIC CRUD
Overview
In order to use ElasticSearch for anything useful, such as searching, the first step is to populate an
index with some data. A process known as indexing. In this chapter we'll look at how to do that as
well as how to read, update and delete indexed documents. In the process we'll see that while
ElasticSearch is a search engine it's also possible to use it as a general purpose data store.
Indexing
In ElasticSearch indexing corresponds to both "Create" and "Update" in CRUD - if we index a
document with a given type and ID that doesn't already exists it's inserted. If a document with the
same type and ID already exists it's overwritten.
What is a document? Under the covers a document in ElasticSearch is a Lucene document. However,
from our perspectives as users of ElasticSearch a document is a JSON object. As such a document can
can have fields in the form of JSON properties. Such properties can be values such as strings or
numbers, but they can also be other JSON objects.
In order to create a document we make a PUT request to the REST API to a URL made up of the
index name, type name and ID. That is: http://localhost:9200/<index>/<type>/[<id>] and include a
JSON object as the PUT data.
Index and type are required while the id part is optional. If we don't specify an ID ElasticSearch will
generate one for us. However, if we don't specify an id we should use POST instead of PUT. The
index name is arbitrary. If there isn't an index with that name on the server already one will be created
using default configuration.
As for the type name it too is arbitrary. It serves several purposes, including:
Each type has its own ID space.
Different types can have different mappings ("schema" that defines how properties/fields
should be indexed).
Although it's possible, and common, to search over multiple types, it's easy to search only for
one or more specific type(s).
Let's index something! We can put just about anything into our index as long as it can be represented
as a single JSON object. For the sake of having something to work with we'll be indexing, and later
searching for, movies. Here's a classic one:
Sample JSON object
{
"title": "The Godfather",
"director": "Francis Ford Coppola",
"year": 1972
}
To index the above JSON object we decide on an index name ("movies"), a type name ("movie") and
an ID ("1") and make a request following the pattern described above with the JSON object in the
body.
A request that indexes the sample JSON object as a document of type 'movie' in an index named
'movies'.
Execute the above request using cURL or paste it into sense and hit the green arrow to run it. After
doing so, given that ElasticSearch is running, you should see a response looking like this:
Response from ElasticSearch to the indexing request.
{
"_index": "movies",
"_type": "movie",
"_id": "1",
"_version": 1,
"created": true
}
Larger View
The request for, and result of, indexing the movie in Sense.
The request for, and result of, indexing the movie in Sense. As you see, the response from
ElasticSearch is also a JSON object. It's properties describe the result of the operation. The first three
properties simply echo the information that we specified in the URL that we made the request to.
While this can be convenient in some cases it may seem redundant. However, remember that the ID
part of the URL is optional and if we don't specify an ID the _id property will be generated for us and
its value may then be of great interest to us.
The fourth property, _version, tells us that this is the first version of this document (the document
with type "movie" with ID "1") in the index. This is also confirmed by the fifth property, "created",
whose value is true.
Now that we've got a movie in our index let's look at how we can update it, adding a list of genres to
it. In order to do that we simply index it again using the same ID. In other words, we make the exact
same indexing request as as before but with an extended JSON object containing genres.
Indexing request with the same URL as before but with an updated JSON payload.
{
"_index": "movies",
"_type": "movie",
"_id": "1",
"_version": 2,
"created": false
}
Not surprisingly the first three properties are the same as before. However, the _version property now
reflects that the document has been updated as it now has 2 a version number. The created property is
also different, now having the value false. This tells us that the document already existed and
therefore wasn't created from scratch.
If we supply a version in indexing requests ElasticSearch will then only overwrite the document if the
supplied version is the same as for the document in the index. To try this out add a version query
string parameter to the URL of the request with "1" as value, making it look like this:
Indexing request with a 'version' query string parameter.
Now the response from ElasticSearch is different. This time it contains an error property with a
message explaining that the indexing didn't happen due to a version conflict.
Response from ElasticSearch indicating a version conflict.
{
"error": "VersionConflictEngineException[[movies][2] [movie][1]: version conflict, cu\
rrent [2], provided [1]]",
"status": 409
}
Use of content on this site is subject to the restrictions set forth in the Terms of Use.
Page Layout and Design © 2018 Skillsoft Ireland Limited - All rights reserved, individual content is
owned by respective copyright holder.
v1.0.8.414
Privacy and Cookie Policy Terms of Use Help Print Page Citation
ElasticSearch Quick Start: An Introduction to ElasticSearch in Tutorial Form
Basic CRUD
Joel Abrahamsson © 2015
Getting by ID
We've seen how to indexing documents, both new ones and existing ones, and have looked at how
ElasticSearch responds to such requests. However, we haven't actually confirmed that the documents
exists, only that ES tells us so.
So, how do we retrieve a document from an ElasticSearch index? Of course we could search for it.
However that's overkill if we only want to retrieve a single document with a known ID. A simpler and
faster approach is be to retrieve it by ID.
In order to do that we make a GET request to the same URL as when we indexed it, only this time the
ID part of the URL is mandatory. In other words, in order to retrieve a document by ID from
ElasticSearch we make a GET request to http://localhost:9200//<type>/<id>. Let's try it with our
movie using the following request:
GET request for retrieving the movie with ID 1.
{
"_index": "movies",
"_type": "movie",
"_id": "1",
"_version": 2,
"found": true,
"_source": {
"title": "The Godfather",
"director": "Francis Ford Coppola",
"year": 1972,
"genres": [
"Crime",
"Drama"
]
}
}
As you can see the result object contains similar meta data as we saw when indexing, such as index,
type and version. Last but not least it has a property named _source which contains the actual
document body. There's not much more to say about GET as it's pretty straightforward. Let's move on
to the final CRUD operation.
Use of content on this site is subject to the restrictions set forth in the Terms of Use.
Page Layout and Design © 2018 Skillsoft Ireland Limited - All rights reserved, individual content is
owned by respective copyright holder.
v1.0.8.414
Privacy and Cookie Policy Terms of Use Help Print Page Citation
ElasticSearch Quick Start: An Introduction to ElasticSearch in Tutorial Form
Basic CRUD
Joel Abrahamsson © 2015
Deleting documents
In order to remove a single document from the index by ID we again use the same URL as for
indexing and retrieving it, only this time we change the HTTP verb to DELETE.
Request for deleting the movie with ID 1.
The response object contains some of the usual suspects in terms of meta data, along with a property
named "_found" indicating that the document was indeed found and that the operation was successful.
Response to the DELETE request.
{
"found": true,
"_index": "movies",
"_type": "movie",
"_id": "1",
"_version": 3
}
If we, after executing the DELETE request, switch back to GET we can verify that the document has
indeed been deleted:
Response when making the the DELETE request a second time.
{
"_index": "movies",
"_type": "movie",
"_id": "1",
"found": false
}
Use of content on this site is subject to the restrictions set forth in the Terms of Use.
Page Layout and Design © 2018 Skillsoft Ireland Limited - All rights reserved, individual content is
owned by respective copyright holder.
v1.0.8.414
Privacy and Cookie Policy Terms of Use Help Print Page Citation
ElasticSearch Quick Start: An Introduction to ElasticSearch in Tutorial Form
Basic CRUD
Joel Abrahamsson © 2015
{
"title": "The Godfather",
"director": {
"givenName": "Francis Ford",
"surName": "Coppola"
},
"year": 1972
}
{
"title": "The Godfather",
"director": {
"givenNames": ["Francis", "Ford"],
"surNames": ["Coppola"]
},
"year": 1972
}
ElasticSearch Quick Start: An Introduction to ElasticSearch in Tutorial Form
Searching
Joel Abrahamsson © 2015
Searching
Overview
So, we've covered the basics of working with data in an ElasticSearch index and it's time to move on
to more exciting things - searching. However, considering the last thing we did was to delete the only
document we had from our index we'll first need some sample data. Below is a number of indexing
requests that we'll use.
Indexing request for sample data.
It's worth pointing out that ElasticSearch has and endpoint (_bulk) for indexing multiple documents
with a single request. We'll cover that in a later chapter. For now we keep it simple and use six
separate requests.
Use of content on this site is subject to the restrictions set forth in the Terms of Use.
Page Layout and Design © 2018 Skillsoft Ireland Limited - All rights reserved, individual content is
owned by respective copyright holder.
v1.0.8.414
Privacy and Cookie Policy Terms of Use Help Print Page Citation
ElasticSearch Quick Start: An Introduction to ElasticSearch in Tutorial Form
Searching
Joel Abrahamsson © 2015
Let's give searching a try by making a GET request to the second URL above.
A search request limited to the 'movies' index but without any other criteria.
Use of content on this site is subject to the restrictions set forth in the Terms of Use.
Page Layout and Design © 2018 Skillsoft Ireland Limited - All rights reserved, individual content is
owned by respective copyright holder.
v1.0.8.414
Privacy and Cookie Policy Terms of Use Help Print Page Citation