Introduction To Elasticsearch.: Ruslan Zavacky
Introduction To Elasticsearch.: Ruslan Zavacky
elasticsearch.
Ruslan Zavacky
@ruslanzavacky | ruslan.zavacky@gmail.com
Released in 2010
In 2014, 70$ million in Series C
funding
2
real time data real time analytics
Data flows into your system all the time. The question is … Search isn’t just free text search anymore - it’s about
how quickly can that data become an insight? With exploring your data. Understanding it. Gaining insights
Elasticsearch, real-time is the only time. that will make your business better or improve your
product.
3
full text search document oriented
Elasticsearch uses Lucene under the covers to provide the Store complex real world entities in Elasticsearch as
most powerful full text search capabilities available in any structured JSON documents. All fields are indexed by
open source product. Search comes with multi-language default, and all the indices can be used in a single query,
support, a powerful query language, support for to return results at breath taking speed.
geolocation, context aware did-you-mean suggestions,
autocomplete and search snippets.
4
restful api per-operation persistence
Elasticsearch is API driven. Almost any action can be Elasticsearch puts your data safety first. Document
performed using a simple RESTful API using JSON over changes are recorded in transaction logs on multiple
HTTP. An API already exists in the language of your nodes in the cluster to minimise the chance of any data
choice. loss.
5
who
6
I
7
8
Unstructured search
9
Structured search
10
Enrichment
11
Sorting
12
Pagination
13
Aggregation
14
Suggestions
15
Elasticsearch in 10 seconds
• Zero configuration
16
The most
important question
17
18
Exploding kittens
on Kickstarter
> 195,794 bakers
> $7,840,830 pledged
… and yes, Kickstarter use
elasticsearch
19
Capabilities
20
Capabilities
Store schema less data
Or create a schema for your data
Manipulate your data record by record
Or use Multi-document APIs to do Bulk ops
Perform Queries/Filters on your data for insights
Or if you are DevOps person, use APIs to monitor
Do not forget about built-in Full-Text search and analysis
Document API Search APIs Indices API Cat APIs Cluster API Query DSL
Validate API Search API More Like This API Mapping Analysis Modules
21
Auto Completion
SELECT name
FROM product
WHERE name LIKE ‘d%’
22
Auto Completion
Yea, sure…
23
Auto Completion: FST
24
Auto Completion
Multiple Inputs Going fuzzy
Single Unified Output Statistics
Scoring
Payloads
Synonyms
Ignoring stopwords
25
Auto Completion
curl -X PUT localhost:9200/hotels/hotel/2 -d '
{
"name" : "Hotel Monaco",
"city" : "Munich",
"name_suggest" : {
"input" : [
"Monaco Munich",
"Hotel Monaco"
],
"output": "Hotel Monaco",
"weight": 10
}
}'
26
Faceted Navigation
27
Aggregation & Filtering
Documents
28
Aggregation & Filtering
Documents
Query
29
Aggregation & Filtering
Documents
Query
Buckets
30
Aggregation & Filtering
Documents
Query
Buckets
31
Aggregation & Filtering
Documents
Query
Buckets
32
Faceted Navigation
33
Snapshot / Restore
Snapshot
curl -XPUT "localhost:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true"
Restore
curl -XPOST "localhost:9200/_snapshot/my_backup/snapshot_1/_restore"
34
Percolate API
Store queries in ElasticSearch.
Pass documents as queries.
Observe matched queries.
WUT?
35
Percolate API
Use Case
You tell customer, that you will notify them
when Plane ticket will be available and
cheaper.
Solution
Store customer criteria about desired flight
- departure, destination, max price
When you store flight data, match it against
saved percolators.
36
Percolate API
Store Query
curl -XPUT 'localhost:9200/my-index/.percolator/1' -d '{
"query" : {
"match" : {
"message" : "bonsai tree"
}
}
}'
Match document
curl -XGET 'localhost:9200/my-index/my-type/_percolate'
-d '{
"doc" : {
"message" : "A new bonsai tree in the office"
}
}'
37
Percolate API
{
"took" : 19,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"total" : 1,
"matches" : [
{
"_index" : "my-index",
"_id" : "1"
}
]
}
38
More like this API
curl -XGET 'http://localhost:9200/memes/meme/1/_mlt?mlt_fields=face&min_doc_freq=1'
39
scalability
40
Distributed & scalable
Replication
Read scalability
Removing SPOF
Sharding
Split logical data over several machines
Write scalability
Control data flows
41
Distributed & scalable
node 1
curl -X PUT localhost:9200/orders -d ’{
“settings.index.number_of_shards" : 4
orders “settings.index.number_of_replicas”: 1
1 2 }'
3 4
42
Distributed & scalable
node 1 node 2
orders orders
1 2 1 2
3 4 3 4
products products
1 2
43
Distributed & scalable
1 2 2 1
4 3 3 4
1 2
44
API tour
45
Create
46
Update
47
Delete
Get
48
Search
{
"took" : 2, "timed_out" : false,
"_shards" : { "total" : 5, "successful" : 5, "failed" : 0 },
"hits" : {
"total" : 1, "max_score" : 0.076713204,
"hits" : [ {
"_index" : “books", "_type" : “book", "_id" : "1",
"_score" : 0.076713204, "_source" : {
"title" : "Elasticsearch - The definitive guide",
"authors" : [ "Clinton Gormley", "Zachary Tong" ],
"started" : “2013-02-04", "pages" : 230
}
}]
}
}
49
Search Query DSL
»»curl
curl -XGET
-XGET ‘localhost:9200/books/book/_search'
‘localhost:9200/books/book/_search' -d
-d '{
'{
"query":
"query": {{
"filtered"
"filtered" :: {{
"query"
"query" :: {{
"match":
"match": {{
"text"
"text" :: {{
"query"
"query" :: “To
“To Be
Be Or
Or Not
Not To
To Be",
Be",
"cutoff_frequency" : 0.01
"cutoff_frequency" : 0.01
}}
}}
},
},
"filter"
"filter" :: {{
"range":
"range": {{
"price":
"price": {{
"gte":
"gte": 20.0
20.0
"lte": 50.0
"lte": 50.0
……
}
}
}'
}'
50
Use case: Product Search Engine
51
Product Search Engine
52
Neutrality? Really?
Is full-text search relevancy really your
preferred scoring algorithm?
54
Neutrality? Really?
55
ecosystem
56
Ecosystem
• Plugins
• Clients for many languages
• Kibana
• Logstash
• Hadoop integration
• Marvel
57
Ecosystem
• Plugins
• Clients for many languages
• Kibana
• Logstash
• Hadoop integration
• Marvel
58
spoiler alert!
59
what is data?
60
provides value for
Whatever
your business.
61
Domain data Application data
Internal
Orders Log files
products
Metrics
External
Social media streams
email
62
63
Logstash
• Managing events and logs
• Collect data
• Parse data
• Enrich data
64
Why collect and centralise data?
65
Unify dates
• apache [19/Feb/2015:19:00:00 +0000]
66
Logstash
}
• Managing events and logs
Input
• Collect data
• Parse data
68
Kibana
69
Kibana
70
Kibana
71
Kibana
72
Thank You!
73
Feedback
☺ ! ☹
Sponsors of XXVIII DevClub.lv