Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
93 views

Introduction To Elasticsearch.: Ruslan Zavacky

DB
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views

Introduction To Elasticsearch.: Ruslan Zavacky

DB
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

introduction to

elasticsearch.
Ruslan Zavacky

@ruslanzavacky | ruslan.zavacky@gmail.com
Released in 2010

In 2014, 70$ million in Series C
funding

2
real time data real time analytics
Data flows into your system all the time. The question is … Search isn’t just free text search anymore - it’s about
how quickly can that data become an insight? With exploring your data. Understanding it. Gaining insights
Elasticsearch, real-time is the only time. that will make your business better or improve your
product.

high availability multi-tenancy


Elasticsearch clusters are resilient - they will detect and A cluster can host multiple indices which can be queried
remove failed nodes, and reorganise themselves to ensure independently or as a group. Index aliases allow you to
that your data is safe and accessible. add indexes on the fly, while being transparent to your
application.

3
full text search document oriented
Elasticsearch uses Lucene under the covers to provide the Store complex real world entities in Elasticsearch as
most powerful full text search capabilities available in any structured JSON documents. All fields are indexed by
open source product. Search comes with multi-language default, and all the indices can be used in a single query,
support, a powerful query language, support for to return results at breath taking speed.
geolocation, context aware did-you-mean suggestions,
autocomplete and search snippets.

conflict management schema free


Optimistic version control can be used where needed to Elasticsearch allows you to get started easily. Toss it a
ensure that data is never lost due to conflicting changes JSON document and it will try to detect the data structure,
from multiple processes index the data and make it searchable. Later, apply your
domain specific knowledge of your data to customise how
your data is indexed.

4
restful api per-operation persistence
Elasticsearch is API driven. Almost any action can be Elasticsearch puts your data safety first. Document
performed using a simple RESTful API using JSON over changes are recorded in transaction logs on multiple
HTTP. An API already exists in the language of your nodes in the cluster to minimise the chance of any data
choice. loss.

apache 2 open source license build on top of apache lucene™


Elasticsearch can be downloaded, used and modified free Apache Lucene is a high performance, full-featured
of charge. It is available under the Apache 2 license, one Information Retrieval library, written in Java. Elasticsearch
of the most flexible open source licenses available. uses Lucene internally to build its state of the art
distributed search and analytics capabilities.

5
who

6
I
7
8
Unstructured search

9
Structured search

10
Enrichment

11
Sorting

12
Pagination

13
Aggregation

14
Suggestions

15
Elasticsearch in 10 seconds

• Schema-free, REST & JSON based distributed


document store

• Open Source: Apache License 2.0

• Zero configuration

• Written in Java, extensible

16
The most
important question

17
18
Exploding kittens
on Kickstarter
> 195,794 bakers
> $7,840,830 pledged
… and yes, Kickstarter use
elasticsearch

19
Capabilities

20
Capabilities
Store schema less data
Or create a schema for your data
Manipulate your data record by record
Or use Multi-document APIs to do Bulk ops
Perform Queries/Filters on your data for insights
Or if you are DevOps person, use APIs to monitor
Do not forget about built-in Full-Text search and analysis
Document API Search APIs Indices API Cat APIs Cluster API Query DSL

Validate API Search API More Like This API Mapping Analysis Modules
21
Auto Completion

SELECT name
FROM product
WHERE name LIKE ‘d%’

1k records 500k records 20m records

22
Auto Completion

Yea, sure…

23
Auto Completion: FST

24
Auto Completion
Multiple Inputs Going fuzzy
Single Unified Output Statistics
Scoring
Payloads
Synonyms
Ignoring stopwords

25
Auto Completion
curl -X PUT localhost:9200/hotels/hotel/2 -d '
{
"name" : "Hotel Monaco",
"city" : "Munich",
"name_suggest" : {
"input" : [
"Monaco Munich",
"Hotel Monaco"
],
"output": "Hotel Monaco",
"weight": 10
}
}'

26
Faceted Navigation

27
Aggregation & Filtering

Documents

28
Aggregation & Filtering

Documents

Query

29
Aggregation & Filtering

Documents

Query

Buckets

30
Aggregation & Filtering

Documents

Query

Buckets

31
Aggregation & Filtering

Documents

Query

Buckets

Metrics 123 344 545

32
Faceted Navigation

33
Snapshot / Restore
Snapshot
curl -XPUT "localhost:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true"

Restore
curl -XPOST "localhost:9200/_snapshot/my_backup/snapshot_1/_restore"

34
Percolate API
Store queries in ElasticSearch.
Pass documents as queries.

Observe matched queries.

WUT?

35
Percolate API
Use Case
You tell customer, that you will notify them
when Plane ticket will be available and
cheaper.
Solution
Store customer criteria about desired flight
- departure, destination, max price
When you store flight data, match it against
saved percolators.
36
Percolate API
Store Query
curl -XPUT 'localhost:9200/my-index/.percolator/1' -d '{
"query" : {
"match" : {
"message" : "bonsai tree"
}
}
}'

Match document
curl -XGET 'localhost:9200/my-index/my-type/_percolate'
-d '{
"doc" : {
"message" : "A new bonsai tree in the office"
}
}'

37
Percolate API
{
"took" : 19,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"total" : 1,
"matches" : [
{
"_index" : "my-index",
"_id" : "1"
}
]
}

38
More like this API
curl -XGET 'http://localhost:9200/memes/meme/1/_mlt?mlt_fields=face&min_doc_freq=1'

39
scalability

40
Distributed & scalable
Replication
Read scalability
Removing SPOF

Sharding
Split logical data over several machines
Write scalability
Control data flows

41
Distributed & scalable

node 1
curl -X PUT localhost:9200/orders -d ’{
“settings.index.number_of_shards" : 4
orders “settings.index.number_of_replicas”: 1
1 2 }'

3 4

curl -X PUT localhost:9200/products -d ’{


products
“settings.index.number_of_shards" : 2
1 2 “settings.index.number_of_replicas”: 0
}'

42
Distributed & scalable

node 1 node 2
orders orders

1 2 1 2

3 4 3 4

products products

1 2

43
Distributed & scalable

node 1 node 2 node 3


orders orders orders

1 2 2 1

4 3 3 4

products products products

1 2

44
API tour

45
Create

» curl -X PUT localhost:9200/books/book/1 -d '


{
"title" : "Elasticsearch - The definitive guide",
"authors" : "Clinton Gormley",
"started" : "2013-02-04",
"pages" : 230
}'

46
Update

» curl -X PUT localhost:9200/books/book/1 -d '


{
"title" : "Elasticsearch - The definitive guide",
"authors" : [ "Clinton Gormley", "Zachary Tong"],
"started" : "2013-02-04",
"pages" : 230
}'

47
Delete

» curl -X DELETE localhost:9200/books/book/1

Get

» curl -X GET localhost:9200/books/book/1

48
Search

» curl -X GET localhost:9200/books/_search?q=elasticsearch

{
"took" : 2, "timed_out" : false,
"_shards" : { "total" : 5, "successful" : 5, "failed" : 0 },
"hits" : {
"total" : 1, "max_score" : 0.076713204,
"hits" : [ {
"_index" : “books", "_type" : “book", "_id" : "1",
"_score" : 0.076713204, "_source" : {
"title" : "Elasticsearch - The definitive guide",
"authors" : [ "Clinton Gormley", "Zachary Tong" ],
"started" : “2013-02-04", "pages" : 230
}
}]
}
}
49
Search Query DSL
»»curl
curl -XGET
-XGET ‘localhost:9200/books/book/_search'
‘localhost:9200/books/book/_search' -d
-d '{
'{
"query":
"query": {{
"filtered"
"filtered" :: {{
"query"
"query" :: {{
"match":
"match": {{
"text"
"text" :: {{
"query"
"query" :: “To
“To Be
Be Or
Or Not
Not To
To Be",
Be",
"cutoff_frequency" : 0.01
"cutoff_frequency" : 0.01
}}
}}
},
},
"filter"
"filter" :: {{
"range":
"range": {{
"price":
"price": {{
"gte":
"gte": 20.0
20.0
"lte": 50.0
"lte": 50.0
……
}
}
}'
}'

50
Use case: Product Search Engine

51
Product Search Engine

Just index all your products and be happy?


Search is not that easy

Synonyms, Suggestions, Faceting, De-compounding,


Custom scoring, Analytics, Price agents,
Query optimisation, beyond search

52
Neutrality? Really?
Is full-text search relevancy really your
preferred scoring algorithm?

Possible influential factors

Age of the product, been ordered in last 24h


In stock?
Special offer
Provision
No shipping costs
Rating (product, seller)
Returns
….
53
Neutrality? Really?

54
Neutrality? Really?

55
ecosystem

56
Ecosystem

• Plugins
• Clients for many languages
• Kibana
• Logstash
• Hadoop integration
• Marvel

57
Ecosystem

• Plugins
• Clients for many languages
• Kibana
• Logstash
• Hadoop integration
• Marvel

58
spoiler alert!

59
what is data?

60
provides value for
Whatever
your business.

61
Domain data Application data
Internal
Orders Log files
products
 Metrics

External
Social media streams
email

62
63
Logstash
• Managing events and logs

• Collect data

• Parse data

• Enrich data

• Store data (search and visualising)

64
Why collect and centralise data?

• Access log files without system access

• Shell scripting: Too limited or slow

• Using unique ids for errors, aggregate it across


your stack
• Reporting (everyone can create his/her own report)

• Bonus points: Unify your data to make it easily


searchable

65
Unify dates
• apache [19/Feb/2015:19:00:00 +0000]

• unix timestamp 1424372400

• log4j [2015-02-19 19:00:00,000]

• postfix.log Feb 19 19:00:00

• ISO 8601 2015-02-19T19:00:00+02:00

66
Logstash

}
• Managing events and logs
Input
• Collect data

• Parse data

• Enrich data } Filter

• Store data (search and visualise)


} Output
67
kibana

68
Kibana

69
Kibana

70
Kibana

71
Kibana

72
Thank You!

73
Feedback

☺ ! ☹
Sponsors of XXVIII DevClub.lv

You might also like