Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Anwendungsfälle für
Florian Hopf
@fhopf
http://www.florian-hopf.de 15.07.2014
Agenda
Vorbereitung
curl -XGET http://localhost:9200
{
"status" : 200,"name" : "Hawkeye",
"version" : {
"number" : "1.2.1",
"build_hash" : "6c95b759f9e7ef0f8e17f77d850da43ce8a4b364",
"build_timestamp" : "2014-06-03T15:02:52Z",
"build_snapshot" : false,
"lucene_version" : "4.8"
},
"tagline" : "You Know, for Search"
}
Installation
curl -XGET http://localhost:9200
{
"status" : 200,"name" : "Hawkeye",
"version" : {
"number" : "1.2.1",
"build_hash" : "6c95b759f9e7ef0f8e17f77d850da43ce8a4b364",
"build_timestamp" : "2014-06-03T15:02:52Z",
"build_snapshot" : false,
"lucene_version" : "4.8"
},
"tagline" : "You Know, for Search"
}
# download archive
wget https://download.elasticsearch.org/
elasticsearch/elasticsearch/elasticsearch-1.2.1.zip
# zip is for windows and linux
unzip elasticsearch-1.2.1.zip
# on windows: elasticsearch.bat
elasticsearch-1.2.1/bin/elasticsearch
curl -XGET http://localhost:9200
{
"status" : 200,"name" : "Hawkeye",
"version" : {
"number" : "1.2.1",
"build_hash" : "6c95b759f9e7ef0f8e17f77d850da43ce8a4b364",
"build_timestamp" : "2014-06-03T15:02:52Z",
"build_snapshot" : false,
"lucene_version" : "4.8"
},
"tagline" : "You Know, for Search"
}
Zugriff
curl -XGET http://localhost:9200
{
"status" : 200,"name" : "Hawkeye",
"version" : {
"number" : "1.2.1",
"build_hash" : "6c95b759f9e7ef0f8e17f77d850da43ce8a4b364",
"build_timestamp" : "2014-06-03T15:02:52Z",
"build_snapshot" : false,
"lucene_version" : "4.8"
},
"tagline" : "You Know, for Search"
}
curl -XGET http://localhost:9200
{
"status" : 200,"name" : "Hawkeye",
"version" : {
"number" : "1.2.1",
"build_hash" : "6c95b759f9e7ef0f8e17f77d850da43ce8a4b364",
"build_timestamp" : "2014-06-03T15:02:52Z",
"build_snapshot" : false,
"lucene_version" : "4.8"
},
"tagline" : "You Know, for Search"
}
Document Store
Document
{
"title" : "Anwendungsfälle für Elasticsearch",
"speaker" : "Florian Hopf",
"date" : "2014-07-15T16:30:00.000Z",
"tags" : ["Java", "Lucene"],
"conference" : {
"name" : "Developer Week",
"city" : "Nürnberg"
}
}
Speichern
curl -XPOST http://localhost:9200/conferences/talk/
--data-binary @talk-example.json
{
"_index":"conferences",
"_type":"talk",
"_id":"GqjY7l8sTxa3jLaFx67_aw",
"_version":1,
"created":true
}
Speichern
curl -XPOST http://localhost:9200/conferences/talk/
--data-binary @talk-example.json
{
"_index":"conferences",
"_type":"talk",
"_id":"GqjY7l8sTxa3jLaFx67_aw",
"_version":1,
"created":true
}
Index
Speichern
curl -XPOST http://localhost:9200/conferences/talk/
--data-binary @talk-example.json
{
"_index":"conferences",
"_type":"talk",
"_id":"GqjY7l8sTxa3jLaFx67_aw",
"_version":1,
"created":true
}
Index Type
Lesen
curl -XGET http://localhost:9200/conferences/talk/
GqjY7l8sTxa3jLaFx67_aw?pretty=true
{
"_index" : "conferences",
[...]
"_source":{
"title" : "Anwendungsfälle für Elasticsearch",
"speaker" : "Florian Hopf",
"date" : "2014-07-15T16:30:00.000Z",
"tags" : ["Java", "Lucene"],
"conference" : {
"name" : "Developer Week",
"city" : "Nürnberg"
}
}
}
Sharding
● Aufteilen eines Index in mehrere Teile
– Default: 5 Shards pro Elasticsearch-Index
● Mehrere Elasticsearch-Instanzen können einen Cluster bilden
– Automatische Verteilung auf die Knoten im Cluster
Sharding
Sharding
Sharding
● Einfache Speicherung von JSON-Dokumenten
● Index und Type
● Sharding für große Datenmengen
● Verteilung ist First Class Citizen
Recap
Users
● HipChat
– http://highscalability.com/blog/2014/1/6/how-hipchat-stores-and-
indexes-billions-of-messages-using-el.html
● Engagor
– http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-
elasticsearch/
– http://www.elasticsearch.org/case-study/engagor/
Volltextsuche
Anwendungsfaelle für Elasticsearch
Anwendungsfaelle für Elasticsearch
Suche per Parameter
curl -XGET "http://localhost:9200/conferences/talk/_search
?q=elasticsearch&pretty=true"
{"took" : 73,
[…]
"hits" : {
[…]
"hits" : [ {
[…]
"_score" : 0.076713204,
"_source":{
"title" : "Anwendungsfälle für Elasticsearch",
"tags" : ["Java", "Lucene"],
[…]
} } ]
}
}
Query DSL
curl -XPOST "http://localhost:9200/conferences/_search " -d'
{
"query": {
"match": {
"title" : {
"query": "elasticsaerch",
"fuzziness": 2
}
}
},
"filter": {
"term": {
"conference.city": "nürnberg"
}
}
}'
Sprache
curl -XGET "http://localhost:9200/conferences/talk/_search
?q=title:anwendungsfall&pretty=true"
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
Term Document Id
anwendungsfall 1
elasticsearch 1,2
fur 1
mit 1
such 1
verteilt 1
1. Tokenization
2. Lowercasing
3. Stemming
Anwendungsfälle
für Elasticsearch
Verteiltes
Suchen mit
Elasticsearch
Analyzing
Mapping
curl -XDELETE "http://localhost:9200/conferences/"
curl -XPUT "http://localhost:9200/conferences/“
curl -XPUT "http://localhost:9200/conferences/talk/_mapping" -d'
{
"properties": {
"tags": {
"type": "string",
"index": "not_analyzed"
},
"title": {
"type": "string",
"analyzer": "german"
}
}
}'
Sprache
curl -XGET "http://localhost:9200/conferences/talk/_search
?q=title:anwendungsfall&pretty=true"
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
[…]
}
}
Was noch?
● Faceting/Aggregations
● Suggestions
● Highlighting
● Sortierung
● Pagination
● ...
Recap
● Ausdrucksstarke Suchen über Query DSL
● Analyzing als Kernfunktionaltät
● Alle Lucene-Goodies verfügbar
Users
● GitHub
– http://exploringelasticsearch.com/github_interview.html
– http://www.elasticsearch.org/case-study/github/
●
StackOverflow
– http://meta.stackexchange.com/questions/160100/a-new-search-engine-for-stack-exchange
– http://nickcraver.com/blog/2013/11/22/what-it-takes-to-run-stack-overflow/
●
SoundCloud
– http://developers.soundcloud.com/blog/architecture-behind-our-new-search-and-explore-experience
– http://www.elasticsearch.org/case-study/soundcloud/
● XING
– http://www.elasticsearch.org/case-study/xing/
Flexibler Cache
Anwendung
DB
Setup Suche
Nur Suche?
Anwendung
DB
Queries
Listing
curl -XPOST "http://localhost:9200/conferences/_search " -d'
{
"filter": {
"term": {
"conference.city": "nürnberg"
}
}
}'
Geo-Suche
Strukturierte Suche
● Nicht nur Volltext
– Strukturierte Daten: Geo- und numerische Daten, Datumswerte
● Geopoint als Datentyp
● Sortierung
● Filterung
Anwendungen
● Zeige nächste Filiale
● Filialsuche
● Sortierung Kleinanzeigen
● Sortierung Locations
● Filterung auf Nähe
● Social Media-Analysen
Document
{
"title" : "Anwendungsfälle für Elasticsearch",
"speaker" : "Florian Hopf",
"date" : "2014-07-15T16:30:00.000Z",
"tags" : ["Java", "Lucene"],
"conference" : {
"name" : "Developer Week",
"city" : "Nürnberg",
"coordinates": {
"lon": "11.115358",
"lat": "49.417175"
}
}
}
Mapping
curl -XPUT "http://localhost:9200/conferences/talk/_mapping" -d'
{
"properties": {
[…],
"conference": {
"type": "object",
"properties": {
"coordinates": {
"type": "geo_point"
}
}
}
}
}'
Sortierung
curl -XPOST "http://localhost:9200/conferences/_search " -d'
{
"sort" : [
{
"_geo_distance" : {
"conference.coordinates" : {
"lon": 8.403697,
"lat": 49.006616
},
"order" : "asc",
"unit" : "km"
}
}
]
}'
Filterung
curl -XPOST "http://localhost:9200/conferences/_search" -d'
{
"filter": {
"geo_distance": {
"conference.coordinates": {
"lon": 8.403697,
"lat": 49.006616
},
"distance": "200km",
"distance_type": "arc"
}
}
}'
Recap
● Elasticsearch kann mehr als Volltext
● Ausgefeilte Geo-Algorithmen
● Sortierung nach Distanz
● Filterung nach Distanz oder Bereich
● Berechnung von Distanz
Users
● FourSquare
– http://engineering.foursquare.com/2012/08/09/foursquare-now-uses-
elastic-search-and-on-a-related-note-slashem-also-works-with-
elastic-search/
● Gild
– http://www.elasticsearch.org/case-study/gild/
Logfile-Analyse
Logfile-Analyse
● Zentralisierung Logs aus Anwendungen
● Zentralisierung Logs über Maschinen
– Auch ohne Zugriff
● Leichte Durchsuchbarkeit
● Real-Time-Analysis / Visualisierung
● Daten für alle!
Logfile-Analyse
● Einlesen
– Logstash
● Speicherung
– Elasticsearch
● Auswertung
– Kibana
Logfile-Analyse
Logstash-Config
input {
file {
path => "/var/log/apache2/access.log"
}
}
filter {
grok {
match => { message => "%{COMBINEDAPACHELOG}" }
}
}
output {
elasticsearch_http {
host => "localhost"
}
}
Kibana
Recap
● Einlesen, Anreichern, Speichern von Logevents
● Zahlreiche Inputs in Logstash
● Konsolidierung
● Zentralisierung
● Auswertung
Users
● Mailgun
– http://www.elasticsearch.org/blog/using-elasticsearch-and-logstash-
to-serve-billions-of-searchable-events-for-customers/
● CERN
– https://medium.com/@ghoranyi/needle-in-a-haystack-873c97a99983
● Bloomberg
– http://www.elasticsearch.org/videos/using-elasticsearch-logstash-
kibana-techologies-centralized-viewing-logs-bloomberg/
Analytics
Analytics
● Aggregationen auf Feldern
● Auswertung auch großer Datenmengen
– Social Media
– Data Warehouse
● Datenkonsolidierung aus unterschiedlichen Quellen
● Visualisierung
Aggregations
curl -XGET "http://localhost:9200/devoxx/tweet/_search" -d'
{
"aggs" : {
"hashtags" : {
"terms" : {
"field" : "hashtag.text"
}
}
}
}'
Aggregations
Aggregations
"aggregations": {
"hashtags": {
"buckets": [
{
"key": "dartlang",
"doc_count": 229
},
{
"key": "java",
"doc_count": 216
},
[...]
Aggregations
Aggregations
curl -XGET "http://localhost:9200/devoxx/tweet/_search" -d'
{
"aggs" : {
"hashtags" : {
"terms" : {
"field" : "hashtag.text"
},
"aggs" : {
"hashtagusers" : {
"terms" : {
"field" : "user.screen_name"
}
}
}
}
}
}'
Aggregations
Aggregations
"key": "scala",
"doc_count": 130,
"hashtagusers": {
"buckets": [
{
"key": "jaceklaskowski",
"doc_count": 74
},
{
"key": "ManningBooks",
"doc_count": 3
},
[...]
Aggregations
● Bucket Aggregations
– terms
– (date_)histogram
– range
– significant_terms
– ...
● Metrics Aggregations
– min, max, sum, avg
– stats
– percentiles
– value_count
– ...
Aggregations
Tweets
Recap
● Auswertung großer Datenmengen
● Visualisierung
● Zahlreiche Aggregationen
– Berechnungen, max, min, mean
– Terms, SignificantTerms
Users
● Engagor
● The Guardian
– http://www.elasticsearch.org/blog/using-elasticsearch-and-logstash-
to-serve-billions-of-searchable-events-for-customers/
– http://www.infoq.com/presentations/elasticsearch-guardian
● Cogenta
– http://www.elasticsearch.org/case-study/cogenta/
Agenda
@fhopf
mail@florian-hopf.de
http://blog.florian-hopf.de
Vielen
Dank!
● http://www.morguefile.com/archive/display/685952
● http://www.morguefile.com/archive/display/2359
● http://www.morguefile.com/archive/display/615356
● http://www.morguefile.com/archive/display/914733
● http://www.morguefile.com/archive/display/826258
● http://www.morguefile.com/archive/display/170605
● http://www.morguefile.com/archive/display/181488
Images

More Related Content

Anwendungsfaelle für Elasticsearch