The user dictionary functionality of Solr's Kuromoji tokenizer is extremely useful and easy to use, but it isn't always the right tool for the job. In my case, I'm migrating our system off of the MeCab tokenizer. MeCab also allows you customize the tokenization, but the two models are completely different. In Kuromoji's user dictionary, you take an untokenized phrase and provide the custom tokeniz
In the latest release of Apache Solr (4.2), support for DocValue field types were added. DocValues have been in the works in Lucene for a while (~5 years), also under the name “column stride fields”, but recently have become stable enough to incorporate into Solr. For those of you not familiar with the Lucene lingo, DocValues are column-oriented fields. In other words, values of DocValue fields ar
31 Oct Better synonym handling in Solr Posted October 31, 2012 by Nolan Lawson in NLP. Tagged: information retrieval, lucene, nlp, query expansion, solr, synonyms. 88 Comments Update: Download the plugin on Github. It’s a pretty common scenario when working with a Solr-powered search engine: you have a list of synonyms, and you want user queries to match documents with synonymous terms. Sounds eas
This reference guide describes Apache Solr 4, an open source solution for search. You can download Apache Solr from the Solr website at http://lucene.apache.org/solr/. This Guide contains the following sections: Getting Started: This section guides you through the installation and setup of Solr. Using the Solr Administration User Interface: This section introduces the Solr Web-based user interface
第9回Solr勉強会 に行ってきました。 日本語解析の話、Web サービスにおける運用の話がメインでした。 実サービスでしっかりと利用されているんだなぁ、と実感できる勉強会でした。 まとめ、レポートなど。相変わらず早いですね。 2012/11/26(#solrjp)第9回Solr勉強会 (togetter まとめ) 第9回Solr勉強会を主催しました。#SolrJP (@johtani の日記) Who we are, what we do, and a little bit about Kuromoji Atilika Inc. の Christian Moenさん Atilika のコアは search engine, big data analysis, NLP の3本立て。 製品を開発してコンサルティングもやるっぽい。customer-driven innovation と称するモ
日本語Wikipediaなどの「辞書型コーパス」からLucene/Solr用の類義語辞書を自動作成するシステムを開発しましたので、簡単にご紹介します。 参考資料(SlideShare) 辞書型コーパスからの類義語知識の自動獲得(SlideShare) Lucene/Solrと類義語検索 Lucene/SolrではSynonymFilterを使って類義語検索を簡単に実現することができます。たとえば次のような内容のsynonyms.txtを用意し: 自動車損害賠償責任保険, 自賠責保険 Solrのschema.xmlファイルに次のようなフィールド型を定義すれば: <fieldType name="text_ja" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.Japane
“Solr or Elasticsearch?”…well, at least that is the common question we hear from Sematext’s consulting services clients and prospects. Which one is better, Solr or Elasticsearch? Which one is faster? Which one scales better? Which one is easier to manage? Which one should we use? Is there any advantage to migrating from Solr to Elasticsearch? – and the list goes on. These are all great questions,