Compass Reference
Compass Reference
Compass Reference
2.2.0 GA
Copies of this document may be made for your own use and for distribution to others, provided that you do not
charge any fee for such copies and further provided that each copy contains this Copyright Notice, whether
distributed in print or electronically.
Table of Contents
Preface ................................................................................................................................................
1. Introduction ..................................................................................................................................
1.1. Overview ............................................................................................................................. 1
1.2. I use ... ................................................................................................................................. 2
1.2.1. ... Lucene .................................................................................................................. 2
1.2.2. ... Domain Model ....................................................................................................... 4
1.2.3. ... Xml Model ............................................................................................................ 4
1.2.4. ... No Model .............................................................................................................. 4
1.2.5. ... ORM Framework ................................................................................................... 4
1.2.6. ... Spring Framework ................................................................................................. 5
I. Compass Core .................................................................................................................................
2. Introduction ..........................................................................................................................
2.1. Overview ..................................................................................................................... 7
2.2. Session Types .............................................................................................................. 8
2.3. Session Lifecycle ......................................................................................................... 8
2.4. Template and Callback ................................................................................................. 9
3. Configuration ........................................................................................................................
3.1. Programmatic Configuration ......................................................................................... 10
3.2. XML/JSON Configuration ............................................................................................ 12
3.2.1. Schema Based Configuration .............................................................................. 12
3.2.2. JSON Based Configuration ................................................................................ 14
3.2.3. DTD Based Configuration .................................................................................. 14
3.3. Obtaining a Compass reference ..................................................................................... 15
3.4. Rebuilding Compass ..................................................................................................... 15
3.5. Configuring Callback Events ........................................................................................ 15
4. Connection ............................................................................................................................
4.1. File System Store ......................................................................................................... 16
4.2. RAM Store .................................................................................................................. 17
4.3. Jdbc Store .................................................................................................................... 17
4.3.1. Managed Environment ....................................................................................... 18
4.3.2. Data Source Provider ......................................................................................... 18
4.3.3. File Entry Handler ............................................................................................. 20
4.3.4. DDL ................................................................................................................. 20
4.4. Lock Factory ................................................................................................................ 21
4.5. Local Directory Cache .................................................................................................. 21
4.6. Lucene Directory Wrapper ............................................................................................ 22
4.6.1. SyncMemoryMirrorDirectoryWrapperProvider ................................................... 22
4.6.2. AsyncMemoryMirrorDirectoryWrapperProvider ................................................. 23
5. Search Engine .......................................................................................................................
5.1. Introduction ................................................................................................................. 24
5.2. Alias, Resource and Property ........................................................................................ 24
5.2.1. Using Resource/Property .................................................................................... 24
5.3. Analyzers .................................................................................................................... 24
5.3.1. Configuring Analyzers ....................................................................................... 25
5.3.2. Analyzer Filter .................................................................................................. 26
5.3.3. Handling Synonyms .......................................................................................... 26
5.4. Similarity ..................................................................................................................... 27
5.5. Query Parser ................................................................................................................ 27
6.6.4. id ...................................................................................................................... 63
6.6.5. property ............................................................................................................ 64
6.6.6. analyzer ............................................................................................................ 66
6.6.7. boost ................................................................................................................. 66
6.6.8. meta-data .......................................................................................................... 67
6.6.9. dynamic-meta-data ............................................................................................ 68
6.6.10. component ...................................................................................................... 69
6.6.11. reference ......................................................................................................... 70
6.6.12. parent .............................................................................................................. 72
6.6.13. constant .......................................................................................................... 72
7. XSEM - Xml to Search Engine Mapping ...............................................................................
7.1. Introduction ................................................................................................................. 74
7.2. Xml Object .................................................................................................................. 74
7.3. Xml Content Handling .................................................................................................. 74
7.4. Raw Xml Object .......................................................................................................... 76
7.5. Mapping Definition ...................................................................................................... 76
7.5.1. Converters ........................................................................................................ 79
7.5.2. xml-object ......................................................................................................... 79
7.5.3. xml-id ............................................................................................................... 80
7.5.4. xml-property ..................................................................................................... 81
7.5.5. xml-analyzer ..................................................................................................... 82
7.5.6. xml-boost .......................................................................................................... 83
7.5.7. xml-content ....................................................................................................... 83
8. JSEM - JSOM to Search Engine Mapping ............................................................................
8.1. Introduction ................................................................................................................. 85
8.2. JSON API Abstraction .................................................................................................. 87
8.3. Content Mapping ......................................................................................................... 87
8.4. Raw Json Object .......................................................................................................... 88
8.5. Mapping Definitions ..................................................................................................... 88
8.5.1. root-json-object ................................................................................................. 88
8.5.2. json-id .............................................................................................................. 88
8.5.3. json-property ..................................................................................................... 89
8.5.4. json-object ........................................................................................................ 90
8.5.5. json-array .......................................................................................................... 91
8.5.6. json-content ...................................................................................................... 91
8.5.7. json-boost ......................................................................................................... 92
8.5.8. json-analyzer ..................................................................................................... 92
9. RSEM - Resource/Search Engine Mapping ...........................................................................
9.1. Introduction ................................................................................................................. 94
9.2. Mapping Declaration .................................................................................................... 94
9.2.1. resource ............................................................................................................ 96
9.2.2. resource-contract ............................................................................................... 97
9.2.3. resource-id ........................................................................................................ 97
9.2.4. resource-property .............................................................................................. 98
9.2.5. resource-analyzer .............................................................................................. 99
9.2.6. resource-boost ................................................................................................... 99
10. Common Meta Data ............................................................................................................
10.1. Introduction ............................................................................................................... 101
10.2. Commnon Meta Data Definition .................................................................................. 101
10.3. Using the Definition ................................................................................................... 102
10.4. Commnon Meta Data Ant Task ................................................................................... 103
11. Transaction .........................................................................................................................
As you will see, Compass is geared towards integrating Search Engine functionality into any type of
application (web app, rich client, middle-ware, ...). We ask you, the user, to give feedback on how complex it
was to integrate Compass into your application, and places where Compass can be enhanced to make things
even simpler.
Compass is a powerful, transactional Java Search Engine framework. Compass allows you to declaratively map
your Object domain model to the underlying Search Engine, synchronizing data changes between search engine
index and different datasources. Compass provides a high level abstraction on top of the Lucene low level API.
Compass also implements fast index operations and optimization and introduces transaction capabilities to the
Search Engine.
• The simplest solution for enabling search capabilities within your application stack.
This document provides a reference guide to Compass's features. Since this document is still to be considered
very much work-in-progress, if you have any requests or comments, please post them on Compass forum, or
Compass JIRA (issue tracking).
Before we continue, the Compass team would like to thank the Hibernate and Spring Framework teams, for
providing the template DocBook definition and configuration, which help us create this reference guide.
Shay Banon (kimchy), the creator of Compass, decided to write a simple Java based recipe management
software for his wife (who happens to be a chef). Main requirement for the tool, since it was going to hold
substantial cooking knowledge, was to be able to get to the relevant information fast. Using Spring Framework,
Hibernate, and all the other tools out there that makes a developer life simple, he was surprised to find nothing
similar in the search engine department. Now, don't get him wrong, Lucene is an amazing search engine (or IR
library), but developers want simplicity, and Lucene comes with an added complexity that caused Shay to start
Compass.
In todays applications, search is becoming a "must have" requirement. Users expect applications (rich clients,
web based, sever side, ...) to provide snappy and relevant search results the same way Google does for the web.
Let it be a recipe management software, a trading application, or a content management driven web site, users
expect search results across the whole app business domain model.
Java developers on the other hand, need to implement it. As Java developers are getting used to simplified
development model, with Hibernate, Spring Framework, and EJB3 to name a few, up until now there was a
lack in a simple to use Java Search Engine solution. Compass aim is to fill this gap.
Many applications, once starting to use a search engine in order to implement that illusive search box, find that
the search engine can then be used for many data extraction related operations. Once a search engine holds a
valid representation of the application business model, many times it just makes sense to execute simple queries
against it instead of going to the actual data store (usually a database). Two prime examples are Jira and
Confluence, which perform many of the reporting and search (naturally) operations using a search engine
instead of the usual database operations.
1.1. Overview
Compass provides a breadth of features geared towards integrating search engine functionality. The next
diagram shows the different Compass modules, followed by a short description of each one.
Overview of Compass
Compass Core is the most fundamental part of Compass. It holds Lucene extensions for transactional index,
search engine abstraction, ORM like API, transaction management integration, different mappings technologies
(OSEM, XSEM and RSEM), and more. The aim of Compass core is to be usable within different scenarios and
environments, and simplify the core operations done with a search engine.
Compass Gps aim is to integrate with different content sources. The prime feature is the integration with
different ORM frameworks (Hibernate, JPA, JDO, OJB), allowing for almost transparent integration between a
search engine and an ORM view of content that resides in a database. Other features include a Jdbc integration,
which allows to index database content using configurable SQL expression responsible for extracting the
content.
Compass Spring integrate Compass with the Spring Framework. Spring, being an easy to use application
framework, provides a simpler development model (based on dependency injection and much more). Compass
integrates with Spring in the same manner ORM Frameworks integration is done within the Spring Framework
code-base. It also integrates with Spring transaction abstraction layer, AOP support, and MVC library.
Direct Lucene
Compass tries to be a good Lucene citizen, allowing to use most of Lucene classes directly within Compass. If
your application has a specialized Query, Analyzer, or Filter, you can use them directly with Compass.
Compass does have its own index structure, divided into sub indexes, but each sub index is a fully functional
Lucene index.
Compass created a search engine abstraction, with its main (and only) implementation using Lucene. Lucene is
an amazing, fast, and stable search engine (or IR library), yet the main problem with integrating Lucene with
our application is its low-level usage and API.
For people who use or know Lucene, it is important to explain new terms that are introduced by Compass.
Resource is Compass abstraction on top of a Lucene Document, and Property is Compass abstraction on top of
Lucene Field. Both do not add much on top of the actual Lucene implementations, except for Resource, which
is associated with an Alias. For more information, please read Chapter 5, Search Engine.
Resource is the lowest level data object used in Compass, with all different mapping technologies are geared
towards generating it. Compass comes with a low level mapping technology called RSEM (Resource/Search
Engine Mapping), which allows to declaratively define resource mapping definitions. RSEM can be used when
an existing system already uses Lucene (upgrade to Compass should be minimal), or when an application does
not have a rich domain model (Object or XML).
An additional feature built on top of Compass converter framework, is that a Property value does not have to
be a String (as in Lucene Field). Objects can be used as values, with specific or default converters applied to
them. For more information, please read Chapter 9, RSEM - Resource/Search Engine Mapping.
Simple API
Compass exposes a very simple API. If you have experience with an ORM tool (Hibernate, JPA, ...), you
should feel very comfortable with Compass API. Also, Lucene has three main classes, IndexReader, Searcher
and IndexWriter. It is difficult, especially for developers unfamiliar with Lucene, to understand how to
perform operations against the index (while still having a performant system). Compass has a single interface,
with all operations available through it. Compass also abstract the user from the gory details of opening and
closing readers/searchers/writers, as well as caching and invalidating them. For more information, please read
Chapter 2, Introduction, and Chapter 12, Working with objects.
Lucene is not transactional. This causes problems when trying to integrate Lucene with other transactional
resources (like database or messaging). Compass provides support for two phase commits transactions
(read_committed and serializable), implemented on top of Lucene index segmentations. The implementation
provides fast commits (faster than Lucene), though they do require the concept of Optimizers that will keep the
index at bay. For more information, please read Section 5.7, “Transaction”, and Section 5.10, “Optimizers”.
On top of providing support for a transactional index, Compass provides integration with different transaction
managers (like JTA), and provides a local one. For more information, please read Chapter 11, Transaction.
Fast Updates
In Lucene, in order to perform an update, you must first delete the old Document and then create a new
Document. This is not trivial, especially because of the usage of two different interfaces to perform the delete
(IndexReader) and create (IndexWriter) operations, it is also very delicate in terms of performance. Thanks to
Compass support for transactional index, and the fact that each saved Resource in Compass must be
identifiable (through the use of mapping definition), makes executing an update using Compass both simple
All Support
When working with Lucene, there is no way to search on all the fields stored in a Document. One must
programmatically create synthetic fields that aggregate all the other fields in order to provide an "all" field, as
well as providing it when querying the index. Compass does it all for you, by default Compass creates that "all"
field and it acts as the default search field. Of course, in the spirit of being as configurable as possible, the "all"
property can be enabled or disabled, have a different name, or not act as the default search property. One can
also exclude certain mappings from participating in the all property.
Index Fragmentation
When building a Lucene enabled application, sometimes (for performance reasons) the index actually consists
of several indexes. Compass will automatically fragment the index into several sub indexes using a
configurable sub index hashing function, allowing to hash different searchable objects (Resource, mapped
object, or an XmlObject) into a sub index (or several of them). For more information, please read Section 5.6,
“Index Structure”.
One of Compass main features is OSEM (Object/Search Engine Mapping). Using either annotations or xml
definitions (or a combination), mapping definitions from a rich domain model into a search engine can be
defined. For more information, please read Chapter 6, OSEM - Object/Search Engine Mapping.
One of Compass main features is XSEM (Xml/Search Engine Mapping). If your application is built around
Xml data, you can map it directly to the search engine using simple xml based mapping definitions based on
xpath expressions. For more information, please read Chapter 7, XSEM - Xml to Search Engine Mapping.
If no specific domain model is defined for the application (for example, in a messaging system based on
properties), RSEM (Resource/Search Engine Mapping) can be used. A Resource can be considered as a fancy
hash map, allowing for completely open data that can be saved in Compass. A resource mapping definition
needs to be defined for "types" of resources, with at least one resource id definition (a resource must be
identifiable). Additional resource properties mapping can be defined, with declarative definition of its
characteristics (search engine, converter, ...). For more information, please read Chapter 9, RSEM -
Resource/Search Engine Mapping.
Built on top of Compass Core, Compass Gps (which is aimed at integrating Compass with other datasources)
integrates with most popular ORM frameworks. The integration consists of two main features:
Index Operation
Automatically index data from the database using the ORM framework into the search engine using Compass
(and OSEM). Objects that have both OSEM and ORM definitions will be indexed, with the ability to provide
custom filters.
Mirror Operation
For ORM frameworks that support event registration (most do), Compass will automatically register its own
event listeners to reflect any changes made to the database using the ORM API into the search engine.
For more information, please read Chapter 15, Introduction. Some of the ORM frameworks supports are:
Chapter 17, Embedded Hibernate, Chapter 19, JPA (Java Persistence API), Chapter 20, Embedded OpenJPA,
and Chapter 23, iBatis.
The aim of Compass::Spring module is to provide seamless integration with the Spring Framework (as if a
Spring developer wrote it :)).
First level of integration is very similar to Spring provided ORM integration, with a LocalCompassBean which
allows to configure Compass within a Spring context, and a CompassDaoSupport class. For more information,
please read Chapter 24, Introduction and Chapter 25, DAO Support.
Spring AOP integration, providing simple advices which helps to mirror data changes done within a Spring
powered application. For applications with a data source or a tool with no Gps device that works with it (or it
does not have mirroring capabilities - like iBatis), the mirror advices can make synchronizing changes made to
the data source and Compass index simpler. For more information, please read Chapter 29, Spring AOP.
For web applications that use Spring MVC, Compass provides a search and index controllers. The index
controller can automatically perform the index operation on a CompassGps, only the initiator view and result
view need to be written. The search controller can automatically perform the search operation (With
pagination), requiring only the search initiator and search results view (usually the same one). For more
information, please read Chapter 30, Spring MVC Support.
Last, LocalCompassBean can be configured using Spring 2 new schema based configuration.
The two most important chapters are Chapter 2, Introduction, which explains the high level Compass API's
(similar to your usual ORM framework), and Chapter 3, Configuration which explains how to configure a
Compass instance. Chapter 5, Search Engine dives into details of Compass search engine abstraction,
explaining concepts, index structure, and Lucene extensions (both Compass extensions and Compass enabling
Lucene features simply). In order to start using Compass, reading this chapter is not required, though it does
provide a good overview of how Compass abstracts Lucene, and explains how to configure advance search
engine related features (Analyzers, Optimizers, Sub Index Hashing, and so on).
The following chapters dive into details of Compass different mapping technologies. Chapter 6, OSEM -
Object/Search Engine Mapping explain how to use Compass OSEM (Object/Search Engine Mapping),
Chapter 7, XSEM - Xml to Search Engine Mapping goes into details of how to use Compass XSEM
(Xml/Search Engine Mapping), and Chapter 9, RSEM - Resource/Search Engine Mapping dives into Compass
RSEM (Resource/Search Engine Mapping), which is a low level, Lucene like, mapping support.
Chapter 10, Common Meta Data explains Compass support for creating a semantic model defined outside of
the mapping definitions. Using it is, in the spirit of Compass, completely optional, and depends on the
developers if they wish to use it within their Compass enabled application.
Chapter 11, Transaction goes into details of the different ways to integrate Compass transaction support within
different transaction managers. It explains both local (Compass managed) transaction, and JTA integration.
2.1. Overview
Compass API
As you will learn in this chapter, Compass high level API looks strangely familiar. If you used an ORM
framework (Hibernate, JDO or JPA), you should feel right at home. This is of-course, intentional. The aim is to
let the developer learn as less as possible in terms of interaction with Compass. Also, there are so many design
patterns of integrating this type of API with different applications models, that it is a shame that they won't be
used with Compass as well.
For Hibernate users, Compass maps to SessionFactory, CompassSession maps to Session, and
CompassTransaction maps to Transaction.
Compass is built using a layered architecture. Applications interacts with the underlying Search Engine through
three main Compass interfaces: Compass, CompassSession and optionally CompassTransaction. These
interfaces hide the implementation details of Compass Search Engine abstraction layer.
Compass provide access to search engine management functionality and CompassSession's for managing data
within the Search Engine. It is created using CompassConfiguration (loads configuration and mappings files).
When Compass is created, it will either join an existing index or create a new one if none is available. After this,
an application will use Compass to obtain a CompassSession in order to start managing the data with the Search
Engine. Compass is a heavyweight object, usually created at application startup and shared within an application
for CompassSession creation.
CompassSession as the name suggests, represents a working lightweight session within Compass (it is non
thread safe). With a CompassSession, applications can save and retrieve any searchable data (declared in
Compass mapping files) from the Search Engine. Applications work with CompassSession at either the Object
level or Compass Resource level to save and retrieve data. In order to work with Objects within Compass, they
must be specified using either OSEM, XSEM (with XSEM XmlObject) or JSEM. In order to work with
Resources, they must be specified using RSEM (Resource can still be used with OSEM/XSEM/JSEM to
display search results, since Objects/Xml/Json end up being converted to Resources). Compass will then
retrieve the declared searchable data from the Object automatically when saving Objects within Compass.
When querying the Search Engine, Compass provides a CompassHits interface which one can use to work with
the search engine results (getting scores, resources and mapped objects).
CompassConfiguration conf =
new CompassConfiguration().configure().addClass(Author.class);
Compass compass = conf.buildCompass();
CompassSession session = compass.openSession();
try {
...
session.save(author);
CompassHits hits = session.find("jack london");
Author a = (Author) hits.data(0);
Resource r = hits.resource(0);
...
session.commit();
} catch (CompassException ce) {
session.rollback();
}
CompassTransaction, retrieved from the CompassSession and is used to optionally manage transactions within
Compass (if it is not used, as in the above example, they will be managed automatically). You can configure
Compass Core to use either local transactions JTA synchronization, XA, Spring synchronization, or embedded
JPA providers synchronization (Hibernate, OpenJPA, EclipseLink, TopLink). Note, CompassTransaction is
completely optional and used when fine grained transactional control is required.
CompassConfiguration conf =
new CompassConfiguration().configure().addClass(Author.class);
Compass compass = conf.buildCompass();
CompassSession session = compass.openSession();
CompassTransaction tx = null;
try {
tx = session.beginTransaction();
...
session.save(author);
CompassHits hits = session.find("jack london");
Author a = (Author) hits.data(0);
Resource r = hits.resource(0);
...
tx.commit();
} catch (CompassException ce) {
if (tx != null) tx.rollback();
} finally {
session.close();
}
Compass::Core Compass interface manages the creation of CompassSession using the openSession() method.
When beginTransaction() is called on the CompassTransaction (or the first operation is performed on the
session, when not explicit transaction is used), the session is bound to the created transaction (JTA, Spring or
Local) and used throughout the life-cycle of the transaction. It means that if an additional session is opened
within the current transaction, the originating session will be returned by the openSession() method.
When using the openSession method, Compass will automatically try and join an already running outer
transaction. An outer transaction can be an already running local Compass transaction, a JTA transaction, or a
Spring managed transaction. If Compass manages to join an existing outer transaction, the application does not
need to call CompassSession#beginTransaction() or use CompassTransaction to manage the transaction
(since it is already managed). This allows to simplify the usage of Compass within managed environments
(CMT or Spring) where a transaction is already in progress by not requiring explicit Compass code to manage a
Compass transaction.
CompassConfiguration conf =
new CompassConfiguration().configure().addClass(Author.class);
Compass compass = conf.buildCompass();
CompassTemplate template = new CompassTemplate(compass);
template.save(author); // open a session, transaction, and closes both
Author a = (Author) template.execute(new CompassCallback() {
public Object doInCompass(CompassSession session) {
// all the actions here are within the same session
// and transaction
session.save(author);
CompassHits hits = session.find("london");
...
return session.load(id);
}
});
Throughout this manual, we will use the schema based configuration file to show examples of how to configure
certain features. This does not mean that they can not be expressed in a settings based configuration (either
programmatic or DTD based configuration file). For a complete list of all the different settings in compass,
please refer to Appendix A, Configuration Settings.
Compass must be configured to work with a specific applications domain model. There are a large number of
configuration parameters available (with default settings), which controls how Compass works internally and
with the underlying Search Engine. This section describes the configuration API.
In order to create a Compass instance, it first must be configured. CompassConfiguration can be used in order
to configure a Compass instance, by having the ability to add different mapping definitions, configure Compass
based on xml or JSON configuration files, and expose a programmatic configuration options.
CompassConfiguration provides several API's for adding mapping definitions (xml mapping files suffixed
.cpm.xml or annotated classes), as well as Common Meta Data definition (xml mapping files suffixed
.cmd.xml). The following table summarizes the most important API's:
API Description
API Description
addScan(String basePackage, String Scans for all the mappings that exist wihtin the base
pattern) backage recursively. An optioal ant style pattern can be
provided as well. The mappings detected are all the xml
based mappings. Annotation based mappings will be
detected automatically if either ASM or Javassist exists
within the classpath.
Other than mapping file configuration API, Compass can be configured through the CompassSettings
interface. CompassSettings is similar to Java Properties class and is accessible via the
CompassConfiguration.getSettings() or the CopmassConfiguration.setSetting(String setting,
String value) methods. Compass's many different settings are explained in Appendix A, Configuration
Settings.
In terms of required settings, Compass only requires the compass.engine.connection (which maps to
CompassEnvironment.CONNECTION) parameter defined.
Global Converters (classes that implement Compass Converter) can also be registered with the configuration
to be used by the created compass instances. The converters are registered under a logical name, and can be
referenced in the mapping definitions. The method to register a global converter is registerConverter.
Again, many words and so little code... . The following code example shows the minimal
CompassConfiguration with programmatic control:
An important aspect of settings (properties like) configuration is the notion of group settings. Similar to the way
log4j properties configuration file works, different aspects of Compass need to be configured in a grouped
nature. If we take Compass converter configuration as an example, here is an example of a set of settings to
configure a custom converter called test:
org.compass.converter.test.type=eg.TestConverter
org.compass.converter.test.param1=value1
org.compass.converter.test.param2=value2
Compass defined prefix for all converter configuration is org.compass.converter. The segment that comes
afterwards (up until the next '.') is the converter (group) name, which is set to test. This will be the converter
name that the converter will be registered under (and referenced by in different mapping definitions). Within
the group, the following settings are defined: type, param1, and param2. type is one of the required settings for
a custom Compass converter, and has the value of the fully qualified class name of the converter
implementation. param1 and param2 are custom settings, that can be used by the custom converter (it should
implement CompassConfigurable).
API Description
Compass uses the schema based configuration as a different view on top of its support for settings based
configuration (properties like). Compass translates all the different, more expressive, xml structure into their
equivalent settings as described in Appendix A, Configuration Settings.
The preferred way to configure Compass (and the simplest way) is to use an Xml configuration file, which
validates against a Schema. It allows for richer and more descriptive (and less erroneous) configuration of
Compass. The schema is fully annotated, with each element and attribute documented within the schema. Note,
that some additional information is explained in the Configuration Settings appendix, even if it does not apply
in terms of the name of the setting to be used, it is advisable to read the appropriate section for more fuller
explanation (such as converters, highlighters, analyzers, and so on).
Here are a few sample configuration files, just to get a feeling of the structure and nature of the configuration
file. The first is a simple file based index with the OSEM definitions for the Author class.
<compass-core-config xmlns="http://www.compass-project.org/schema/core-config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.compass-project.org/schema/core-config
http://www.compass-project.org/schema/compass-core-config-2.2.xsd">
<compass name="default">
<connection>
<file path="target/test-index"/>
</connection>
<mappings>
<class name="test.Author" />
</mappings>
</compass>
</compass-core-config>
The next sample configures a jdbc based index, with a bigger buffer size for default file entries:
<compass-core-config xmlns="http://www.compass-project.org/schema/core-config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.compass-project.org/schema/core-config
http://www.compass-project.org/schema/compass-core-config-2.2.xsd">
<compass name="default">
<connection>
<jdbc dialect="org.apache.lucene.store.jdbc.dialect.HSQLDialect">
<dataSourceProvider>
<driverManager url="jdbc:hsqldb:mem:test" username="sa" password=""
driverClass="org.hsqldb.jdbcDriver" />
</dataSourceProvider>
<fileEntries>
<fileEntry name="__default__">
<indexInput bufferSize="4096" />
<indexOutput bufferSize="4096" />
</fileEntry>
</fileEntries>
</jdbc>
</connection>
</compass>
</compass-core-config>
The next sample configures a jdbc based index, with a JTA transaction (note the managed="true" and
commitBeforeCompletion="true"):
<compass-core-config xmlns="http://www.compass-project.org/schema/core-config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.compass-project.org/schema/core-config
http://www.compass-project.org/schema/compass-core-config-2.2.xsd">
<compass name="default">
<connection>
<jdbc dialect="org.apache.lucene.store.jdbc.dialect.HSQLDialect" managed="true">
<dataSourceProvider>
<driverManager url="jdbc:hsqldb:mem:test" username="sa" password=""
driverClass="org.hsqldb.jdbcDriver" />
</dataSourceProvider>
</jdbc>
</connection>
<transaction factory="org.compass.core.transaction.JTASyncTransactionFactory" commitBeforeCompletion="true"
</transaction>
</compass>
</compass-core-config>
Here is another sample, that configures another analyzer, a specialized Converter, and changed the default date
format for all Java Dates (date is an internal name that maps to Compass default date Converter).
<compass-core-config xmlns="http://www.compass-project.org/schema/core-config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.compass-project.org/schema/core-config
http://www.compass-project.org/schema/compass-core-config-2.2.xsd">
<compass name="default">
<connection>
<file path="target/test-index"/>
</connection>
<converters>
<converter name="date" type="org.compass.core.converter.basic.DateConverter">
<setting name="format" value="yyyy-MM-dd" />
</converter>
<converter name="myConverter" type="test.Myconverter" />
</converters>
<searchEngine>
<analyzer name="test" type="Snowball" snowballType="Lovins">
<stopWords>
<stopWord value="test" />
</stopWords>
</analyzer>
</searchEngine>
</compass>
</compass-core-config>
The next configuration uses lucene transaction isolation, with a higher ram buffer size for faster indexing.
<compass-core-config xmlns="http://www.compass-project.org/schema/core-config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.compass-project.org/schema/core-config
http://www.compass-project.org/schema/compass-core-config-2.2.xsd">
<compass name="default">
<connection>
<file path="target/test-index"/>
</connection>
<settings>
<setting name="compass.engine.ramBufferSize" value="64" />
</settings>
</compass>
</compass-core-config>
Compass can be configured using JSON based configuration. Basically, the JSON configuration breaks the
settings into JSON elements. The Configuration Settings are explained in Appendix A, Configuration Settings.
Here is an example:
{
compass : {
engine : {
connection : "test-index"
},
event : {
preCreate : {
event1 : {
type : "test.MyEvent1"
},
event2 : {
type : "test.MyEvent2"
}
}
}
}
}
Compass can be configured using a DTD based xml configuration. The DTD configuration is less expressive
than the schema based one, allowing to configure mappings and Compass settings. The Configuration Settings
are explained in Appendix A, Configuration Settings.
<compass-core-configuration>
<compass>
<setting name="compass.engine.connection">my/index/dir</setting>
</compass>
</compass-core-configuration>
Note: It is possible to have multiple Compass instances within the same application, each with a different
configuration.
The "old" Compass instance is closed after a graceful time (default to 60 seconds). The time can be controlled
using the following setting: compass.rebuild.sleepBeforeClose.
Configuring event listener can be done settings. For example, to configure a pre save event listener, the
following setting should be used: compass.event.preSave.mylistener.type and its value can be the actual
class name of the listener.
More information for each listener can be found in the javadoc under the org.compass.events package. An
important note with regards to pre listener is the fact that they can filter out certain operations.
Lucene comes with a Directory abstraction on top of the actual index storage. Compass uses Lucene built in
different directories implementations, as well as have custom implementations built on top of it.
The only required configuration for a Compass instance (using the CompassConfiguration) is its connection.
The connection controls where the index will be saved, or in other words, the storage location of the index. This
chapter will review the different options of index storage that comes with Compass, and try to expand on some
of important aspects when using a certain storage (like clustering support).
<compass name="default">
<connection>
<file path="target/test-index"/>
</connection>
</compass>
compass.engine.connection=file://target/test-index
Another option for file system based configuration is using Java 1.4 NIO mmap feature. The NIO should
perform better under certain environment/load then the default file based one. We recommend performing some
performance tests (preferable as close to your production system configuration as possible), and check which
one performs better. Here is an example of a simple file system based connection configuration that stores the
index in the target/test-index path:
<compass name="default">
<connection>
<mmap path="target/test-index"/>
</connection>
</compass>
compass.engine.connection=mmap://target/test-index
Yet Another option for file system based configuration is using Java 1.4 NIO feature. The NIO should perform
better under certain environment/load then the default file based one (especially under heavily concurrent
systems). We recommend performing some performance tests (preferable as close to your production system
configuration as possible), and check which one performs better. Note, there are known problems when using
this under windows operating system due to a SUN JVM bug. Here is an example of a simple file system based
connection configuration that stores the index in the target/test-index path:
<compass name="default">
<connection>
<niofs path="target/test-index"/>
</connection>
</compass>
compass.engine.connection=niofs://target/test-index
When using file system based index storage, locking (for transaction support) is done using lock files. The
existence of the file means a certain sub index is locked. The default lock file directory is java.io.tmp system
property.
Clustering support for file system based storage usually means sharing the file system between different
machines (running different Compass instances). Current locking mechanism will require to set the locking
directory on the shared file system, see the LockFactory section for more information.
Another important note regarding using a shared file system based index storage is not to use NFS. For best
performance, a SAN based solution is recommended.
<compass name="default">
<connection>
<ram path="/index"/>
</connection>
</compass>
compass.engine.connection=ram://index
Compass implementation, JdbcDirectory, which is built on top of Lucene Directory abstraction is completely
decoupled from the rest of Compass, and can be used with pure Lucene applications. For more information,
please read Appendix B, Lucene Jdbc Directory. Naturally, when using it within Compass it allows for simpler
configuration, especially in terms of transaction management and Jdbc DataSource management.
Here is a simple example of using Jdbc to store the index. The example configuration assumes a standalone
configuration, with no data source pooling.
<compass name="default">
<connection>
<jdbc>
<dataSourceProvider>
<driverManager url="jdbc:hsqldb:mem:test"
username="sa" password=""
driverClass="org.hsqldb.jdbcDriver" />
</dataSourceProvider>
</jdbc>
</connection>
</compass>
The above configuration does not define a dialect attribute on the jdbc element. Compass will try to auto-detect
the database dialect based on the database meta-data. If it fails to find one, a dialect can be set, in our case it
should be dialect="org.apache.lucene.store.jdbc.dialect.HSQLDialect".
It is important to understand if Compass is working within a "managed" environment or not when it comes to a
Jdbc index storage. A managed environment is an environment where Compass is not in control of the
transaction management (in case of configuring Compass with JTA or Spring transaction management). If
Compass is in control of the transaction, i.e. using Local transaction factory, it is not considered a managed
environment.
When working in a non managed environment, Compass will wrap the data source with a
TransactionAwareDataSourceProxy, and will commit/rollback the Jdbc connection. When working within a
managed environment, no wrapping will be performed, and Compass will let the external transaction manager
to commit/rollback the connection.
Usually, but not always, when working in a managed environment, the Jdbc data source used will be from an
external system/configuration. Most of the times it will either be JNDI or external data source provider (like
Spring). For more information about different data source providers, read the next section.
By default, Compass works as if within a non managed environment. The managed attribute on the jdbc
element should be set to true otherwise.
All different data source supported by Compass allow to configure the autoCommit flag. There are three values
allowed for autoCommit: false, true and external (don't set the autoCommit explicitly, assume it is
configured elsewhere). The autoCommit mode defaults to false and it is the recommended value (external can
also be used, but make sure to set the actual data source to false).
The simplest of all providers. Does not requires any external libraries or systems. Main drawback is
performance, since it performs no pooling of any kind. The first sample of a Jdbc configuration earlier in this
chapter used the driver manager as a data source provider.
Compass can be configured to use Jakarta Commons DBCP as a data source provider. It is the preferred option
than the driver manager provider for performance reasons (it is up to you if you want to use it or c3p0
explained later in this section). Here is an example of using it:
<compass name="default">
<connection>
<jdbc>
<dataSourceProvider>
<dbcp url="jdbc:hsqldb:mem:test"
username="sa" password=""
driverClass="org.hsqldb.jdbcDriver"
maxActive="10" maxWait="5" maxIdle="2" initialSize="3" minIdle="4"
poolPreparedStatements="true" />
</dataSourceProvider>
</jdbc>
</connection>
</compass>
The configuration shows the different settings that can be used on the dbcp data source provider. They are, by
no means, the recommended values for a typical system. For more information, please consult Jakarta
Commons DBCP documentation.
4.3.2.3. c3p0
Compass can be configured using c3p0 as a data source provider. It is the preferred option than the driver
manager provider for performance reasons (it is up to you if you want to use it or Jakarta Commons DBCP
explained previously in this section). Here is an example of using it:
<compass name="default">
<connection>
<jdbc>
<dataSourceProvider>
<c3p0 url="jdbc:hsqldb:mem:test"
username="testusername" password="testpassword"
driverClass="org.hsqldb.jdbcDriver" />
</dataSourceProvider>
</jdbc>
</connection>
</compass>
The c3p0 data source provider will use c3p0 ComboPooledDataSource, with additional settings can be set by
using c3p0.properties stored as a top-level resource in the same CLASSPATH / classloader that loads c3p0's jar
file. Please consult the c3p0 documentation for additional settings.
4.3.2.4. JNDI
Compass can be configured to look up the data source using JNDI. Here is an example of using it:
<compass name="default">
<connection>
<jdbc>
<dataSourceProvider>
<jndi lookup="testds" username="testusername" password="testpassword" />
</dataSourceProvider>
</jdbc>
</connection>
</compass>
4.3.2.5. External
Compass can be configured to use an external data source using the ExteranlDataSourceProvider. It uses
Java thread local to store the DataSource for later use by the data source provider. Setting the data source uses
the static method setDataSource(DataSource dataSource) on ExteranlDataSourceProvider. Here is an
example of how it can be configured:
<compass name="default">
<connection>
<jdbc>
<dataSourceProvider>
<external username="testusername" password="testpassword"/>
</dataSourceProvider>
</jdbc>
</connection>
</compass>
Note, the username and password are used for the DataSource, and are completely optional.#
Configuring the Jdbc store with Compass also allows defining FileEntryHandler settings for different file
entries in the database. FileEntryHandlers are explained in Appendix B, Lucene Jdbc Directory (and require
some Lucene knowledge). The Lucene Jdbc Directory implementation already comes with sensible defaults,
but they can be changed using Compass configuration.
One of the things that comes free with Compass is automatically using the more performant
FetchPerTransactoinJdbcIndexInput if possible (based on the dialect). Special care need to be taken when
using the mentioned index input, and it is done automatically by Compass.
File entries configuration are associated with a name. The name can be either __default__ which is used for all
unmapped files, it can be the full name of the file stored, or the suffix of the file (the last 3 characters).
Here is an example of the most common configuration of file entries, changing their buffer size for both index
input (used for reading data) and index output (used for writing data):
<compass name="default">
<connection>
<jdbc>
<dataSourceProvider>
<external username="testusername" password="testpassword"/>
</dataSourceProvider>
<fileEntries>
<fileEntry name="__default__">
<indexInput bufferSize="4096" />
<indexOutput bufferSize="4096" />
</fileEntry>
</fileEntries>
</jdbc>
</connection>
</compass>
4.3.4. DDL
Compass by default can create the database schema, and has defaults for the column names, types, sizes and so
on. The schema definition is configurable as well, here is an example of how to configure it:
<compass name="default">
<connection>
<jdbc>
<dataSourceProvider>
<external username="testusername" password="testpassword"/>
</dataSourceProvider>
<ddl>
<nameColumn name="myname" length="70" />
<sizeColumn name="mysize" />
</ddl>
</jdbc>
</connection>
</compass>
Compass by default will drop the tables when deleting the index, and create them when creating the index. If
performing schema based operations is not allowed, the disableSchemaOperations flag can be set to true.
This will cause Compass not to perform any schema based operations.
<compass name="default">
<connection>
<file path="target/test-index" />
<lockFactory type="nativefs" path="test/#subContext#/#subindex#" />
</connection>
</compass>
compass.engine.connection=file://target/test-index
compass.engine.store.lockFactory.type=nativefs
compass.engine.store.lockFactory.path=test/#subContext#/#subindex#
The lock factory type can have the following values: simplefs, nativefs (both file system based locks),
nolock, and singleinstance. A fully qualified class name of LockFactory implementation or
LockFactoryProvider can also be provided. Note, when using the nativefs locks, it is recommended to
specific a lock directory different than the index directory, especially when using Compass Gps.
The path allows to provide path parameter to the file system based locks. This is an optional parameter and
defaults to the sub index location. The specialized keyword #subindex# and #subContext# can be used to be
replaced with the actual sub index and sub context.
Local Cache fully supports several Compass instances running against the same directory (unlike the directory
wrappers explained in the next section) and keeps its local cache state synchronized with external changes
periodically.
There are several types of local cache implementations. The regular connection based types (ram:// or
file://) actually creates a full replica of the directory it is built on. The memory:// based one will keep an
evictable memory based local cache. The memory based local cache accepts a size parameter that controls the
maximum memory size that will be used before things will start to be evicted (defaults to 64m). It also accepts
the bucketSize parameter that controls the size of each cache entry (defaults to 1024 bytes). Note, the
configuration applies to each sub index separately.
Here is an example configuring a ram based local cache for sub index called a:
<compass name="default">
<connection>
<file path="target/test-index" />
<localCache subIndex="a" connection="ram://" />
</connection>
</compass>
compass.engine.connection=target/test-index
compass.engine.localCache.a.connection=ram://
And here is an example of how it can be configured to use local file system cache for all different sub indexes
(using the special __default__ keyword):
<compass name="default">
<connection>
<file path="target/test-index" />
<localCache subIndex="__default__" connection="file://tmp/cache" />
</connection>
</compass>
compass.engine.connection=target/test-index
compass.engine.localCache.__default__.connection=file://tmp/cache
<compass name="default">
<connection>
<file path="target/test-index" />
<localCache subIndex="__default__" connection="memory://size=128m&bucketSize=2k" />
</connection>
</compass>
compass.engine.connection=target/test-index
compass.engine.localCache.__default__.connection=memory://size=128m&bucketSize=2k
Other than using a faster local cache directory implementation, Compass also improve compound file structure
performance by performing the compound operation on the local cache and only flushing the already
compound index structure.
4.6.1. SyncMemoryMirrorDirectoryWrapperProvider
Wraps the given Lucene directory with SyncMemoryMirrorDirectoryWrapper (which is also provided by
Compass). The wrapper wraps the directory with an in memory directory which mirrors it synchronously.
The original directory is read into memory when the wrapper is constructed. All read related operations are
performed against the in memory directory. All write related operations are performed both against the in
memory directory and the original directory. Locking is performed using the in memory directory.
The wrapper will allow for the performance gains that comes with an in memory index (for read/search
operations), while still maintaining a synchronized actual directory which usually uses a more persistent store
than memory (i.e. file system).
This wrapper will only work in cases when either the index is read only (i.e. only search operations are
performed against it), or when there is a single instance which updates the directory.
<compass name="default">
<connection>
<file path="target/test-index"/>
<directoryWrapperProvider name="test"
type="org.compass.core.lucene.engine.store.wrapper.SyncMemoryMirrorDirectoryWrapperProvider">
</directoryWrapperProvider>
</connection>
</compass>
4.6.2. AsyncMemoryMirrorDirectoryWrapperProvider
Wraps the given Lucene directory with AsyncMemoryMirrorDirectoryWrapper (which is also provided by
Compass). The wrapper wraps the directory with an in memory directory which mirrors it asynchronously.
The original directory is read into memory when the wrapper is constructed. All read related operations are
performed against the in memory directory. All write related operations are performed against the in memory
directory and are scheduled to be performed against the original directory (in a separate thread). Locking is
performed using the in memory directory.
The wrapper will allow for the performance gains that comes with an in memory index (for read/search
operations), while still maintaining an asynchronous actual directory which usually uses a more persistent store
than memory (i.e. file system).
This wrapper will only work in cases when either the index is read only (i.e. only search operations are
performed against it), or when there is a single instance which updates the directory.
<compass name="default">
<connection>
<file path="target/test-index"/>
<directoryWrapperProvider name="test"
type="org.compass.core.lucene.engine.store.wrapper.AsyncMemoryMirrorDirectoryWrapperProvider">
<setting name="awaitTermination">10</setting>
<setting name="sharedThread">true</setting>
</directoryWrapperProvider>
</connection>
</compass>
awaitTermination controls how long the wrapper will wait for the async write tasks to finish. When closing
Compass, there might be still async tasks pending to be written to the actual directory, and the setting control
how long (in seconds) Compass will wait for tasks to be executed against the actual directory. sharedThread
set to false controls if each sub index will have its own thread to perform pending "write" operations. If it is
set to true, a single thread will be shared among all the sub indexes.
5.1. Introduction
Compass Core provides an abstraction layer on top of the wonderful Lucene Search Engine. Compass also
provides several additional features on top of Lucene, like two phase transaction management, fast updates, and
optimizers. When trying to explain how Compass works with the Search Engine, first we need to understand
the Search Engine domain model.
Every Resource is associated with one or more id properties. They are required for Compass to manage
Resource loading based on ids and Resource updates (a well known difficulty when using Lucene directly). Id
properties are defined either explicitly in RSEM definitions or implicitly in OSEM/XSEM definitions.
For Lucene users, Compass Resource maps to Lucene Document and Compass Property maps to Lucene
Field.
When working with RSEM, resources acts as your prime data model. They are used to construct searchable
content, as well as manipulate it. When performing a search, resources be used to display the search results.
Another important place where resources can be used, which is often ignored, is with OSEM/XSEM. When
manipulating search content through the use of the application domain model (in case of OSEM), or through
the use of xml data structures (in case of XSEM), resources are rarely used. They can be used when performing
search operations. Based on your mapping definition, the semantic model could be accessed in a uniformed
way through resources and properties.
Lets simplify this statement by using an example. If our application has two object types, Recipe and
Ingredient, we can map both recipe title and ingredient title into the same semantic meta-data name, title
(Resource Property name). This will allow us when searching to display the search results (hits) only on the
Resource level, presenting the value of the property title from the list of resources returned.
5.3. Analyzers
Analyzers are components that pre-process input text. They are also used when searching (the search string has
to be processed the same way that the indexed text was processed). Therefore, it is usually important to use the
same Analyzer for both indexing and searching.
itself comes with several Analyzers and you can configure Compass to work with either one of them. If we
take the following sentence: "The quick brown fox jumped over the lazy dogs", we can see how the different
Analyzers handle it:
whitespace (org.apache.lucene.analysis.WhitespaceAnalyzer):
[The] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs]
simple (org.apache.lucene.analysis.SimpleAnalyzer):
[the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs]
stop (org.apache.lucene.analysis.StopAnalyzer):
[quick] [brown] [fox] [jumped] [over] [lazy] [dogs]
standard (org.apache.lucene.analysis.standard.StandardAnalyzer):
[quick] [brown] [fox] [jumped] [over] [lazy] [dogs]
Lucene also comes with an extension library, holding many more analyzer implementations (including
language specific analyzers). Compass can be configured to work with all of them as well.
A Compass instance acts as a registry of analyzers, with each analyzer bound to a lookup name. Two internal
analyzer names within Compass are: default and search. default is the default analyzer that is used when no
other analyzer is configured (configuration of using different analyzer is usually done in the mapping definition
by referencing a different analyzer lookup name). search is the analyzer used on a search query string when no
other analyzer is configured (configuring a different analyzer when executing a search based on a query string
is done through the query builder API). By default, when nothing is configured, Compass will use Lucene
standard analyzer as the default analyzer.
The following is an example of configuring two analyzers, one that will replace the default analyzer, and
another one registered against myAnalyzer (it will probably later be referenced from within the different
mapping definitions).
<compass name="default">
<connection>
<file path="target/test-index" />
</connection>
<searchEngine>
<analyzer name="deault" type="Snowball" snowballType="Lovins">
<stopWords>
<stopWord value="no" />
</stopWords>
</analyzer>
<analyzer name="myAnalyzer" type="Standard" />
</searchEngine>
</compass>
Compass also supports custom implementations of Lucene Analyzer class (note, the same goal might be
achieved by implementing an analyzer filter, described later). If the implementation also implements
CompassConfigurable, additional settings (parameters) can be injected to it using the configuration file. Here is
an example configuration that registers a custom analyzer implementation that accepts a parameter named
threshold:
<compass name="default">
<connection>
<file path="target/test-index" />
</connection>
<searchEngine>
Filters are provided for simpler support for additional filtering (or enrichment) of analyzed streams, without the
hassle of creating your own analyzer. Also, filters, can be shared across different analyzers, potentially having
different analyzer types.
<compass name="default">
<connection>
<file path="target/test-index" />
</connection>
<searchEngine>
<analyzer name="deafult" type="Standard" filters="test1, test2" />
Since synonyms are a common requirement with a search application, Compass comes with a simple synonym
analyzer filter: SynonymAnalyzerTokenFilterProvider. The implementation requires as a parameter (setting)
an implementation of a SynonymLookupProvider, which can return all the synonyms for a given value. No
implementation is provided, though one that goes to a public synonym database, or a file input structure is
simple to implement. Here is an example of how to configure it:
<compass name="default">
<connection>
<file path="target/test-index" />
</connection>
<searchEngine>
<analyzer name="deafult" type="Standard" filters="synonymFilter" />
Note the fact that we did not set the fully qualified class name for the type, and used synonym. This is a
simplification that comes with Compass (naturally, you can still use the fully qualified class name of the
synonym token filter provider).
5.4. Similarity
Compass can be configured with Lucene Similarity for both indexing and searching. This is advanced
configuration level. By default, Lucene DefaultSimilarity is used for both searching and indexing.
In order to globally change the Similarity, the type of the similarity can be set using
compass.engine.similarity.default.type. The type can either be the actual class name of the Similarity
implementation, or an the class name of SimilarityFactory implementation. Both can optionally implement
CompassConfigurble in order to be injected with CompassSettings.
Specifically, the index similarity can be set using compass.engine.similarity.index.type. The search
similarity can be set using compass.engine.similarity.search.type.
Here is an example of configuring a custom query parser registered under the name test:
<compass name="default">
<connection>
<file path="target/test-index" />
</connection>
<searchEngine>
<queryParser name="test" type="eg.MyQueryParser">
<setting name="param1" value="value1" />
</queryParser>
</searchEngine>
</compass>
Every sub-index has it's own fully functional index structure (which maps to a single Lucene index). The
Lucene index part holds a "meta data" file about the index (called segments) and 0 to N segment files. The
segments can be a single file (if the compound setting is enabled) or multiple files (if the compound setting is
disable). A segment is close to a fully functional index, which hold the actual inverted index data (see Lucene
documentation for a detailed description of these concepts).
Index partitioning is one of Compass main features, allowing for flexible and configurable way to manage
complex indexes and performance considerations. The next sections will explain in more details why this
feature is important, especially in terms of transaction management.
5.7. Transaction
Compass Search Engine abstraction provides support for transaction management on top of Lucene. The
abstraction support the following transaction processors: read_committed, lucene and async. Compass
provides two phase commit support for the common transaction levels only.
5.7.1. Locking
Compass utilizes Lucene inter and outer process locking mechanism and uses them to establish it's transaction
locking. Note that the transaction locking is on the "sub-index" level (the sub index based index), which means
that dirty operations only lock their respective sub-index index. So, the more aliases / searchable content map to
the same index (next section will explain how to do it - called sub index hashing), the more aliases / searchable
content will be locked when performing dirty operations, yet the faster the searches will be. Lucene uses a
special lock file to manage the inter and outer process locking which can be set in the Compass configuration.
You can manage the transaction timeout and polling interval using Compass configuration.
A Compass transaction acquires a lock only when a dirty operation occurs against the index (for
read_committed and lucene this means: create, save or delete), which makes "read only" transactions as fast
as they should and can be. The following configuration file shows how to control the two main settings for
locking, the locking timeout (which defaults to 10 seconds) and the locking polling interval (how often
Compass will check and see if a lock is released or not) (defaults to 100 milli-seconds):
<compass name="default">
<connection>
<file path="target/test-index" />
</connection>
5.7.2.1. read_committed
Read committed transaction processor allows to isolate changes done during a transaction from other
transactions until commit. It also allows for load/get/find operations to take into account changes done during
the current transaction. This means that a delete that occurs during a transaction will be filtered out if a search
is executed within the same transaction just after the delete.
When starting a read_committed transaction, no locks are obtained. Read operation will not obtain a lock
either. A lock will be obtained only when a dirty operation is performed. The lock is obtained only on the index
of the alias / searchable content that is associated with the dirty operation, i.e the sub-index, and will lock all
other aliases / searchable content that map to that sub-index.
The read committed transaction support concurrent commit where if operations are performed against several
sub indexes, the commit process will happen concurrently on the different sub indexes. It uses Compass
internal Execution Manager where the number of threads as well as the type of the execution manager
(concurrent or work manager) can be configured. Concurrent commits are used only if the relevant store
supports it.
By default, read committed transactions work in an asynchronous manner (if allowed by the search engine
store). Compass executor service is used to execute the destructive operations using several threads. This
allows for much faster execution and the ability to continue process application level logic while objects are
being indexed or removed from the search engine. In order to disable this, the
compass.transaction.processor.read_committed.concurrentOperations setting should be set to false.
The read committed semantic is still maintained by waiting for all current operation to complete once a
read/search operation occurs. Note, when asynchronous operation is enabled, problems with indexing objects
will be raised not as a result of the actual operation that caused it, but on commit, flush or read/search time.
ComapssSession flush operation can also be used to wait for all current operation to be performed.
There are several settings that control the asynchronous execution at (javadoc
LuceneEnvironment.Transaction.Processor.ReadCommitted). The
compass.transaction.processor.read_committed.concurrencyLevel control the number of threads used
(per transaction) to process dirty operations (defaults to 5).
compass.transaction.processor.read_committed.hashing controls how operations are hashed to a
respective thread to be processed, and can be either by uid or subindex (defaults to uid). The
Read committed transaction processor works against a transaction log (which is simply another Lucene index).
The transaction log can either be stored in memory or on the file system. By default the transaction log is stored
in memory. Here is how it can be configured:
<compass name="default">
<connection>
<file path="target/test-index" />
</connection>
<transaction processor="read_committed">
<processors>
<readCommitted transLog="ram://" />
</processors>
</transaction>
</compass>
compass.engine.connection=target/test-index
compass.transaction.readcommitted.translog.connection=ram://
Using the read committed transaction processor can also be done in runtime using the following code:
The FS transaction log stores the transactional data on the file system. This allows for bigger transactions
(bigger in terms of data) to be run when compared with the ram transaction log though on account of
performance. The fs transaction log can be configured with a path where to store the transaction log (defaults to
java.io.tmpdir system property). The path is then appended with compass/translog and for each transaction a
new unique directory is created. Here is an example of how the FS transaction can be configured:
<compass name="default">
<connection>
<file path="target/test-index" />
</connection>
<transaction processor="read_committed">
<processors>
<readCommitted transLog="file://" />
</processors>
</transaction>
</compass>
compass.engine.connection=target/test-index
compass.transaction.readcommitted.translog.connection=file://
CompassSession and CompassIndexSession provides the flushCommit operation. The operation, when used
with the read_committed transaction processor, means that the current changes to the search engine will be
flushed and committed. The operation will be visible to other sessions / compass instances and rollback
operation on the transaction will not roll them back. The flushCommit is handy when there is a long running
session that performs the indexing and transactionality is not as important as making the changes made
available to other sessions intermittently.
Transactional log settings are one of the session level settings that can be set. This allows to change how
Compass would save the transaction log per session, and not globally on the Compass instance level
configuration. Note, this only applies on the session that is responsible for creating the transaction. The
following is an example of how it can be done:
5.7.2.2. lucene
A special transaction processor, lucene (previously known as batch_insert) transaction processor is similar to
the read_committed isolation level except dirty operations done during a transaction are not visible to
get/load/find operations that occur within the same transaction. This isolation level is very handy for long
running batch dirty operations and can be faster than read_committed. Most usage patterns of Compass (such
as integration with ORM tools) can work perfectly well with the lucene isolation level.
It is important to understand this transaction isolation level in terms of merging done during commit time.
Lucene might perform some merges during commit time depending on the merge factor configured using
compass.engine.mergeFactor. This is different from the read_committed processor where no merges are
performed during commit time. Possible merges can cause commits to take some time, so one option is to
configure a large merge factor and let the optimizer do its magic (you can configure a different merge factor for
the optimizer).
Another important parameter when using this transaction isolation level is compass.engine.ramBufferSize
(defaults to 16.0 Mb) which replaces the max buffered docs parameter and controls the amount of transactional
data stored in memory. Larger values will yield better performance and it is best to allocate as much as
possible.
By default, lucene transactions work in an asynchronous manner (if allowed by the index store). Compass
executor service is used to execute theses destructive operations using several threads. This allows for much
faster execution and the ability to continue process application level logic while objects are being indexed or
removed from the search engine. In order to disable this, the
compass.transaction.processor.lucene.concurrentOperations setting should be set to false. Note, when
asynchronous operation is enabled, problems with indexing objects will be raised not as a result of the actual
operation that caused it, but on commit or flush time. ComapssSession flush operation can also be used to wait
for all current operation to be performed. Also note, that unlike read committed transaction (and because of the
semantics of lucene transaction processor, that do not reflect current transaction operations in read/search
operations), there is no need to wait for asynchronous dirty operation to be processed during search/read
operation.
There are several settings that control the asynchronous execution (javadoc at
LuceneEnvironment.Transaction.Processor.Lucene). The
compass.transaction.processor.lucene.concurrencyLevel control the number of threads used (per
transaction) to process dirty operations (defaults to 5). compass.transaction.processor.lucene.hashing
controls how operations are hashed to a respective thread to be processed, and can be either by uid or subindex
(defaults to uid). The compass.transaction.processor.lucene.backlog controls the number of pending
destructive operations allowed. If full, dirty operations will block until space is available (defaults to 100). Last,
the compass.transaction.processor.lucene.addTimeout controls the time to wait in order to add dirty
operations to the backlog if the backlog is full. It defaults to 10 seconds and accepts Compass time format
(10millis, 30s, ...). Note, all settings are also runtime settings that can control specific transaction using
CompassSession#getSettings() and then setting the respective ones.
The lucene transaction support concurrent commit where if operations are performed against several sub
indexes, the commit process will happen concurrently on the different sub indexes. It uses Compass internal
Execution Manager where the number of threads as well as the type of the execution manager (concurrent or
work manager) can be configured.
CompassSession and CompassIndexSession provides the flushCommit operation. The operation, when used
with the lucene transaction processor, means that the current changes to the search engine will be flushed and
committed. The operation will be visible to other sessions / compass instances and rollback operation on the
transaction will not roll them back. The flushCommit is handy when there is a long running session that
performs the indexing and transactionality is not as important as making the changes made available to other
sessions intermittently.
Here is how the transaction isolation level can be configured to be used as the default one:
<compass name="default">
<connection>
<file path="target/test-index" />
</connection>
<transaction processor="lucene" />
</compass>
compass.engine.connection=target/test-index
compass.transaction.processor=lucene
Using the lucene transaction processor can also be done in runtime using the following code:
5.7.2.3. async
The async transaction processor allows to asynchronously process transactions without incurring any overhead
on the thread that actually executes the transaction. Transactions (each transaction consists of one or more
destructive operation - create/upgrade/delete) are accumulated and processed asynchronously and concurrently.
Note, async transaction processor does not obtain a lock when a dirty operation occurs within a transaction.
Note, when several instances of Compass are running using this transaction processor, order of transactions is
not maintained, which might result in out of order transaction being applied to the index. When there is a single
instance running, by default, order of transaction is maintained by obtaining an order lock once a dirty
operation occurs against the sub index (similar to Lucene write lock) which is released once the transaction
commits / rolls back. In order to disable even this ordering, the
compass.transaction.processor.async.maintainOrder can be set to false.
The number of transactions that have not been processed (backlog) are bounded and default to 10. If the
processor is falling behind in processing transactions, commit operations will block until the backlog lowers
below its threshold. The backlog can be set using the compass.transaction.processor.async.backlog
setting. Commit operations will block by default for 10 seconds in order for the backlog to lower below its
threshold. It can be changed using the compass.transaction.processor.async.addTimeout setting (which
accepts the time format setting).
Processing of transactions is done by a background thread that waits for transactions. Once there is a
transaction to process, it will first try to wait for additional transactions. It will block for 100 milliseconds
(configurable using compass.transaction.processor.async.batchJobTimeout), and if one was added, will
wait again up to 5 times (configurable using compass.transaction.processor.async.batchJobSize setting).
Once batch jobs based on timeout is done, the processor will try to get up to 5 more transactions in a non
When all transaction jobs are accumulated, the processor starts up to 5 threads (configurable using
compass.transaction.processor.async.concurrencyLevel) in order to process all the transaction jobs
against the index. Hashing of actual operation (create/update/delete) can either be done based on uid (of the
resource) or subindex. By default, hashing is done based on uid and can be configured using
compass.transaction.processor.async.hashing setting.
CompassSession and CompassIndexSession provides the flushCommit operation. The operation, when used
with the async transaction processor, means that all the changes accumulated up to this point will be passed to
be processed (similar to commit) except that the session is still open for additional changes. This allows, for
long running indexing sessions, to periodically flush and commit the changes (otherwise memory consumption
will continue to grow) instead of committing and closing the current session, and opening a new session.
When the transaction processor closes, by default it will wait for all the transactions to finish. In order to
disable it, the compass.transaction.processor.async.processBeforeClose setting should be set to false.
Here is how the transaction isolation level can be configured to be used as the default one:
<compass name="default">
<connection>
<file path="target/test-index" />
</connection>
<transaction processor="async" />
</compass>
compass.engine.connection=target/test-index
compass.transaction.processor=async
Using the async transaction processor can also be done in runtime using the following code:
The mt transaction processor is a thread safe transaction processor effectively allowing to use the same
CompassIndexSession (or CompassSession) across multiple threads. The transaction processor is very handy
when several threads are updating the index during a single logical indexing session.
Once a dirty operation is performed using the mt transaction processor, its respective sub index is locked and no
dirty operation from other session is allowed until the session is committed, rolled back, or closed. Because of
the nature of the mt transaction processor, the flushCommit operation can be very handy. It effectively commits
all the changes up to the point it was called without closing or committing the session, making it visible for
other sessions when searching. Note, it also means that the operations that were flush committed will not
rollback on a rollback operation.
It is important to understand this transaction isolation level in terms of merging done during commit and
indexing time. Lucene might perform some merges during commit time depending on the merge factor
configured using compass.engine.mergeFactor. This is different from the read_committed processor where
no merges are performed during commit time. Possible merges can cause commits to take some time, so one
option is to configure a large merge factor and let the optimizer do its magic (you can configure a different
merge factor for the optimizer).
Another important parameter when using this transaction isolation level is compass.engine.ramBufferSize
(defaults to 16.0 Mb) which replaces the max buffered docs parameter and controls the amount of transactional
data stored in memory. Larger values will yield better performance and it is best to allocate as much as
possible.
If using the search / read API with the mt transaction processor, it basically acts in a similar manner to the
lucene transaction processor, meaning the search operations will see the latest version of the index, without the
current changes done during the transaction. There is an exception to the rule, where if the flushCommit
operation was called, and the internal cache of Compass was invalidated (either asynchronously or explicitly by
calling the refresh or clear cache API), then the search results will reflect that changes done up to the point
when the flushCommit operation.
Here is how the transaction isolation level can be configured to be used as the default one:
<compass name="default">
<connection>
<file path="target/test-index" />
</connection>
<transaction processor="mt" />
</compass>
compass.engine.connection=target/test-index
compass.transaction.processor=mt
Using the mt transaction processor can also be done in runtime using the following code:
5.7.2.5. search
An optimized transaction processor that only provides search capabilities. The transaction processor is
automatically used when setting the session to read only (using CompassSession#setReadOnly()), or when
opening a search session (using Compass#openSearchSession()).
5.7.2.6. tc (Terracotta)
Compass supports master worker like processing of transactions using Terracotta. For more information see
here.
5.7.2.7. Custom
Implementing a custom Transaction Processor is very simple. Compass provides simple API and simple base
classes that should make implementing one a snap. For more information on how to do it, please check the
currently implemented transaction processors and ask questions on the forum. Proper documentation will arrive
soon.
The all property provides advance features such using declared mappings of given properties. For example, if a
property is marked with a certain analyzer, that analyzer will be usde to add the property to the all property. If it
is untokenized, it will be added without analyzing it. If it is configured with a certain boost value, that part of
the all property, when "hit", will result in higher ranking of the result.
The all property allows for global configuration and per mapping configuration. The global configuration
allows to disable the all feature completely (compass.property.all.enabled=false). It allows to exclude the
alias from the all proeprty (compass.property.all.excludeAlias=true), and can set the term vector for the
all property (compass.property.all.termVector=yes for example).
The per mapping definitions allow to configure the above settings on a mapping level (they override the global
ones). They are included in an all tag that should be the first one within the different mappings. Here is an
example for OSEM:
<compass-core-mapping>
<[mapping] alias="test-alias">
<all enable="true" exclude-alias="true" term-vector="yes" omit-norms="yes" />
</[mapping]>
</compass-core-mapping>
In the above diagram A, B, C, and D represent aliases which in turn stands for the mapping definitions of the
searchable content. A1, B2, and so on, are actual instances of the mentioned searchable content. The diagram
shows the different options of mapping searchable content into different sub indexes.
The simplest way to map aliases (stands for the mapping definitions of a searchable content) is by mapping all
its searchable content instances into the same sub index. Defining how searchable content mapping to the
search engine (OSEM/XSEM/RSEM) is done within the respectable mapping definitions. There are two ways
to define a constant mapping to a sub index, the first one (which is simpler) is:
<compass-core-mapping>
<[mapping] alias="test-alias" sub-index="test-subindex">
<!-- ... -->
</[mapping]>
</compass-core-mapping>
The mentioned [mapping] that is represented by the alias test-alias will map all its instances to
test-subindex. Note, if sub-index is not defined, it will default to the alias value.
Another option, which probably will not be used to define constant sub index hashing, but shown here for
completeness, is by specifying the constant implementation of SubIndexHash within the mapping definition
(explained in details later in this section):
<compass-core-mapping>
<[mapping] alias="test-alias">
<sub-index-hash type="org.compass.core.engine.subindex.ConstantSubIndexHash">
<setting name="subIndex" value="test-subindex" />
</sub-index-hash>
<!-- ... -->
</[mapping]>
</compass-core-mapping>
Here is an example of how three different aliases: A, B and C can be mapped using constant sub index hashing:
Constant sub index hashing allows to map an alias (and all its searchable instances it represents) into the same
sub index. The modulo sub index hashing allows for partitioning an alias into several sub indexes. The
partitioning is done by hashing the alias value with all the string values of the searchable content ids, and then
using the modulo operation against a specified size. It also allows setting a constant prefix for the generated sub
index value. This is shown in the following diagram:
Here, A1, A2 and A3 represent different instances of alias A (let it be a mapped Java class in OSEM, a
Resource in RSEM, or an XmlObject in XSEM), with a single id mapping with the value of 1, 2, and 3. A
modulo hashing is configured with a prefix of test, and a size of 2. This resulted in the creation of 2 sub
indexes, called test_0 and test_1. Based on the hashing function (the alias String hash code and the different
ids string hash code), instances of A will be directed to their respective sub index. Here is how A alias would be
configured:
<compass-core-mapping>
<[mapping] alias="A">
<sub-index-hash type="org.compass.core.engine.subindex.ModuloSubIndexHash">
<setting name="prefix" value="test" />
<setting name="size" value="2" />
</sub-index-hash>
<!-- ... -->
</[mapping]>
</compass-core-mapping>
Naturally, more than one mapping definition can map to the same sub indexes using the same modulo
configuration:
An implementation of SubIndexHash must provide two operations. The first, getSubIndexes, must return all
the possible sub indexes the sub index hash implementation can produce. The second, mapSubIndex(String
alias, Property[] ids) uses the provided aliases and ids in order to compute the given sub index. If the sub
index hash implementation also implements the CompassConfigurable interface, different settings can be
injected to it. Here is an example of a mapping definition with custom sub index hash implementation:
<compass-core-mapping>
<[mapping] alias="A">
<sub-index-hash type="eg.MySubIndexHash">
<setting name="param1" value="value1" />
<setting name="param2" value="value2" />
</sub-index-hash>
<!-- ... -->
</[mapping]>
</compass-core-mapping>
5.10. Optimizers
As mentioned in the read_committed section, every dirty transaction that is committed successfully creates
another segment in the respective sub index. The more segments the index has, the slower the fetching
operations take. That's why it is important to keep the index optimized and with a controlled number of
segments. We do this by merging small segments into larger segments.
The optimization process works on a sub index level, performing the optimization for each one. During the
optimization process, optimizers will lock the sub index for dirty operations (only if optimization is required).
This causes a tradeoff between having an optimized index, and spending less time on the optimization process
in order to allow for other dirty operations.
Compass comes with a single default optimizer implementation. The default optimizer will try and maintain no
more than a configured maximum number of segments (defaults to 10). This applies when using the
optimize() and optimizer(subIndex) API, as well as when the optimizer is scheduled to run periodically
(which is the default). It also provides support for specific, API level, optimization with a provided maximum
number of segments.
Here is an example of a configuration of the default optimizer to run with 20 maximum number of segments,
and work in a scheduled manner with an interval of 90 seconds (default is 10 seconds):
<compass name="default">
<connection>
<file path="target/test-index" />
</connection>
<searchEngine>
<optimizer scheduleInterval="90" schedule="true" maxNumberOfSegments="20" />
</searchEngine>
</compass>
compass.engine.connection=target/test-index
compass.engine.optimizer.schedule=true
compass.engine.optimizer.schedule.period=90
compass.engine.optimizer.maxNumberOfSegments=20
5.11. Merge
Lucene perfoms merges of different segments after certain operaitons are done on the index. The less merges
you have, the faster the searching is. The more merges you do, the slower certain operations will be. Compass
allows for fine control over when merges will occur. This depends greatly on the transaction isolation level and
the optimizer used and how they are configured.
Merge policy controls which merges are supposed to happen for a ceratin index. Compass allows to simply
configure the two merge policies that come with Lucene, the LogByteSize (the default) and LogDoc, as well as
configure custom implementations. Configuring the type can be done usign
compass.engine.merge.policy.type and has possible values of logbytesize, logdoc, or the fully qualified
class name of a MergePolicyProvider.
Merge scheduler controls how merge operations happen once a merge is needed. Lucene comes with built in
ConcurrentMergeSchduler (executes merges concurrently on newly created threads) and
SerialMergeScheduler that executes the merge operations on the same therad. Compass extends Lucene and
provide ExecutorMergeScheduler allowing to utlize Compass internal exdecutor pool (either concurrent or
work manager backed) with no overhead of creating new threads. This is the default merge scheduler that
comes with Compass.
Configuring the type of the merge scheduler can be done using compass.engine.merge.scheduler.type with
the following possible values: executor (the default), concurrent (Lucene Concurrent merge scheduler), and
serial (Lucene serial merge scheduler). It can also have a fully qualified name of an implementation of
MergeSchedulerProvider.
<compass name="default">
<connection>
<file path="target/test-index" />
</connection>
<searchEngine>
<indexDeletionPolicy>
<keepLastCommit />
</indexDeletionPolicy>
</searchEngine>
</compass>
<compass name="default">
<connection>
<file path="target/test-index" />
</connection>
<settings>
<setting name="compass.engine.store.indexDeletionPolicy.type" value="keeplastcommit" />
</settings>
</compass>
Compass comes built in with several additional deletion policies including: keepall which keeps all commit
points. keeplastn which keeps the last N commit points. expirationtime which keeps commit points for X
number of seconds (with a default expiration time of "cache invalidation interval * 3").
By default, the index deletion policy is controlled by the actual index storage. For most (ram, file) the deletion
policy is keep last committed (which should be changed when working over a shared disk). For distributed ones
(such as coherence, gigaspaces, terrracotta), the index deletion policy is the expiration time one.
compass.engine.spellcheck.enable=true
Once spell check is enabled, a special spell check index will be built based on the "all" property (more on that
later). It can then be used in the following simple manner:
In order to perform spell index level operations, Compass exposes now a getSpellCheckManager() in order to
perform them. Note, this method will return null in case spell check is disabled. The spell check manager also
allows to get suggestions for a given word.
By default, when the spell check index is enabled, two scheduled tasks will kick in. The first scheduled task is
responsible for monitoring the spell check index, and if changed (for example, by a different Compass
instance), will reload the latest changes into the index. The interval for this scheduled task can be controlled
using the setting compass.engine.cacheIntervalInvalidation (which is used by Compass for the actual
index as well), and defaults to 5 seconds (it is set in milliseconds).
The second scheduler is responsible for identifying that the actual index was changed, and rebuild the spell
check index for the relevant sub indexes that were changed. It is important to understand that the spell check
index will not be updated when operations are performed against the actual index. It will only be updated if
explicitly called for rebuild or concurrentRebuild using the Spell Check Manager, or through the scheduler
(which calls the same methods). By default, the scheduler will run every 10 minutes (no sense in rebuilding the
spell check index very often), and can be controlled using the following setting:
compass.engine.spellcheck.scheduleInterval (resolution in seconds).
Compass by default will build a spell index using the same configured index storage simply under a different
"sub context" name called spellcheck (the compass index is built under sub context index). For each sub
index in Compass, a spell check sub index will be created. By default, a scheduler will kick in (by default each
10 minutes) and will check if the spell index needs to be rebuilt, and if it does, it will rebuild it. The spell check
manager also exposes API in order to perform the rebuild operations as well as checking if the spell index
needs to be rebuilt. Here is an example of how the scheduler can be configured:
compass.engine.spellcheck.enable=true
# the default it true, just showing the setting
compass.engine.spellcheck.schedule=true
# the schedule, in minutes (defaults to 10)
compass.engine.spellcheck.scheduleInterval=10
The spell check index can be configured to be stored on a different location than the Compass index. Any index
related parameters can be set as well. Here is an example (for example, if the index is stored in the database,
and spell index should be stored on the file system):
compass.engine.spellcheck.enable=true
compass.engine.spellcheck.engine.connection=file://target/spellindex
compass.engine.spellcheck.engine.ramBufferSize=40
In the above example we also configure the indexing process of the spell check index to use more memory (40)
so the indexing process will be faster. As seen here, settings that control the index can be used
(compass.engine. settings) can apply to the spell check index by prepending the compass.engine.spellcheck
setting.
So, what is actually being included in the spell check index. Out of the box, by just enabling spell check, the all
field is going to be used to get the terms for the spell check index. In this case, things that are excluded from the
all field will be excluded from the spell check index as well
Compass allows for great flexibility in what is going to be included or excluded in the spell check index. The
first two important settings are: compass.engine.spellcheck.defaultMode and the spell-check resource
mapping level definition (for class/resource/xml-object). By default, both are set to NA, which results in
including the all property. The all property can be excluded by setting the spell-check to exclude on the all
mapping definition.
Each resource mapping (resource/class/xml-object) can have a spell-check definition of include, exclude,
and na. If set to na, the global default mode will be used for it (which can be set to include, exclude and na as
well).
When the resource mapping ends up with spell-check of include, it will automatically include all the
properties for the given mapping, except for the "all" property. Properties can be excluded by specifically
setting their respective spell-check to exclude.
When the resource mapping ends up with spell-check of exclude, it will automatically exclude all the
properties for the given mapping, as well as the "all" property. Properties can be included by specifically setting
their respective spell-check to include.
If you wish to know which properties end up being included for certain sub index, turn the debug logging level
on for org.compass.core.lucene.engine.spellcheck.DefaultLuceneSpellCheckManager and it will print
out the list of properties that will be used for each sub index.
5.14.1. Wrappers
Compass wraps some of Lucene classes, like Query and Filter. There are cases where a Compass wrapper will
need to be created out of an actual Lucene class, or an actual Lucene class need to be accessed out of a wrapper.
Here is an example for wrapping the a custom implementation of a Lucene Query with a CompassQuery:
The next sample shows how to get Lucene Explanation, which is useful to understand how a query works and
executes:
When performing read operations against the index, most of the time Compass abstraction layer is enough.
Sometimes, direct access to Lucene own IndexReader and Searcher are required. Here is an example of using
the reader to get all the available terms for the category property name (Note, this is a prime candidate for
future inclusion as part of Compass API):
if (!termEnum.next()) {
break;
}
}
} finally {
termEnum.close();
}
6.1. Introduction
Compass provides the ability to map Java Objects to the underlying Search Engine using Java 5 Annotations or
simple XML mapping files. We call this technology OSEM (Object Search Engine Mapping). OSEM provides
a rich syntax for describing Object attributes and relationships. The OSEM files/annotations are used by
Compass to extract the required property from the Object model at run-time and inserting the required
meta-data into the Search Engine index.
The process of saving an Object into the search engine is called marshaling, and the process of retrieving an
object from the search engine is called un-marshaling. As described in Section 5.2, “Alias, Resource and
Property”, Compass uses Resources when working against a search engine, and OSEM is the process of
marshaling and un-marshaling an Object tree to a Resource (for simplicity, think of a Resource as a Map).
import java.util.Date;
import java.util.Set;
@Searchable
@SearchableConstant(name = "type", values = {"person", "author"})
public class Author {
private Long id; // identifier
private String name;
private Date birthday;
@SearchableId
public Long getId() {
return this.id;
}
@SearchableProperty(name = "name")
@SearchableMetaData(name = "authorName")
public String getName() {
return this.name;
}
@SearchableProperty(format = "yyyy-MM-dd")
public Date getBirthday() {
return this.birtday;
}
The Author class is mapped using Java 5 annotations. The following shows how to map the same class using
<?xml version="1.0"?>
<!DOCTYPE compass-core-mapping PUBLIC
"-//Compass/Compass Core Mapping DTD 2.2//EN"
"http://www.compass-project.org/dtd/compass-core-mapping-2.2.dtd">
<compass-core-mapping package="eg">
<constant>
<meta-data>type</meta-data>
<meta-data-value>person</meta-data-value>
<meta-data-value>author</meta-data-value>
</constant>
<property name="name">
<meta-data>name</meta-data>
<meta-data>authorName</meta-data>
</property>
<property name="birthday">
<meta-data format="yyyy-MM-dd">birthday</meta-data>
</property>
</class>
</compass-core-mapping>
Last, the mapping can also be configured using JSON, here is the same mapping definition using JSON:
{
"compass-core-mapping" : {
package : "eg"
class : [
{
name : "Author",
alias : "author"
id : { name : "name" },
constant : {
"meta-data" : { name : "type" }
"meta-data-value" : [ "person", "author" ]
},
property : [
{ name : "name", "meta-data" : [{ name : "name" }, { name : "authorName"}]},
{ name : "birthday", "meta-data" : { name : "birthday", format : "yyyy-MM-dd" }}
]
}
]
}
}
Compass works non-intrusive with application Objects, these Objects must follow several rules:
• Implement a Default Constructor: Author has an implicit default (no-argument) constructor. All persistent
classes must have a default constructor (which may be non-public) so Compass::Core can instantiate using
Constructor.newInstance()
• Provide Property Identifier(s): OSEM requires that a root searchable Object will define one or more
properties (JavaBean properties) that identifies the class.
• Declare Accessors and Mutators (Optional): Even though Compass can directly persist instance variables, it
is usually better to decouple this implementation detail from the Search Engine mechanism. Compass
recognizes JavaBean style property (getFoo, isFoo, and setFoo). This mechanism works with any level of
visibility.
• It is recommended to override the equals() and hashCode() methods if you intend to mix objects of
persistent classes (e.g. in a Set). You can implement it by using the identifier of both objects, but note that
Compass works best with surrogate identifier (and will provide a way to automatically generate them), thus
it is best to implement the methods using business keys..
The above example defines the mapping for Author class. It introduces some key Compass mapping concepts
and syntax. Before explaining the concepts, it is essential that the terminology used is clearly understood.
The first issue to address is the usage of the term Property. Because of its common usage as a concept in Java
and Compass (to express Search Engine and Semantic terminology), special care has been taken to clearly
prefix the meaning. A class property refers to a Java class attribute. A resource property refers in Compass to
Search Engine meta-data, which contains the values of the mapped class property value. In previous OSEM
example, the value of class property "name" is mapped to two resource property instances called "name" and
"authorname", each containing the value of the class property "name".
6.2.1. Alias
Each mapping definition in Compass is registered under an alias. The alias is used as the link between the
OSEM definitions of a class, and the class itself. The alias can then be used to reference the mapping, both in
other mapping definitions and when working directly with Compass API. When using annotations mappings,
the alias defaults to the short class name.
6.2.2. Root
There are two types of searchable classes in Compass, root searchable classes and non-root searchable classes.
Root searchable classes are best defined as classes that return as hits when a search is performed. For example,
in a scenario where we have Customer class with a Name class, the Customer will be a root searchable class,
and Name would have root="false" in it since it does not "stands on its own". Another way of looking at root
searchable classes is as searchable classes that end up marshaled into their own Resource (which is then used to
work against the search engine).
By default, each root searchable class will have its own sub index defaulting to the alias name. The sub index
name can be controlled, allowing to join several root searchable classes into the same sub index, or using
different sub index hashing functions. Please read Section 5.9, “Sub Index Hashing” for more information.
Each root searchable class must define at least one searchable id. The searchable id(s) are used to uniquely
identify the object within its alias context. More than one searchable id can be defined, as well as user defined
classes to act as searchable ids (must register its own converter or use searchable id component mapping).
Searchable Id does not require the definition of a searchable meta-data. If none is defined, Compass will
automatically create an internal meta-data id (explained later) which most times is perfectly fine (usually, text
searching based on the surrogate id is not required). If the searchable id does need to be searched, a searchable
meta-data need to be defined for it. When using xml mapping, one or more meta-data element need to be added
to the id element. When using annotations, there are three options: the first, provide a name for the
SearchableId (compass will automatically act as if a SearchableMetaData was defined on the SearchableId and
add it), the second is to add a SearchableMetaData annotation and the last is to add SearchableMetaDatas
annotation (for multiple meta-datas). Of course, all the three can be combined. The reason why SearchalbeId
will automatically create a SearchableMetaData if the name is provided is to ease the number of annotations
required (and not get to annotation hell).
Here is an example of defining a Searchable Id using annotations. This example will not create any visible
meta-data (as the SearchableId has no name to it, or SearchableMetaData(s) annotation).
@Searchable
public class Author {
@SearchableId
private Long id;
// ...
}
The following is another example, now with actually defining a meta-data on the id for its values to be
searchable:
@Searchable
public class Author {
@SearchableId(name = "id")
private Long id;
// ...
}
Which is the same as defining the following mapping using SearchableMetaData explicitly:
@Searchable
public class Author {
@SearchableId
@SearchableMetaData(name = "id")
private Long id;
// ...
}
A searchable id component represent a composite object acting as the id of a object. It works in a similar
manner to searchable component except that it will act as the id of the class.
Here is an example of defining a Searchable Id Component using annotations (note, in this case, B is not a root
searchable class, and it needs to define only ids):
@Searchable
public class A {
@SearchableIdComponent
private B b;
// ...
}
@Searchable(root = false)
public class B {
@SearchableId
private long id1;
@SearchableId
private long id2;
}
Searchable Parent mapping provides support for cyclic mappings for components (though bi directional
component mappings are also supported). If the component class mapping wish to map the enclosing class, the
parent mapping can be used to map to it. The parent mapping will not marshal (persist the data to the search
engine) the parent object, it will only initialize it when loading the parent object from the search engine.
Here is an example of defining a Searchable Component and Searchable Parent using annotations (note, in this
case, B is not a root searchable class, and need not define any ids):
@Searchable
public class A {
@SearchableId
private Long id;
@SearchableComponent
private B b;
// ...
}
@Searchable(root = false)
public class B {
@SearchableParent
private A a;
// ...
}
</class>
A Searchable Property maps to a Class attribute/property which is of a simple type. The searchable property
maps to a class attribute that ends up as a String within the search engine. This include primitive types,
primitive wrapper types, java.util.Date, java.util.Calendar and many more types that are automatically
supported by Compass (please see the converter section). A user defined type can be used as well using a
custom converter (though most times, a component relationship is more suited - explained later). A Searchable
Meta Data uses the Searchable Property value (converted String value using its registered converter) and stores
it in the index against a name.
When using xml mapping, one or more meta-data elements can be defined for a property mapping. When using
annotation, a SearchableProperty needs to be defined on the mapped class attribute. A SearchableMetaData
annotation can be explicitly defined, as well as SearchableMetaDatas (for multiple meta data). A
SearchableProperty will automatically create a SearchableMetaData (in order not to get annotation hell) if no
SearchableMetaData(s) annotation is defined, or a its name is explicitly defined (note, all the
SearchableMetaData options are also defined on the SearchableProperty, they apply to the automatically
created SearchableMetaData).
Here is an example of defining a Searchable Property using annotations. This example will automatically create
a Searchable Meta Data with the name of value (the class field name).
@Searchable
public class Author {
// ...
@SearchableProperty
private String value;
// ...
}
This mapping is the same as defining the following annotation using SearchableMetaData explicitly:
@Searchable
public class Author {
// ...
@SearchableProperty
@SearchableMetaData(name = "value")
private String value;
// ...
}
A Searchable Dynamic Property maps a Class attribute/property that has both its Resource Property name and
value evaluated dynamically (the Searchable Property mapping only allows for the value to be evaluated, with
the name being a constant value). There are two main types it is used with, user defined classes (and array /
collection of them), and java.util.Map.
Note: The Searchable Dynamic Property does not support unmarshalling (due to the overhead in terms of data
to store in the index for correct unmarshalling). It means that when loading the object from the index, the
property will be null. Note, this might be supported in the future, but with an explicit setting for it.
User defined mapping allows to define a Class attribute/property that will be used as the Resource Property
name, and a Class attribute/property that will be used as the Resource Property value. Here is an example:
@SearchableDynamicName
String name;
@SearchableDynamicValue
String value;
}
@Searchable
public class Book {
// ....
@SearchableDynamicProperty
List<Tag> tags;
}
The tags dynamic property will automatically scan the Tag class for @SearchableDynamicName and
@SearchableDynamicValue in order to find the class properties that will be used as the Resource Property name
and Resource Property value. The annotations on the Tag class are not required, and the class property names
can be set on the SearchableDynamicProperty itself. Here is an example of xml based mappings that overrides
the annotation defined on the Tag class:
Both the value and the name support formatting configuration, as well as custom converter lookup name
setting. The dynamic property can be of the user type, as well as a collection or array of it. The value itself can
be of simple type, or a collection/array of it (in which case there will be several dynamic values against the
dynamic name).
java.util.Map is also supported, allowing for Resource Property name to be derived from the Map key, and
the Resource Property Value to be derived from the Map value. Simple Map key/value types are supported
(with formatting and custom converters), as well as custom user classes for either the Map key or the value
(with explicit setting of the class attribute/property to use, either using name-proeprty/value-property or using
annotations).
Here is a simple example of dynamic property mapping a java.util.Map (with formattable values):
@Searchable
public class Book {
// ....
@SearchableDynamicProperty(value-format = "00000")
Map<String, Integer> tags;
}
Here is a more complex Map example, with two custom classes, one for the key, and one for the value. From
the key, we use the value1 property as the Resource Property name, and from the value we use value2 as the
Resource Property value:
@Searchable
public class Book {
// ....
@SearchableDynamicProperty(name-property="value1", value-property="value2")
Map<Key, Value> tags;
}
Searchable Constant allows to define constant meta data associated with a searchable class with a list of values
set against a constant name. This is useful for adding static meta-data against a Searchable Class, allowing to
create semantic groups across the searchable classes.
@Searchable
@SearchableConstant(name = "type", values = {"person", "author"})
public class Author {
}
The dynamic meta data mapping allows to define meta-data saved into the search engine as a result of
evaluating an expression. The mapping does not map to any class property and acts as a syntactic meta-data
(similar to the constant mapping). The value of the dynamic meta-data tag is the expression evaluated by a
Dynamic Converter. Compass comes with several built in dynamic converters: el (Jakarta commons el), jexl
(Jakarta commons jexl), velocity, ognl, mvel, and groovy. When defining the expression, the root class is
registered under the data key (for libraries that require it).
Here is an example of how to define a searchable dynamic meta-data (with jakarta commons jexl) using
annotations (assuming class A has value1 and value2 as class fields):
@Searchable
@SearchableDynamicMetaData(name = "test", expression = "data.value + data.value2", converter = "jexl")
public class A {
}
A searchable reference mapping maps between one root searchable class and the other. The mapping is only
used for keeping the relationship "alive" when performing un-marshalling. The marshalling process marshals
only the referenced object ids (based on its id mappings) and use it later in the un-marshalling process to load
the referenced object from the index.
Cascading is supported when using reference mappings. Cascading can be configured to cascade any
combination of create/save/delete operations, or all of them. By default, no cascading will be performed on the
referenced object.
In order to identify the referenced class mapping, Compass needs access to its class mapping definition. In most
cases there is no need to define the referenced alias that define the class mapping, as Compass can
automatically detect it. If it is required, it can be explicitly set on the reference mappings (an example when
Compass needs this mapping is when using Collection without generics or when a class has more than one class
mapping).
Lazy loading is supported when using reference mapping over a collection (either a Set or a List). In such a
case, during the un-marshalling process, a lazy collection will be created that will load referenced objects on
demand (using the current session). Any dirty operations on the collection (such as add or remove will cause
Compass to load all the collection). In order to enable lazy loading, the lazy attribute should be set to true. A
global setting, compass.osem.lazyReference, can control all the unmapped reference collection mapping lazy
nature. It defaults to false.
@Searchable
public class A {
@SearchableId
private Long id;
@SearchableReference
private B b;
// ...
@Searchable
public class B {
@SearchableId
private Long id;
// ...
}
A searchable component mapping embeds a searchable class within its owning searchable class. The mapping
is used to allow for searches that "hit" the component referenced searchable class to return the owning
searchable class (or its parent if it also acts a component mapping up until the root object that was saved).
The component referenced searchable class can be either root or not. An example for a non root component can
be a Person class (which is root) with a component mapping to a non root searchable class Name (with
firstName and lastName fields). An example for a root component can be a Customer root searchable class and
an Account searchable class, where when searching for account details, both Account and Customer should
return as hits.
Cascading is supported when using component mappings. Cascading can be configured to cascade any
combination of create/save/delete operations, or all of them. By default, no cascading will be performed on the
referenced object. Cascading can be performed on non root objects as well, which means that a non root object
can be "created/saved/deleted" in Compass (using save operation) and Compass will only cascade the operation
on its referenced objects without actually performing the operation on the non root object.
In order to identify the referenced component class mapping, Compass needs access to its class mapping
definition. In most cases there is no need to define the referenced alias that define the class mapping, as
Compass can automatically detect it. If it is required, it can be explicitly set on the reference mappings (an
example when Compass needs this mapping is when using Collection without generics or when a class has
more than one class mapping).
Here is an example of defining a Searchable Component using annotations (note, in this case, B is not a root
searchable class, and need not define any ids):
@Searchable
public class A {
@SearchableId
private Long id;
@SearchableComponent
private B b;
// ...
}
@Searchable(root = false)
public class B {
// ...
}
Many times, the same class mapping can act as component mapping for several fields within the same class and
it is required to distinguish the searchable property names between the properties. The Searchable Component
Prefix mapping attribute can be used for exactly that. Here is an example for prefix mappings:
@Searchable(root = false)
public class Address {
@SearchableId
int id;
@SearchableProperty
String location;
}
@Searchable(root = false)
public class Customer {
@SearchableId
int id;
@SearchableProperty
String name;
@SearchableComponent(prefix = "home_")
Address homeAddress;
@SearchableComponent(prefix = "work_")
Address workAddress;
}
@Searchable
public class Order {
@SearchableId
int id;
@SearchableComponent(prefix = "first_")
Customer firstCustomer;
@SearchableComponent(prefix = "second_")
Customer secondCustomer;
}
In the above case, if we save a fully constructed Order object, the location of the home address of the first
customer will be first_home_location. This means that we can search for:
first_home_location:mylocation (which is equivalent for
Order.firstCustomer.homeAddress.location:mylocation).
The searchable cascading mapping allows to define cascading operations on certain properties without
explicitly using component/reference/parent mappings (which have cascading option on them). Cascading
actually results in a certain operation (save/delete/create) to be cascaded to and performed on the referenced
objects.
@Searchable
public class A {
@SearchableId
private Long id;
@SearchableCascading(cascade = {Cascade.ALL})
private B b;
// ...
}
The searchable analyzer mapping dynamically controls the analyzer that will be used when indexing the class
data. If the mapping is defined, it will override the class mapping analyzer attribute setting.
If, for example, Compass is configured to have two additional analyzers, called an1 (and have settings in the
form of compass.engine.analyzer.an1.*), and another called an2. The values that the searchable analyzer
can hold are: default (which is an internal Compass analyzer, that can be configured as well), an1 and an2. If
the analyzer will have a null value, and it is applicable with the application, a null-analyzer can be
configured that will be used in that case. If the class property has a value, but there is not matching analyzer, an
exception will be thrown.
@Searchable
public class A {
@SearchableId
private Long id;
@SearchableAnalyzer
private String language;
// ...
}
The searchable boost mapping dynamically controls the boost value associated with the Resource stored. If the
mapping is defined, it will override the class mapping boost attribute setting. The value of the property should
be convertable to float value.
@Searchable
public class A {
@SearchableId
private Long id;
@SearchableBoost(defaultValue = 2.0f)
private Float value;
// ...
}
6.4. Specifics
Collection (java.util.Collection) based types cab be mapped using Searchable Property, Searchable Component
and Searchable Reference. The same mapping declaration should be used, with Compass automatically
detecting that a java.util.Collection is being mapped, and applying the mapping definition to the collection
element.
When mapping a Collection with a Searchable Property, Compass will try to automatically identify the
collection element type if using Java 5 Generics. If Generics are not used, the class attribute should be set with
the FQN of the element class. With Searchable Component or Reference Compass will try to automatically
identify the referenced mapping if Generics are used. If generics are not used the ref-alias should be explicitly
set.
6.4.2. Managed Id
When marshaling an Object into a search engine, Compass might add internal meta-data for certain Searchable
Properties in order to properly un-marshall it correctly. Here is an example mapping where an internal
meta-data id will be created for the firstName and lastName searchable properties:
@Searchable
public class A {
@SearchableId
private Long id;
@SearchableProperty(name = "name")
private String lastName;
@SearchableProperty(name = "name")
private String firstName;
@SearchableProperty
private String birthdate;
}
In the above mapping we map firstName and lastName into "name". Compass will automatically create internal
meta-data for both firstName and lastName, since if it did not create one, it won't be able to identify which
name belongs to which. Compass comes with three strategies for creating internal meta-data:
• AUTO: Compass will automatically identify if a searchable property requires an internal meta-data, and
create one for it.
• TRUE: Compass will always create an internal meta-data id for the searchable property.
• FALSE: Compass will not create an internal meta-data id, and will use the first searchable meta-data as the
searchable property meta-data identifier.
• NO: Compass will not create an internal meta-data id, and will not try to un-marshall this property at all.
• NO_STORE: Compass will not create an internal meta-data id if all of its meta-data mappings have
store="no". Otherwise, it will be treated as AUTO.
Setting the managed id can be done on several levels. It can be set on the property mapping level explicitly. It
can be set on the class level mapping which will then be applied to all the properties that are not set explicitly.
And it can also be set globally be setting the following setting compass.osem.managedId which will apply to all
the classes and properties that do not set it explicitly. By default, it is set to NO_STORE.
There are different strategies when mapping an inheritance tree with Compass. The first apply when the
inheritance tree is known in advance. If we take a simple inheritance of class A and class B that extends it, here
is the annotation mapping that can be used for it:
@Searchable
public class A {
@SearchableId
private Long id;
@SearchableProperty
private String aValue;
}
@Searchable
public class B extends A {
@SearchableProperty
private String bValue;
}
Compass will automatically identify that B extends A, and will include all of A mapping definitions (note that
Searchable attributes will not be inherited). When using annotations, Compass will automatically interrogate
interfaces as well for possible Searchable annotations, as well have the possibility to explicitly define which
mappings to extend using the extend attribute (the mappings to extends need not be annotation driven
mappings).
When using xml mapping definition, the above inheritance tree can be mapped as follows:
When using extends explicitly (as needed when using xml mappings), a list of the aliases to extend (comma
separated) can be provided. All the extended mapping definitions will be inherited except for class mapping
attributes.
If the inheritance tree is not known in advance, a poly flag should be set on all the known mapped inheritance
tree. Compass will be able to persist unknown classes that are part of the mapped inheritance tree, using the
closest searchable mapping definition. Here is an example of three classes: A and B are searchable classes, with
B extending A. C extends B but is not a searchable class and we would still like to persist it in the search
engine. The following is the annotation mappings for such a relationship:
@Searchable(poly = true)
public class A {
// ...
}
@Searchable(poly = true)
public class B extends A {
// ...
}
When saving an Object of class C, B mapping definitions will be used to map it to the search engine. When
loading it, an instance of class C will be returned, with all of its B level attributes initialized.
Polymorphic relationship are applicable when using component or reference mappings. If we take the following
polymorphic relationship of a Father class to a Child class, with a Son and Daughter sub classes, the
component/reference mapping relationship between Father and Child is actually a relationship between Father
and Child, Son and Daughter. The following is how to map it using annotations:
@Searchable
public class Father {
// ...
@SearchableComponent
private Child child;
}
@Searchable(poly = true)
public class Child {
// ...
}
@Searchable(poly = true)
public class Son extends Child {
// ...
}
@Searchable(poly = true)
public class Daughter extends Child {
// ...
}
Compass will automatically identify that Child mappings has a Son and a Daughter, and will add them to the
ref-alias definition of the SearchableComponent (similar to automatically identifying the mapping of Child).
Explicit definition of the referenced aliases can be done by providing a comma separated list of aliases (this
will disable Compass automatic detection of related classes and will only use the provided list). Note as well,
that the Child hierarchy had to be defined as poly.
Compass OSEM fully supports cyclic relationships both for reference and component mappings. Reference
mappings are simple, they are simply defined, and Compass would handle everything if they happen to perform
a cyclic relationship.
Bi directional component mappings are simple as well with Compass automatically identifying cyclic
relationship. A tree based cyclic relationship is a bit more complex (think of a file system tree like relationship).
In such a case, the depth Compass will traverse with the component mapping is controlled using the max-depth
attribute (defaults to 1).
Compass allows for Annotations and Xml mappings definitions to be used together. Annotations mappings can
extend/override usual cpm.xml mapping definition (event extending xml contract mapping). When using
annotations, a .cpm.ann.xml can be defined that will override annotations definitions using xml definitions.
Compass adds an overhead both in terms of memory consumption, processing speed and index size (managed
ids) when it works in a mode that needs to support un-marshalling (i.e. getting objects back from the search
engine). Compass can be configured not to support un-marshalling. In such a mode it will not add any internal
Compass information to the index, and will use less memory. This setting can be a global setting (set within
Compass configuration), or per searchable class definitions.
Though initially this mode may sounds unusable, it is important to remember that when working with support
unmarshall set to false, the application can still use Compass Resource level access to the search engine. An
application that works against the database using an ORM tool for example, might only need Compass to index
its domain model into the search engine, and display search results. Displaying search results can be done using
Resources (many times this is done even when using support for unmarshalling). Create/Delete/Update
operations will be done based on ORM based fetched objects, and mirrored (either explicitly or implicitly) to
the search engine.
Compass also allows using annotation for certain configuration settings. The annotations are defined on a
package level (package-info.java). Some of the configuration annotations are @SearchAnalyzer,
@SearchAnalyzerFilter, and @SearchConverter. Please see the javadocs for more information.
6.6.1. compass-core-mapping
The main element which holds all the rest of the mappings definitions.
<compass-core-mapping package="packageName"/>
Attribute Description
package (optional) Specifies a package prefix for unqualified class names in the
mapping document.
6.6.2. class
<class
name="className"
alias="alias"
sub-index="sub index name"
analyzer="name of the analyzer"
root="true|false"
poly="false|true"
poly-class="the class name that will be used to instantiate poly mapping (optional)"
extends="a comma separated list of aliases to extend"
support-unmarshall="true|false"
boost="boost value for the class"
converter="converter lookup name"
>
all?,
sub-index-hash?.
(id)*,
parent?,
(analyzer?),
(boost?),
(property|dynamic-meta-data|component|reference|constant)*
</class>
Attribute Description
name The fully qualified class name (or relative if the package is declared
in compass-core-mapping).
alias The alias of the Resource that will be mapped to the class.
sub-index (optional, defaults to the The name of the sub-index that the alias will map to. When joining
alias value) several searchable classes into the same index, the search will be
much faster, but updates perform locks on the sub index level, so it
might slow it down.
analyzer (optional, defaults to the The name of the analyzer that will be used to analyze ANALYZED
default analyzer) (ANALYZED) properties. Defaults to the default analyzer which is
one of the internal analyzers that comes with Compass. Note, that
when using the analyzer mapping (a child mapping of class
mapping) (for a property value that controls the analyzer), the
analyzer attribute will have no effects.
root (optional, defaults to true) Specifies if the class is a "root" class or not. You should define the
searchable class with false if it only acts as mapping definitions for
a component mapping.
poly (optional, defaults to false) Specifies if the class will be enabled to support polymorphism. This
is the less preferable way to map an inheritance tree, since the
extends attribute can be used to statically extend base classes or
contracts.
poly-class (optional) If poly is set to true, the actual class name of the indexed object
will be saved to the index as well (will be used later to instantiate
the Object). If the poly-class is set, the class name will not be saved
to the index, and the value of poly-class will be used to instantiate
all the classes in the inheritance tree.
extends (optional) A comma separated list of aliases to extend. Can extend a class
mapping or a contract mapping. Note that can extend more than
one class/contract
support-unmarhsall (optional) Controls if the searchable class will support unmarshalling from the
search engine or using Resource is enough. Un-marshalling is the
process of converting a raw Resource into the actual domain object.
If support un-marshall is enabled extra information will be stored
within the search engine, as well as consumes extra memory.
Defaults to Compass global setting
compass.osem.supportUnmarshall (which in turn defaults to
true).
Attribute Description
boost (optional, defaults to 1.0) Specifies the boost level for the class.
converter (optional) The global converter lookup name registered with the
configuration. Responsible for converting the ClassMapping
definition. Defaults to compass internal ClassMappingConverter.
Root classes have their own index within the search engine index directory (by default). Classes with a
dependency to Root class, that don't require an index (i.e. component) should set root to false. You can control
the sub-index that the root classes will map to using the sub-index attribute or the sub-index-hash element,
otherwise it will create a sub-index based on the alias name.
The class mapping can extend other class mappings (more than one), as well as contract mappings. All the
mappings that are defined within the class mapping or the contract mapping will be inherited from the
extended mappings. You can add any defined mappings by defining the same mappings in the class mappings,
except for id mappings, which will be overridden. Note that any xml attributes (like root, sub-index, ...) that are
defined within the extended mappings are not inherited.
The default behavior of the searchable class will support the "all" feature, which means that compass will create
an "all" meta-data which represents all the other meta-data (with several exceptions, like Reader class
property). The name of the "all" meta-data will default to the compass setting, but you can also set it using the
all-metadata attribute.
6.6.3. contract
<contract
alias="alias"
>
(id)*,
(analyzer?),
(boost?),
(property|dynamic-meta-data|component|reference|constant)*
</contract>
Attribute Description
alias The alias of the contract. Will be used as the alias name in the
class mapping extended attribute
A contract acts as an interface in the Java language. You can define the same mappings within it that you can
define in the class mapping, without defining the class that it will map to.
If you have several classes that have similar properties, you can define a contract that joins the properties
definition, and than extend the contract within the mapped classes (even if you don't have a concrete interface
or class in your Java definition).
6.6.4. id
Declaring a searchable id class property (a.k.a JavaBean property) of a class using the id element.
<id
name="property name"
accessor="property|field"
boost="boost value for the class property"
class="explicit declaration of the property class"
managed-id="auto|true|false"
managed-id-converter="managed id converter lookup name"
exclude-from-all="no|yes|no_analyzed"
converter="converter lookup name"
>
(meta-data)*
</id>
Attribute Description
name The class property (a.k.a JavaBean property) name, with initial
lowercase letter.
accessor (optional, defaults to The strategy to access the class property value. property access
property) using the Java Bean accessor methods, while field directly access
the class fields.
boost (optional, default to 1.0f) The boost level that will be propagated to all the meta-data defined
within the id.
class (optional) An explicit definition of the class of the property, helps for certain
converters.
managed-id (optional, defaults to auto) The strategy for creating or using a class property meta-data id
(which maps to a ResourceProperty).
managed-id-converter (optional) The global converter lookup name applied to the generated
managed id (if generated).
exclude-from-all (optional, defaults to Excludes the class property from participating in the "all"
no) meta-data, unless specified in the meta-data level. If set to
no_analyzed, not_analyzed (not_analyzed) properties will be
analyzed when added to the all property (the analyzer can be
controlled using the analyzer attribute).
converter (optional) The global converter lookup name registered with the
configuration.
The id mapping is used to map the class property that identifies the class. You can define several id properties,
even though we recommend using one. You can use the id mapping for all the Java primitive types (i.e. int),
Java primitive wrapper types (i.e. Integer), String type, and many other custom types, with the only
requirement that a type used for an id will be converted to a single String.
6.6.5. property
Declaring a searchable class property (a.k.a JavaBean property) of a class using the property element.
<property
name="property name"
accessor="property|field"
boost="boost value for the property"
class="explicit declaration of the property class"
analyzer="name of the analyzer"
override="true|false"
managed-id="auto|true|false"
managed-id-index="[compass.managedId.index setting]|no|not_analyzed"
managed-id-converter="managed id converter lookup name"
exclude-from-all="no|yes|no_analyzed"
converter="converter lookup name"
>
(meta-data)*
</property>
Attribute Description
name The class property (a.k.a JavaBean property) name, with initial
lowercase letter.
accessor (optional, defaults to The strategy to access the class property value. property means
property) accessing using the Java Bean accessor methods, while field
directly accesses the class fields.
boost (optional, default to 1.0f) The boost level that will be propagated to all the meta-data defined
within the class property.
class (optional) An explicit definition of the class of the property, helps for certain
converters (especially for java.util.Collection type properties,
since it applies to the collection elements).
analyzer (optional, defaults to the class The name of the analyzer that will be used to analyze ANALYZED
mapping analyzer decision scheme) meta-data mappings defined for the given property. Defaults to the
class mapping analyzer decision scheme based on the analyzer set,
or the analyzer mapping property.
override (optional, defaults to true) If there is another definition with the same mapping name, if it will
be overridden or added as additional mapping. Mainly used to
override definitions made in extended mappings.
managed-id (optional, defaults to auto) The strategy for creating or using a class property meta-data id
(which maps to a ResourceProperty.
managed-id-index (optional, defaults to Can be either not_analyzed or no. It is the index setting that will be
compass.managedId.index setting, used when creating an internal managed id for a class property
which defaults to no) mapping (if it is not a property id, if it is, it will always be
not_analyzed).
managed-id-converter (optional) The global converter lookup name applied to the generated
managed id (if generated).
exclude-from-all (optional, defaults to Excludes the class property from participating in the "all"
no) meta-data, unless specified in the meta-data level. If set to
no_analyzed, not_analyzed properties will be analyzed when
added to the all property (the analyzer can be controlled using the
analyzer attribute).
converter (optional) The global converter lookup name registered with the
configuration.
You can map all internal Java primitive data types, primitive wrappers and most of the common Java classes
(i.e. Date and Calendar). You can also map Arrays and Collections of these data types. When mapping a
Collection, you must specify the object class (like java.lang.String) in the class mapping property (unless
you are using generics).
Note, that you can define a property with no meta-data mapping within it. It means that it will not be
searchable, but the property value will be stored when persisting the object to the search engine, and it will be
loaded from it as well (unless it is of type java.io.Reader).
6.6.6. analyzer
Declaring an analyzer controller property (a.k.a JavaBean property) of a class using the analyzer element.
<analyzer
name="property name"
null-analyzer="analyzer name if value is null"
accessor="property|field"
converter="converter lookup name"
>
</analyzer>
Attribute Description
name The class property (a.k.a JavaBean property) name, with initial
lowercase letter.
accessor (optional, defaults to The strategy to access the class property value. property means
property) accessing using the Java Bean accessor methods, while field
directly accesses the class fields.
null-analyzer (optional, defaults to error The name of the analyzer that will be used if the property has the
in case of a null value) null value.
converter (optional) The global converter lookup name registered with the
configuration.
The analyzer class property mapping, controls the analyzer that will be used when indexing the class data (the
underlying Resource). If the mapping is defined, it will override the class mapping analyzer attribute setting.
If, for example, Compass is configured to have two additional analyzers, called an1 (and have settings in the
form of compass.engine.analyzer.an1.*), and another called an2. The values that the class property can hold
are: default (which is an internal Compass analyzer, that can be configured as well), an1 and an2. If the
analyzer will have a null value, and it is applicable with the application, a null-analyzer can be configured
that will be used in that case. If the class property has a value, but there is not matching analyzer, an exception
will be thrown.
6.6.7. boost
Declaring boost property (a.k.a JavaBean property) of a class using the boost element.
<boost
name="property name"
default="the boost default value when no property value is present"
accessor="property|field"
converter="converter lookup name"
>
</boost>
Attribute Description
name The class property (a.k.a JavaBean property) name, with initial
lowercase letter.
accessor (optional, defaults to The strategy to access the class property value. property means
property) accessing using the Java Bean accessor methods, while field
directly accesses the class fields.
default (optional, defaults 1.0f) The default value if the property has a null value.
converter (optional) The global converter lookup name registered with the
configuration.
The boost class property mapping, controls the boost associated with the Resource created based on the mapped
property. The value of the property should be allowed to be converted to float.
6.6.8. meta-data
<meta-data
store="yes|no|compress"
index="analyzed|not_analyzed|no"
boost="boost value for the meta-data"
analyzer="name of the analyzer"
reverse="no|reader|string"
null-value="String value that will be stored when the property value is null"
exclude-from-all="[parent's exclude-from-all]|no|yes|no_analyzed"
converter="converter lookup name"
term-vector="no|yes|positions|offsets|positions_offsets"
format="the format string (only applies to formatted elements)"
>
</meta-data>
Attribute Description
store (optional, defaults to yes) If the value of the class property that the meta-data maps to, is
going to be stored in the index.
index (optional, defaults to analyzed) If the value of the class property that the meta-data maps to, is
going to be indexed (searchable). If it does, than controls if the
value is going to be broken down and analysed (analyzed), or is
going to be used as is (not_analyzed).
boost (optional, defaults to 1.0f) Controls the boost level for the meta-data.
analyzer (optional, defaults to the The name of the analyzer that will be used to analyze ANALYZED
parent analyzer) meta-data. Defaults to the parent property mapping, which in turn
Attribute Description
term-vector (optional, defaults to no) The term vector value of meta data.
reverse (optional, defaults to no) The meta-data will have it's value reversed. Can have the values of
no - no reverse will happen, string - the reverse will happen and
the value stored will be a reversed string, and reader - a special
reader will wrap the string and reverse it. The reader option is
more performant, but the store and index settings will be
discarded.
exclude-from-all (optional, defaults to Excludes the meta-data from participating in the "all" meta-data. If
the parent's exclude-from-all value) set to no_analyzed, not_analyzed properties will be analyzed
when added to the all property (the analyzer can be controlled using
the analyzer attribute).
null-value (optional, defaults to not A String null value that will be used when the property evaluates to
storing the anything on null) null.
converter (optional) The global converter lookup name registered with the
configuration. Note, that in case of a Collection property, the
converter will be applied to the collection elements (Compass has
it's own converter for Collections).
format (optional) Allows for quickly setting a format for format-able types (dates,
and numbers), without creating/registering a specialized converter
under a lookup name.
You can control the format of the marshalled values when mapping a java.lang.Number (or the equivalent
primitive value) using the format provided by the java.text.DecimalFormat. You can also format a
java.util.Date using the format provided by java.text.SimpleDateFormat. You set the format string in the
format attribute.
6.6.9. dynamic-meta-data
<dynamic-meta-data
name="The name the meta data will be saved under"
store="yes|no|compress"
index="analyzed|not_analyzed|no"
boost="boost value for the meta-data"
analyzer="name of the analyzer"
reverse="no|reader|string"
null-value="optional String value when expression is null"
exclude-from-all="[parent's exclude-from-all]|no|yes|no_analyzed"
converter="the Dynamic Converter lookup name (required)"
format="the format string (only applies to formatted elements)"
>
</meta-data>
Attribute Description
name The name the dynamic meta data will be saved under (similar to the
tag name of the meta-data mapping).
store (optional, defaults to yes) If the value of the class property that the meta-data maps to, is
going to be stored in the index.
index (optional, defaults to analyzed) If the value of the class property that the meta-data maps to, is
going to be indexed (searchable). If it does, than controls if the
value is going to be broken down and analysed (analyzed), or is
going to be used as is (not_analyzed).
boost (optional, defaults to 1.0f) Controls the boost level for the meta-data.
analyzer (optional, defaults to the The name of the analyzer that will be used to analyze ANALYZED
parent analyzer) meta-data. Defaults to the parent property mapping, which in turn
defaults to the class mapping analyzer decision scheme based on
the analyzer set, or the analyzer mapping property.
reverse (optional, defaults to no) The meta-data will have it's value reversed. Can have the values of
no - no reverse will happen, string - the reverse will happen and
the value stored will be a reversed string, and reader - a special
reader will wrap the string and reverse it. The reader option is
more performant, but the store and index settings will be
discarded.
exclude-from-all (optional, defaults to Excludes the meta-data from participating in the "all" meta-data. If
the parent's exclude-from-all value) set to no_analyzed, not_analyzed properties will be analyzed
when added to the all property (the analyzer can be controlled using
the analyzer attribute).
null-value (optional, defaults to not If the expression evaluates to null, the String null value that will be
saving the value) stored for it.
converter (required) The global dynamic converter lookup name registered with the
configuration. Built in dynamic converters include: el, jexl,
velocity, ognl and groovy.
format (optional) Allows for quickly setting a format for format-able types (dates,
and numbers), without creating/registering a specialized converter
under a lookup name. Applies when the dynamic expression
evaluates to a formatable object. Must set the type attribute as well.
type (optional) The fully qualified class name of the object evaluated as a result of
the dynamic expression. Applies when using formats.
The dynamic meta data mapping allows to define meta-data saved into the search engine as a result of
evaluating an expression. The mapping does not map to any class property and acts as a syntactic meta-data
(similar to the constant mapping). The value of the dynamic meta-data tag is the expression evaluated by a
Dynamic Converter. Compass comes with several built in dynamic converters: el (Jakarta commons el), jexl
(Jakarta commons jexl), velocity, ognl, and groovy. When defining the expression, the root class is registered
under the data key (for libraries that require it).
6.6.10. component
<component
name="the class property name"
ref-alias="name of the alias"
max-depth="the depth of cyclic component mappings allowed"
accessor="property|field"
converter="converter lookup name"
cascade="comma separated list of create,save,delete or all"
>
</component>
Attribute Description
name The class property (a.k.a JavaBean property) name, with initial
lowercase letter.
ref-alias (optional) The class mapping alias that defines the component. This is an
optional attribute since under most conditions, Compass can infer
the referenced alias (it actually can't infer it when using Collection
without generics, or when a class has more than one mapping). In
case of polymorphic relationship, a list of aliases can be provided
(though again, Compass will try and auto detect the list of aliases if
none is defined).
override (optional, defaults to true) If there is another definition with the same mapping name, if it will
be overridden or added as additional mapping. Mainly used to
override definitions made in extended mappings.
accessor (optional, defaults to The strategy to access the class property value. property access
property) using the Java Bean accessor methods, while field directly access
the class fields.
converter (optional) The global converter lookup name registered with the
configuration.
cascade (optional, defaults to none) A comma separated list of operations to cascade. The operations
names are: create, save and delete. all can be used as well to mark
cascading for all operations.
The component element defines a class dependency within the root class. The dependency name is identified by
the ref-alias, which can be non-rootable or have no id mappings.
An embedded class means that all the mappings (meta-data values) defined in the referenced class are stored
within the alias of the root class. It means that a search that will hit one of the component mapped meta-datas,
will return it's owning class.
The type of the JavaBean property can be the class mapping class itself, an Array or Collection.
6.6.11. reference
<reference
name="the class property name"
ref-alias="name of the alias"
ref-comp-alias="name of an optional alias mapped as component"
accessor="property|field"
converter="converter lookup name"
cascade="comma separated list of create,save,delete or all"
>
</reference>
Attribute Description
name The class property (a.k.a JavaBean property) name, with initial
lowercase letter.
ref-alias (optional) The class mapping alias that defines the reference. This is an
optional attribute since under most conditions, Compass can infer
the referenced alias (it actually can't infer it when using Collection
without generics, or when a class has more than one mapping). In
case of polymorphic relationship, a list of aliases can be provided
(though again, Compass will try and auto detect the list of aliases if
none is defined).
ref-comp-alias (optional) The class mapping alias that defines a "shadow component". Will
marshal a component like mapping based on the alias into the
current class. Note, it's best to create a dedicated class mapping
(with root="false") that only holds the required information. Based
on the information, if you search for it, you will be able to get as
part of your hits the encompassing class. Note as well, that when
changing the referenced class, for it to be reflected as part of the
ref-comp-alias you will have to save all the relevant encompassing
classes.
accessor (optional, defaults to The strategy to access the class property value. property access
property) using the Java Bean accessor methods, while field directly access
the class fields.
converter (optional) The global converter lookup name registered with the
configuration.
cascade (optional, defaults to none) A comma separated list of operations to cascade. The operations
names are: create, save and delete. all can be used as well to mark
cascading for all operations.
The type of the JavaBean property can be the class mapping class itself, an Array of it, or a Collection.
Currently there is no support for lazy behavior or cascading. It means that when saving an object, it will not
persist the object defined references and when loading an object, it will load all it's references. Future versions
will support lazy and cascading features.
Compass supports cyclic references, which means that two classes can have a cyclic reference defined between
them.
6.6.12. parent
<parent
name="the class property name"
accessor="property|field"
converter="converter lookup name"
>
</reference>
Attribute Description
name The class property (a.k.a JavaBean property) name, with initial
lowercase letter.
accessor (optional, defaults to The strategy to access the class property value. property access
property) using the Java Bean accessor methods, while field directly access
the class fields.
converter (optional) The global converter lookup name registered with the
configuration.
The parent mapping provides support for cyclic mappings for components (though bi directional component
mappings are also supported). If the component class mapping wish to map the enclosing class, the parent
mapping can be used to map to it. The parent mapping will not marshal (persist the data to the search engine)
the parent object, it will only initialize it when loading the parent object from the search engine.
6.6.13. constant
<constant
exclude-from-all="no|yes|no_analyzed"
converter="converter lookup name"
>
meta-data,
meta-data-value+
</reference>
Attribute Description
exclude-from-all (optional, defaults to Excludes the constant meta-data and all it's values from
false) participating in the "all" feature. If set to no_analyzed,
not_analyzed properties will be analyzed when added to the all
property (the analyzer can be controlled using the analyzer
attribute).
override (optional, defaults to true) If there is another definition with the same mapping name, if it will
be overridden or added as additional mapping. Mainly used to
override definitions made in extended mappings.
Attribute Description
converter (optional) The global converter lookup name registered with the
configuration.
If you wish to define a set of constant meta data that will be embedded within the searchable class (Resource),
you can use the constant element. You define the usual meta-data element followed by one or
moremeta-data-value elements with the value that maps to the meta-data within it.
7.1. Introduction
Compass provides the ability to map XML structure to the underlying Search Engine through simple XML
mapping files, we call this technology XSEM (XML to Search Engine Mapping). XSEM provides a rich syntax
for describing XML mappings using Xpath expressions. The XSEM files are used by Compass to extract the
required xml elements from the xml structure at run-time and inserting the required meta-data into the Search
Engine index.
An extension to the XmlObject interface is the AliasedXmlObject interface. It represents an xml object that is
also associated with an alias. This means that saving the object does not require to explicitly specify the alias
that it will be saved under.
Compass comes with support for dom4j and JSE 5 xml libraries, here is an example of how to use dom4j API
in order to create a dom4j xml object:
Up until now, Compass has no knowledge of how to parse and create an actual XmlObject implementation, or
how to convert an XmlObject into its xml representation. This is perfectly fine, but it also means that systems
will not be able to work with XmlObject for read/search operations. Again, this is perfectly ok for some
application, since they can always work with the underlying Resource representation, but some applications
would still like to store the actual xml content in the search engine, and work with the XmlObject for
read/search operations.
Compass XSEM support allows to define the xml-content mapping (defined below), which will cause
Compass to store the xml representation in the search engine as well. It will also mean that for read/search
operations, the application will be able to get an XmlObject back (for example, using CompassSession#get
operation).
In order to support this, Compass must be configured with how to parse the xml content into an XmlObject, and
how to convert an XmlObject into an xml string. Compass comes with built in converters that do exactly that:
org.compass.core.xml.javax.converter.
javax-node Support for JSE 5 xml libraries. Not
NodeXmlContentConverter recommended on account of performance.
org.compass.core.xml.javax.converter.
javax-stax Support for JSE 5 xml libraries. Parses the
StaxNodeXmlContentConverter Document model using StAX. Not
recommended on account of performance.
org.compass.core.xml.dom4j.converter.
dom4j-sax Support dom4j SAXReader for parsing, and
SAXReaderXmlContentConverter XMLWriter to write the raw xml data.
org.compass.core.xml.dom4j.converter.
dom4j-xpp Support dom4j XPPReader for parsing, and
XPPReaderXmlContentConverter XMLWriter to write the raw xml data.
org.compass.core.xml.dom4j.converter.
dom4j-xpp3 Support dom4j XPP3Reader for parsing,
XPP3ReaderXmlContentConverter and XMLWriter to write the raw xml data.
org.compass.core.xml.dom4j.converter.
dom4j-stax Support dom4j STAXEventReader for
STAXReaderXmlContentConverter parsing, and XMLWriter to write the raw
xml data.
org.compass.core.xml.jdom.converter.
jdom-sax Support JDOM SAXBuilder for parsing,
SAXBuilderXmlContentConverter and XMLOutputter to write the raw xml
data.
org.compass.core.xml.jdom.converter.
jdom-stax Support JDOM STAXBuilder for parsing,
STAXBuilderXmlContentConverter and XMLOutputter to write the raw xml
data.
Most of the time, better performance can be achieved by pooling XmlContentConverters implementations.
Compass handling of XmlContentConverter allows for three different instantiation models: prototype, pool,
and singleton. prototype will create a new XmlContentConverter each time, a singleton will use a shared
XmlContentConverter for all operations, and pooled will pool XmlContentConverter instances. The default is
prototype.
Here is an example of a Compass schema based configuration that registers a global Xml Content converter:
<compass-core-config ...
<compass name="default">
<connection>
<file path="target/test-index" />
</connection>
<settings>
<setting name="compass.xsem.contentConverter.type" value="jdom-stax" />
<setting name="compass.xsem.contentConverter.wrapper" value="pool" />
</settings>
</compass>
</compass-core-config>
compass.xsem.contentConverter.type=jdom-stax
compass.xsem.contentConverter.wrapper=pool
settings.setSetting(CompassEnvironment.Xsem.XmlContent.TYPE, CompassEnvironment.Xsem.XmlContent.Jdom.TYPE_STAX);
Note, that specific converters can be associated with a specific xml-object mapping, in order to do it, simply
register the converter under a different name (compass.converter.xmlContentMapping is the default name that
Compass will use when nothing is configured), and use that name in the converter attribute of the xml-content
mapping.
Based on internal performance testing, the preferable configuration is a pooled converter that uses either dom4j
or JDOM with a pull parser (StAX or XPP).
Here, Compass will identify that it is a RawAliasedXmlObject, and will used the registered converter (or the
one configured against the xml-content mapping for the given alias) to convert it to the appropriate XmlObject
implementation. Note, that when performing any read/search operation, the actual XmlObject that will be
returned is the onc the the registered converter creates, and not the raw xml object.
<xml-fragment>
<data>
<id value="1"/>
<data1 value="data11attr">data11</data1>
<data1 value="data12attr">data12</data1>
</data>
<data>
<id value="2"/>
<data1 value="data21attr">data21</data1>
<data1 value="data22attr">data22</data1>
</data>
</xml-fragment>
We can map it using the following XSEM xml mapping definition file:
<?xml version="1.0"?>
<!DOCTYPE compass-core-mapping PUBLIC
"-//Compass/Compass Core Mapping DTD 2.2//EN"
"http://www.compass-project.org/dtd/compass-core-mapping-2.2.dtd">
<compass-core-mapping>
</compass-core-mapping>
{
"compass-core-mapping" : {
xml : [
{
alias : "data1",
xpath : "/xml-fragment/data[1]",
id : { name : "id", xpath : "id/@value" }.
property : [
{ xpath : "data1/@value" },
{ name : "eleText", xpath : "data1" }
]
},
{
alias : "data2",
xpath : "/xml-fragment/data",
id : { name : "id", xpath : "id/@value" },
property : [
{ xpath : "data1/@value" },
{ name : "eleText", xpath : "data1" }
]
},
{
alias : "data3",
xpath : "/xml-fragment/data",
id : { name : "id", xpath : "id/@value" }.
property : [
{ xpath : "data1/@value" },
{ name : "eleText", xpath : "data1" }
],
content : { name : "content" }
}
]
}
}
conf.addMapping(
xml("data1").xpath("/xml-fragment/data[1]")
.add(id("id/@value").indexName("id"))
.add(property("data1/@value"))
.add(proeprty("data1").indexName("eleText"))
);
conf.addMapping(
xml("data2").xpath("/xml-fragment/data")
.add(id("id/@value").indexName("id"))
.add(property("data1/@value"))
.add(proeprty("data1").indexName("eleText"))
);
conf.addMapping(
xml("data3").xpath("/xml-fragment/data")
.add(id("id/@value").indexName("id"))
.add(property("data1/@value"))
.add(proeprty("data1").indexName("eleText"))
.add(content("content"))
);
The mapping definition here shows three different mappings (that will work with the sample xml). The
different mappings are registered under different aliases, where the alias acts as the connection between the
actual XML saved and the mappings definition.
The xml mapping also supports xpath with namespaces easily. For example, if we have the following xml
fragment:
<xml-fragment>
<data xmlns="http://mynamespace.org">
<id value="1"/>
<data1 value="data11attr">data11</data1>
<data1 value="data12attr">data12</data1>
</data>
</xml-fragment>
<?xml version="1.0"?>
<!DOCTYPE compass-core-mapping PUBLIC
"-//Compass/Compass Core Mapping DTD 2.2//EN"
"http://www.compass-project.org/dtd/compass-core-mapping-2.2.dtd">
<compass-core-mapping>
In this case, we need to define the mapping between the mynamespace prefix used in the xpath definition and
the http://mynamespace.org URI. Within the Compass configuration, a simple setting should be set:
compass.xsem.namespace.mynamespace.uri=http://mynamespace.org. Other namespaces can be added
using similar settings: compass.xsem.namespace.[prefix].uri=[uri].
An xml-object mapping can have an associated xpath expression with it, which will narrow down the actual
xml elements that will represent the top level xml object which will be mapped to the search engine. A nice
benefit here, is that the xpath can return multiple xml objects, which in turn will result in multiple Resources
saved to the search engine.
Each xml object mapping must have at least one xml-id mapping definition associated with it. It is used in
order to update/delete existing xml objects.
In the mapping definition associated with data3 alias, the xml-content mapping is used, which stores the actual
xml content in the search engine as well. This will allow to unmarshall the xml back into an XmlObject
representation. For the first two mappings (data1 and data2), search/read operations will only be able to work
on the Resource level.
7.5.1. Converters
Actual value mappings (the xml-property) can use the extensive converters that come built in with Compass.
Xml "value converter" are a special case since in their case, normalization needs to be done by converting a
String to a normalized String. For example, a number with a certain format may need to be normalized into a
number with a different (padded for example) format.
<xml-fragment>
<data>
<id value="1"/>
<data1 value="21.2">03-12-2001</data1>
</data>
</xml-fragment>
<?xml version="1.0"?>
<!DOCTYPE compass-core-mapping PUBLIC
"-//Compass/Compass Core Mapping DTD 2.2//EN"
"http://www.compass-project.org/dtd/compass-core-mapping-2.2.dtd">
<compass-core-mapping>
In this case, we create a mapping under alias name data6. We can see the the value attribute of data1 element
is a float number. When we index it, we would like to convert it to the following format: 000000.00. We can
see, within the mappings, that we defined the converter to be of a float type and with the requested format.
The actual data1 element text is of type date, and again we configure the converter to be of type date, and we
use the support for "format" based converters in Compass to accept several formats (the first is the one used
when converting from an Object). So, in the date case, the converter will try and convert the 03-12-2001 date
and will succeed thanks to the fact that the second date format match it. It will then convert the Date created
back to a String using the first format, giving us a searchable date format indexed.
7.5.2. xml-object
You may declare a xml object mapping using the xml-object element:
<xml-object
alias="aliasName"
sub-index="sub index name"
Attribute Description
sub-index (optional, defaults to the The name of the sub-index that the alias will map to.
alias value)
xpath (optional, will not execute an An optional xpath expression to narrow down the actual xml
xpath expression if not specified) elements that will represent the top level xml object which will be
mapped to the search engine. A nice benefit here, is that the xpath
can return multiple xml objects, which in turn will result in multiple
Resources saved to the search engine.
analyzer (optional, defaults to the The name of the analyzer that will be used to analyze ANALYZED
default analyzer) properties. Defaults to the default analyzer which is one of the
internal analyzers that comes with Compass. Note, that when using
the xml-analyzer mapping (a child mapping of xml object
mapping) (for an xml element that controls the analyzer), the
analyzer attribute will have no effects.
7.5.3. xml-id
Mapped XmlObject's must declare at least one xml-id. The xml-id element defines the XmlObject (element,
attribute, ...) that identifies the root XmlObject for the specified alias.
<xml-id
name="the name of the xml id"
xpath="xpath expression"
value-converter="value converter lookup name"
converter="converter lookup name"
/>
Attribute Description
name The name of the xml-id. Will be used when constructing the xml-id
internal path.
xpath The xpath expression used to identify the xml-id. Must return a
single xml element.
value-converter (optional, default to The global converter lookup name registered with the
Attribute Description
converter (optional) The global converter lookup name registered with the
configuration. The converter will is responsible to convert the
xml-id mapping.
An important note regarding the xml-id mapping, is that it will always at as an internal Compass Property.
This means that if one wish to have it as part of the searchable content, it will have to be mapped with
xml-property as well.
7.5.4. xml-property
<xml-property
xpath="xpath expression"
name="optionally the name of the xml property"
store="yes|no|compress"
index="analyzed|not_analyzed|no"
boost="boost value for the property"
analyzer="name of the analyzer"
reverse="no|reader|string"
override="true|false"
exclude-from-all="no|yes|no_analyzed"
value-converter="value converter lookup name"
format="a format string for value converters that support this"
converter="converter lookup name"
/>
Attribute Description
name (optional, will use the xml object The name that the value will be saved under. It is optional, and if
(element, attribute, ...) name if not set) not set, will use the xml object name (the result of the xpath
expression).
xpath The xpath expression used to identify the xml-property. Can return
no xml objects, one xml object, or many xml objects.
store (optional, defaults to yes) If the value of the xml property is going to be stored in the index.
index (optional, defaults to analyzed) If the value of the xml property is going to be indexed (searchable).
If it does, than controls if the value is going to be broken down and
analyzed (analyzed), or is going to be used as is (not_analyzed).
boost (optional, defaults to 1.0f) Controls the boost level for the xml property.
analyzer (optional, defaults to the xml The name of the analyzer that will be used to analyze ANALYZED xml
mapping analyzer decision scheme) property mappings defined for the given property. Defaults to the
xml mapping analyzer decision scheme based on the analyzer set,
or the xml-analyzer mapping.
Attribute Description
exclude-from-all (optional, default to Excludes the property from participating in the "all" meta-data. If
no) set to no_analyzed, not_analyzed properties will be analyzed
when added to the all property (the analyzer can be controlled using
the analyzer attribute).
override (optional, defaults to true) If there is another definition with the same mapping name, if it will
be overridden or added as additional mapping. Mainly used to
override definitions made in extended mappings.
reverse (optional, defaults to no) The meta-data will have it's value reversed. Can have the values of
no - no reverse will happen, string - the reverse will happen and
the value stored will be a reversed string, and reader - a special
reader will wrap the string and reverse it. The reader option is
more performant, but the store and index settings will be
discarded.
value-converter (optional, default to The global converter lookup name registered with the
Compass SimpleXmlValueConverter) configuration. This is a converter associated with converting the
actual value of the xml-id. Acts as a convenient extension point for
custom value converter implementation (for example, date
formatters). SimpleXmlValueConverter will usually act as a base
class for such extensions.
converter (optional) The global converter lookup name registered with the
configuration. The converter will is responsible to convert the
xml-property mapping.
7.5.5. xml-analyzer
<xml-analyzer
name="property name"
xpath="xpath expression"
null-analyzer="analyzer name if value is null"
converter="converter lookup name"
>
</xml-analyzer>
Attribute Description
xpath The xpath expression used to identify the xml-analyzer. Must return
a single xml element.
null-analyzer (optional, defaults to error The name of the analyzer that will be used if the property has a
in case of a null value) null value, or the xpath expression returned no elements.
converter (optional) The global converter lookup name registered with the
configuration.
The analyzer xml property mapping, controls the analyzer that will be used when indexing the XmlObject. If
the mapping is defined, it will override the xml object mapping analyzer attribute setting.
If, for example, Compass is configured to have two additional analyzers, called an1 (and have settings in the
form of compass.engine.analyzer.an1.*), and another called an2. The values that the xml property can hold
are: default (which is an internal Compass analyzer, that can be configured as well), an1 and an2. If the
analyzer will have a null value, and it is applicable with the application, a null-analyzer can be configured
that will be used in that case. If the resource property has a value, but there is not matching analyzer, an
exception will be thrown.
7.5.6. xml-boost
Declaring a dynamic boost mapping controlling the boost level using the xml-boost element.
<xml-analyzer
name="property name"
xpath="xpath expression"
default="the boost default value when no property value is present"
converter="converter lookup name"
>
</xml-analyzer>
Attribute Description
xpath The xpath expression used to identify the xml-analyzer. Must return
a single xml element.
default (optional, defaults to 1.0) The default boost value if no value is found.
converter (optional) The global converter lookup name registered with the
configuration.
The boost xml property mapping, controls the boost associated with the Resource created based on the mapped
property. The value of the property should be allowed to be converted to float.
7.5.7. xml-content
<xml-content
name="property name"
store="yes|compress"
converter="converter lookup name"
>
</xml-content>
Attribute Description
Attribute Description
store (optional, defaults to yes) How to store the actual xml content.
converter (optional) The global converter lookup name registered with the
configuration.
The xml-content mapping causes Compass to store the actual xml content in the search engine as well. This
will allow to unmarshall the xml back into an XmlObject representation. For xml-object mapping without an
xml-content mapping, search/read operations will only be able to work on the Resource level.
8.1. Introduction
Compass provides the ability to map JSON to the underlying Search Engine through simple XML mapping
files, we call this technology JSEM (JSON to Search Engine Mapping). The JSEM files are used by Compass
to extract the required JSON elements at run-time and inserting the required meta-data into the Search Engine
index. Mappings can be done explicitly for each JSON element, or let Compass dynamically add all JSON
elements from a certain JSON element recursively.
Lets start with a simple example. The following is a sample JSON that we will work with:
{
"id": 1,
"name": "Mary Lebow",
"address": {
"street": "5 Main Street"
"city": "San Diego, CA",
"zip": 91912,
},
"phoneNumbers": [
"619 332-3452",
"664 223-4667"
]
}
}
Now, lets see different ways of how we can map this JSON into the search engine. The first option will be to
use fully explicit mappings:
<root-json-object alias="addressbook">
<json-id name="id" />
<json-property name="name" />
<json-object name="address">
<json-property name="street" />
<json-property name="city" />
<json-property name="zip" index="not_analyzed" />
<json-array name="phoneNumbers" index-name="phoneNumber">
<json-property />
</json-array>
</json-object>
</root-json-object>
{
"compass-core-mapping" : {
"json" : [
{
alias : "addressbook",
id : {
name : "id"
},
property : [
{name : "name"}
],
object : [
{
name : "address",
property : [
{name : "street"},
{name : "city"},
{name : "zip", index : "not_analyzed"},
]
array : {
name : "phoneNumbers"
"index-name" : "phoneNumber",
property : {}
}
}
]
}
]
}
}
conf.addMapping(
json("addressbook")
.add(id("id"))
.add(property("name"))
.add(object("address")
.add(property("street"))
.add(property("city"))
.add(property("zip").index(Property.Index.NOT_ANALYZED))
.add(array("phoneNumbers").indexName("phoneNumber").element(property()))
)
);
The above explicit mapping defines how each JSON element will be mapped to the search engine. In the above
case, we will have several searchable properties named after their respective JSON element names (the name
can be changed by using index-name attribute). We can now perform search queries such as street:diego, or
phoneNumber:619*, or even (using dot path notation): addressbook.address.street:diego.
Many times though, explicit mapping of all the JSON elements is a bit of a pain, and does not work when
wanting to create a generic indexing service. In this case, Compass allows to dynamically and recursively map
JSON element. Here is an example where the JSON address element is mapped dynamically, thus adding any
element within it dynamically to the search engine:
<root-json-object alias="addressbook">
<json-id name="id" />
<json-property name="name" />
<json-object name="address" dynamic="true" />
</root-json-object>
The dynamic aspect can even be set on the root-json-object allows to create a completely generic JSON
indexing service which requires only setting the id JSON element.
Now, in order to index, search, and load JSON objects, we can use the JsonObject API abstraction. Here is a
simple example that uses a JsonObject implementation that is bundled with Compass called JSONObject and is
based on the json.org site:
// we can also get back the JSON content and actual object when using content mapping (see later)
jsonObject = (JsonObject) session.load("addressbook", 1);
Implementing support for another framework that bundles its own JSON object based implementation should
be very simple. It should basically follow the API requirements (probably by wrapping the actual one). The
jettison implementation can be used as a reference implementation on how this can be done.
The following mapping definition shows how to map JSON to also store its content:
<root-json-object alias="addressbook">
<json-id name="id" />
<json-property name="name" />
<json-object name="address" dynamic="true" />
<json-content name="content" />
</root-json-object>
This will cause Compass to store the actual JSON content under a Resource Property named content. Here is an
example of how it can be retrieved back from the search engine:
In order to convert back to the actual JSON object, a converter instructing Compass how to convert the JSON
string back to your favorite JSON object model should be registered with Compass. For example, to register the
jettison based converter, the setting named compass.jsem.contentConverter.type should be set to
org.compass.core.json.jettison.converter.JettisonContentConverter. In order to register the grails
converter the setting should be set to org.compass.core.json.grails.converter.GrailsContentConverter.
In order to register the jackson converter the setting should be set to
org.compass.core.json.jackson.converter.JacksonContentConverter. And, in order to use the default
build in Compass implementation the setting should be set to
org.compass.core.json.impl.converter.DefaultJSONContentConverterImpl.
By default, the content converter registered with Compass is the default one.
8.5.1. root-json-object
<root-json-object
alias="aliasName"
analyzer="name of the analyzer"
dynamic="false|true"
dynamic-naming-type="plain|full"
spell-check="optional spell check setting"
/>
all?,
sub-index-hash?,
json-id*,
(json-analyzer?),
(json-boost?),
(json-property|json-array|json-object)*,
(json-content?)
Attribute Description
sub-index (optional, defaults to the The name of the sub-index that the alias will map to.
alias value)
analyzer (optional, defaults to the The name of the analyzer that will be used to analyze ANALYZED
default analyzer) properties. Defaults to the default analyzer which is one of the
internal analyzers that comes with Compass. Note, that when using
the json-analyzer mapping (a child mapping of root json object
mapping) (for a json element that controls the analyzer), the
analyzer attribute will have no effects.
dynamic (optional, default to false) Should unmapped json elements be added to the search engine
automatically (and recursively).
8.5.2. json-id
The JSON element within the json object that represents the id of the resource
<json-id
name="the name of the json id element"
value-converter="value converter lookup name"
converter="converter lookup name"
format="an optional format string"
omit-norms="true|false"
spell-check="spell check setting"
/>
Attribute Description
name The name of the JSON element within the JSON object that its
value is the id of the element/resource.
value-converter (optional, default to The global converter lookup name registered with the
Compass SimpleJsonValueConverter) configuration. This is a converter associated with converting the
actual value of the json-id. Acts as a convenient extension point for
custom value converter implementation (for example, date
formatters). SimpleJsonValueConverter will usually act as a base
class for such extensions. The value of this converter can also
reference one of Compass built in converters, such as int (in this
case, the format can also be used).
converter (optional) The global converter lookup name registered with the
configuration. The converter will is responsible to convert the
json-id mapping.
8.5.3. json-property
The JSON element within the json object that represents a property of the resource
<json-property
name="the name of the json id element"
index-name="the name it will be store under, default to the element name"
naming-type="plain|full"
store="yes|no|compress"
index="analyzed|not_analyzed|no"
omit=-norms="false|true"
null-value="a value to store the index in case the element value is null"
boost="boost value for the property"
analyzer="name of the analyzer"
reverse="no|reader|string"
override="true|false"
exclude-from-all="no|yes|no_analyzed"
value-converter="value converter lookup name"
format="a format string for value converters that support this"
converter="converter lookup name"
spell-check="spell check setting"
/>
Attribute Description
name The name of the JSON element within the JSON object that its
value is the property name of the element/resource.
index-name (optional, defaults to the The name of the resource property that will be stored in the index.
element name) Defaults to the element name.
store (optional, defaults to yes) If the value of the xml property is going to be stored in the index.
index (optional, defaults to analyzed) If the value of the xml property is going to be indexed (searchable).
If it does, than controls if the value is going to be broken down and
analyzed (analyzed), or is going to be used as is (not_analyzed).
boost (optional, defaults to 1.0f) Controls the boost level for the xml property.
analyzer (optional, defaults to the xml The name of the analyzer that will be used to analyze ANALYZED
mapping analyzer decision scheme) json property mappings defined for the given property. Defaults to
the json mapping analyzer decision scheme based on the analyzer
set, or the json-analyzer mapping.
exclude-from-all (optional, default to Excludes the property from participating in the "all" meta-data. If
no) set to no_analyzed, not_analyzed properties will be analyzed
when added to the all property (the analyzer can be controlled using
the analyzer attribute).
override (optional, defaults to false) If there is another definition with the same mapping name, if it will
be overridden or added as additional mapping. Mainly used to
override definitions made in extended mappings.
reverse (optional, defaults to no) The meta-data will have it's value reversed. Can have the values of
no - no reverse will happen, string - the reverse will happen and
the value stored will be a reversed string, and reader - a special
reader will wrap the string and reverse it. The reader option is
more performant, but the store and index settings will be
discarded.
value-converter (optional, default to The global converter lookup name registered with the
Compass SimpleJsonValueConverter) configuration. This is a converter associated with converting the
actual value of the json-property. Acts as a convenient extension
point for custom value converter implementation (for example, date
formatters). SimpleJsonValueConverter will usually act as a base
class for such extensions. The value of this converter can also
reference one of Compass built in converters, such as int (in this
case, the format can also be used).
converter (optional) The global converter lookup name registered with the
configuration. The converter will is responsible to convert the
json-property mapping.
8.5.4. json-object
<json-object
name="the name of the json object element"
converter="optional converter lookup name"
dynamic="false|true"
dynamic-naming-type="plain|full"
/>
(json-property|json-array|json-object)*
Attribute Description
name The name of the json object element. Not required when mapping
json-object within the json-array.
dynamic (optional, default to false) Should unmapped json elements be added to the search engine
automatically (and recursively).
8.5.5. json-array
<json-array
name="the name of the json object element"
index-name="optional, the name of the internal mapping will be stored under"
converter="optional converter lookup name"
dynamic="false|true"
dynamic-naming-type="plain|full"
/>
(json-property|json-array|json-object)*
Attribute Description
index-name The name of the json array internal mapping will be store under.
Note, when using json array, there is no need to name its internal
element, it is controlled by the json-array name/index-name.
dynamic (optional, default to false) Should unmapped json elements be added to the search engine
automatically (and recursively).
8.5.6. json-content
Maps the actual JSON string into a resource property to be store in the search engine.
<json-content
name="the name to store the json content under"
store="yes|compress"
converte="optional converter lookup name"
/>
Attribute Description
name The name to store the JSON string under in the resource.
store How the JSON content will be stored. yes for plain storing,
compress for compressed storing.
8.5.7. json-boost
Declaring a dynamic boost mapping controlling the boost level using the json-boost element.
<json-boost
name="the json element that holds the boost value"
default="the boost default value when no property value is present"
converter="converter lookup name"
/>
Attribute Description
name The name of json element that its value will be used as the boost
value.
default (optional, defaults to 1.0) The default boost value if no value is found.
8.5.8. json-analyzer
<json-analyzer
name="the json element that holds the analyzer value"
null-analyzer="analyzer name if value is null"
converter="converter lookup name"
/>
Attribute Description
name The name of json element that its value will be used as the analyzer
lookup value.
null-analyzer (optional, defaults to error The name of the analyzer that will be used if the property has a
in case of a null value) null value.
The analyzer json property mapping, controls the analyzer that will be used when indexing the JsonObject. If
the mapping is defined, it will override the json object mapping analyzer attribute setting.
If, for example, Compass is configured to have two additional analyzers, called an1 (and have settings in the
form of compass.engine.analyzer.an1.*), and another called an2. The values that the xml property can hold
are: default (which is an internal Compass analyzer, that can be configured as well), an1 and an2. If the
analyzer will have a null value, and it is applicable with the application, a null-analyzer can be configured
that will be used in that case. If the resource property has a value, but there is not matching analyzer, an
exception will be thrown.
9.1. Introduction
Compass provides OSEM technology for use with an applications Object domain model or XSEM when
working with xml data structures. Compass also provides Resource Mapping technology for resources other
than Objects/XML (that do not benefit from OSEM). The benefits of using Resources can be summarized as:
• Your application does not have a domain model (therefore cannot use OSEM), but you still want to use the
functionality of Compass.
• Your application already works with Lucene, but you want to add Compass additional features (i.e.
transactions, fast updates). Working with Resources makes your migration easy (as it is similar to working
with Lucene Document).
• You execute a query and want to update all the meta-data (Resource Property) with a certain value. You
use OSEM in your application, but you do not wish to iterate through the results, performing run-time
object type checking and casting to the appropriate object type before method call. You can simply use the
Resource interface and treat all the results in the same abstracted way.
<?xml version="1.0"?>
<!DOCTYPE compass-core-mapping PUBLIC
"-//Compass/Compass Core Mapping DTD 2.2//EN"
"http://www.compass-project.org/dtd/compass-core-mapping-2.2.dtd">
<compass-core-mapping>
<resource alias="a">
<resource-id name="id" />
</resource>
<resource alias="b">
<resource-id name="id1" />
<resource-id name="id2" />
</resource>
<resource alias="c">
<resource-id name="id" />
<resource-property name="value1" />
<resource-property name="value2" store="yes" index="analyzed" />
<resource-property name="value3" store="compress" index="analyzed" />
<resource-property name="value4" store="yes" index="not_analyzed" />
<resource-property name="value5" store="yes" index="no" converter="my-date" />
</resource>
</compass-core-mapping>
{
"compass-core-mapping" : {
"resource" : [
{
alias : "a",
id : {name : " id"}
},
{
alias : "b",
id : [
{name : "id1"},
{name : "id2"}
]
},
{
alias : "c",
property : [
{ name : "value1"},
{ name : "value2", store : "yes", index : "analyzed"},
{ name : "value3", store : "compress", index : "analyzed"},
{ name : "value4", store : "yes", index : not_analyzed""},
{ name : "value5", store : "yes", index : "no", converter : "my-date"}
]
}
]
}
}
Now that the Resource Mapping has been declared, you can create the Resource in the application. In the
following code example the Resource is created with an alias and id property matching the Resource Mapping
declaration.
session.save(r);
The Resource Mapping file example above defines mappings for three resource types (each identified with a
different alias). Each resource has a set of resource ids that are associated with it. The value for the
resource-id tag is the name of the Property that is associated with the primary property for the Resource.
The third mapping (alias "c"), defines resource-property mappings as well as resource-id mappings. The
resource-property mapping works with the Resource#addProperty(String name, Object value)
operation. It provides definitions for the resource properties that are added (index, store, and so on), and they
are then looked up when using the mentioned add method. Using the resource-property mapping, helps clean
up the code when constructing a Resource, since all the Property characteristics are defined in the mapping
definition, as well as auto conversion from different objects, and the ability to define new ones. Note that the
resource-property definition will only work with the mentioned addProperty method, and no other
addProperty method.
Here is an example of how resource-property mappings can simplify Resource construction code:
All XML mappings should declare the doctype shown. The actual DTD may be found at the URL above or in
the compass core distribution. Compass will always look for the DTD in the classpath first.
There are no compass-core-mapping attributes that are applicable when working with resource mappings.
9.2.1. resource
<resource
alias="aliasName"
sub-index="sub index name"
extends="a comma separated list of aliases to extend"
analyzer="name of the analyzer"
/>
all?,
sub-index-hash?,
resource-id*,
(resource-analyzer?),
(resource-boost?),
(resource-property)*
Table 9.1.
Attribute Description
sub-index (optional, defaults to the The name of the sub-index that the alias will map to.
alias value)
extends (optional) A comma separated list of aliases to extend. Can extend a resource
mapping or a resource-contract mapping. Note that can extend
more than one resource/resource-contract
analyzer (optional, defaults to the The name of the analyzer that will be used to analyze ANALYZED
Attribute Description
default analyzer) properties. Defaults to the default analyzer which is one of the
internal analyzers that comes with Compass. Note, that when using
the resource-analyzer mapping (a child mapping of resource
mapping) (for a resource property value that controls the analyzer),
the analyzer attribute will have no effects.
9.2.2. resource-contract
You may declare a resource mapping contract using the resource-contract element:
<resource-contract
alias="aliasName"
extends="a comma separated list of aliases to extend"
analyzer="name of the analyzer"
/>
resource-id*,
(resource-analyzer?),
(resource-property)*
Table 9.2.
Attribute Description
extends (optional) A comma separated list of aliases to extend. Can extend a resource
mapping or a resource-contract mapping. Note that can extend
more than one resource/resource-contract
analyzer (optional, defaults to the The name of the analyzer that will be used to analyze ANALYZED
default analyzer) properties. Defaults to the default analyzer which is one of the
internal analyzers that comes with Compass. Note, that when using
the resource-analyzer mapping (a child mapping of resource
mapping) (for a resource property value that controls the analyzer),
the analyzer attribute will have no effects.
9.2.3. resource-id
Mapped Resource's must declare at least one resource-id. The resource-id element defines the Property
that identifies the Resource for the specified alias.
<resource-id
name="idName"
/>
Table 9.3.
Attribute Description
name The name of the Property (known also as the name of the
meta-data) that is the id of the Resource.
9.2.4. resource-property
<resource-property
name="property name"
store="yes|no|compress"
index="analyzed|not_analyzed|no"
boost="boost value for the property"
analyzer="name of the analyzer"
reverse="no|reader|string"
override="true|false"
exclude-from-all="no|yes|no_analyzed"
converter="converter lookup name"
>
</resource-property>
Table 9.4.
Attribute Description
name The name of the Property (known also as the name of the
meta-data).
store (optional, defaults to yes) If the value of the resource property is going to be stored in the
index.
index (optional, defaults to analyzed) If the value of the resource property is going to be indexed
(searchable). If it does, than controls if the value is going to be
broken down and analyzed (analyzed), or is going to be used as is
(not_analyzed).
boost (optional, defaults to 1.0f) Controls the boost level for the resource property.
analyzer (optional, defaults to the The name of the analyzer that will be used to analyze ANALYZED
resource mapping analyzer decision resource property mappings defined for the given property.
scheme) Defaults to the resource mapping analyzer decision scheme based
on the analyzer set, or the resource-analyzer mapping.
exclude-from-all (optional, default to Excludes the property from participating in the "all" meta-data. If
false) set to no_analyzed, not_analyzed properties will be analyzed
when added to the all property (the analyzer can be controlled using
the analyzer attribute).
override (optional, defaults to true) If there is another definition with the same mapping name, if it will
be overridden or added as additional mapping. Mainly used to
override definitions made in extended mappings.
reverse (optional, defaults to no) The meta-data will have it's value reversed. Can have the values of
no - no reverse will happen, string - the reverse will happen and
the value stored will be a reversed string, and reader - a special
reader will wrap the string and reverse it. The reader option is
more performant, but the store and index settings will be
discarded.
converter (optional) The global converter lookup name registered with the
configuration.
Defines the characteristics of a Resource Property identified by the name mapping. The definition only applies
when using the Resource#addProperty(String name, Object value) operation, and the operation can only
be used with the resourcde-property mapping.
Note that other Resource Property can be added that are not defined in the resource mapping using the
createProperty operation.
9.2.5. resource-analyzer
<resource-analyzer
name="property name"
null-analyzer="analyzer name if value is null"
converter="converter lookup name"
>
</resource-analyzer>
Table 9.5.
Attribute Description
name The name of the Property (known also as the name of the
meta-data).
null-analyzer (optional, defaults to error The name of the analyzer that will be used if the property has the
in case of a null value) null value.
converter (optional) The global converter lookup name registered with the
configuration.
The analyzer resource property mapping, controls the analyzer that will be used when indexing the Resource. If
the mapping is defined, it will override the resource mapping analyzer attribute setting.
If, for example, Compass is configured to have two additional analyzers, called an1 (and have settings in the
form of compass.engine.analyzer.an1.*), and another called an2. The values that the resource property can
hold are: default (which is an internal Compass analyzer, that can be configured as well), an1 and an2. If the
analyzer will have a null value, and it is applicable with the application, a null-analyzer can be configured
that will be used in that case. If the resource property has a value, but there is not matching analyzer, an
exception will be thrown.
9.2.6. resource-boost
Declaring a dynamic property to control the resource boost value using the resource-boost element.
<resource-boost
name="property name"
default="the boost default value when no property value is present"
converter="converter lookup name"
>
</resource-boost>
Table 9.6.
Attribute Description
name The name of the Property (known also as the name of the
meta-data).
default (optional, defaults to 1.0) The default value if the property has a null value.
converter (optional) The global converter lookup name registered with the
configuration.
The boost resource property mapping, controls the boost associated with the Resource created based on the
mapped property. The value of the property should be allowed to be converted to float.
10.1. Introduction
The common meta-data feature of Compass::Core provides a way to externalize the definition of meta-data
names and aliases used in OSEM files, especially useful if your application has a large domain model with
many OSEM files. Another advantage of this mechanism is the ability to add extra information to the meta data
(i.e. a long description) and the ability to specify the format for the meta-data definition, removing the need to
explicitly define formats in the OSEM file (like ...format="yyyy/MM/dd"..).
By centralizing your meta-data, other tools can take advantage of this information and extend this knowledge
(i.e. adding semantic meaning to the data). Compass::Core provides a common meta-data Ant task that
generates a Java class containing constant values of the information described in the Common meta-data file,
allowing programmatic access to this information from within the application (see Library class in sample
application).
Note, the common meta-data support in Compass is completely optional for applications.
<?xml version="1.0"?>
<!DOCTYPE compass-core-meta-data PUBLIC
"-//Compass/Compass Core Meta Data DTD 2.2//EN"
"http://www.compass-project.org/dtd/compass-core-meta-data-2.2.dtd">
<compass-core-meta-data>
</meta-data>
...
</meta-data-group>
</compass-core-meta-data>
<meta-data resource=
"org/compass/sample/library/library.cmd.xml" />
Note: The common meta data reference needs to be BEFORE the mapping files that use them.
To use common meta data within a OSEM file, you use the familiar ${...} label (similar to Ant). An example of
using the common meta data definitions in the mapping file is:
<?xml version="1.0"?>
<!DOCTYPE compass-core-mapping PUBLIC
"-//Compass/Compass Core Mapping DTD 2.2//EN"
"http://www.compass-project.org/dtd/compass-core-mapping-2.2.dtd">
<compass-core-mapping package="org.compass.sample.library">
<constant>
<meta-data>${library.type}</meta-data>
<meta-data-value>${library.type.mdPerson}</meta-data-value>
<meta-data-value>${library.type.mdAuthor}</meta-data-value>
</constant>
<property name="keywords">
<meta-data boost="2">${library.keyword}</meta-data>
</property>
<property name="birthdate">
<meta-data>${library.birthdate}</meta-data>
</property>
</class>
</compass-core-mapping>
The following is a snippet from an ant build script (or maven) which uses the common meta data ant task.
<taskdef name="mdtask"
classname="org.compass.core.metadata.ant.MetaDataTask"
classpathref="classpathhref"/>
<mdtask destdir="${java.src.dir}">
<fileset dir="${java.src.dir}">
<include name="**/*"/>
</fileset>
</mdtask>
11.1. Introduction
As we explained in the overview page, Compass provides an abstraction layer on top of the actual transaction
handling using the CompassTransaction interface. Compass has a transaction handling framework in place to
support different transaction strategies and comes built in with LocalTranasction and JTA synchronization
support.
The CompassTransaction API is completely optional and can be managed by Compass automatically. This
means that once a session is obtained, you can start and work with it (and it will automatically start a
transaction if needed, internally using the CompassTransaction), and then calling CompassSession#commit()
or CompassSession#rollback(). It is usually much simpler to just use the session API.
When using the openSession method, Compass will automatically try and join an already running outer
transaction. An outer transaction can be an already running local Compass transaction, a JTA transaction, a
Hibernate transaction, or a Spring managed transaction. If Compass manages to join an existing outer
transaction, the application does not need to call CompassSession#beginTransaction() or use
CompassTransaction to manage the transaction (since it is already managed). This allows to simplify the usage
of Compass within managed environments (CMT or Spring) where a transaction is already in progress by not
requiring explicit Compass code to manage a Compass transaction. In fact, calling beginTransaction will not
actually begin a transaction in such a case, but will simply join it (with only the rollback method used).
A local transaction which starts within the boundaries of a compass local transaction will share the same
session and transaction context and will be controlled by the outer transaction.
In order to configure
Compass to work with the Local Transaction, you must set the
compass.transaction.factory to org.compass.core.transaction.LocalTransactionFactory.
The support for JTA also includes support for suspend and resume provided by the JTA transaction manager
(or REQUIRES_NEW in CMT when there is already a transaction running).
JTA transaction support is best used when wishing to join with other transactional resources (like DataSource).
The current implementation performs the full transaction commit (first and second phase) at the
afterCompletion method and any exception is logged but not propagated. It can be configured to perform the
commit in the beforeCompletion phase, which is useful when storing the index in the database.
In order to configure Compass to work with the JTA Sync Transaction, you must set the
compass.transaction.factory to org.compass.core.transaction.JTASyncTransactionFactory. You can
also set the transaction manager lookup based on the environment your application will be running at (Compass
will try to automatically identify it).
11.5. XA Transaction
Compass provides support for JTA transactions by enlisting an XAResource with a currently active
Transaction. This allows for Compass to participate in a two phase commit process. A JTA transaction will be
joined if already started (by CMT for example) or will be started if non was initiated.
The support for JTA also includes support for suspend and resume provided by the JTA transaction manager
(or REQUIRES_NEW in CMT when there is already a transaction running).
The XA support provided allows for proper two phase commit transaction operations, but do not provide a full
implementation such as a JCA implementation (mostly for recovery).
In order to configure Compass to work with the JTA XA Transaction, you must set the
compass.transaction.factory to org.compass.core.transaction.XATransactionFactory. You can also
set the transaction manager lookup based on the environment your application will be running at (Compass will
try to automatically identify it).
12.1. Introduction
Lets assume you have download and configured Compass within your application and create some
RSEM/OSEM/XSEM mappings. This section provides the basics of how you will use Compass from within the
application to load, search and delete Compass searchable objects. All operations within Compass are accessed
through the CompassSession interface. The interface provides Object and Resource method API's, giving the
developer the choice to work directly with Compass internal representation (Resource) or application domain
Objects.
When using OSEM and defining cascading on component/reference mappings, Compass will cascade save
operations to the target referenced objects (if they are marked with save cascade). Non root objects are allowed
to be saved in Compass if they have cascading save relationship defined.
load() will throw an exception if no object exists in the index. If you are not sure that there is an object that
maps to the supplied id, use the get method instead.
session.delete(Author.class, 12);
// or :
session.delete(Author.class, new Author(12));
// or :
session.delete(Author.class, "12"); // Everything in the search engine is a String at the end
When using OSEM and defining cascading on component/reference mappings, Compass will cascade delete
operations to the target referenced objects (if they are marked with delete cascade). Non root objects are
allowed to be deleted in Compass if they have cascading save relationship defined. Note, deleting objects by
their id will not cause cascaded relationships to be deleted, only when the actual object is passed to be deleted,
with the relationships initialized (the object can be loaded from the search engine).
12.5. Searching
For a quick way to query the index, use the find() method. The find() method returns a CompassHits object,
which is an interface which encapsulates the search results. For more control over how the query will executed,
use the CompassQuery interface, explained later in the section.
The free text query string has a specific syntax. The syntax is the same one Lucene uses, and is summarized
here:
Table 12.1.
jack london (jack AND london) Contains the term jack and london in the default search field
jack OR london Contains the term jack or london, or both, in the default search
field
+jack +london (jack AND london) Contains both jack and london in the default search field
name:jack -city:london (name:jack Have jack in the name property and don't have london in the city
AND NOT city:london) property
name:"jack london" Contains the exact phrase jack london in the name property
name:"jack london"~5 Contain the term jack and london within five positions of one
another
birthday:[1870/01/01 TO 1920/01/01] Have the birthday values between the specified values. Note that it
is a lexicography range
The default search can be controlled using the Compass::Core configuration parameters and defaults to all
meta-data.
Compass simplifies the usage of range queries when working with dates and numbers. When using numbers it
is preferred to store the number if a lexicography correct value (such as 00001, usually using the format
attribute). When using range queries, Compass allows to execute the following query: value:[1 TO 3] and
internally Compass will automatically translate it to value:[0001 TO 0003].
When using dates, Compass allows to use several different formats for the same property. The format of the
Date object should be sortable in order to perform range queries. This means, for example, that the format
attribute should be: format="yyyy-MM-dd". This allows for range queries such as: date:[1980-01-01 TO
1985-01-01] to work. Compass also allows to use different formats for range queries. It can be configured
within the format configuration: format="yyyy-MM-dd||dd-MM-yyyy" (the first format is the one used to store
the String). And now the following range query can be executed: date:[01-01-1980 TO 01-01-1985].
Compass also allows for math like date formats using the now keyword. For example: "now+1year" will
translate to a date with a year from now. For more information please refer to the DateMathParser javadoc.
All the search results are accessible using the CompassHits interface. It provides an efficient access to the
search results and will only hit the index for "hit number N" when requested. Results are ordered by relevance
(if no sorting is provided), in other words and by how well each resource matches the query.
CompassHits can only be used within a transactional context, if hits are needed to be accessed outside of a
transactional context (like in a jsp view page), they have to be "detached", using one of CompassHits#detch
methods. The detached hits are of type COmpassDetachedHits, and it is guaranteed that the index will not be
accessed by any operation of the detached hits. CompassHits and CompassDetachedHits both share the same
operations interface called CompassHitsOperations.
The following table lists the different CompassHitsOperations methods (note that there are many more, please
view the javadoc):
Table 12.2.
Method Description
Compass::Core comes with the CompassQueryBuilder interface, which provides programmatic API for
building a query. The query builder creates a CompassQuery which can than be used to add sorting and
executing the query.
Using the CompassQueryBuilder, simple queries can be created (i.e. eq, between, prefix, fuzzy), and more
complex query builders can be created as well (such as a boolean query, multi-phrase, and query string).
The following code shows how to use a query string query builder and using the CompassQuery add sorting to
the result.
Another example for building a query that requires the name to be jack, and the familyName not to be london:
CompassQuery can also be created using the Compass instance, without the need to construct a CompassSession.
They can then stored and used safely by multiple sessions (in a multi threaded environment) by attaching them
to the current session using CompassQuery#attach(CompssSession) API.
Note that sorted resource properties / meta-data must be stored and not_analyzed. Also sorting requires more
memory to keep sorting properties available. For numeric types, each property sorted requires four bytes to be
cached for each resource in the index. For String types, each unique term needs to be cached.
When a query is built, most of the queries can accept an Object as a parameter, and the name part can be more
than just a simple string value of the meta-data / resource-property. If we take the following mapping for
example:
<property name="familyName">
<meta-data>family-name</meta-data>
</property>
<property name="date">
<meta-data converter-param="YYYYMMDD">date-sem</meta-data>
</property>
</class>
The mapping defines a simple class mapping, with a simple string property called familyName and a date
property called date. With the CompassQueryBuilder, most of the queries can directly work with either level of
the mappings. Here are some samples:
// The following search will use the class property meta-data id, which in this case
// is the first one (family-name). If there was another meta-data with the family-name value,
// the internal meta-data that is created will be used ($/a/familyName).
CompassHits hits = queryBuilder.term("a.familyName", "london").hits();
// Here, we provide the Date object as a parameter, the query builder will use the
// converter framework to convert the value (and use the given parameter)
CompassHits hits = queryBuilder.term("a.date.date-sem", new Date()).hits();
When using query strings and query parsers, Compass enhances Lucene query parser to support custom formats
(for dates and numbers, for example) as well as support dot path notation. The query:
a.familyname.family-name:london will result in a query matching on familyName to london as well as
wrapping the query with one that will only match the a alias.
Compass allows to easily get all the terms (possible values) for a property / meta-data name and their respective
frequencies. This can be used to build a frequency based list of terms showing how popular are different tags
(as different blogging sites do for example). Here is a simple example of how it can be used:
12.5.6. CompassSearchHelper
Compass provides a simple search helper providing support for pagination and automatic hits detach. The
search helper can be used mainly to simplify search results display and can be easily integrated with different
MVC frameworks. CompassSearchHelper is thread safe. Here is an example of how it can be used:
12.5.7. CompassHighlighter
Compass::Core comes with the CompassHighlighter interface. It provides ways to highlight matched text
fragments based on a query executed. The following code fragment shows a simple usage of the highlighter
functionality (please consult the javadoc for more information):
Highlighting can only be used with CompassHits, which operations can only be used within a transactional
context. When working with pure hits results, CompassHits can be detached, and then used outside of a
transactional context, the question is: what can be done with highlighting?
Each highlighting operation (as seen in the previous code) is also cached within the hits object. When detaching
the hits, the cache is passed to the detached hits, which can then be used outside of a transaction. Here is an
example:
Built on top of the general support for common meta-data, provided by Compass::Core, Compass::Vocabulary
provides both a set of common meta data xml definitions files (*.cmd.xml) and the compiled Java version of
them (using the common meta-data ant task).
Compass::Gps provides an API for registering GPS devices and controlling their lifecycle, along with a set of
base classes that implement popular data accessing technologies (i.e JDBC, JDO, Hibernate ORM and OJB).
Developers can create their own GPS Device's simply, extending the capability of Compass::Gps.
15.1. Overview
Compass Gps provides integration with different indexable data sources using two interfaces: CompassGps and
CompassGpsDevice. Both interfaces are very abstract, since different data sources are usually different in the
way they work or the API they expose.
A device is considered to be any type of indexable data source imaginable, from a database (maybe through the
use of an ORM mapping tool), file system, ftp site, or a web site.
The main contract that a device is required to provide is the ability to index it's data (using the index()
operation). You can think of it as batch indexing the datasource data, providing access for future search queries.
An additional possible operation that a device can implement is mirror data changes, either actively or
passively.
Compass Gps is built on top of Compass Core module, utilizing all it's features such as transactions (including
the important batch_insert level for batch indexing), OSEM, and the simple API that comes with Compass
Core.
When performing the index operation, it is very important NOT to perform it within an already running
transaction. For LocalTransactionFactory, no outer LocalTransaction should be started. For
JTATransactionFactory, no JTA transaction must be started, or no CMT transaction defined for the method
level (on EJB Session Bean for example). For SpringSyncTransactionFactory, no spring transaction should
be wrapping the index code, and the executing method should not be wrapped with a transaction (using
transaction proxy for example).
15.2. CompassGps
CompassGps is the main interface within the Compass Gps module. It holds a list of CompassGpsDevices, and
manages their lifecycle.
CompassGpsInterfaceDevice is an extension of CompassGps, and provides the needed abstration between the
Compass instance/s and the given devices. Every implementation of a CompassGps must also implement the
CompassGpsInterfaceDevice. Compass Gps module comes with two implementations of CompassGps:
15.2.1. SingleCompassGps
Holds a single Compass instance. The Compass instance is used for both the index operation and the mirror
operation. When executing the index operation Single Compass Gps will clone the provided Compass instance.
Additional or overriding settings can be provided using indexSettings. By default, default overriding settings
are: batch_insert as transaction isolation mode, and disabling of any cascading operations (as they usually do
not make sense for index operations). A prime example for overriding setting of the index operation can be
when using a database as the index storage, but define a file based storage for the index operation (the index
will be built on the file system and then copied to the database).
When calling the index operation on the SingleCompassGps, it will gracefully replace the current index
(pointed by the initialized single Compass instance), with the content of the index operation. Gracefully means
that while the index operation is executing and building a temporary index, no write operations will be allowed
on the actual index, and while the actual index is replaced by the temporary index, no read operations are
allowed as well.
15.2.2. DualCompassGps
Holds two Compass instances. One, called indexCompass is responsible for index operation. The other, called
mirrorCompass is responsible for mirror operations. The main reason why we have two different instances is
because the transaction isolation level can greatly affect the performance of each operation. Usually the
indexCompass instance will be configured with the batch_insert isolation level, while the mirrorCompass
instance will use the default transaction isolation level (read_committed).
When calling the index operation on the DualCompassGps, it will gracefully replace the mirror index (pointed
by the initialized mirrorCompass instance), with the content of the index index (pointed by the initialized
indexCompass instance). Gracefully means that while the index operation is executing and building the index,
no write operations will be allowed on the mirror index, and while the mirror index is replaced by the index, no
read operations are allowed as well.
Both implementations of CompassGps allow to set / override settings of the Compass that will be responsible for
the index process. One sample of using the feature which might yield performance improvements can be when
storing the index within a database. The indexing process can be done on the local file system (on a temporary
location), in a compound format (or non compound format), by setting the indexing compass connection setting
to point to a file system location. Both implementations will perform "hot replace" of the file system index into
the database location, automatically compounding / uncompounding based on the settings of both the index and
the mirror compass instances.
15.3. CompassGpsDevice
A Gps devices must implement the CompassGpsDevice interface in order to provide device indexing. It is
responsible for interacting with a data source and reflecting it's data in the Compass index. Two examples of
devices are a file system and a database, accessed through the use of a ORM tool (like Hibernate).
A device will provide the ability to index the data source (using the index() operation), which usually means
iterating through the device data and indexing it. It might also provide "real time" monitoring of changes in the
device, and applying them to the index as well.
A CompassGpsDevice cannot operate standalone, and must be a part of a CompassGps instance (even if we have
only one device), since the device requires the Compass instance(s) in order to apply the changes to the index.
Each device has a name associated with it. A device name must be unique across all the devices within a single
CompassGps instance.
15.3.1. MirrorDataChangesGpsDevice
As mentioned, the main operation in CompassGpsDevice is index(), which is responsible for batch indexing all
the relevant data in the data source. Gps devices that can mirror real time data changes made to the data source
by implementing the MirrorDataChangesGpsDevice interface (which extends the CompassGpsDevice interface).
There are two types of devices for mirroring data. ActiveMirrorGpsDevice provides data mirroring of the
datasource by explicit programmatic calls to performMirroring. PassiveMirrorGpsDevice is a GPS device
that gets notified of data changes made to the data source, and does not require user intervention in order to
reflect data changes to the compass index.
The following code snippet shows how to configure Compass Gps as well as managing it's lifecycle.
gps.start();
....
....
//on application shutdown
gps.stop();
The first step during the parallel device startup (start operation) is to ask its derived class for its indexable
entities (the parallel device support defines an index entity as an entity "template" about to be indexed
associated with a name and a set of sub indexes). In our case, the following are the indexed entities:
Then, still during the startup process, the index entities are partitioned using an IndexEntitiesPartitioner
implementation. The default (and the only one provided built in) is the SubIndexIndexEntitiesPartitioner
that partitions the entities based on their sub index allocation (this is also usually the best partitioning possible,
as locking is performed on the sub index level). Here are the index entities partitioned:
During the index operation, a ParallelIndexExecutor implementation will then execute the index operation
using the partitioned index entities, and an IndexEntitiesIndexer implementation (which is provided by the
derived class). The default implementation is ConcurrentParallelIndexExecutor which creates N threads
during the index operation based on the number of partitioned entities and then executes the index process in
parallel on the partitioned index entities. In our case, the following diagram shows the index process:
Compass also comes with a simple SameThreadParallelIndexExecutor which basically uses the same thread
of execution to execute the index operation sequentially.
If you wish to perform real time mirroring of data changes from the data source to the index, you can controll
the lifecycle of the mirroring using the start() and stop() operations, and must implement either the
ActiveMirrorGpsDevice or the PassiveMirrorGpsDevice interfaces.
Compass::Gps comes with a set of base classes for gps devices that can help the development of new gps
devices.
16.1. Introduction
The Jdbc Gps Device provides support for database indexing through the use of JDBC. The Jdbc device maps a
Jdbc ResultSet to a set of Compass Resources (sharing the same resource mapping). Each Resource maps one
to one with a ResultSet row. The Jdbc device can hold multiple ResultSet to Resource mappings. The Jdbc
Gps device class is ResultSetJdbcGpsDevice. The core configuration is the mapping definitions of a Jdbc
ResultSet and a Compass Resource.
The Jdbc Gps device does not use OSEM, since no POJOs are defined that map the ResultSet to objects. For
applications that use ORM tools, Compass::Gps provides several devices that integrate with popular ORM tools
such as Hibernate, JDO, and OJB. For more information about Compass Resource, Resource Property and
resource mapping, please read the Search Engine and Resource Mapping sections.
The Jdbc Gps device also provides support for ActiveMirrorGpsDevice, meaning that data changes done to the
database can be automatically detected by the defined mappings and device.
For the rest of the chapter, we will use the following database tables:
The PARENT.ID is the primary key of the PARENT table, and the CHILD.ID is the primary key of the CHILD
table. There is a one to many relationship between PARENT and child using the CHILD.PARENT_ID column.
The VERSION columns will be explained later, as they are used for the data changes mirroring option.
16.2. Mapping
To enable the Jdbc device to index a database, a set of mappings must be defined between the database and the
compass index. The main mapping definition maps a generic Jdbc ResultSet to a set of Compass Resources
that are defined by a specific Resource Mapping definitions. The mapping can be configured either at database
ResultSet or Table levels. ResultSetToResourceMapping maps generic select SQL (returning a ResultSet)
and TableToResourceMapping (extends the ResultSetToResourceMapping) simply maps database tables.
The following code sample shows how to configure a single ResultSet that combines both the PARENT and
CHILD tables into a single resource mapping with an alias called "result-set".
mapping.setAlias("result-set");
mapping.setSelectQuery("select "
+ "p.id as parent_id, p.first_name as parent_first_name, p.last_name as parent_last_name, "
+ "c.id as child_id, c.first_name as child_first_name, c.last_name child_last_name "
+ "from parent p left join child c on p.id = c.parent_id");
// maps from a parent_id column to a resource property named parent-id
mapping.addIdMapping(new IdColumnToPropertyMapping("parent_id", "parent-id"));
// maps from a child_id column to a resource property named child-id
mapping.addIdMapping(new IdColumnToPropertyMapping("child_id", "child-id"));
mapping.addDataMapping(new DataColumnToPropertyMapping("parent_first_name", "parent-first-name"));
mapping.addDataMapping(new DataColumnToPropertyMapping("parent_first_name", "first-name"));
mapping.addDataMapping(new DataColumnToPropertyMapping("child_first_name", "child-first-name"));
mapping.addDataMapping(new DataColumnToPropertyMapping("child_first_name", "first-name"));
Here, we defined a mapping from a ResultSet that combines both the PARENT table and the CHILD table
into a single set of Resources. Note also in the above example how "parent_first_name" is mapped to multiple
alias names, allowing searches to be performed on either the specific attribute type or the more general
"first_name".
The required settings for the ResultSetToResourceMapping are the alias name of the Resource that will be
created, the select query that generates the ResultSet, and the ids columns mapping (at least one must be
defined) that maps to the columns the uniquely identifies the rows in the ResultSet.
In the above sample, the two columns that identifies a row for the given select query, are the parent_id and the
child_id. They are mapped to the parent-id and child-id property names respectively.
Mapping data columns using the DataColumnToPropertyMapping provides mapping from "data" columns into
searchable meta-data (Resource Property). As mentioned, you can control the property name and it's
characteristics. Mapping data columns is optional, though mapping none makes little sense.
ResultSetToResourceMapping has the option to index all the unmapped columns of the ResultSet by setting
the indexUnMappedColumns property to true. The meta-datas that will be created will have the property name
set to the column name.
The above code defined the table mappings. One mapping for the PARENT table to the "parent" alias, and one
for the CHILD table to the "child" alias. The mappings definitions are much simpler than the
ResultSetToResourceMapping, with only the table name and the alias required. Since the mapping works
against a database table, the id columns can be auto generated (based on the table primary keys, and the
property names are the same as the column names), and the select query (based on the table name). Note that
the mapping will auto generate only settings that have not been set. If for example the select query is set, it will
not be generated.
The following code sample shows how to configure a mirroring enabled ResultSet mapping:
There are three additions to the previously configured result set mapping. The first is the version query that will
be executed in order to identify changes made to the result set (rows created, updated, or deleted). The version
query should return the ResultSet id and version columns. The second change is the id columns names in the
select query, since a dynamic where clause is added to the select query for mirroring purposes. The last one is
the actual version column mapping (no version column mapping automatically disabled the mirroring feature).
The following code sample shows how to configure a mirroring enabled Table mapping:
Again, the table mapping is much simpler than the result set mapping. The only thing that needs to be added is
the version column mapping. The version query is automatically generated.
The mirroring operation works with snapshots. Snapshots are taken when the index() or the
performMirroring() are called and represents the latest ResultSet state.
GPS devices are Inversion Of Control / Dependency Injection enabled, meaning that it can be configured with
an IOC container. For an example of configuring the ResultSetJdbcGpsDevice, please see Spring Jdbc Gps
Device section.
17.1. Introduction
Compass allows for embedded integration with Hibernate and Hibernate JPA. Using simple configuration,
Compass will automatically perform mirroring operations (mirroring changes done through Hibernate to the
search engine), as well as allow to simply index the content of the database using Hibernate.
The integration involves few simple steps. The first is enabling Embedded Compass within Hibernate. If
Hibernate Annotations or Hibernate EntityManager (JPA) are used, just dropping Compass jar file to the
classpath will enable it (make sure you don't have Hibernate Search in the classpath, as it uses the same event
class name :) ). If Hibernate Core is used, the following event listeners need to be configured:
<hibernate-configuration>
<session-factory>
<event type="post-update">
<listener class="org.compass.gps.device.hibernate.embedded.CompassEventListener"/>
</event>
<event type="post-insert">
<listener class="org.compass.gps.device.hibernate.embedded.CompassEventListener"/>
</event>
<event type="post-delete">
<listener class="org.compass.gps.device.hibernate.embedded.CompassEventListener"/>
</event>
<event type="post-collection-recreate">
<listener class="org.compass.gps.device.hibernate.embedded.CompassEventListener"/>
</event>
<event type="post-collection-remove">
<listener class="org.compass.gps.device.hibernate.embedded.CompassEventListener"/>
</event>
<event type="post-collection-update">
<listener class="org.compass.gps.device.hibernate.embedded.CompassEventListener"/>
</event>
</session-factory>
</hibernate-configuration>
Now that Compass is enabled with Hibernate there is one required Compass property in order to configure it
which is the location of where the search engine index will be stored. This is configured as a Hibernate property
configuration using the key compass.engine.connection (for example, having the value file://tmp/index).
When it is configured, Compass will automatically use the mapped Hibernate classes and check if one of them
is searchable. If there is at least one, then the listener will be enabled. That is it!. Now, every operation done
using Hibernate will be mirrored to the search engine.
Direct access to the Compass (for example to execute search operations), either the HibernateHelper (when
using pure Hibernate) or HibernateJpaHelper (when using Hibernate JPA) can be used to access it. For
example:
tr.commit();
session.close();
In order to completely reindex the content of the database based on both the Hibernate and Compass mappings,
17.2. Configuration
The basic configuration of embedded Hibernate is explained in the introduction section. Within the Hibernate
(or JPA persistance xml) configuration, the Compass instance used for mirroring and searching can be
configured using Compass usual properties (using the compass. prefix). If configuring Compass using external
configuration is needed, the compass.hibernate.config can be used to point to Compass configuration file.
An implementation of HibernateMirrorFilter can also be configured in order to allow for filtering out
specific objects from the index (for example, based on their specific content). The
compass.hibernate.mirrorFilter property should be configured having the fully qualified class name of the
mirroring filter implementation.
The Compass instance created automatically for the indexing operation can be also configured using specific
properties. This properties should have the prefix of gps.index.. This is usually configured to have specific
parameters for the indexing Compass, for example, having a different index storage location for it while
indexing.
18.1. Introduction
The Hibernate Gps Device provides support for database indexing through the use of Hibernate ORM
mappings. If your application uses Hibernate, it couldn't be easier to integrate Compass into your application
(Sometimes with no code attached - see the petclinic sample).
Hibernate Gps Device utilizes Compass::Core OSEM feature (Object to Search Engine Mappings) and
Hibernate ORM feature (Object to Relational Mappings) to provide simple database indexing. As well as
Hibernate 3 new event based system to provide real time mirroring of data changes done through Hibernate.
The path data travels through the system is: Database -- Hibernate -- Objects -- Compass::Gps --
Compass::Core (Search Engine).
Hibernate Gps Device extends Compass Gps AbstractParallelGpsDevice and supports parallel index
operations. It is discussed in more detail here: Section 15.5, “Parallel Device”.
18.2. Configuration
When configuring the Hibernate device, one must instantiate HibernateGpsDevice. After instantiating the
device, it must be initialized with a Hibernate SessionFactory.
In order to register event listener with Hibernate SessionFactory, the actual instance of the session factory
need to be obtained. The Hibernate device allows for a pluggable NativeHibernateExtractor implementation
responsible for extracting the actual instance. Compass comes with a default implementation when working
within a Spring environment called: SpringNativeHibernateExtractor.
18.2.1.1. Configuration
When configuring the Hibernate device, one need to instantiate HibernateGpsDevice. The device requires the
Hibernate SessionFactory and a logical "name".
gps.addDevice(hibernateDevice);
.... // configure other devices
gps.start();
The indexing process is pluggable and Compass comes with two implementations. The first,
PaginationHibernateIndexEntitiesIndexer, uses setFirstResult and setMaxResults in order to perform
pagination. The second one, ScrollableHibernateIndexEntitiesIndexer, uses Hibernate scrollable resultset
in order to index the data. The default indexer used is the scrollable indexer.
During the indexing process Compass will execute a default query which will fetch all the relevant data from
the database using Hibernate. The query itself can be controlled both by setting a static sql query and providing
a query provider. This setting applies per entity. Note, when using the scrollable indexer, it is preferable to use
a custom query provider that will return specific Hibernate Criteria instead of using static sql query.
An important point when configuring the hibernate device is that both the application and the hibernate device
must use the same SessionFactory.
If using Hibernate and the Spring Framework, please see the SpringHibernate3GpsDevice
18.5. HibernateSyncTransaction
Compass integrates with Hibernate transaction synchronization services. This means that whichever Hibernate
transaction management is used (Jta, JDBC, ...) you are using, the HibernateSyncTransaction will
synchronize with the transaction upon transaction completion. The Hibernate transaction support uses
Hibernate context session in order to obtain the current session and the current transaction. The application
using this feature must also use Hibernate context session (which is the preferred Hibernate usage model
starting from Hibernate 3.2).
If you are using the HibernateSyncTransaction, a Hibernate based transaction must already be started in order
for HibernateSyncTransaction to join. If no transaction is started, Compass can start one (and will commit it
eventually). Note, if you are using other transaction management abstraction (such as Spring), it is preferable to
use it instead of this transaction factory.
In order to configure Compass to work with the HiberanteSyncTransaction, you must set the
compass.transaction.factory to
org.compass.gps.device.hiberante.transaction.HibernateSyncTransactionFactory. Additional
initialization should be performed by calling HibernateSyncTransactionFactory.setSessionFactory with
Hibernate SessionFactory instance before the Compass is created.
19.1. Introduction
The Jpa Gps Device provides support for database indexing through the use of the Java Persistence API (Jpa),
part of the EJB3 standard. If your application uses Jpa, it couldn't be easier to integrate Compass into your
application.
Jpa Gps Device utilizes Compass::Core OSEM feature (Object to Search Engine Mappings) and Jpa feature
(Object to Relational Mappings) to provide simple database indexing. As well as Jpa support for life-cycle
event based system to provide real time mirroring of data changes done through Jpa (see notes about real time
mirroring later on). The path data travels through the system is: Database -- Jpa (Entity Manager) -- Objects --
Compass::Gps -- Compass::Core (Search Engine).
JPA Gps Device extends Compass Gps AbstractParallelGpsDevice and supports parallel index operations. It
is discussed in more detail here: Section 15.5, “Parallel Device”.
19.2. Configuration
When configuring the Jpa device, one must instantiate JpaGpsDevice. After instantiating the device, it must be
initialized by an EntityManagerFactory. This is the only required parameter to the JpaGpsDevice. For tighter
integration with the actual implementation of Jpa (i.e. Hibernate), and frameworks that wrap it (i.e. Spring), the
device allows for abstractions on top of it. Each one will be explained in the next sections, though in the spirit
of compass, it already comes with implementations for popular Jpa implementations.
The device performs all it's operations using its EntityManagerWrapper. The Jpa support comes with three
different implementations: JtaEntityManagerWrapper which will only work within a JTA environment,
ResourceLocalEntityManagerWrapper for resource local transactions, and DefaultEntityManagerWrapper
which works with both JTA and resource local environments. The DefaultEntityManagerWrapper is the
default implementation of the EntityManagerWrapper the device will use.
Several frameworks (like Spring) sometimes wrap (proxy) the actual EntityManagerFactory. Some features of
the Jpa device require the actual implementation of the EntityManagerFactory. This features are the ones that
integrate tightly with the implementation of the EntityManagerFactory, which are described later in the
chapter. The device allows to set NativeJpaExtractor, which is responsible for extracting the actual
implementation.
implementations. Compass will index objects (or their matching database tables in the Jpa mappings) specified
in both the Jpa mappings and Compass::Core mappings (OSEM) files.
When indexing Compass::Gps, the Jpa device can be configured with a fetchCount. The fetchCount
parameter controls the pagination process of indexing a class (and it's represented table) so in case of large
tables, the memory level can be controlled.
The device allows to set a JpaEntitiesLocator, which is responsible for extracting all the entities that are
mapped in both Compass and Jpa EntityManager. The default implementation DefaultJpaEntitiesLocator
uses Annotations to determine if a class is mapped to the database. Most of the times, this will suffice, but for
applications that use both annotations and xml definitions, a tighter integration with the Jpa implementation is
required, with a specialized implementation of the locator. Compass comes with several specialized
implementations of a locator, and auto-detect the one to use (defaulting to the default implementation if none is
found). Note, that this is one of the cases where the actual EntityManagerFactory is required, so if the
application is using a framework that wraps the EntityManagerFactory, a NativeJpaExtractor should be
provided (though Compass tries to automatically detect most common framework and extract it automatically).
With several Jpa implementation, Compass can automatically register life-cycle event listeners based on the
actual implementation API's (like Hibernate event listeners support). In order to enable it, the
injectEntityLifecycleListener must be set to true (defaults to false), and an implementation of
JpaEntityLifecycleInjector can be provided. Compass can auto-detect a proper injector based on the
currently provided internal injector implementations. The auto-detection will happen if no implementation for
the injector is provided, and the inject flag is set to true. Note, that this is one of the cases where the actual
EntityManagerFactory is required, so if the application is using a framework that wraps the
EntityManagerFactory, a NativeJpaExtractor should be provided.
An important point when configuring the Jpa device is that both the application and the Jpa device must use the
same EntityManagerFactory.
20.1. Introduction
Compass has "native" integration with OpenJPA by working in an "embedded" mode within it. OpenJPA can
be used with Chapter 19, JPA (Java Persistence API) and Compass has specific indexer and lifecycle for it, but
Compass can also work from within OpenJPA and have OpenJPA control Compass creation and configuration.
Embedded Compass OpenJPA integration provides support for database indexing and mirroring through the
use of the OpenJPA, an implementation of the EJB3 standard.
Compass OpenJPA integration utilizes Compass::Core OSEM feature (Object to Search Engine Mappings) and
Jpa feature (Object to Relational Mappings) to provide simple database indexing. As well as OpenJPA support
for life-cycle event based system to provide real time mirroring of data changes done through Jpa (see notes
about real time mirroring later on). The path data travels through the system is: Database -- Jpa (Entity
Manager) -- Objects -- Compass::Gps -- Compass::Core (Search Engine).
The Compass OpenJPA uses under the cover Chapter 19, JPA (Java Persistence API) and all configuration
options apply when using it. The JPA Gps Device extends Compass Gps AbstractParallelGpsDevice and
supports parallel index operations. It is discussed in more detail here: Section 15.5, “Parallel Device”.
20.2. Configuration
Configuration of Embedded Compass OpenJPA integration is done within the persistence xml file (or
programmatic Map configuration) using Compass support for properties based configuration. Here is the most
simplest example of enabling Compass within OpenJPA (note, just having Compass jars within the classpath
enable it!):
<persistence xmlns="http://java.sun.com/xml/ns/persistence"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/persistence persistence_1_0.xsd" version="1.0">
<!-- This will enable Comapss, this is also the single Compass configuration required --
<property name="compass.engine.connection" value="target/test-index" />
</properties>
</persistence-unit
</persistence>
OpenJPAHelper.getCompassGps(entityManagerFactory).index();
Specific configuration for the Compass index instance can be done using gps.index.compass. prefix.
Internally the CompassGps implementation used is SingleCompassGps.
Several special properties can also be used. The first, compass.openjpa.reindexOnStartup (defaults to false)
will cause Compass to reindex the database when it starts up. Another important configuration option is
compass.openjpa.indexQuery.[entity name/class] which allows to plug a custom query string for
indexing.
21.1. Introduction
Compass allows for embedded integration with TopLink Essentials. Using simple configuration, Compass will
automatically perform mirroring operations (mirroring changes done through TopLink to the search engine), as
well as allow to simply index the content of the database using TopLink.
The integration involves few simple steps. The first is enabling Embedded Compass within TopLink. Within
the persistence configuration (or when passing properties) a custom Compass TopLink session customizer
needs to be defined:
Now that Compass is enabled with TopLink there is one required Compass property in order to configure it
which is the location of where the search engine index will be stored. This is configured as a Persistence Unit
property configuration using the key compass.engine.connection (for example, having the value
file://tmp/index). When it is configured, Compass will automatically use the mapped TopLink classes and
check if one of them is searchable. If there is at least one, then the it will be enabled. That is it!. Now, every
operation done using TopLink will be mirrored to the search engine.
Direct access to Compass (for example to execute search operations), can be done using TopLinkHelper. For
example:
tr.commit();
session.close();
In order to completely reindex the content of the database based on both the TopLink and Compass mappings, a
Compass Gps can be accessed. Here is an example of how to do it:
21.2. Configuration
The basic configuration of embedded TopLink Essentials is explained in the introduction section. Within the
persistence configuration, the Compass instance used for mirroring and searching can be configured using
Compass usual properties (using the compass. prefix). If configuring Compass using external configuration is
needed, the compass.toplink.config can be used to point to Compass configuration file.
The Compass instance created automatically for the indexing operation can be also configured using specific
properties. This properties should have the prefix of gps.index.. This is usually configured to have specific
parameters for the indexing Compass, for example, having a different index storage location for it while
indexing.
22.1. Introduction
Compass allows for embedded integration with EclipseLink. Using simple configuration, Compass will
automatically perform mirroring operations (mirroring changes done through EcliseLink to the search engine),
as well as allow to simply index the content of the database using EclipseLink.
The integration involves few simple steps. The first is enabling Embedded Compass within EclipseLink. Within
the persistence configuration (or when passing properties) a custom Compass EclipseLink session customizer
needs to be defined:
Now that Compass is enabled with EclipseLink there is one required Compass property in order to configure it
which is the location of where the search engine index will be stored. This is configured as a Persistence Unit
property configuration using the key compass.engine.connection (for example, having the value
file://tmp/index). When it is configured, Compass will automatically use the mapped EclipseLink classes
and check if one of them is searchable. If there is at least one, then the it will be enabled. That is it!. Now, every
operation done using EclipseLink will be mirrored to the search engine.
Direct access to Compass (for example to execute search operations), can be done using EclipseLinkHelper.
For example:
tr.commit();
session.close();
In order to completely reindex the content of the database based on both the EclipseLink and Compass
mappings, a Compass Gps can be accessed. Here is an example of how to do it:
22.2. Configuration
The basic configuration of embedded EclipseLink is explained in the introduction section. Within the
persistence configuration, the Compass instance used for mirroring and searching can be configured using
Compass usual properties (using the compass. prefix). If configuring Compass using external configuration is
needed, the compass.eclipselink.config can be used to point to Compass configuration file.
The Compass instance created automatically for the indexing operation can be also configured using specific
properties. This properties should have the prefix of gps.index.. This is usually configured to have specific
parameters for the indexing Compass, for example, having a different index storage location for it while
indexing.
23.1. Introduction
The SqlMapClient (iBatis) Gps Device provides support for database indexing through the use of Apache iBatis
ORM mappings. The device can index the database data using a set of configured select statements. Mirroring
is not supported, but if Spring is used, Compass::Spring AOP can be simply used to add advices that will mirror
data changes that are made using iBatis DAOs.
23.3. Configuration
Here is a code sample of how to configure the SqlMapClient (iBatis) device:
24.1. Overview
Compass::Spring aim is to provide closer integration with the springframework. The following list summarizes
the main integration points with Spring.
• Support for a Compass level factory bean, with Spring IOC modelled configuration options.
• Compass DAO level support (similar to the ORM dao support), with transaction integration and Compass
DAO support class.
• An extension on top of Spring's Hibernate 3 dao support which extends Compass::Gps Hibernate 3 device.
Handles Spring proxing of the Hibernate SessionFactory.
• An extension on top of Spring's OJB dao support which extends Compass::Gps OJB device. Mainly
provides non programmatic configuration with OJB.
• Extension to Spring MVC, providing Search controller (based on Compass::Core search capabilities) and an
Index controller (based on Compass::Gps index operation).
<beans>
...
<bean id="compass"
class="org.compass.spring.LocalCompassBean">
<property name="resourceLocations">
<list>
<value>classpath:org/compass/spring/test/A.cpm.xml</value>
</list>
</property>
<property name="compassSettings">
<props>
<prop key="compass.engine.connection">
target/testindex
</prop>
<!-- This is the default transaction handling
(just explicitly setting it) -->
<prop key="compass.transaction.factory">
org.compass.core.transaction.LocalTransactionFactory
</prop>
</props>
</property>
</bean>
...
</beans>
If using a Spring PlatformTransactionManager, you should also initialize the transactionManager property
of the LocalCompassBean.
Also, of storing the index within a database, be sure to set the dataSource property of the LocalCompassBean.
It will be automatically wrapped by Spring's TransactionAwareDataSourceProxy if not wrapped already.
When using Compass code within an already managed code (within a transaction), it is enough to just use
Compass#openSession(), without worrying about Compass transaction management code, or even closing the
session. Since even opening the session should not be really required, a LocalCompassSessionBean can be used
to directly inject CompassSession to be used. It can be initialized with a Compass instance, but if there is only
one within Spring application context, it will automatically identify and use it (this feature is similar to
@CompassContext annotation explained later).
Compass also supports @CompassContext annotations to inject either Compass instance or CompassSession
instance. The annotation can be used on either a class field or on a property setter. In order to inject the
annotation, the bean org.compass.spring.support.CompassContextBeanPostProcessor need to be added to
the bean configuration. If Spring 2 new schema based support is used, compass:context can be used.
Compass Spring integration also supports Spring 2 new schema based configuration. Using Compass own
schema definition, the configuration of a Compass instance can be embedded within a Spring beans schema
based configuration. Here is an example of using the new schema based configuration:
<!-- A direct LocalCompassSessionBean, used with code within a transaciton context -->
<compass:session id="sess" />
</beans>
Compass::Spring provides a simple base class called CompassDaoSupport which can be initialized by Compass
or CompassTemplate and provides access to CompassTemplate from it's subclasses.
The following is an example of configuring the above Library DAO in the XML application context (assuming
that we configured a LocalCompassBean named "compass" previously:
<beans>
<bean id="libraryCompass" class="LibraryCompassDao">
<property name="compass">
<ref local="compass" />
</property>
</bean>
</beans>
26.1. Introduction
Compass::Spring integrates with Spring transaction management in several ways, either using Compass::Core
own LocalTransaction or using the Spring transaction synchronization services. Currently there is no Compass
implementation of Spring's PlatformTransactionManagement.
26.2. LocalTransaction
Compass::Core default transaction handling is LocalTransaction. A LocalTransaction does not integrate with
Spring transaction management services, but can be used to write Compass Dao beans that do not require
integration with an on going Spring or Jta transactions.
26.3. JTASyncTransaction
When using Spring's JtaTransactionManager, you have a choice to either use the SpringSycnTransaction
(described next) or the JTASyncTransaction provided by Compass::Core (where SpringSyncTransaction is
preferable).
26.4. SpringSyncTransaction
Compass::Spring integrates with Spring transaction synchronization services. This means that whichever
Spring transaction manager (Jta, Hiberante, ...) you are using, the SpringSyncTransaction will synchronize
with the transaction upon transaction completion.
If you are using the SpringSyncTransaction, a Spring based transaction must already be started in order for
SpringSyncTransaction to join. If no transcation is started, Compass can start one (and will commit it
eventually) if the PlatformTransactionManager is provided to the LocalCompassBean. The transaction must
support the transaction synchronization feature (which by default all of them do).
Note: you can use spring transaction management support to suspend and resumed transactions. In which case a
Compass provided transaction will be suspended and resumed also.
In order to configure Compass to work with the SpringSyncTransaction, you must set the
compass.transaction.factory to org.compass.spring.transaction.SpringSyncTransactionFactory.
26.5. CompassTransactionManager
Currently Compass::Spring does not provide a CompassTransactionManager. This means any CompassDao
objects with LocalTransaction, programmatic (Spring transction template) / declarative (Spring
Interceptor/AOP transaction support) Spring transaction definition won't be applied to the Compass transaction.
27.2. Introduction
The device is built on top of Spring ORM support for Hiberante 3, and Compass::Gps support for Hibernate 3
device. It provides support for Spring generation of Hibernate SessionFactory proxy.
27.3. SpringHibernate3GpsDevice
An extension of the Hibernate3GpsDevice that can handle Spring's proxing the Hibernate SessionFactory in
order to register event listeners for real time data changes mirroring.
28.1. Introduction
This section provides no additional implementation, only samples of using Jdbc Gps Device within Spring IOC
container.
The database structure is the same one as the one on the Jdbc Gps Device section, and is show here as well:
<beans>
<bean class="org.compass.gps.device.jdbc.mapping.IdColumnToPropertyMapping">
<property name="columnName"><value>child_id</value></property>
<property name="propertyName"><value>child_id</value></property>
<property name="columnNameForVersion"><value>COALESCE(c.id, 0)</value></property>
</bean>
</list>
</property>
<property name="dataMappings">
<list>
<bean class="org.compass.gps.device.jdbc.mapping.DataColumnToPropertyMapping">
<property name="columnName"><value>parent_first_name</value></property>
<property name="propertyName"><value>parent_first_name</value></property>
</bean>
<bean class="org.compass.gps.device.jdbc.mapping.DataColumnToPropertyMapping">
<property name="columnName"><value>child_first_name</value></property>
<property name="propertyName"><value>child_first_name</value></property>
<property name="propertyStoreString"><value>compress</value></property>
</bean>
</list>
</property>
<property name="versionMappings">
<list>
<bean class="org.compass.gps.device.jdbc.mapping.VersionColumnMapping">
<property name="columnName"><value>parent_version</value></property>
</bean>
<bean class="org.compass.gps.device.jdbc.mapping.VersionColumnMapping">
<property name="columnName"><value>child_version</value></property>
</bean>
</list>
</property>
</bean>
<!-- Compass-->
<bean id="compass" class="org.compass.spring.LocalCompassBean">
<property name="mappingResolvers">
<list>
<bean class="org.compass.gps.device.jdbc.ResultSetResourceMappingResolver">
<property name="mapping"><ref local="rsMapping" /></property>
<property name="dataSource"><ref bean="dataSource" /></property>
</bean>
</list>
</property>
<property name="compassSettings">
<props>
<prop key="compass.engine.connection">target/testindex</prop>
<!-- This is the default transaction handling (just explicitly setting it) -->
<prop key="compass.transaction.factory">
org.compass.core.transaction.LocalTransactionFactory
</prop>
</props>
</property>
</bean>
</beans>
<beans>
<!-- Compass-->
<bean id="compass" class="org.compass.spring.LocalCompassBean">
<property name="mappingResolvers">
<list>
<bean class="org.compass.gps.device.jdbc.ResultSetResourceMappingResolver">
<property name="mapping"><ref local="parentMapping" /></property>
<property name="dataSource"><ref bean="dataSource" /></property>
</bean>
<bean class="org.compass.gps.device.jdbc.ResultSetResourceMappingResolver">
<property name="mapping"><ref local="childMapping" /></property>
<property name="dataSource"><ref bean="dataSource" /></property>
</bean>
</list>
</property>
<property name="compassSettings">
<props>
<prop key="compass.engine.connection">target/testindex</prop>
<!-- This is the default transaction handling (just explicitly setting it) -->
<prop key="compass.transaction.factory">
org.compass.core.transaction.LocalTransactionFactory
</prop>
</props>
</property>
</bean>
<property name="path"><value>target/testindex/snapshot</value></property>
</bean>
</property>
</bean>
</beans>
29.1. Introduction
Compass provides a set of Spring AOP Advices which helps to mirror data changes done within a Spring
powered application. For applications with a data source or a tool with no gps device that works with it (or it
does not have mirroring capabilities - like iBatis), the mirror advices can make synchronizing changes made to
the data source and Compass index simpler.
29.2. Advices
The AOP support
comes with three advices: CompassCreateAdvice, CompassSaveAdvice, and
CompassDeleteAdvice. They can create, save, or delete a data Object respectively. The advices are of type
AfterReturningAdvice, and will persist the change to the index after the method proxied/adviced returns.
The data object that will be used to be created/saved/deleted can be one of the adviced method parameters
(using the parameterIndex property), or it's return value (setting the useReturnValue to true).
<beans>
...
</bean>
...
</beans>
<beans>
...
class="org.springframework.transaction.interceptor.TransactionProxyFactoryBean">
<property name="transactionManager"><ref bean="transactionManager"/></property>
<property name="transactionAttributes">
<props>
<prop key="create*">PROPAGATION_REQUIRED</prop>
<prop key="save*">PROPAGATION_REQUIRED</prop>
<prop key="delete*">PROPAGATION_REQUIRED</prop>
<prop key="*">PROPAGATION_REQUIRED,readOnly</prop>
</props>
</property>
</bean>
...
</beans>
30.1. Introduction
Compass::Spring provides helper and support classes that build and integrate with Spring web MVC support. It
has several base class controller helpers, as well as search and index controllers.
The controller has two views to be set. The indexView is the view that holds the screen which initiates the
index operation, and the indexResultsView, which shows the results of the index operation.
The results of the index operation will be saved under the indexResultsName, which defaults to "indexResults".
The results use the CompassIndexResults class.
The Controller performs the search operation on the Compass instance using the query supplied by the
CompassSearchCommand. The command holds the query that will be executed, as well as the page number
(using the pagination feature).
If you wish to enable the pagination feature, you must set the pageSize property on the controller, as well as
providing the page number property on the CompassSearchCommand.
The controller has two views to be set, the searchView, which is the view that holds the screen which the user
will initiate the search operation, and the searchResultsView, which will show the results of the search
operation (they can be the same page).
The results of the search operation will be saved under the searchResultsName, which defaults to
"searchResults". The results use the CompassSearchResults class.
Note, that if using the SpringSyncTransactionFactory, the transactionManager must be set. Since when
using the spring sync transaction setting, a spring managed transactions must be in progress already. The
controller will start a transcation using the given transaction manager.
31.1. Overview
The Compass Needle GigaSpaces integration allows to store a Lucene index within GigaSpaces. It also allows
to automatically index the data grid using Compass OSEM support and mirror changes done to the data grid
into the search engine.
On top of storing the index on GigaSpaces cluster, the integration allows for two automatic indexing and search
integration. The first integrates with GigaSpaces mirror service which allows to replicate in a reliable
asyncronous manner changes that occur on the space to an external data source implemented by Compass
(which, in turn, can store the index on another Space). Searching in this scenario is simple as another Compass
instance that connects to the remote Space where the index is stored can be created for search purposes.
The seconds integration allows to perform a collocated indexing where each Space cluster member will have an
embedded indexing service that will index changes done on the Space using notifications and store the index
only on its local embedded Space. Searching is done using OpenSpaces executor remoting allowing to
broadcast the search query to all partitions.
In the above example we created a directory on top of GigaSpace's Space with an index named "test". The
directory can now be used to create Lucene IndexWriter and IndexSearcher.
The Lucene directory interface represents a virtual file system. Implementing it on top of the Space is done by
breaking files into a file header, called FileEntry and one or more FileBucketEntry. The FileEntry holds the
meta data of the file, for example, its size and timestamp, while the FileBucketEntry holds a bucket size of the
actual file content. The bucket size can be controlled when constructing the GigaSpaceDirectory, but note that
it must not be changed if connecting to an existing index.
Note, it is preferable to configure the directory not to use the compound index format as it yields better
performance (note, by default, when using Compass, the non compound format will be used). Also, the merge
factor for the directory (also applies to Compass optimizers) should be set to a higher value (than the default
10) since it mainly applies to file based optimizations.
The GigaSpaces integration can also use GigaSpaces just as a distributed lock manager without the need to
actually store the index on GigaSpaces. The GigaSpaceLockFactory can be used for it.
<compass name="default">
<connection>
<space indexName="test" url="jini://*/*/mySpace"/>
</connection>
</compass>
compass.engine.connection=space://test:jini://*/*/mySpace
By default, when using GigaSpaces as the Compass store, the index will be in an uncompound file format. It
will also automatically be configured with an expiration time based index deletion policy so multiple clients
will work correctly.
Compass can also be configured just to used GigaSpaces as a distributed lock manager without the need to
actually store the index on GigaSpaces (note that when configuring GigaSpaces as the actual store, the
GigaSpaces lock factory will be used by default). Here is how it can be configured:
compass.engine.store.lockFactory.type=org.compass.needle.gigaspaces.store.GigaSpaceLockFactoryProvider
compass.engine.store.lockFactory.path=jini://*/*/mySpace?groups=kimchy
The GigaSpaces integration comes with a built in external data source that can be used with GigaSpaces Mirror
Service. Basically, a mirror allows to mirror changes done to the Space (data grid) into the search engine in a
reliable asynchronous manner.
The following is an example of how it can be configured within a mirror processing unit (for more information
see here):
</props>
</property>
</bean>
The above configuration will mirror any changes done in the data grid into the search engine through the
Compass instance. It will, further more, connect and store the index content on a specific Space called blog.
The integrations builds on top of GigaSpaces OpenSpaces build in components. On the "server" side (the
processing unit that starts up the relevant space cluster members), the following configuration should be added:
<os-remoting:service-exporter id="serviceExporter">
<os-remoting:service ref="searchService"/>
</os-remoting:service-exporter>
<os-core:template>
<bean class="com.mycompany.model.Order"/>
</os-core:template>
<os-events:listener ref="indexService"/>
</os-events:notify-container>
</beans>
The above configuration starts up an embedded space (the clustering model is chosen during deployment time).
It creates a Compass instance that stores the index on the embedded local space cluster member. It uses the
notify container to listen for notifications of changes happening on the embedded cluster member and index
them using the Compass instance. It also exposes a executor remoting service that implements the
CompassSearchService interface. The service performs a search using Compass on top of the collocated space
and loads objects back from the collocated space. Note, all operations are done in memory in a collocated
manner.
On the client side, both programmatic and Spring based configuration can be used. Here is an example of the
spring based configuration:
The above configuration connects to a remote space. It creates the client side search service that also
implements CompassSearchService. The search operation is automatically broadcast to all relevant partitions,
performed in a collocated manner on them, and then reduced back on the client side.
Note, exposing other more advance forms of queries, or using other Compass APIs (such as term frequencies),
can be done by exposing other remoting services in a similar manner to how the search API is done.
32.1. Overview
The Compass Needle Coherence integration allows to store a Lucene index within Coherence Data Grid.
In the above example we created the invocable Coherence directory on top of Coherence's Data Grid with an
index named "test". The directory can now be used to create Lucene IndexWriter and IndexSearcher.
The Lucene directory interface represents a virtual file system. Implementing it on top of Coherence is done by
breaking files into a file header, called FileEntryKey/FileEntryValue and one or more
FileBucketKey/FileBucketValue. The file header holds the meta data of the file, for example, its size and
timestamp, while the file bucket holds a bucket size of the actual file content. The bucket size can be controlled
when constructing the coherence directory, but note that it must not be changed if connecting to an existing
index.
The DataGridCoherenceDirectory uses coherence features that are supported by all of coherence editions. It
uses coherence lock API and plain Map remove APIs. The InvocableCoherenceDirectory uses coherence
invocation service support allowing to delete files (header and buckets) in parallel (without returning results),
and use FileLockKey existence as an indicator for locking (conditional put) which results in better performance
(for remove operations) and better lock API implementation.
Note, it is preferable to configure the directory not to use the compound index format as it yields better
performance (note, by default, when using Compass, the non compound format will be used). Also, the merge
factor for the directory (also applies to Compass optimizers) should be set to a higher value (than the default
10) since it mainly applies to file based optimizations.
The Coherence integration can also use Coherence just as a distributed lock manager without the need to
actually store the index on Coherence. Either the InvocableCoherenceLockFactory or
DefaultCoherenceLockFactory can be used for it.
<compass name="default">
<connection>
<coherence indexName="test" cacheName="testcache"/>
</connection>
</compass>
compass.engine.connection=coherence://test:testcache
By default, when using Coherence as the Compass store, the index will be in an uncompound file format. It will
also automatically be configured with an expiration time based index deletion policy so multiple clients will
work correctly.
Compass can also be configured just to used Coherence as a distributed lock manager without the need to
actually store the index on Coherence (note that when configuring Cohernece as the actual store, the Coherence
lock factory will be used by default). Here is how it can be configured:
compass.engine.store.lockFactory.type=org.compass.needle.cohernece.InvocableCoherenceLockFactoryProvider
compass.engine.store.lockFactory.path=cacheName
33.1. Overview
The Compass Needle Terracotta integration allows to store a Lucene index in a distributed manner using
Terracotta as well as provide seamless integration with Compass.
Terracotta is a shared memory (referred to as "network attached memory"). The terracotta directory makes use
of that and stores the directory in memory allowing for terracotta to distribute changes of it to all relevant nodes
connected to the terracotta server. The actual content of a "file" in the directory is broken down into one or
more byte arrays, which can be controlled using the bufferSize parameter. Note, once an index is created with a
certain bufferSize, it should not be changed. By default, the buffer size is set to 4096 bytes.
Terracotta will automatically fetch required content from the server, and will evict content if memory
thresholds break for an application. When constructing large files, the directory allows to set a flush rate when
the file content will be flushed (and be allowed to be evicted) during its creation. The formula is that every
bufferSize * flushRate bytes, it will be released by Compass and allow terracotta to move it to the server
and reclaim the memory. The default flush rate is set to 10.
The internal Concurrent Hash Map construction settings can also be controlled. Initial capacity (default to 16 *
10), load factor (default to 0.75), and concurrency level (defaults to 16 * 10).
Note, it is preferable to configure the directory not to use the compound index format as it yields better
performance (note, by default, when using Compass, the non compound format will be used). Also, the merge
factor for the directory (also applies to Compass optimizers) should be set to a higher value (than the default
10) since it mainly applies to file based optimizations.
Another version of Lucene Terracotta Directory, called ManagedTerracottaDirectory is also provided. The
idea behind this directory implementation is to be able to wrap several operations in a single "transaction". The
ManagedTerracottaDirectory is initialized with a ReadWriteLock and any operations using Lucene should be
wrapped with a read lock and unlock operations. The more operations are wrapped, the better the performance
will be, since locking will be more coarse grained (as opposed to the more fine grained, concurrent hash map
based, locking done with the plain TerracottaDirectory).
TC_HOME/modules and it already comes pre-configured with a terracotta configuration of both locks and roots
(terracotta.xml file within the root of the compass jar file). Another option is to tell Terracotta where to look
for more TIMs within the application tc-config file and point it to where the compass jar is located.
Once the TIM is setup, Compass has a special Terracotta connection that allows it to use the
TerracottaDirectory, CSMTerracotaDirectory, or ManagedTerracottaDirectory called
TerracottaDirectoryStore. The TerracottaDirectoryStore is where terracotta is configured to have its root
(note, this is all defined for you already since compass is a TIM).
The type of the terracotta directory used can be controlled using compass.compass.engine.store.tc.type
setting. The setting can have three values: managed (the default), chm and csm. The managed terracotta directory,
creates a logical transaction (using the managed read lock) that is bounded to the Compass transaction. It allows
for much faster operations compared with the plain terracotta directory on expense of lower concurrency. The
chm maps to the plain TerracottaDirectory and the csm maps to the CSMTerracottaDirectory.
compass.engine.connection=tc://myindex
# default values, just showing how it can be configured
compass.engine.store.tc.bufferSize=4096
compass.engine.store.tc.flushRate=10
<compass name="default">
<connection>
<tc indexName="myindex" bufferSize="4096" flushRate="10" />
</connection>
</compass>
The "client application" will need to run using Terracotta bootclasspath configuration, and have the following
in its tc-config.xml:
<clients>
<modules>
<module group-id="org.compass-project" name="compass" version="2.2.0" />
</modules>
</clients>
For more information on how to run it in different ways/environments, please refer to the terracotta
documentation.
In order to enable the Terracotta transaction processor, a setting with the key of
compass.transaction.processor.tc.type should be set to
org.compass.needle.terracotta.transaction.processor.TerracottaTransactionProcessorFactory.
Now, the default transaction processor used can be set to tc (for example, by setting:
compass.transaction.processor). Of course, this setting can be set in runtime on a per session basis.
When setting the above setting, transactions will be processed by the terracotta processor which means that
nothing much will be done except for accumulating transactional changes and putting them as on a shared
queue. By default, a thread per sub index will also be started to process transactional jobs for each sub index.
The threads will pick transaction jobs and index them in a fail-safe, ordered transactional manner.
Total ordering of transactions is maintained by default. This basically means that a dirty operation on a specific
sub index will try and obtain a lock (using Lucene Directory abstraction) called "order.lock" (per sub index).
The lock will be obtained through the duration of the transaction and released when the transaction commits /
rolls back. Ordering of transactions on a sub index level can be disabled by setting
compass.transaction.processor.tc.maintainOrder setting to false. This means that transactions on the
same sub index will not block on each other, but ordering will not be guaranteed.
By default, an indexing node will try and work on all sub indexes (note, it is perfectly fine to have more than
one indexing node working on all sub indexes, they will maintain order and pick jobs as they can). In order to
have a node to only process transactional jobs for certain sub indexes, the following setting
compass.transaction.processor.tc.subIndexes should be set to a comma separated list of the sub indexes
to process. The compass.transaction.processor.tc.aliases can also be used to narrow down the sub
indexes of respective aliases that will be processed. This setting is very handy in cases where the index is stored
on terracotta as well, the index is very large, and maximum collocation of sub index data and processing is
desired.
The processor thread itself (each per sub index), once it identifies that there is a transaction job to be processed,
will try and get more transactional jobs (in a non blocking manner) for better utilization of an already opened
IndexWriter. By default, it will try to process up to 5 more transactional jobs, and can be configured using
compass.transaction.processor.tc.nonBlockingBatchJobSize setting.
When a transaction commits by one of the client nodes, it will not be immediately visible for search operations.
It will be visible only after the actual node that will process the transaction has done so, and the cache
invalidation interval has kicked in to identify that the shared index has changed and the new index needs to be
reloaded (happens in the background when using Compass). The cache invalidation interval (how often
Compass will check if the index has changed) can be set using the following setting:
compass.engine.cacheIntervalInvalidation.
CompassSession and CompassIndexSession provides the flushCommit operation. The operation, when used
with the tc transaction processor, means that all the changes accumulated up to this point will be passed to be
processed (similar to commit) except that the session is still open for additional changes. This allows, for long
running indexing sessions, to periodically flush and commit the changes (otherwise memory consumption will
continue to grow) instead of committing and closing the current session, and opening a new session.
34.1. Introduction
Compass::Samples [library] is a basic example, that highlights the main features of Compass::Core. The
application contains a small library domain model, containing Author, Article and Book Objects.
You can find most of Compass::Core features used within the library sample, such as OSEM and Common
Metadata. It executes as a unit test, using JUnit and can be used to search a predefined set of data. Modify the
LibraryTests class to add your own test data and experiment with how easy it is to work with Compass.
Enjoy.
Table 34.1.
Target Description
usage (also the default target) Lists all the available targets.
test Runs the tests defined in the LibraryTests, also compiles sample
(see the compile target).
compile Compiles the tests and the source code into the build/classes
directory. Also executes the common meta data task to generate the
Library class out of the library.cmd.xml file into the source
directory.
35.1. Introduction
Compass::Samples [petclinic] is the Spring petclinic sample powered by Compass. The main aim of the sample
is to show how simple it is to add compass to an application, especially if one of the frameworks the application
uses is one of the ones compass seamlessly integrates with.
Integrating compass into the petclinic sample, did not require any Java code to be written. Although several
unit tests were added (good programming practice). Integration consisted of extending the Spring configuration
files and writing a search and index jsp pages. The following sections will explain how integration was
achieved.
The Compass petclinic sample shows how to integrate Compass with Spring and other frameworks. An
important note, of course, is that Compass can be integrated with applications that do not use the Spring
framework. Although Spring does make integration a bit simpler (and building applications much simpler).
Table 35.1.
Target Description
usage (also the default target) Lists all the available targets.
As we explained in the Common Meta-data section, Common meta-data provides a global lookup mechanism
for meta-data and alias definitions. It integrates with OSEM definitions and Gps::Jdbc mappings, externalising
(and centralising) the actual semantic lookup keys that will be stored in the index. It also provides an Ant task
to provides a constant Java class definitions for all the common meta-data definitions which can be used by
Java application to lookup and store Compass Resource.
Defining a common meta-data definition is an optional step when integrating Compass. Though taking the time
and creating one can provides valuable information and centralisation of the system (or systems) semantic
definitions.
In the petclinic sample, the Common meta-data file is located in the org.compass.sample.petclinic package,
and is called petclinic.cmd.xml. A fragment of it is shown here:
<?xml version="1.0"?>
<!DOCTYPE compass-core-meta-data PUBLIC
"-//Compass/Compass Core Meta Data DTD 2.2//EN"
"http://www.compass-project.org/dtd/compass-core-meta-data-2.2.dtd">
<compass-core-meta-data>
</meta-data-group>
</compass-core-meta-data>
The above fragment of the common meta-data definitions, declares an alias called vet and meta-data called
birthdate. The birthdate meta-data example shows one of the benefits of using common meta-data. The format
of the date field is defined in the meta-data, instead of defining it in every mapping of birtdate meta-data (in
OSEM for example).
One of the features of the search engine abstraction layer is the use of Resource and Property. As well as
simple and minimal Resource Mapping definitions.
Although it is not directly used, the Jdbc implementation of the data access layer uses Search Engine API,
based on Resources and resource mappings (the Jdbc device of Compass::Gps can automatically generate
them).
35.3.3. OSEM
One of the main features of Compass is OSEM (Object / Search Engine Mapping), and it is heavily used in the
petclinic sample. OSEM maps Java objects (domain model) to the underlying search engine using simple
mapping definitions.
The petclinic sample uses most of the features provided by OSEM, among them are: contract, with mappings
defined for the Entity, NamedEntity, and Person (all are "abstract" domain definitions), Cyclic references are
defined (for example between pet and owner), and many more. The OSEM definitions can be found at the
petclinic.cpm.xml file.
The main concern with the data access layer (and Compass) is to synchronise each data model change made
with Compass search engine index. Compass provides integration support for indexing the data using any of the
actual implementation of the data access layer.
35.4.1. Hibernate
Compass::Gps comes with the Hibernate device. The device can index the data mapped by Hibernate, and
mirror any data changes made by Hibernate to the search engine index. Since we are using Hibernate with
Spring, the device used is the Spring Hibernate device.
The integration uses the OSEM definitions, working with Compass object level API to interact with the
underlying search engine. The spring application context bean definitions for the compass (required by the
Hibernate Gps device) instance is defined with OSEM definitions and spring based transaction support. The
applicationContext-hibernate.xml in the test package, and the applicationContext-hibernate.xml in the
WEB-INF directory define all the required definitions to work with hibernate and compass. Note, that only the
mentioned configuration has to be created in order to integrate compass to the data access layer.
35.4.2. JDBC
Compass::Gps comes with the JDBC device. The Jdbc device can connect to a database using jdbc, and based
on different mappings defentions, index it's content and mirror any data changes. When using the Jdbc device,
the mapping is made on the Resource level (cannot use OSEM).
It is important to understand the different options for integrating Compass for a Jdbc (or a Jdbc helper
framework like Spring or iBatis) data access implementation. If the system has no domain model, than
Resource level API and mapping must be used. The Jdbc device can automate most of the actions needed to
index and mirror the database. If the system has a domain model (such as the petclinic sample), two options are
available: working on the Resource level and again using the Jdbc device, or using OSEM definitions, and
plumb Compass calls to the data access API's (i.e. save the Vet in compass when the Vet is saved to the
database). In the petclinic sample, the Jdbc device option was taken in order to demonstrate the Jdbc device
usage. An API level solution should be simple, especially if the system has decent and centralized data access
layer (which in our case it does).
The integration uses the Jdbc Gps Device mapping definitions and works with Compass object level API to
work with the search engine. The spring application context bean definitions for the compass (required by the
Jdbc Gps device) instance are defined with Jdbc mapping resolvers, and Local transactions. The
applicationContext-jdbc.xml in the test package, and the applicationContext-jdbc.xml in the WEB-INF
directory define all the required definitions to work with jdbc and compass. Note, that only the mentioned
configuration has to be created in order to integrate compass to the data access layer.
The petclinic sample using the Jdbc Gps Device and defines several Jdbc mappings to the database. Some of
the mappings use the more complex Result Set mappings (for mappings that require a join operation) and some
use the simple Table mapping. The mapping definitions uses the common meta-data to lookup the actual
meta-data values.
Note, that an important change made to the original petclinic sample was the addition the Version column. The
version column is needed for automatic data mirroring (some databases, like Oracle, provides a "version
column" by default).
The Resource mapping definition are automatically generated using mapping resolvers, and compass use them.
Note, that the Jdbc support currently only works with Hsql database (since the sql queries used in the Result Set
mappings use hsql functions).
The only thing required when using the Compass and Spring mvc integration is writing the view layer for the
search / index operations. These are the index.jsp, search.jsp and serachResource.jsp Jstl view based files.
The index.jsp is responsible for both initiating the index operation for CompassGps (and it's controlled
devices), as well as displaying the results for the index operation.
The search.jsp and the searchResource.jsp are responsible for initiating the search operation as well as
displaying the results. The difference between them is the search.jsp works with OSEM enabled petclinic
(when using Hibernate or Apache OJB), and the searchResource.jsp works with resource mapping and
resource level petclinic (when using Jdbc).
Note, that when using Jdbc, remember to change the views.proeprties file under the WEB-INF/classes
directory and have both the searchView.url and the searchResultsView.url referring to
searchResource.jsp view. And when using either Hibernate or OJB, change it to point to search.jsp.
Note, that configuring Compass is simpler when using a schema based configuration file. But in its core, all of
Compass configuration is driven by the following settings. You can use only settings to configure Compass
(either programatically or using the Compass configuration based on DTD).
A.1.1. compass.engine.connection
Table A.1.
Connection Description
file:// prefix or no prefix The path to the file system based index path, using default file
handling. This is a JVM level setting for all the file based prefixes.
mmap:// prefix Uses Java 1.4 nio MMAp class. Considered slower than the general
file system one, but might have memory benefits (according to the
Lucene documentation). This is a JVM level setting for all the file
based prefixes.
ram:// prefix Creates a memory based index, follows the Compass life-cycle.
Created when the Compass is created, and disposed when Compass
is closed.
jdbc:// prefix Holds the Jdbc url or Jndi (based on the DataSourceProvider
configured). Allows storing the index within a database. This
setting requires additional mandatory settings, please refer to the
Search Engine Jdbc section. It is very IMPORTANT to read the
Search Engine Jdbc section, especially in term of performance
considerations.
A.1.2. JNDI
Table A.2.
Setting Description
compass.name The name that Compass will be registered under. Note that you can
specify it at the XML configuration file with a name attribute at the
Setting Description
compass element.
A.1.3. Property
Table A.3.
Setting Description
compass.property.alias The name of the "alias" property that Compass will use (a
property that holds the alias property value of a resource).
Defaults to alias (set it only if one of your mapped meta
data is called alias).
compass.property.extendedAlias The name of the property that extended aliased (if exists) of
a given Resource will be stored. This allows for poly alias
queries where one can query on a "base" alias, and get all
the aliases the are extending it. Defaults to extendedAlias
(set it only if one of your mapped meta data is called
extendedAlias).
compass.property.all The name of the "all" property that Compass will use (a
property that accumulates all the properties/meta-data).
Defaults to all (set it only if one of your mapped meta data
is called all). Note that it can be overriden in the mapping
files.
compass.property.all.termVector (defaults to The default setting for the term vector of the all property.
no) Can be one of no, yes, positions, offsets, or
positions_offsets.
Compass supports several transaction processors. More information about them can be found in the Search
Engine chapter.
Table A.4.
read_committed The same read committed from data base systems. As fast for read
only transactions.
Please read more about how Compass::Core implements it's transaction management in the Search Engine
section.
When using the Compass::Core transaction API, you must specify a factory class for the CompassTransaction
instances. This is done by setting the property compass.transaction.factory. The CompassTransaction API
hides the underlying transaction mechanism, allowing Compass::Core code to run in a managed and
non-managed environments. The two standard strategies are:
Table A.5.
org.compass.core. Manages a local transaction which does not interact with other
transaction.LocalTransactionFactory transaction mechanisms.
org.compass.core. Uses the JTA synchronization support to synchronize with the JTA
transaction.JTASyncTransactionFactory transaction (not the same as two phase commit, meaning that if the
transaction fails, the other resources that participate in the
transaction will not roll back). If there is no existing JTA
transaction, a new one will be started.
Although the J2EE specification does not provide a standard way to reference a JTA TransactionManager, to
register with a transaction synchronization service, Compass provides several lookups which can be set with a
compass.transaction.managerLookup setting (thanks hibernate!). The setting is not required since Compass
will try to auto-detect the JTA environment.
Table A.6.
org.compass.core.transaction.manager.JBoss JBoss
org.compass.core.transaction.manager.Weblogic Weblogic
org.compass.core.transaction.manager.WebSphere WebSphere
org.compass.core.transaction.manager.Orion Orion
org.compass.core.transaction.manager.JOTM JOTM
org.compass.core.transaction.manager.JOnaAS JOnAS
org.compass.core.transaction.manager.JRun4 JRun4
org.compass.core.transaction.manager.BEST Borland ES
The JTA transaction mechanism will use the JNDI configuration to lookup the JTA UserTransaction. The
transaction manager lookup provides the JNDI name, but if you wish to set it yourself, you can set the
compass.transaction.userTransactionName setting. Also, the UserTransaction will be cached by default
(fetched from JNDI on Compass startup), the caching can be controlled by
compass.transaction.cacheUserTransaction.
Property accessors are used for reading and writing Class properties. Compass comes with two
implementations, field and property. field is used for directly accessing the Class property, and property is used
for accessing the class property using the property getters/setters. Compass allows for registration of custom
PropertyAccessor implementations under a lookup name, as well as changing the default property accessor
used (which is property).
The configuration uses Compass support for group properties, with the compass.propertyAccessor group
prefix. The name the property accessor will be registered under is the group name. In order to set the default
property accessor, the default group name should be used.
Setting Description
A.1.7. Converters
Compass uses converters to convert all the different OSEM mappings into Resources. Compass comes with a
set of default converters that should be sufficient for most applications, but does allow the extendibility to
define custom converters for all aspects related to marshaling Objects and Mappings (Compass internal
mapping definitions) into a search engine.
Compass uses a registry of Converters. All Converters are registered under a registry name (converter lookup
name). Compass registers all it's default Converters under lookup names (which allows for changing the default
converters settings), and allows for registration of custom Converters.
The following lists all the default Converters that comes with Compass. The lookup name is the lookup name
the Converter will be registered under, the Converter class is Compass implementation of the Converter, and
the Converter Type acts as shorthand string for the Converter implementation (can be used with the
compass.converter.[converter name].type instead of the fully qualified class name). The default mapping
converters are responsible for converting the meta-data mapping definitions.
byte[] primitivebytearray
org.compass.core.converter. primitivebytearray
Stores the content
extended.PrimitiveByteArrayConverter of the byte array
without performing
any other search
related operations.
org.compass.core.mapping. classIdPropertyMapping
org.compass.core.converter.
osem.ClassIdPropertyMapping mapping.ClassPropertyMappingConverter
org.compass.core.mapping. classPropertyMapping
org.compass.core.converter.
osem.ClassPropertyMapping mapping.ClassPropertyMappingConverter
org.compass.core.mapping. componentMapping
org.compass.core.converter.
osem.ComponentMapping mapping.ComponentMappingConverter
org.compass.core.mapping. collectionMappingorg.compass.core.converter.
osem.CollectionMapping mapping.CollectionMappingConverter
Defining a new converter can be done using Compass support for group settings. compass.converter is the
prefix for the group. In order to define new converter that will be registered under the "converter name" lookup,
the compass.converter.[converter name] setting prefix should be used. The following lists all the settings
that can apply to a converter definition.
Setting Description
compass.converter.[converter Compass pools the formatters for greater performance. The value of
name].format.minPoolSize the minimum pool size. Defaults to 4.
compass.converter.[converter Compass pools the formatters for greater performance. The value of
name].format.maxPoolSize the maximum pool size. Defaults to 20.
Note, that any other setting can be defined after the compass.converter.[converter name]. If the Converter
implements the org.compass.core.config.CompassConfigurable, it will be injected with the settings for the
converter. The converter will get all the settings, with settings names without the
compass.converter.[converter name] prefix.
For example, defining a new Date converter with a specific format can be done by setting two settings:
compass.converter.mydate.type=date (same as
compass.converter.mydate.type=org.compass.core.converter.basic.DateConverter), and
compass.converter.mydate.format=yyyy-HH-dd. The converter will be registered under the "mydate"
converter lookup name. It can than be used as a lookup name in the OSEM definitions.
In order to change the default converters, simply define a setting with the [converter name] of the default
converter that comes with compass. For example, in order to override the format of all the dates in the system
to "yyyy-HH-dd", simple set: compass.converter.date.format=yyyy-HH-dd.
Setting Description
compass.engine.all.analyzer The name of the analyzer to use for the all property (see the next
section about Search Engine Analyzers).
compass.transaction.lockDir The directory where the search engine will maintain it's locking file
mechanism for inter and outer process transaction synchronization.
Defaults to the java.io.tmpdir Java system property. This is a
JVM level property.
compass.transaction.lockTimeout The amount of time a transaction will wait in order to obtain it's
specific lock (in seconds). Defaults to 10 seconds.
compass.transaction.lockPollInterval The interval that the transaction will check to see if it can obtain the
lock (in milliseconds). Defaults to 100 milliseconds. This is a JVM
level proeprty.
compass.engine.optimizer.type The fully qualified class name of the search engine optimizer that
will be used. Defaults to org.compass.core.lucene.engine.
optimizer.AdaptiveOptimizer. Please see the following section
for a list of optimizers.
compass.engine.optimizer. The period that the optimizer will check if the index need
schedule.period optimization, and if it does, optimize it (in seconds, can be a float
number). Defaults to 10 seconds. The setting applies if the
Setting Description
optimizer is scheduled.
compass.engine.optimizer. Determines if the schedule will run in a fixed rate or not. If it is set
schedule.fixedRate to false each execution is scheduled relative to the actual
execution of the previous execution. If it is set to true each
execution is scheduled relative to the execution time of the initial
execution.
compass.engine.optimizer. For the adaptive optimizer, determines how often the optimizer will
adaptive.mergeFactor optimize the index. With small values, the faster the searches will
be, but the more often that the index will be optimized. Larger
values will result in slower searches, and less optimizations.
compass.engine.optimizer. For the aggressive optimizer, determines how often the optimizer
aggressive.mergeFactor will optimize the index. With small values, the faster the searches
will be, but the more often that the index will be optimized. Larger
values will result in slower searches, and less optimizations.
compass.engine.mergeFactor With smaller values, less RAM is used, but indexing is slower.
With larger values, more RAM is used, and the indexing speed is
faster. Defaults to 10.
compass.engine.maxBufferedDeletedTerms
Determines the minimal number of delete terms required before the
buffered in-memory delete terms are applied and flushed. If there
are documents buffered in memory at the time, they are merged and
a new segment is created. Disabled by default (writer flushes by
RAM usage).
compass.engine.ramBufferSize Determines the amount of RAM that may be used for buffering
added documents before they are flushed as a new Segment.
Generally for faster indexing performance it's best to flush by RAM
usage instead of document count and use as large a RAM buffer as
you can. When this is set, the writer will flush whenever buffered
documents use this much RAM. Pass in -1 to prevent triggering a
flush due to RAM usage. Note that if flushing by document count is
also enabled, then the flush will be triggered by whichever comes
first. The default value is 16 (M).
compass.engine.termIndexInterval Expert: Set the interval between indexed terms. Large values cause
less memory to be used by IndexReader, but slow random-access to
terms. Small values cause more memory to be used by an
IndexReader, and speed random-access to terms. This parameter
determines the amount of computation required per query term,
regardless of the number of documents that contain that term. In
particular, it is the maximum number of other terms that must be
Setting Description
compass.engine.maxFieldLength The number of terms that will be indexed for a single property in a
resource. This limits the amount of memory required for indexing,
so that collections with very large resources will not crash the
indexing process by running out of memory. Note, that this
effectively truncates large resources, excluding from the index
terms that occur further in the resource. Defaults to 10,000 terms.
compass.engine.useCompoundFile Turn on (true) or off (false) the use of compound files. If used
lowers the number of files open, but have very small performance
overhead. Defaults to true. Note, when compass starts up, it will
validate that the current index structure maps the configured setting,
and if it is not, it will automatically try and convert it to the correct
structure.
compass.engine.cacheIntervalInvalidationSets how often (in milliseconds) the index manager will check if
the index cache needs to be invalidated. Defaults to 5000
milliseconds. Setting it to 0 means that the cache will check if it
needs to be invalidated all the time. Setting it to -1 means that the
cache will not check the index for invalidation, it is perfectly fine if
a single instance is working with the index, since the cache is
automatically invalidated upon a dirty operation.
compass.engine.indexManagerScheduleInterval
The index manager schedule interval (in seconds) where different
actions related to index manager will happen (such as global cache
interval invalidation checks - see
SearchEngineIndexManager#notifyAllToClearCache and
SearchEngineIndexManager#checkAndClearIfNotifiedAllToClearCache).
Defaults to 60 seconds.
compass.engine.waitForCacheInvalidationOnIndexOperation
Defaults to false. If set to true, will cause the index manager
operation (including replace index) to wait for all other compass
instances to invalidate their cache. The time to wait will be the
indexManagerScheduledInterval configuration setting.
The following section lists the different optimizers that are available with Compass::Core. Note that all the
optimizers can be scheduled or not.
Table A.12.
Optimizer Description
Optimizer Description
optimizer.AdaptiveOptimizer the segments will be merged from the last segment, until a segment
with a higher resource count will be encountered.
Compass allows storing the index in a database using Jdbc. When using Jdbc storage, additional settings are
mandatory except for the connection setting. The value after the Jdbc:// prefix in the
compass.engine.connection setting can be the Jdbc url connection or the Jndi name of the DataSource,
depending on the configured DataSourceProvider.
It is important also to read the Jdbc Directory Appendix. Two sections that should be read are the supported
dialects, and the performance considerations (especially the compound structure).
Setting Description
compass.engine.store.jdbc. dialect Optional. The fully qualified class name of the dialect (the database
type) that the index will be stored at. Please refer to Lucene Jdbc
Directory appendix for a list of the currently supported dialects. If
not set, Compass will try to auto-detect it based on the Database
meta-data.
Setting Description
org.compass.core.lucene.engine
.store.jdbc.DriverManagerDataSourceProvider (Poor
performance). Please refer to next section for a list of the available
providers.
compass.engine.store.jdbc. Optional (defaults to 50). The size (charecters) of the name column.
ddl.name.size
compass.engine.store.jdbc. Optional (defaults to 500 * 1000 K). The size (in K) of the value
ddl.value.size column. Only applies to databases that require it.
compass.engine.store.jdbc. Optional (defaults to lf_). The name of the last modified column.
ddl.lastModified.name
DriverManagerDataSourceProvider The default data source provider. Creates a simple DataSource that
returns a new Connection for each request. Performs very poorly,
and should not be used.
JndiDataSourceProvider Gets a DataSource from JNDI. The JNDI name is the value after
the jdbc:// prefix in Compass connection setting.
ExternalDataSourceProvider A data source provider that can use an externally configured data
source. In order to set the external DataSource to be used, the
ExternalDataSourceProvider#setDataSource(DataSource)
static method needs to be called before the Compass instance if
created.
Setting Description
Configuring the Jdbc store with Compass also allows defining FileEntryHandler settings for different file
entries in the database. FileEntryHandlers are explained in the Lucene Jdbc Directory appendix (and require
some Lucene knowledge). The Lucene Jdbc Directory implementation already comes with sensible defaults,
but they can be changed using Compass configuration.
One of the things that come free with Compass it automatically using the more performant
FetchPerTransactoinJdbcIndexInput if possible (based on the dialect). Special care need to be taken when
using the mentioned index input, and it is done automatically by Compass.
Setting file entry handlers is done using the following setting prefix: compass.engine.store.jdbc.fe.[name].
The name can be either __default__ which is used for all unmapped files, it can be the full name of the file
stored, or the suffix of the file (the last 3 charecters). Some of the currently supported settings are:
Setting Description
compass.engine.store.jdbc.fe. The fully qualified class name of the file entry handler.
[name].type
compass.engine.store.jdbc.fe. The RAM buffer size of the index input. Note, it applies only to
[name].indexInput.bufferSize some of the IndexInput implementations.
compass.engine.store.jdbc.fe. The RAM buffer size of the index output. Note, it applies only to
[name].indexOutput.bufferSize some of the IndexOutput implementations.
compass.engine.store.jdbc.fe. The threshold value (in bytes) after which data will be temporarly
[name].indexOutput.threshold written to a file (and them dumped into the database). Applies when
using RAMAndFileJdbcIndexOutput (which is the default one).
Defaults to 16 * 1024 bytes.
With Compass, multiple Analyzers can be defined (each under a different analyzer name) and than referenced
in the configuration and mapping definitions. Compass defines two internal analyzers names called: default
and search. The default analyzer is the one used when no other analyzer can be found, it defaults to the
standard analyzer with English stop words. The search is the analyzer used to analyze search queries, and if
not set, defaults to the default analyzer (Note that the search analyzer can also be set using the CompassQuery
API). Changing the settings for the default analyzer can be done using the
compass.engine.analyzer.default.* settings (as explained in the next table). Setting the search analyzer (so
it will differ from the default analyzer) can be done using the compass.engine.analyzer.search.* settings.
Also, you can set a list of filter to be applied to the given analyzers, please see the next section of how to
configure analyzer filters, especially the synonym one.
Setting Description
compass.engine.analyzer.[analyzer The type of the search engine analyzer, please see the available
name].type analyzers types later in the section.
compass.engine.analyzer.[analyzer A comma separated list of stop words to use with the chosen
name].stopwords analyzer. If the string starts with +, the list of stop-words will be
added to the default set of stop words defined for the analyzer.
Note, that not all the analyzers type support this feature.
Setting Description
name].factory is not enough to configure your analyzer, use it to define the fully
qualified class name of your analyzer factory which implements
LuceneAnalyzerFactory class.
Compass comes with core analyzers (Which are part of the lucene-core jar). They are: standard, simple,
whitespace, and stop. See the Analyzers Section.
Compass also allows simple configuration of the snowball analyzer type (which comes with the
lucene-snowball jar). An additional setting that must be set when using the snowball analyzer, is the
compass.engine.analyzer.[analyzer name].name setting. The settings can have the following values:
Danish, Dutch, English, Finnish, French, German, German2, Italian, Kp, Lovins, Norwegian, Porter,
Portuguese, Russian, Spanish, and Swedish.
Another set of analyer types comes with the lucene-analyzers jar. They are: brazilian, cjk, chinese, czech,
german, greek, french, dutch, and russian.
You can specify a set of analyzer filters that can then be applied to all the different analyzers configured. It uses
the group settings, so setting the analyzer filter need to be prefixed with compass.engine.analyzerfilter, and
the value after it is the analyzer filter name, and then the setting for the analyzer filter.
Filters are provided for simpler support for additional filtering (or enrichment) of analyzed streams, without the
hassle of creating your own analyzer. Also, filters, can be shared across different analyzers, potentially having
different analyzer types.
Table A.18.
Setting Description
compass.engine.analyzerfilter.[analyzer The type of the search engine analyzer filter provider, must
filter name].type implement the
org.compass.core.lucene.engine.analyzer.LuceneAnalyzerTokenFilterPro
interface. Can also be the value synonym, which will automatically
map to the
org.compass.core.lucene.engine.analyzer.synonym.SynonymAnalyzerToken
class.
compass.engine.analyzerfilter.[analyzer Only applies for synonym filters. The class that implements the
filter name].lookup org.compass.core.lucene.engine.analyzer.synonym.SynonymLookupProvide
for providing synonyms for a given term.
With Compass, multiple Highlighters can be defined (each under a different highlighter name) and than
referenced when using CompassHighlighter. Within Compass, an internal default highlighter is defined, and
can be configured when using default as the highlighter name.
Table A.19.
Setting Description
compass.engine.highlighter.[highlighter Optional (default to 3). Sets the maximum number of fragments that
name].maxNumFragments will be returned.
compass.engine.highlighter.[highlighter Optional (default to simple). The type of the fragmenter that will
name].fragmenter.type be used, can be simple, null, or the fully qualified class name of
the fragmenter (implements the
org.apache.lucene.search.highlight.Fragmenter).
compass.engine.highlighter.[highlighter Optional (defaults to 100). Sets the size (in bytes) of the fragments
name].fragmenter.simple.size for the simple fragmenter.
compass.engine.highlighter.[highlighter Optional (default to default). The type of the encoder that will be
name].encoder.type used to encode fragmented text. Can be default (does nothing),
html (escapes html tags), or the fully qualifed class name of the
encoder (implements
org.apache.lucene.search.highlight.Encoder).
compass.engine.highlighter.[highlighter Optional (default to simple). The type of the formatter that will be
name].formatter.type used to highlight the text. Can be simple (simply wraps the
highlighted text with pre and post strings), htmlSpanGradient
(wraps the highlighted text with an html span tag with an optional
background and foreground gradient colors), or the fully qualified
class name of the formatter (implements
org.apache.lucene.search.highlight.Formatter).
compass.engine.highlighter.[highlighter Optional (default to <b>). In case the highlighter uses the simple
name].formatter.simple.pre formatter, controlls the text that is appened before the highlighted
text.
Setting Description
compass.engine.highlighter.[highlighter Optional (default to </b>). In case the highlighter uses the simple
name].formatter.simple.post formatter, controlls the text that is appened after the highlighted
text.
compass.engine.highlighter.[highlighter Optional (if not set, foreground will not be set on the span tag). In
name].formatter.htmlSpanGradient.minForegroundColor
case the highlighter uses the htmlSpanGradient formatter, the hex
color used for representing IDF scores of zero eg #FFFFFF (white).
compass.engine.highlighter.[highlighter Optional (if not set, foreground will not be set on the span tag). In
name].formatter.htmlSpanGradient.maxForegroundColor
case the highlighter uses the htmlSpanGradient formatter, the
largest hex color used for representing IDF scores eg #000000
(black).
compass.engine.highlighter.[highlighter Optional (if not set, background will not be set on the span tag). In
name].formatter.htmlSpanGradient.minBackgroundColor
case the highlighter uses the htmlSpanGradient formatter, the hex
color used for representing IDF scores of zero eg #FFFFFF (white).
compass.engine.highlighter.[highlighter Optional (if not set, background will not be set on the span tag). In
name].formatter.htmlSpanGradient.maxBackgroundColor
case the highlighter uses the htmlSpanGradient formatter, The
largest hex color used for representing IDF scores eg #000000
(black).
Table A.20.
Setting Description
compass.osem.supportUnmarshall Controls if the default support for un-marshalling within the class
mappings will default to true or false (unless it is explicitly set in
the class mapping). Defaults to true. Controls if the searchable
class will support unmarshalling from the search engine or using
Resource is enough. Un-marshalling is the process of converting a
raw Resource into the actual domain object. If support un-marshall
is enabled extra information will be stored within the search engine,
as well as consumes extra memory
The JdbcDirectory is highly configurable, using the optional JdbcDirectorySettings. All the settings are
described in the javadoc, and most of them will be made clear during the next sections.
Parameters Description
DataSource, Dialect, tableName Creates a new JdbcDirectory using the given data source and
dialect. JdbcTable and JdbcDirectorySettings are created based
on default values.
DataSource, Dialect, Creates a new JdbcDirectory using the given data source, dialect,
JdbcDirectorySettings, tableName and JdbcDirectorySettings. The JdbcTable is created internally.
DataSource, JdbcTable Creates a new JdbcDirectory using the given dialect, and
JdbcTable. Creating a new JdbcTable requires a Dialect and
JdbcDirectorySettings.
The Jdbc directory works against a single table (where the table name must be provided when the directory is
created). The table schema is described in the following table:
Name VARCHAR name_ The file entry name. Similar to a file name
within a file system directory. The column size
is configurable and defaults to 50.
Value BLOB value_ A binary column where the content of the file is
stored. Based on Jdbc Blob type. Can have a
configurable size where appropriate for the
database type.
Size NUMBER size_ The size of the current saved data in the Value
column. Similar to the size of a file in a file
system.
Last Modified TIMESTAMP lf_ The time that file was last modified. Similar to
the last modified time of a file within a file
system.
Deleted BIT deleted_ If the file is deleted or not. Only used for some
of the file types based on the Jdbc directory.
More is explained in later sections.
The Jdbc directory provides the following operations on top of the ones forced by the Directory interface:
create Creates the database table (with the above mentioned schema). The create
operation drops the table first.
deleteContent Deletes all the rows from the table in the database.
tableExists Returns if the table exists or not. Only supported on some of the databases.
deleteMarkDeleted Deletes all the file entries that are marked to be deleted, and they were
marked, and they were marked "delta" time ago (base on database time, if
possible by dialect). The delta is taken from the JdbcDirectorySettings, or
provided as a parameter to the deleteMarkDeleted operation.
The Jdbc directory requires a Dialect implementation that is specific to the database used with it. The
following is a table listing the current dialects supported with the Jdbc directory:
Oracle
org.apache.lucene.store.jdbc.dialect.OracleDialect Oracle Jdbc Driver - Yes
Microsoft
org.apache.lucene.store.jdbc.dialect.SQLServerDialect SQL jTds 1.2 - No. Microsoft Jdbc Driver -
Server Unknown
MySQL
org.apache.lucene.store.jdbc.dialect.MySQLDialect MySQL Connector J 3.1/5 - Yes with
emulateLocators=true in connection
string.
MySQL with
org.apache.lucene.store.jdbc.dialect.MySQLInnoDBDialect See MySQL
InnoDB.
MySQL with
org.apache.lucene.store.jdbc.dialect.MySQLMyISAMDialect See MySQL
MyISAM
PostgreSQL
org.apache.lucene.store.jdbc.dialect.PostgreSQLDialect Postgres Jdbc Driver - Yes.
Sybase /
org.apache.lucene.store.jdbc.dialect.SybaseDialect Sybase Unknown.
Anywhere
Interbase
org.apache.lucene.store.jdbc.dialect.InterbaseDialect Unknown.
Firebird
org.apache.lucene.store.jdbc.dialect.FirebirdDialect Unknown.
DB2
org.apache.lucene.store.jdbc.dialect.DB2Dialect / DB2 Unknown.
AS400 / DB2
OS390
Derby
org.apache.lucene.store.jdbc.dialect.DerbyDialect Derby Jdbc Driver- Unknown.
HypersonicSQL
org.apache.lucene.store.jdbc.dialect.HSQLDialect HSQL Jdbc Driver - No.
* A Blob locator is a pointer to the actual data, which allows fetching only portions of the Blob at a time.
Databases (or Jdbc drivers) that do not use locators usually fetch all the Blob data for each query (which makes
using them impractical for large indexes). Note, the support documented here does not cover all the possible
Jdbc drivers, please refer to your Jdbc driver documentation for more information.
It is best to use a pooled data source (like Jakarta Commons DBCP), so Connections won't get created every
time, but be pooled.
Most of the time, when working with Jdbc directory, it is best to work in a non compound index format. Since
with databases there is no problem of too many files open, it won't be an issue. The package comes with a set of
utilities to compound or uncompund an index, located in the org.apache.lucene.index.LuceneUtils class,
just in case you already have an index and it is in the wrong structure.
When indexing data, a possible performance improvement can be to index the data into the file system or
memory, and then copy over the contents of the index to the database.
org.apache.lucene.index.LuceneUtils comes with a utility to copy one directory to the other, and changing
the compound state of the index while copying.
As you can see, no commit or rollback are called on the connection, allowing for any type of transaction
management done outside of the actual JdbcDirectory related operations. Also, the fact that we are using the
Jdbc DataSource, allows for plug able transaction management support (usually based on DataSource delegate
and Connection proxy). DataSourceUtils is a utility class that comes with the Jdbc directory, and it's usage
will be made clear in the following sections.
There are several options when it comes to transaction management, and they are:
When configuring the DataSource or the Connection to use autoCommit (set it to true), no transaction
management is required. Additional benefit is that any existing Lucene code will work as is with the
JdbcDirectory (assuming that the Directory class was used instead of the actual implementation type).
The main problems with using the Jdbc directory in the autoCommit mode are: performance suffers because of
it, and not all database allow to use Blobs with autoCommit. As you will see later on, other transaction
management are simple to use, and the Jdbc directory comes with a set of helper classes that make the
transition into a "Jdbc directory enabled code" simple.
The TransactionAwareDataSourceProxy can wrap a DataSource, returning Jdbc Connection only if there is
no existing Connection that was opened before (within the same thread) and not closed yet. Any call to the
close method on this type of Connection (which we call a "not controlled" connection) will result in a no op.
The DataSourceUtils#releaseConnection will also take care and not close the Connection if it is not
controlled.
So, how do we rollbackor commit the Connection? DataSourceUtils has two methods,
commitConnectionIfPossible and rollbackConnectionIfPossible, which will only commit/rollback the
Connection if it was proxied by the TransactionAwareDataSourceProxy, and it is a controlled Connection.
Note, that the above code will also work when you do have a transaction manager (as described in the next
section), and it forms the basis for the DirectoryTemplate (described later) that comes with Jdbc directory.
For environments that use external transaction managers (like JTA or Spring PlatformTransactionManager),
the transaction management should be performed outside of the code that use the Jdbc directory. Do not use
Jdbc directory TransactionAwareDataSourceProxy.
For JTA for example, if Container Managed transaction is used, the executing code should reside within it. If
not, JTA transaction should be executed programmatically.
When using Spring, the executing code should reside within a transactional context, using either transaction
proxy (AOP), or the PlatformTransactionManager and the TransactionTemplate programmatically.
IMORTANT: When using Spring, you should wrap the DataSource with Spring's own
TransactionAwareDataSourceProxy.
B.3.4. DirectoryTemplate
Since transaction management might require specific code to be written, Jdbc directory comes with a
DirectoryTemplate class, which allows writing Directory implementation and transaction management
vanilla code. The directory template perform transaction management support code only if the Directory is of
type JdbcDirectory and the transaction management is a local one (Data Source transaction management).
Each directory based operation (done by Lucene IndexReader, IndexSearcher and IndexWriter) should be
wrapped by the DirectoryTemplate. An example of using it:
template.execute(new DirectoryTemplate.DirectoryCallbackWithoutResult() {
protected void doInDirectoryWithoutResult(Directory dir) throws IOException {
// indexSearcher operations
}
});
When the JdbcDirectory is created, all the different file entry handlers that are registered with the directory
settings are created and configured. They will than be used to handle files based on the file names.
When registering a new file entry handler, it must be registered with JdbcFileEntrySettings. The
JdbcFileEntrySettings is a fancy wrapper around java Properties in order to provide an open way for
configuring file entry handlers. When creating a new JdbcFileEntrySettings it already has sensible defaults
(refer to the javadoc for them), but of course they can be changed. One important configuration setting is the
type of the FileEntryHandler, which should be set under the constant setting name:
JdbcFileEntrySettings#FILE_ENTRY_HANDLER_TYPE and should be the fully qualified class name of the file
entry handler.
The Jdbc directory package comes with three different FileEntryHandlers. They are:
Type Description
Performs
org.apache.lucene.store.jdbc.handler. no operations.
NoOpFileEntryHandler
Performs
org.apache.lucene.store.jdbc.handler. actual delete from the database when the different delete
ActualDeleteFileEntryHandler operations are called. Also support configurable IndexInput and
Type Description
Marks
org.apache.lucene.store.jdbc.handler. entries in the database as deleted (using the deleted column)
MarkDeleteFileEntryHandler when the different delete operations are called. Also support
configurable IndexInput and IndexOutput (described later).
Most of the files use the MarkDeleteFileEntryHandler, since there might be other currently open
IndexReaders or IndexSearchers that use the files. The JdbcDirectory provide the deleteMarkDeleted() and
deleteMarkDeleted(delta) to actually purge old entries that are marked as deleted. It should be scheduled and
executed once in a while in order to keep the database table compact.
When creating new JdbcDirectorySettings, it already registers different file entry handlers for specific files
automatically. For example, the deleted file is registered against a NoOpFileEntryHandler since we will
always be able to delete entries from the database (the deleted file is used to store files that could not be
deleted from the file system). This results in better performance since no operations are executed against the
deleted (or deleted related files). Another example, is registering the ActualDeleteFileEntryHandler against
the segments file, since we do want to delete it and replace it with a new one when it is written.
Each file entry handler can be associated with an implementation of IndexInput. Setting the IndexInput
should be set under the constant JdbcFileEntrySettings#INDEX_INPUT_TYPE_SETTING and be the fully
qualified class name of the IndexInput implementation.
Type Description
org.apache.lucene.store.jdbc.index.Fetches and caches all the binary data from the database when the
FetchOnOpenJdbcIndexInput IndexInput is opened. Perfect for small sized file entries (like the
segments file).
Type Description
The JdbcDirectorySettings automatically registers sensible defaults for the default file entry handler and
specific ones for specific files. Please refer to the javadocs for the defaults.
Each file entry handler can be associated with an implementation of IndexOutput. Setting the IndexOutput
should be set under the constant JdbcFileEntrySettings#INDEX_OUTPUT_TYPE_SETTING and be the fully
qualified class name of the IndexOutput implementation.
Type Description
org.apache.lucene.store.jdbc.index.A special index output, that first starts with a RAM based index
RAMAndFileJdbcIndexOutput output, and if a configurable threshold is met, switches to file based
index output. The threshold setting cab be configured using
RAMAndFileJdbcIndexOutput#INDEX_OUTPUT_THRESHOLD_SETTING.
The JdbcDirectorySettings automatically registers sensible defaults for the default file entry handler and
specific ones for specific files. Please refer to the javadocs for the defaults.