Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Xplore 1.3 Admin PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 360

EMC Documentum

xPlore

Version 1.3

Administration and Development Guide

EMC Corporation
Corporate Headquarters:
Hopkinton, MA 017489103
15084351000
www.EMC.com

Copyright 2010-2012 EMC Corporation. All rights reserved.


EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without
notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED "AS IS." EMC CORPORATION MAKES NO
REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION,
AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR
PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.
For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. Adobe and Adobe PDF
Library are trademarks or registered trademarks of Adobe Systems Inc. in the U.S. and other countries. All other trademarks used
herein are the property of their respective owners.
Documentation Feedback
Your opinion matters. We want to hear from you regarding our product documentation. If you have feedback about how we can
make our documentation better or easier to use, please send us your feedback directly at IIGDocumentationFeedback@emc.com.

Table of Contents

Chapter 1

Introduction to xPlore.....................................................................................13
Features ..........................................................................................................13
Limitations .......................................................................................................14
xPlore compared to FAST .................................................................................17
Architectural overview.......................................................................................19
Physical architecture ........................................................................................19
xPlore disk areas ..........................................................................................20
xPlore instances ...........................................................................................20
xDB libraries and Lucene index......................................................................21
Indexes ........................................................................................................22
Logical architecture ..........................................................................................23
Documentum domains and categories............................................................25
Mapping of domains to xDB ...........................................................................26
How Content Server documents are indexed......................................................27
How Content Server documents are queried ......................................................29

Chapter 2

Managing the System .....................................................................................31


Opening xPlore administrator ............................................................................31
Starting and stopping the system.......................................................................32
Viewing and configuring global operations (all instances) ....................................32
Managing instances..........................................................................................33
Changing the host name and URL .................................................................33
Replacing a failed instance with a spare .........................................................34
Replacing a failed primary instance ................................................................35
Changing a failed instance into a spare ..........................................................37
Managing the watchdog service ........................................................................37
Starting and stopping the watchdog service ....................................................37
Configuring system metrics ...............................................................................39
Managing the status database...........................................................................39
Configuring the audit record ..............................................................................39
Troubleshooting system problems .....................................................................40
Modifying indexserverconfig.xml........................................................................43
Tasks performed outside xPlore administrator ....................................................45
Administration APIs ..........................................................................................47
Open an admin connection ............................................................................47
Call an admin API .........................................................................................47
Configuration APIs ........................................................................................48

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Table of Contents

Chapter 3

Managing Security..........................................................................................51
About security ..................................................................................................51
Changing search results security.......................................................................51
Manually updating security................................................................................52
Changing the administrator password ................................................................53
Configuring the security cache ..........................................................................54
Troubleshooting security ...................................................................................55

Chapter 4

Managing the Documentum Index Agent .......................................................59


About the Documentum index agent ..................................................................59
Documentum attributes that control indexing...................................................60
Starting the index agent ....................................................................................62
Silent index agent startup..................................................................................63
Setting up index agents for ACLs and groups .....................................................65
Configuring the index agent after installation ......................................................66
Installing index agent filters (Content Server 6.5 SPx or 6.6) ............................67
Configuring index agent filters........................................................................69
Sharing content storage ................................................................................70
Mapping Server storage areas to collections...................................................71
Migrating documents ........................................................................................72
Migrating content (reindexing)........................................................................72
Migrating documents by object type ...............................................................72
Migrating a limited set of documents ..............................................................73
Using ftintegrity ................................................................................................73
Indexing documents in normal mode .................................................................79
Resubmitting documents for indexing ................................................................79
Removing entries from the index .......................................................................79
Indexing metadata only.....................................................................................79
Making types non-indexable..............................................................................80
Making metadata non-searchable......................................................................80
Injecting data and supporting joins.....................................................................80
Custom content filters .......................................................................................83
Reindexing after removing a Documentum attribute............................................84
Troubleshooting the index agent........................................................................85
Cleaning up the index queue .........................................................................89
Index agent startup issues .............................................................................90
Content query returned no data......................................................................91

Chapter 5

Document Processing (CPS) ..........................................................................93


About CPS.......................................................................................................93
Adding a remote CPS instance..........................................................................94
Removing a CPS instance ................................................................................95
Configuring CPS dedicated to indexing or search ...............................................96
Administering CPS ...........................................................................................97
Modifying CPS configuration file ........................................................................97

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Table of Contents

Maximum document and text size......................................................................98


Configuring languages and encoding ............................................................... 100
Indexable formats........................................................................................... 103
Lemmatization ............................................................................................... 103
About lemmatization.................................................................................... 103
Configuring indexing lemmatization .............................................................. 105
Lemmatizing specific types or attributes ....................................................... 105
Troubleshooting lemmatization..................................................................... 106
Saving lemmatization tokens ....................................................................... 107
Handling special characters ............................................................................ 108
Configuring stop words ................................................................................... 110
Troubleshooting content processing ................................................................ 111
CPS troubleshooting methods...................................................................... 112
CPS startup and connection errors............................................................... 113
Adding CPS daemons for ingestion or query processing................................ 114
Running out of space .................................................................................. 115
CPS file processing errors ........................................................................... 116
Troubleshooting slow ingestion and timeouts ................................................ 117
Adding dictionaries to CPS.............................................................................. 120
Custom content processing............................................................................. 122
About custom content processing................................................................. 122
Text extraction ............................................................................................ 124
Troubleshooting custom text extraction......................................................... 126
Annotation .................................................................................................. 126
UIMA example ............................................................................................ 130
Custom content processing errors................................................................ 135
Chapter 6

Indexing ....................................................................................................... 137


About indexing ............................................................................................... 137
Configuring text extraction .............................................................................. 137
Configuring an index....................................................................................... 139
Subpaths.................................................................................................... 141
Sort support................................................................................................ 143
Creating custom indexes ................................................................................ 145
Managing indexing in xPlore administrator ....................................................... 145
Troubleshooting indexing ................................................................................ 146
Running the standalone consistency checker ................................................... 149
Indexing APIs................................................................................................. 150
Route a document to a collection ................................................................. 150
Creating a custom routing class................................................................ 151
SimpleCollectionRouting example............................................................. 151

Chapter 7

Index Data: Domains, Categories, and Collections ...................................... 155


Domain and collection menu actions................................................................ 155
Managing domains ......................................................................................... 156
Delete a corrupted domain .............................................................................. 157
Configuring categories.................................................................................... 158

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Table of Contents

Managing collections ...................................................................................... 159


About collections......................................................................................... 159
Planning collections for scalability ................................................................ 160
Uses of subcollections................................................................................. 160
Adding or deleting a collection ..................................................................... 161
Changing collection properties ..................................................................... 161
Routing documents to a specific collection.................................................... 162
Attaching and detaching a collection ............................................................ 162
Moving a collection ..................................................................................... 162
Moving a collection to a new drive................................................................ 163
Creating a collection storage location ........................................................... 164
Rebuilding the index.................................................................................... 164
Monitoring merges of index data .................................................................. 165
Deleting a collection and recreating indexes ................................................. 165
Querying a collection................................................................................... 165
Checking xDB statistics .................................................................................. 165
Troubleshooting data management.................................................................. 167
xDB repair commands .................................................................................... 168
Chapter 8

Backup and Restore ..................................................................................... 171


About backup................................................................................................. 171
About restore ................................................................................................. 173
Handling data corruption ................................................................................. 174
Detecting data corruption............................................................................. 175
Handling a corrupt domain........................................................................... 175
Repairing a corrupted index ......................................................................... 175
Snapshot too old......................................................................................... 176
Cleaning and rebuilding the index ................................................................ 176
Dead objects .............................................................................................. 177
Recovering from a system crash .................................................................. 178
Backup in xPlore administrator ........................................................................ 179
File- or volume-based (snapshot) backup and restore....................................... 179
Offline restore ................................................................................................ 180
Troubleshooting backup and restore ................................................................ 181

Chapter 9

Automated Utilities (CLI) .............................................................................. 183


CLI properties and environment ...................................................................... 183
Using the CLI ................................................................................................. 184
CLI batch file.................................................................................................. 185
Scripted federation restore .............................................................................. 186
Scripted domain restore.................................................................................. 187
Scripted collection restore............................................................................... 188
Force detach and attach CLIs ......................................................................... 189
Orphaned segments CLIs ............................................................................... 189
Removing orphaned indexes........................................................................... 190
Domain mode CLIs......................................................................................... 190
Collection and domain state CLIs .................................................................... 191
Activate spare instance CLI............................................................................. 191

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Table of Contents

Detecting the version of an instance ................................................................ 191


Cleaning up after failed index rebuild ............................................................... 192
Final merge CLIs............................................................................................ 192
Chapter 10

Search .......................................................................................................... 195


About searching ............................................................................................. 195
Query operators.......................................................................................... 197
Administering search ...................................................................................... 197
Configuring query warmup........................................................................... 198
Configuring scoring and freshness ............................................................... 201
Supporting search in XML documents .......................................................... 203
Adding a thesaurus ..................................................................................... 207
Configuring query lemmatization .................................................................. 212
Configuring search on compound words ....................................................... 212
Configuring query summaries ...................................................................... 212
Configuring fuzzy search ............................................................................. 215
Configuring index type checking................................................................... 216
Configuring wildcards and fragment search...................................................... 218
Configuring content wildcard support............................................................ 219
Configuring metadata wildcard search .......................................................... 220
Limiting wildcards and common terms in search results ................................. 221
Supporting wildcards in DQL........................................................................ 221
Configuring Documentum search .................................................................... 222
Query plugin configuration (dm_ftengine_config)........................................... 222
Making types and attributes searchable ........................................................ 223
Folder descend queries ............................................................................... 223
DQL, DFC, and DFS queries ....................................................................... 224
Content Server and DFC client search differences ........................................ 225
DQL Processing.......................................................................................... 226
Tracing Documentum queries ...................................................................... 227
Supporting subscriptions to queries ................................................................. 228
About query subscriptions ........................................................................... 228
Installing the query subscription DAR ........................................................... 231
Testing query subscriptions.......................................................................... 232
Subscription reports .................................................................................... 233
Subscription logging.................................................................................... 233
Troubleshooting search .................................................................................. 242
Auditing queries .......................................................................................... 244
Search is not available ................................................................................ 245
Troubleshooting slow queries....................................................................... 246
Unexpected search results .......................................................................... 250
Debugging queries ...................................................................................... 253
Search APIs and customization ....................................................................... 257
Debugging queries ...................................................................................... 259

Chapter 11

Facets........................................................................................................... 273
About Facets.................................................................................................. 273
Configuring facets in xPlore ............................................................................ 274
Creating a DFC facet definition........................................................................ 276
Facet datatypes ............................................................................................. 276
Creating a DFS facet definition........................................................................ 278
Defining a facet handler .................................................................................. 281

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Table of Contents

Sample DFC facet definition and retrieval ........................................................ 282


Tuning facets ................................................................................................. 284
Logging facets ............................................................................................... 285
Troubleshooting facets.................................................................................... 286
Chapter 12

Using reports ............................................................................................... 287


About reports ................................................................................................. 287
Types of reports ............................................................................................. 287
Document processing (CPS) reports................................................................ 290
Indexing reports ............................................................................................. 290
Search reports ............................................................................................... 290
Editing a report............................................................................................... 291
Report syntax................................................................................................. 292
Sample edited report ...................................................................................... 294
Troubleshooting reports .................................................................................. 296

Chapter 13

Logging ........................................................................................................ 297


Configuring logging ........................................................................................ 297
CPS logging................................................................................................... 299

Chapter 14

Setting up a Customization Environment ..................................................... 301


Setting up the xPlore SDK .............................................................................. 301
Customization points ...................................................................................... 301
Adding custom classes ................................................................................... 304
Tracing .......................................................................................................... 304
Trace log format.......................................................................................... 306
Reading trace output ................................................................................... 307
Enabling logging in a client application............................................................. 308
Handling a NoClassDef exception ................................................................... 309

Chapter 15

Performance and Disk Space ....................................................................... 311


Planning for performance................................................................................ 311
Disk space and storage .................................................................................. 313
System sizing for performance ........................................................................ 316
Memory consumption ..................................................................................... 317
Measuring performance .................................................................................. 317
Tuning the system .......................................................................................... 318
Tuning index merges ...................................................................................... 319
Types of merges ......................................................................................... 319
Setting final merge blackout periods ............................................................. 321
Starting and stopping a final merge .............................................................. 321
Scheduling Final Merges ............................................................................. 322
Indexing......................................................................................................... 323

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Table of Contents

Search performance ....................................................................................... 326


About search performance........................................................................... 327
Tuning CPS and xDB for search................................................................... 328
Creating a CPS daemon dedicated to search................................................ 330
Improving search performance with time-based collections ............................ 330
Appendix A

Index Agent, CPS, Indexing, and Search Parameters ................................... 333

Appendix B

Documentum DTDs ...................................................................................... 347

Appendix C

XQuery and VQL Reference.......................................................................... 353

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Preface

This guide describes administration, configuration, and customization of Documentum xPlore.


These tasks include system monitoring, index configuration and management, query configuration
and management, auditing and security, and Documentum integration.

Intended Audience
This guide contains information for xPlore administrators who configure xPlore and Java developers
who customize xPlore:
Configuration is defined for support purposes as changing an XML file or an administration
setting in the UI.
Customization is defined for support purposes as using xPlore APIs to customize indexing and
search. The xPlore SDK is a separate download that supports customization.
You must be familiar with the installation guide, which describes the initial configuration of the
xPlore environment. When Documentum functionality is discussed, this guide assumes familiarity
with EMC Documentum Content Server administration.

Revision history
The following changes have been made to this document.
Revision Date

Description

November 2012

Initial publication

Additional documentation
This guide provides overview, administration, and development information. For information on
installation, supported environments, and known issues, see:
EMC Documentum xPlore Release Notes
EMC Documentum xPlore Installation Guide
EMC Documentum Environment and System Requirements Guide
For additional information on Content Server installation and Documentum search client
applications, see:
EMC Documentum Content Server Installation Guide
EMC Documentum Search Development Guide

EMC Documentum xPlore Version 1.3 Administration and Development Guide

11

Chapter 1
Introduction to xPlore
This chapter contains the following topics:

Features

Limitations

xPlore compared to FAST

Architectural overview

Physical architecture

Logical architecture

How Content Server documents are indexed

How Content Server documents are queried

Features
Documentum xPlore is a multi-instance, scalable, high-performance, full-text index server that can be
configured for high availability and disaster recovery.
The xPlore architecture is designed with the following principles:
Uses standards as much as possible, like XQuery
Uses open source tools and libraries, like Lucene
Supports enterprise readiness: High availability, backup and restore, analytics, performance tuning,
reports, diagnostics and troubleshooting, administration GUI, and configuration and customization
points.
Supports virtualization, with accompanying lower total cost of ownership.

Indexing features
Collection topography: xPlore supports creating collections online, and collections can span multiple
file systems.
Transactional updates and purges: xPlore supports transactional updates and purges of indexes as well
as transactional commit notification to the caller.
Multithreaded insertion into indexes: xPlore ingestion through multiple threads supports vertical
scaling on the same host.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

13

Introduction to xPlore

Dynamic allocation and deallocation of capacity: For periods of high ingestion, you can add a CPS
instance and new collection. Add content to this collection, then move the collection to another
instance for better search performance. You can then decommission the CPS instance.
Temporary high query load: For high query load, like a legal investigation, add an xPlore instance for
the search service and bind collections to it in read-only mode.
Growing ingestion or query load: If your ingestion or query load increases due to growing business,
you can add instances as needed.
Extensible indexing pipeline using the open-source UIMA framework.
Configurable stop words and special characters.

Search features
Case sensitivity: xPlore queries are lower-cased (rendered case-insensitive).
Full-text queries: To query metadata, set up a specific index on the metadata.
Faceted search: Facets in xPlore are computed over the entire result set or over a configurable number
of results.
Security evaluation: When a user performs a search, permissions are evaluated for each result. Security
can be evaluated in the xPlore full-text engine before results are returned to Content Server, resulting
in faster query results. This feature is turned on by default and can be configured or turned off.
Native XQuery syntax: The xPlore full-text engine supports XQuery syntax.
Thesaurus search to expand query terms.
Fuzzy search finds misspelled words or letter reversals.
Boost specific metadata in search results.
Extensive testing and validation of search on supported languages.

Administration features
Multiple instance configuration and management.
Reports on ingestion metrics and errors, search performance and errors, and user activity.
Collections management: Creating, configuring, deleting, binding, routing, rebuilding, querying.
Command-line interface for automating backup and restore.

Limitations
ACLs and aspects are not searchable by default
ACLs and aspects are not searchable by default, to protect security. You can reverse the default by
editing indexserverconfig.xml. Set full-text-search to true in the subpath definition for acl_name and
r_aspect_name and then reindex your content.
14

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Introduction to xPlore

Ingestion of many large files can cause failures


When CPS processes two or more large files at the same time, the CPS log file reports one of the
following errors (cps_daemon.log):
ERROR [Daemon-Core-(3400)] Exception happened, ACCESS_VIOLATION,
Attempt to read data at address 1 at (connection-handler-2)
...
FATAL [DAEMON-LP_RLP-(3440)] Not enough memory to process linguistic requests.
Error message: bad allocation

Workaround using xPlore administrator. Select an instance and click Configuration. Change the
following to smaller values:
Batch size: Decrease batch size to decrease the number of documents in a failed batch. (All
documents in a batch fail to be indexed if one document fails.)
Max text threshold
Thread pool size

Only metadata is indexed if a format is not supported


CPS cannot process certain formats as documents or email attachments, and PDF input fields. Only
metadata of unsupported contents is indexed during ingestion.
For a full list of supported formats, see Oracle Outside In 8.3.7 documentation. To see format
processing errors, use the xPlore administrator report Document Processing Error Detail and choose
File format unsupported.
To test whether a format is supported, try uploading it in xPlore administrator. If no error code is
reported in the Document processing error report, the format has been successfully indexed.

CPS daemon must restart after fatal ingestion error


CPS daemon restarts when ingestion generates a fatal error. The CPS log indicates that the connection
has been reset:
2009-02-10 12:19:55,425 INFO [DAEMON-CORE-(-1343566944)]
Daemon is shutdown forcefully.
2009-02-10 12:19:55,512 ERROR [MANAGER-CPSTransporter-(CPSWorkerThread-6)]
Failed to receive the response XML.
java.net.SocketException: Connection reset

Batch failure
Indexing requests are processed in batches. When one request in a batch fails when the index is
written to xDB, the entire batch fails.

Collection names cannot be duplicated


The name of an adopted collection cannot be reused, because the name still exists in the domain.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

15

Introduction to xPlore

An adopted collection is a collection that has been moved to a parent collection. The adopted collection
becomes a subcollection. This kind of collection is created to boost ingestion rate, and later adopted
for better search performance.

Lemmatization
xPlore supports lemmatization, but you cannot configure the parts of speech that are lemmatized.
The part of speech for a word can be misidentified when there is not enough context. Workaround:
Enable alternative lemmatization if you have disabled it (see Configuring indexing lemmatization,
page 105).
Punctuation at the end of the sentence is included in the lemmatization of the last word. For example,
a phrase Mary likes swimming and dancing is lemmatized differently depending on whether there is a
period at the end. Without the period, dancing is identified as a verb with the lemma dance. With the
period, it is identified as a noun with the lemma dancing. A search for the verb dance does not find
the document when the word is at the end of the sentence. The likelihood of errors in Part-Of-Speech
(POS) tagging increases with sentence length. Workaround: Enable alternate lemmatization.

Only one language is analyzed for indexing


Documents with multiple languages are indexed, but only one language is selected for indexing
tokens. Words in other languages are sometimes not indexed properly.

dftxml attributes are not indexed


xPlore does not index attribute values on XML elements the dftxml representation of the input
document. For example, you cannot find all documents for which the value of the dmfttype attribute
of the element acl_name is dmstring.

Phrase searches
The content of a phrase search is not lemmatized.
Search fails for parts of common phrases. A common phrase like because of, a good many, or
status quo is tokenized as a phrase and not as individual words. A search for a word in the phrase
like because fails.

Stop words are case sensitive


Stop word lists are case sensitive. Add all case forms for words that are not indexed, for example:
AND and And.

Some lightweight sysobjects (LWSOs) are not fully indexed


and searchable
In a Documentum repository, lightweight sysobjects such as emails inherit many attributes from the
parent object. If the LWSOs are not materialized, a query on the inherited attributes fails. Some
16

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Introduction to xPlore

Documentum client applications, such as Webtop and DCO, materialize emails so they are fully
searchable. SourceOne does not, so emails in SourceOne are sometimes not fully searchable.
If a query returns a result that is a lightweight sysobject (LWSO), the query does not find the object.
No result is returned. DQL filters unmaterialized LWSOs, not XQuery.

Skipped collections are not reported in query results


When a collection is unavailable or corrupted, it is skipped in a query. xPlore does not notify the
user or administrator that the collection was skipped. Results can be different when the collection is
restored or brought back online.

Special characters limitations


Special characters are treated as white space. The underscore in the following document name is
treated as white space. Special characters lists are limited to ANSI characters.
Characters can be removed from the special characters list. See Handling special characters, page 108.

Chinese
Space in query causes incorrect tokenization
A space within a Chinese term is treated in DQL as white space. A search for the string fails. For
example, the term is treated as AND . A search for fails.
Dictionary must be customized for Chinese name and place search
For Chinese documents, the names of persons and places cannot be searched. To be found, they must
be added to the Chinese dictionary in xPlore. See Adding dictionaries to CPS, page 120.

xPlore compared to FAST


If you are migrating from FAST to xPlore, the following information describes differences between the
two indexing servers.

Administration differences
xPlore has an administration console. FAST does not. Many features in xPlore are configurable
through xPlore administrator. These features were not configurable for FAST. Additionally,
administrative tasks are exposed through Java APIs.
Ports required: During xPlore instance configuration, the installer prompts for the HTTP port for the
JBoss instance (base port). The installer validates that the next 100 consecutive ports are available.
During index agent configuration, the installer prompts for the HTTP port for index agent Jboss
instance and validates that the next 20 consecutive ports are available. FAST used 4000 ports.
High availability: xPlore supports N+1, active/passive with clusters, and active/active shared data
configurations. FAST supports only active/active. xPlore supports spare indexing instances that are
EMC Documentum xPlore Version 1.3 Administration and Development Guide

17

Introduction to xPlore

activated when another instance fails. The EMC Documentum xPlore Installation Guide describes
high availability options for xPlore.
Disaster recovery: xPlore supports online backup, including full and incremental. FAST supports
only offline (cold) backup.
Storage technology: xPlore supports SAN and NAS. FAST supports SAN only.
Virtualization: xPlore runs in VMware environments. FAST does not.
64-bit address space: 64-bit systems are supported in xPlore but not in FAST.
xPlore requires less temporary disk space than FAST. xPlore requires twice the index space used by
all collections, in addition to the index. This space is used for merges and optimizations. FAST
requires 3.5 times the space.

Indexing differences
Back up and restore: xPlore supports warm backups.
High availability: xPlore automatically restarts content processing after a CPS crash. After a VM
crash, the xPlore watchdog sends an email notification.
Transactional updates and purges: xPlore supports transactional updates and purges as well as
transactional commit notification to the caller. FAST does not.
Collection topography: xPlore supports creating collections online, and collections can span multiple
file systems. FAST does not support these features.
Lemmatization: FAST supports configuration for which parts of speech are lemmatized. In xPlore,
lemmatization is enabled or disabled. You can configure lemmatization for specific Documentum
attribute values.

Search differences
One-box search: Searches from the Webtop client default to ANDed query terms in xPlore.
Query a specific collection: Targeted queries are supported in xPlore but not FAST.
Folder descend: Queries are optimized in xPlore but not in FAST.
Results ranking: FAST and xPlore use different ranking algorithms.
Excluding from index: xPlore allows you to configure non-indexed metadata to save disk space
and improve ingestion and search performance. With this configuration, the number of hits differs
between FAST and xPlore queries on the non-indexed content. For example, if xPlore does not index
docbase_id, a full-text search on "256" returns no hits in xPlore. The search returns all indexed
documents for repository whose ID is 256.
Security evaluation: Security is evaluated by default in the xPlore full-text engine before results are
returned to Content Server, resulting in faster query results. FAST returns results to the Content Server,
resulting in many hits that the user is not able to view.
Underprivileged user queries: Optimized in xPlore but not in FAST.
Native XQuery syntax: Supported by xPlore.
18

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Introduction to xPlore

Facets: Facets are limited to 350 hits in FAST, but xPlore supports many more hits.
XML attributes: Attribute values on XML elements are part of the xPlore binary index. xPlore does
not index XML attribute values.
Special characters: Special character lists are configurable. The default in xPlore differs from FAST
when terms such as email addresses or contractions are tokenized. For example, in FAST, an email
address is split up into separate tokens with the period and @ as boundaries. However, in xPlore,
only the @ serves as the boundary, since the period is considered a context character for part of
speech identification.

Architectural overview
xPlore provides query and indexing services that can be integrated into external content sources such
as the Documentum content management system. External content source clients like Webtop or
CenterStage, or custom Documentum DFC clients, can send indexing requests to xPlore.
Each document source is configured as a domain in xPlore. You can set up domains using xPlore
administrator. For Documentum environments, the Documentum index agent creates a domain for
each repository and a default collection within that domain.
Documents are provided in an XML representation to xPlore for indexing through the indexing APIs.
In a Documentum environment, the Documentum index agent prepares an XML representation of each
document. The document is assigned to a category, and each category corresponds to one or more
collections as defined in xPlore. To support faceted search in Documentum repositories, you can define
a special type of an index called an implicit composite index.
xPlore instances are web application instances that reside on application servers. When an xPlore
instance receives an indexing request, it uses the document category to determine what is tokenized
and saved to the index. A local or remote instance of the content processing service (CPS) fetches the
content. CPS detects the primary language and format of a document. CPS then extracts indexable
content from the request stream and parses it into tokens. The tokens are used for building a full-text
index.
xPlore manages the full-text index. An external Apache Lucene full-text engine is embedded into
the EMC XML database (xDB). xDB tracks indexing and updates requests, recording the status of
requests and the location of indexed content. xDB provides transactional updates to the Lucene
index. Indexes are still searchable during updates.
When an instance receives a query request, the request is processed on all included collections, then
the assembled query results are returned.
xPlore provides a web-based administration console.

Physical architecture
The xPlore index service and search service are deployed as a WAR file to a JBoss application server
that is included in the xPlore installer. xPlore administrator and online help are installed as war files
in the same JBoss application server. The index is stored in the storage location that was selected
during configuration of xPlore.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

19

Introduction to xPlore

xPlore disk areas


xPlore instances
xDB libraries and Lucene index
Indexes

xPlore disk areas


xPlore creates disk areas for xDB data and redo log, the Lucene index within xDB, a temp area, and
xPlore configuration and utilities. When you run the xPlore installer to configure an index agent, a
disk area is created for content staging. The following table describes how these areas are used during
indexing and search.
Table 2

Disk areas for xPlore

Area

Description

Use in indexing

Use in search

xplore_home/data

Stores dftxml, metrics,


audit, ACLs and groups.
Performs query lookup
and retrieval and facet
and security information

Next free space is


consumed by disk block
for batch XML files.
Index updated through
inserts and merges

Random access retrieval


for specific elements and
summary. Inverted index
lookup and facet and
security retrieval

xplore_home/config/log

Stores transaction
information

Updates to xDB data are


logged

Provides snapshot
information during some
retrievals

Non-committed data is
stored to the log

None

Holds content

None

xplore_home/dsearch/adminUsed for restore when


/lib
xPlore is down.
temp

1. (CPS) Intermediate
processing
2. (CPS) Exports to the
index service
3. Index: Updates to
the Lucene index
(non-transactional)

Index agent content


staging area

Temporarily holds
content during indexing
process

xPlore instances
An xPlore instance is a web application instance (WAR file) that resides on an application server. You
can have multiple instances on the same host (vertical scaling), although it is more common to have
one xPlore instance per host (horizontal scaling). You create an instance by running the xPlore installer.
The first instance that you install is the primary instance. You can add secondary instances after
you have installed the primary instance. The primary instance must be running when you install
a secondary instance.
Adding or deleting an instance
20

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Introduction to xPlore

To add an instance to the xPlore system, run the xPlore configurator script. If an xPlore instance exists
on the same host, select a different port for the new instance, because the default port is already in use.
To delete an instance from the xPlore system, use the xPlore configurator script. Shut down the
instance before you delete it.
You manage instances in xPlore administrator. Click Instances in the left panel to see a list of
instances in the right content pane. You see following instance information:
OS information: Host name, status, OS, and architecture.
JVM information: JVM version, active thread count, and number of classes loaded.
xPlore information: xDB version, instance version, instance type, and state.
An instance can have one or more of the following features enabled:
Content processing service (CPS)
Indexing service
Search service
xPlore Administrator (includes analytics, instance, and data management services)
Spare: A spare instance can be manually activated to take over for a disabled or stopped instance.
See Replacing a failed instance with a spare, page 34.
You manage an instance by selecting the instance in the left panel. Collections that are bound to the
instance are listed on the right. Click a collection to go to the Data Management view of the collection.
The application server instance name for each xPlore instance is recorded in indexserverconfig.xml. If
you change the name of the JBoss instance, change the value of the attribute appserver-instance-name
on the node element for that instance. This attribute is used for registering and unregistering instances.
Back up the xPlore federation after you change this file.

xDB libraries and Lucene index


xDB is a Java-based XML database that enables high-speed storage and manipulation of many XML
documents. xDB supports the XQuery language and XQFT query specifications. An xDB library has
a hierarchical structure like an OS directory. The library is a logical container for other libraries
or XML documents.
A library corresponds to a collection in xPlore with additional metadata such as category, usage, and
properties. An xDB library stores an xPlore collection as one or more Lucene indexes that can include
the XML content that is indexed. xPlore manages the indexes on the collection.
xDB manages the following libraries for xPlore:
The root library contains a SystemData with metrics and audit databases. These databases record
metrics and audit queries by xPlore instance.
Each domain contains an xDB tracking library (database) records the content that has been indexed.
Each domain contains a status library (database) that reports indexing status for the domain.
Each domain contains one or more data libraries. The default library is the first that is created for
a domain.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

21

Introduction to xPlore

When xPlore processes an XML representation of an input document and supplies tokens to xDB,
xDB stores them into a Lucene index. Optionally, xPlore can be configured to store the content
along with the tokens. A tracking database in xDB manages deletes and updates to the index. When
documents are updated or deleted, changes to the index are propagated. When xPlore supplies XQuery
expressions to xDB, xDB passes them to the Lucene index. To query the correct index, xDB tracks
the location of documents.
xDB manages parallel dispatching of queries to more than one Lucene index when parallel queries
are enabled. For example, if you have set up multiple collections on different storage locations, you
can query each collection in parallel.
Figure 1

xDB and Lucene

An xDB library is stored on a data store. If you install more than one instance of xPlore, the storage
locations must be accessible by all instances. The xDB data stores and indexes can reside on a separate
data store, SAN or NAS. The locations are configurable in xPlore administrator. If you do not have
heavy performance requirements, xDB and the indexes can reside on the same data store.

Indexes
xDB has several possible index structures that are queried using XQuery. The Lucene index is
modeled as a multi-path index (a type of composite index). in xDB. The Lucene index services both
value-based and full-text probes of the index.
Covering indexes are also supported. When the query needs values, they are pulled from the index and
not from the data pages. Covering indexes are used for security evaluation and facet computation.
You can configure none, one, or multiple indexes on a collection. An explicit index is based on values
of XML elements, paths within the XML document, path-value combination, or full-text content.
For example, following is a value indexed field:
/dmftdoc[dmftmetadata//object_name="foo"]

22

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Introduction to xPlore

Following is a tokenized, full-text field:


/dmftdoc[dmftmetadata//object_name ftcontains foo]

Indexes are defined and configured in indexserverconfig.xml. For information on viewing and updating
this file, see Modifying indexserverconfig.xml, page 43.

Logical architecture
A domain contains indexes for one or more categories of documents. A category is logically
represented as one or more collections. Each collection contains indexes on the content and metadata.
When a document is indexed, it is assigned to a category or class of documents and indexed into one
of the category collections.
Documentum domains and categories, page 25
Mapping of domains to xDB, page 26

Domains
A domain is a separate, independent, logical grouping of collections with an xPlore deployment. For
example, a domain could contain the indexed contents of a single Documentum content repository.
Domains are defined in xPlore administrator in the data management screen. A domain can have
multiple collections in addition to the default collection.
The Documentum index agent creates a domain for the repository to which it connects. This domain
receives indexing requests from the Documentum index agent.

Categories
A category defines how a class of documents is indexed. All documents submitted for ingestion
must be in XML format. (For example, the Documentum index agent prepares an XML version for
Documentum repository indexing.) The category is defined in indexserverconfig.xml and managed
by xPlore. A category definition specifies the processing and semantics that is applied to an ingested
XML document. You can specify the XML elements that are used for language identification. You
can specify the elements that have compression, text extraction, tokenization, and storage of tokens.
You also specify the indexes that are defined on the category and the XML elements that are not
indexed. A collection belongs to one category.

Collections
A collection is a logical group of XML documents that is physically stored in an xDB detachable
library. A collection represents the most granular data management unit within xPlore. All documents
submitted for indexing are assigned to a collection. A collection generally contains one category of
documents. In a basic deployment, all documents in a domain are assigned to a single default collection.
You can create subcollections under each collection and route documents to user-defined collections.
A collection is bound to a specific instance in read-write state (index and search, index only, or update
and search). A collection can be bound to multiple instances in read-only state (search-only). Three
collections (two hot and one cold) with their corresponding instances are shown.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

23

Introduction to xPlore

Figure 2 Read-write (index and search) and read-only (search-only) collections on


two instances

Use xPlore Administrator to do the following:


Define a collection and its category
Back up the collection
Change the collection state to read-only or read-write
Change the collection binding to a different instance
The metrics and audit systems store information in collections in a domain named SystemData. You
can view this domain and collections in xPlore administrator. One metrics and one audit database is
defined. Each database has a subcollection for each xPlore instance.
The following diagram shows the services of a simple xPlore system: Two installed instances, each
with its own indexing, search, and CPS services.

24

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Introduction to xPlore

Figure 3

Services on two instances

Example
A document is submitted for indexing. The client indexing application, for example, Documentum
index agent, has not specified the target collection for the document. If the document exists, the index
service updates the document. If it is a new document, the document is assigned to an instance based
on a round-robin order. On that instance, if the instance has more than one collection, then collection
routing is applied. If collection routing is not supplied by a client routing class or the Documentum
index agent, the document is assigned to a collection in round-robin order.

Documentum domains and categories


Repository domains
An xPlore domain generally maps to a single Documentum repository. Within that domain, you can
direct documents to one or more collections. In the following configuration in indexserverconfig.xml,
a repository is mapped to a domain. Three collections are defined: one for metadata and content
(default), one for ACLs, and one for groups. These latter two collections are used to filter results for
permissions before returning them to the client application. The collections in the domain can be
distributed across multiple xPlore instances. (Each collection is bound to an instance.)
<domain storage-location-name="default" default-document-category="dftxml"
name="TechPubsGlobal">
<collection document-category="dftxml" usage="Data" name="default"/>
...
</domain>

EMC Documentum xPlore Version 1.3 Administration and Development Guide

25

Introduction to xPlore

Documentum categories
A document category defines the characteristics of XML documents that belong to that category and
their processing. All documents are sent to a specific index based on the document category. For
example, xPlore pre-defines a category called dftxml that defines the indexes. All Documentum
indexable content and metadata are sent to this category.
The following Documentum categories are defined within the domain element in indexserverconfig.xml.
For information on viewing and updating this file, see Modifying indexserverconfig.xml, page 43.
dftxml: XML representation of object metadata and content for full text indexing. To view the
dftxml representation using xPlore administrator, click the document in the collection view.
acl: ACLs that defined in the repository are indexed so that security can be evaluated in the full-text
engine. See About security, page 51 for more information.
group: Groups defined in the repository are indexed to evaluate security in the full-text engine.

Mapping of domains to xDB


Figure 4

Database structure for two instances

The entire xPlore federation library is stored in xDB root-library.


26

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Introduction to xPlore

One content source (Documentum repository A) is mapped to a domain library. The library is
stored in a defined storage area on either instance.
A second repository, Repository B, has its own domain.
All xPlore domains share the system metrics and audit databases (SystemData library in xDB
with libraries MetricsDB and AuditDB). The metrics and audit databases have a subcollection for
each xPlore instance.
The ApplicationInfo library contains Documentum ACL and group collections for a specific
domain (repository).
The SystemInfo library has two subcollections: TrackingDB and StatusDB. Each collection in
TrackingDB matches a collection in Data and is bound to the same instance as that data collection.
There is a subcollection in StatusDB for each xPlore instance. The instance-specific subcollection
has a file status.xml that contains processing information for objects processed by the instance.
The Data collection has a default subcollection.

How Content Server documents are indexed


Figure 5

xPlore indexing path

1. In a client application, a Save, Checkin, Destroy, Readonlysave, or MoveContent operation is


performed on a SysObject in the repository.
2. This operation event generates a queue item (dmi_queue_item) in the repository that is sent to the
full-text user work queue. (The full-text user, dm_fulltext_index_user, is a Superuser created when
a repository is created or when an existing repository is upgraded.) The index agent retrieves the
EMC Documentum xPlore Version 1.3 Administration and Development Guide

27

Introduction to xPlore

queue item and applies index agent filters. After an index request is submitted to xPlore, the client
application can move on to the next task. (Indexing is asynchronous.)
3. The index agent retrieves the object associated with the queue item from the repository. The content
is retrieved or staged to a temporary area. The agent then creates a dftxml (XML) representation
of the object that can be used full-text and metadata indexing.
4. The Index Agent sends the dftxml representation of the content and metadata to the xPlore Server.
5. The xPlore indexing service calls CPS to perform text extraction, language identification, and
transformation of metadata and content into indexable tokens.
6. The xPlore indexing service performs the following steps:
Routes documents to their target collections.
Merges the extracted content into the dftxml representation of the document.
Calls xDB to store the dftxml in xDB.
Returns the tokens from CPS to xDB.
Stores the document location (collection) and document ID in the TrackingDB.
Saves indexing metrics in the MetricsDB.
Tracks document indexing status in the StatusDB.
7. The indexing service notifies the index agent of the indexing status. The index agent then removes
the queue item from the Content Server. Otherwise, the queue item is left behind with the error
status and error message.
The object is now searchable. (The index service does not provide any indication that an object is
searchable.) For information on how to troubleshoot latency between index agent submission and
searchability, see Troubleshooting indexing, page 146.

Enabling indexing for an object type


Events in dmi_registry for the user dm_fulltext_index_user generate queue items for indexing. The
following events are registered for dm_fulltext_index_user to generate indexing events by default:
dm_sysobject: dm_save, dm_checkin, dm_destroy, dm_saveasnew, dm_move_content
dm_acl: dm_save, dm_destroy, dm_saveasnew
dm_group: dm_save, dm_destroy

Registering a type for full-text indexing


Use Documentum Administrator to change the full-text registration for an object type. Select the type,
view the properties, and for the property Enable indexing check Register for indexing. To change
specific events that are registered for full-text, use the DFC API registerEvent().

Reindexing
The index agent does not recreate all the queue items for reindexing. Instead, it creates a watermark
queue item (type dm_ftwatermark) to indicate the progress of reindexing. It picks up all the objects for
28

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Introduction to xPlore

indexing in batches by running a query. The index agent updates the watermark as it completes each
batch. When the reindexing is completed, the watermark queue item is updated to done status.
You can submit for reindexing one or all documents that failed indexing. In Documentum
Administrator, open Indexing Management > Index Queue. Choose Tools > Resubmit all failed
queue items, or select a queue item and choose Tools > Resubmit queue item.

How Content Server documents are queried


Several software components control full-text search using the xPlore server:
The Content Server queries the full-text indexes and returns query results to client applications.
The xPlore server responds to full-text queries from Content Server.

Path of a query from a Documentum client to xPlore


1. The client application submits a DQL query or XQuery to the Documentum Content Server. (If
the client application uses DFC 6.6 or higher to create the query, DFC translates the query into
XQuery syntax.)
2. The Server transmits the query to xPlore (Content Server 6.6 or higher). The query plugin translates
the query into XQuery syntax.
3. The query plugin transmits batches of HTTP messages containing XQuery statements to the xPlore
search service.
4. CPS tokenizes the query based on the locale declared in the query. xDB breaks the query into
XQuery clauses for full-text (using ftcontains) and metadata (using value constraints). The query is
executed in the Lucene index against all collections unless a collection is specified in the query.
5. xDB applies the xPlore security filter to evaluate the security of the search results. If Documentum
security evaluation is enabled, then security evaluation is done by the Content Server.
6. The results are returned in batches, with summary and facets.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

29

Chapter 2
Managing the System
This chapter contains the following topics:

Opening xPlore administrator

Starting and stopping the system

Viewing and configuring global operations (all instances)

Managing instances

Configuring an instance

Managing the watchdog service

Configuring the watchdog service

Configuring disk space monitoring

Configuring system metrics

Managing the status database

Configuring the audit record

Troubleshooting system problems

Modifying indexserverconfig.xml

Customizations in indexserverconfig.xml

Tasks performed outside xPlore administrator

Administration APIs

Opening xPlore administrator


Most system administration tasks are available in xPlore administrator. When you open xPlore
administrator, you see the navigation tree and the system overview page. From the tree, you can open
administration pages for system-wide services, instance-specific services, data management, and
diagnostics and troubleshooting. From the system overview, click the grid symbol to see the following
information for each xPlore instance: component status, instance description, instance status, JVM
information, and runtime information.
When you open a service page, such as indexing service, the actions apply to all indexing services in
the xPlore installation. To change the indexing service configuration for a specific instance, open the
instance in the navigation tree and then choose the service.
1. Open your web browser and enter one of the following:
http://host:port/dsearchadmin
https://host:port/dsearchadmin

EMC Documentum xPlore Version 1.3 Administration and Development Guide

31

Managing the System

host: DNS name of the computer on which the xPlore primary instance is installed.
port: xPlore primary instance port (default: 9300).
Log in as the Administrator with the password that you entered when you installed the primary
instance. The xPlore administrator home page displays a navigation tree in the left pane and links
to the four management areas in the content pane.
2. Click System Overview in the left tree to get the status of each xPlore instance. Click Global
Configuration to configure system-wide settings.
If you are unable to open the xPlore administrator URL, with an error message Not configured
, and the index agent error message Not responding, a firewall is likely preventing access. For
information on changing the administrator password, see Changing the administrator password, page
53.

Starting and stopping the system


If you run a stop script, run as the same administrator user who started the instance. If you are stopping
the primary instance, stop all other instances first.
1. Start or stop secondary instances in xPlore administrator. Navigate to the instance in the tree
and choose Stop instance or Start instance.
2. Start or stop the primary instance using the start or stop script in xplore_home/jboss5.1.0/server.
When the JBoss java process has terminated, you see the following in dsearch.log:
<event timestamp="2012-06-11 23:34:39,066" thread="JBoss Shutdown Hook">
The XML database server is shutdown successfully.</event>
<event timestamp="2012-06-11 23:34:42,300" thread="JBoss Shutdown Hook">
The DSS instance PrimaryDsearch is shut down.</event>

If you did not stop secondary instances, they report a failed connection to the primary instance when
you restart it.

Viewing and configuring global operations (all


instances)
A single xPlore federation is a set of instances with a single primary instance and optional secondary
instances.
1. In xPlore administrator, select System Overview in the left panel.
2. Click Home > System overview.
3. Select a service to view the status of each instance of the service.
4. Choose Global Configuration to configure the following system-wide settings:
Storage Location. See Creating a collection storage location, page 164.
Index Service. See Document processing and indexing service configuration parameters,
page 339.
Search Service. See Search service configuration parameters, page 343.
Logging. See Configuring logging, page 297.
Engine. Configure incremental backups. See Backup in xPlore administrator, page 179.
32

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing instances

Auditing. See Auditing queries, page 244, Troubleshooting data management, page 167, and
Configuring the audit record, page 39.

Managing instances
Configuring an instance
You can configure the indexing service, search service, or content processing service for a secondary
instance. Select the instance in xPlore administrator and then click Stop Instance.

Requirements
All instances in an xPlore deployment must have their host clocks synchronized to the primary
xPlore instance host.

Configuring the primary instance


You can set the following attributes on the primary instance element (node) in indexserverconfig.xml.
For information on viewing and updating this file, see Modifying indexserverconfig.xml, page 43.
xdb-listener-port: By default, the xDB listener port is set during xPlore installation.
<node name="primary" hostname="myserver" xdb-listener-port="9330">
...
</node>

primaryNode attribute: Set to true.


admin-rmi-port: Specify the port at which other instances connect to xPlore administrator. By
default, this value is set to the port number of the JBoss connector + 31. Default: 9331
url: Specify the URL of the primary instance, used to set connections from additional instances.

Changing the host name and URL


Stop all xPlore instances.
1. Edit indexserverconfig.xml in xplore_home/config. On the node element, change the values of the
hostname to have the new host name and URL.
2. Primary instance only: Use iAPI to change the parameters for the host name and port in the
dm_ftengine_config object. This change takes effect when you restart the repository.
a. Do the following iAPI command:
retrieve,c,dm_ftengine_config

b. Use the object ID returned by the previous step to get the parameters and values and their
index positions. For example:
?,c,select param_name, param_value from dm_ftengine_config where
r_object_id=080a0d6880000d0d

EMC Documentum xPlore Version 1.3 Administration and Development Guide

33

Managing instances

c. Enter your new port at the SET command line. If the port was returned as the second parameter,
set the index to 2 as shown in the following example:
set,c,l,param_value[2]
SET>new_port
save,c,l

d. Enter your new host name at the SET command line. For example:
retrieve,c,dm_ftengine_config
set,c,l,param_value[3]
SET>new_hostname
save,c,l

3. Restart the Content Server and xPlore instances.

Replacing a failed instance with a spare


You can install a spare instance using the xPlore installer. When you install a spare instance, the data,
index, and log directories must all be accessible to the primary instance. Use shared storage for the
spare. When you activate the spare to take over a failed instance, xPlore recovers failed data using the
transaction log.
You cannot change an active instance into a spare instance.
Use xPlore administrator to activate a spare to replace a failed or stopped secondary instance. If you
are replacing a primary instance, see Replacing a failed primary instance, page 35.
1. Stop the failed instance.
2. Open xPlore administrator and verify that the spare instance is running.
3. Select the spare instance. Click Activate Spare Instance.
4. Choose the instance to replace.
When xPlore administrator reports success, the spare instance is renamed in the UI with the replaced
instance name. When you activate a spare to replace another instance, the spare takes on the
identity of the old instance. For example, if you activated DSearchSpare to replace DSearchNode3,
the spare instance becomes DSearchNode3. The old instance can no longer be used for ingestion or
queries. The failed instance is renamed with Failed appended, for example, Instance2Failed.
The activated instance is not registered with the watchdog service. To register it for watchdog
notifications, edit the configuration file dsearch-watchdog-config.xml. This file is located in
xplore_home/watchdog/config.
1. Copy and paste an existing watchdog-config element for an active instance.
2. Edit the following properties for the activated spare in your copied watchdog-config element:
watchdog-config[@host-name]
watchdog-config/application-config{@instance-name]
watchdog-config/application-config/properties/property[
@name="application_url" value]
watchdog-config/application-config/tasks/task[@category="process-control" id]

3. Restart the watchdog service or run the Linux watchdog script.

34

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing instances

For information on changing a failed instance to spare, see Changing a failed instance into a spare,
page 37.

Replacing a failed primary instance


1. Shut down all xPlore instances. The shutdown scripts are located in xplore_home/jboss5.1.0/server.
(On Windows, each instance is installed as an automatic service.) If you run a stop script, run as the
same administrator user who started the instance.
2. Edit indexserverconfig.xml, which is located in xplore_home/config.
3.
4.

5.
6.

Note: Do not change the value of appserver-instance-name.


Locate the node element for the old primary instance. Note the name of the node for the next step.
Delete this node element.
Locate the spare node element in indexserverconfig.xml. (The status attribute is set to spare.)
a. Set the status to normal.
b. Change the value of the primaryNode attribute to true.
c. Change the value of the name attribute to the name of your previous primary instance, for
example, PrimaryDsearch. Use the previous primary instance name.
Validate your changes using the validation tool described in Modifying indexserverconfig.xml,
page 43.
Edit indexserver-bootstrap.properties in the web
application for the new primary instance, for example,
xplore_home/jboss5.1.0/server/DctmServer_Spare/deploy/dsearch.war/WEB-INF/classes.
a. Change the value of the node-name property to PrimaryDsearch.
b. Change the value of the isPrimary property to true.
c. Change the value of xhive-connection-string to match the host of your new primary instance,
for example:
xhive-connection-string=xhive\://10.32.168.105\:9432

d. Edit indexserver-bootstrap.properties in all other xPlore instances to reference the new primary
instance.
7. Edit xdb.properties in the directory WEB-INF/classes of the new primary instance.
a. Find the XHIVE_BOOTSTRAP entry and edit the URL to reflect the new primary instance
host name and port. (This bootstrap file is not the same as the indexserver bootstrap file.)
b. Change the host name to match your new primary instance host.
c. Change the port to match the port for the value of the attribute xdb-listener-port on the new
instance.
For example:
XHIVE_BOOTSTRAP=xhive://NewHost:9330

d. Edit xDB.properties in all other xPlore instances to reference the new primary instance.
8. Update xdb.bat in xplore_home/dsearch/xhive/admin. Your new values must match the values in
indexserverconfig.xml for the new primary instance.
Change the path for XHIVE_HOME to the path to the new primary instance web application.
Change ESS_HOST to the new host name.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

35

Managing instances

Change ESS_PORT to match the value of the port in the url attribute of the new primary
instance (in indexserverconfig.xml).
9. Start the xPlore primary instance, then start the secondary instances.
10. Update the index agent.
a. Shut down the index agent instance and modify indexagent.xml in
xplore_home/jboss5.1.0/server/DctmServer_Indexagent/deploy/IndexAgent.war/WEB-INF/classes.
b. Change parameter values for parameters that are defined in the element
indexer_plugin_config/generic_indexer/parameter_list/parameter.
Change the parameter_value of the parameter dsearch_qrserver_host to the new host name.
Change the parameter_value of the parameter dsearch_qrserver_port to the new port.
11. Update dm_ftengine_config on the Content Server. Use iAPI to change the parameters for the
host name and port in the dm_ftengine_config object. This change takes effect when you restart
the repository.
a. To find the port and host parameter index values for the next step, do the following iAPI
command:
retrieve,c,dm_ftengine_config

b. Use the object ID to get the parameters and values and their index positions. For example:
?,c,select param_name, param_value from dm_ftengine_config where
r_object_id=080a0d6880000d0d

c. To set the port, enter your new port at the SET command line. If the port was returned as the
third parameter in step 3, substitute 3 for the parameter index. For example:
retrieve,c,dm_ftengine_config
set,c,l,param_value[3]
SET>new_port
save,c,l

d. To set the host name, enter your new host name at the SET command line:
retrieve,c,dm_ftengine_config
set,c,l,param_value[4]
SET>new_hostname
save,c,l

12. Back up the federation.


Troubleshooting primary failover
If you did not configure xPlore administrator when you set up the spare, extract dsearchadmin.war
from xplore_home/setup/dsearch into your spare instance jboss5.1.0/deploy directory. Update the
path to dsearchadminweb.log in logback.xml of the spare instance. Specify a path to the logs
directory in the primary instance JBoss application on the spare instance.
A startup error FEDERATION_ALREADY_OPEN can be encountered when the old primary
instance has not fully terminated before you replace it. The best way to confirm instance shutdown
on Windows is to set the Windows xPlore service to manual. Use the JBoss script to stop the
instance. If you must use the Windows service, check dsearch.log on the old primary instance to find
The DSS instance PrimaryDsearch is shut down. Then configure the new primary instance.

36

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing the watchdog service

Changing a failed instance into a spare


Because the identity of a failed instance is assigned to another instance, the identity of the failed
instance must be changed.
1. Open indexserverconfig.xml in xplore_home/config.
a. Change the node element name attribute of the failed instance to a new unique name.
b. Change the node element status attribute to spare.
2. Modify indexserver-bootstrap.propertie in the WEB-INF/classes directory of the application server
instance, for example:
xplore_home/jboss5.1.0/server/DctmServer_PrimaryDsearch/deploy/dsearch.war/WEB-INF/classes
Change the node-name key value to the one you set in indexserverconfig.xml.
3. Restart the primary and then the secondary instances.
4. Back up the xPlore federation.

Managing the watchdog service


Starting and stopping the watchdog service
The xPlore watchdog service is a Windows service or daemon process that monitors and checks the
status of various processes in xPlore. One watchdog service is installed on each xPlore host. If a host
has multiple xPlore instances, the watchdog service can monitor all instances. If a process such
as the indexing or search service fails, the watchdog service detects the failure and sends an email
notification to the administrator.
When an xPlore instance or index agent instance is deleted, the watchdog configuration file is updated
to remove that instance. If you activate a spare xPlore instance, you must manually configure the
watchdog service as described in Replacing a failed instance with a spare, page 34.
On Windows hosts, the watchdog process starts at xPlore installation and when the host is booted up.
On Linux hosts, you must start the watchdog process manually. It runs as a standalone Java process.
1. To turn off the watchdog service:
On Windows hosts, stop the watchdog service: Documentum Search Services Watchdog.
On Linux hosts, run the script stopWatchdog.sh in xplore_home/watchdog. If you run a stop
script, run as the same administrator user who started the instance.
2. To restart the watchdog service:
On Windows hosts, start the watchdog service: Documentum Search Services Watchdog.
On Linux hosts, run the script startWatchdog.sh in xplore_home/watchdog.

Configuring the watchdog service


To configure watchdog timing, edit the configuration file dsearch-watchdog-config.xml. This file
is located in xplore_home/watchdog/config. The following timing properties within the element
timing-in can be configured:
EMC Documentum xPlore Version 1.3 Administration and Development Guide

37

Managing the watchdog service

recurrence timeunit and frequency: Specifies how often the task is executed. For example, the
disk space task with a frequency of 2 and time unit of hours checks disk space every two hours.
Default: Every minute.
start-date: date and time the task should be invoked, in UTC format. If the date is in the past,
the task will be executed as soon as possible.
expiry-date: Specifies the date and time a task stops executing, in UTC format.
max-response-timeout: Specifies how long between detection of a hung task and execution of the
notification (or other task). For example, a wait-time value of 6 and time unit of hours indicates a
wait of 6 hours before notification about a non-responding instance.
max-retry-threshold: Specifies the maximum number of times the task can be retried. For example,
if the task is notification, a value of 10 indicates the notification task is retried 10 times. Recurring
tasks are retried at the next scheduled invocation time.
max-iterations: Maximum number of times to attempt to ping an instance that has no response.
Default: -1 (no limit)
You can also configure the timing properties for the index agent. If you change
the installation owner password, modify the property docbase_password in
dsearch-watchdog-config.xml with the new encrypted password. To encrypt the password, run
xplore_home/watchdog/tools/encrypt-password.bat|sh.

Configuring disk space monitoring


The watchdog can detect out of space issues in xPlore instances. The watchdog monitors storage
locations and the config and log directories. When the limit of available space is reached, the watchdog
sends a notification to the administrator.
To configure watchdog disk space monitoring, edit the configuration file dsearch-watchdog-config.xml.
This file is located in xplore_home/watchdog/config. The task id is similar to
PrimaryDsearch_DiskFreeSpaceMonitor. The ID changes depending on the ID of the xPlore instance.
The following properties can be configured. All but the first property are configured within the
timing-info element.
percent_available_space_to_take_action: Value attribute: Percentage of free disk space. Default: 30.
recurrence timeunit and frequency: Specifies how often disk space is checked. For example, a
frequency of 2 and time unit of hours checks every two hours. Default: 120 minutes (2 hours).
start-date: date and time the task should be invoked, in UTC format. If the date is in the past,
the task will be executed as soon as possible.
expiry-date: Specifies the date and time a task stops executing, in UTC format.
max-response-timeout: Specifies how long between detection of a hung task and execution of the
notification (or other task). For example, a wait-time value of 6 and time unit of hours indicates a
wait of 6 hours before notification about a non-responding instance.
max-retry-threshold: Specifies the maximum number of times the task can be retried before the next
scheduled invocation time.
max-iterations: Not applicable to this task. Default: -1 (no limit)
38

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing the watchdog service

When the data in a remote shared environment is referenced by a local symbolic link in Windows, the
watchdog cannot monitor the disk space of the remote environment.

Configuring system metrics


Configure system metrics in indexserverconfig.xml. For information on viewing and updating this file,
see Modifying indexserverconfig.xml, page 43. For information on the settings for indexing and CPS
metrics, see Document processing and indexing service configuration parameters, page 339.
By default, system metrics are saved in batches of 100 every 5 seconds. To change these values,
add the following line to the system-metrics-service element. The wait-timeout unit is seconds. For
example, if wait-timeout is set to 10, the latest metrics are available about 10 seconds later (average 5
seconds). The batch size determines how many metrics are accumulated before they are saved to the
system metrics database in xDB. If batch size is reached before timeout, the batch is recorded.
<persistence-service batch-size="100" wait-timeout="10"/>

Managing the status database


The status database records the indexing status of ingested documents (success or failure).
1. Control how much data is cached before the status DB is updated. Open indexserverconfig.xml and
set the statusdb-cache-size property for each instance.
In the following example, the cache size is set to 1000 instead of the default 10000 bytes:
<node ...>
<properties>
<property value="1000" name="statusdb-cache-size"/>
</properties>

For information on viewing and updating this file, see Modifying indexserverconfig.xml, page 43.
2. Conserve disk space on the primary host: Purge the status database when the xPlore
primary instance starts up. Set the value of the purge-statusdb-on-startup attribute on the
index-server-configuration element to true.

Configuring the audit record


In most cases, you do not need to modify the default configuration in indexserverconfig.xml. For
information on viewing and updating this file, see Modifying indexserverconfig.xml, page 43.
Records over 30 days old are automatically purged. You can configure the purge schedule:
auditing/location element: Specifies a storage path for the audit record. Attributes: name, path,
size-limit. Size limit units: K | M | G | T (KB, MB, GB, TB). Default: 2G,
audit-config element: Configures auditing. Attributes: component, status, format, location.
Note: Changes to format and location are not supported in this release.
properties/property element:
EMC Documentum xPlore Version 1.3 Administration and Development Guide

39

Managing the watchdog service

audit-save-batch-size: Specifies how many records are batched before a save. Default: 100.
lifespan-in-days: Specifies time period before audit record is purged. Default: 30
preferred-purge-time: Specifies the time of day at which the audit record is purged. Format:
hours:minutes:seconds in 24-hour time. Default: midnight (00:00:00)
audit-file-size-limit: Size limit units: K | M | G | T (KB, MB, GB, TB).
audit-file-rotate-frequency: Period in hours that a file serves as the active storage for the audit
records.

Viewing the audit record


The entire audit record can be large, and viewing can cause an out of memory error. Use reports
to query audit records. Reports allow you to specify a time period and other report criteria. You
can customize reports to query the audit record.
To view the entire audit record, drill down to the AuditDB collection in Data Management >
SystemData. Click AuditDB and then click auditRecords.xml. An audit record has the following
format in XML:
<event name=event_name component=component_name timestamp=time>
<element-name>value</element_name>
...
</event>

Troubleshooting system problems


For troubleshooting installation, see EMC Documentum xPlore Installation Guide.

Checking the installation versions


All xPlore instances and index agents should have the same installation version. You can check the
version in the version.properties file in xplore_home/installinfo.

Connection refused
Indexing fails when one of the xPlore instances is down. The error in dsearch.log is like the following:
CONNECTION_FAILED: Connect to server at 10.32.112.235:9330 failed,
Original message:
Connection refused

Check the following causes for this issue:


Instance is stopped: A query that hits a document in a collection bound to the stopped instance
fails. For example, you create collection 2 and upload a file to it. You set the mode for collection 2
to search_only and bind the collection to instance 2 and instance 3. Stop instance 2 and query for
a term in the uploaded file.
The xPlore host name has changed: If you have to change the xPlore host name, do the following:
40

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing the watchdog service

1. Update indexserverconfig.xml with the new value of the URL attribute on the node element. For
information on viewing and updating this file, see Modifying indexserverconfig.xml, page 43.
2.

Change the JBoss startup (script or service) so that it starts correctly. If you run a stop script,
run as the same administrator user who started the instance.

Unable to start JBoss


If the file system read/write permissions have changed, you see an error like the following:
ERROR setFile(null,false) call failed.
java.io.FileNotFoundException: ...boot.log (Read-only file system)

Timing problems: Login ticket expired


All instances in an xPlore deployment must have their host clocks synchronized to the primary xPlore
instance host. Shut down all xPlore instances, synchronize clocks, and restart.

Which indexing engine are you using?


xPlore is represented in the repository by the ft engine config object (dm_ftengine_config).
There is one instance of the ft engine config object for each full-text index object. If there is no
dm_ftengine_config object, then full-text is not enabled in the Content Server. When you configure
an index agent for a repository, full-text is automatically enabled.
Verify full-text enabled
Use the following iAPI command to get the object ID of the indexing engine:
retrieve,c,dm_ftengine_config

The dm_fulltext_index object attribute is_standby must be set to false (0). Substitute your object ID:
retrieve,c,dm_fulltext_index
3b0012a780000100
?,c,select is_standby from dm_fulltext_index where r_object_id=3b0012a780000100
0

Verify which engine is used


To verify that the xPlore engine in Content Server is being used, get the object_name attribute of the
dm_ftengine_config object after you have retrieved the object. Substitute the object ID in the following
command. DSearch Fulltext Engine Configuration is returned for xPlore. The FAST configuration
object name contains FAST:
retrieve,c,dm_ftengine_config
080012a780002900
?,c,select object_name from dm_ftengine_config where r_object_id=080012a780002900
DSearch Fulltext Engine Configuration

I/O errors, or no such file or directory


The following causes can result in this error:
EMC Documentum xPlore Version 1.3 Administration and Development Guide

41

Managing the watchdog service

Multiple instances of xPlore: Storage areas must be accessible from all other instances. If not, you
see an I/O error when you try to create a collection. Use the following cleanup procedure:
1.

Shut down all xPlore instances.

2. Edit xhivedatabase.bootstrap in xplore_home/config. Change the binding node value to primary


for segments that have this problem.
3.
4.

Edit indexserverconfig.xml to remove binding elements from the collection that has the issue.
For information on viewing and updating this file, see Modifying indexserverconfig.xml, page 43.
Restart xPlore instances.

I/O error indexing a large collection: Switch to the 64-bit version of xPlore and use 4+ GB of
memory when a single collection has more than 5 million documents.
I/O error during index merge: Documents are added to small Lucene indexes within a single
collection. These indexes are merged into a larger final index to help query response time. The final
merge stage can require large amounts of memory. If memory is insufficient, the merge process
fails and corrupts the index. Switch to the 64-bit version of xPlore and allocate 4 GB of memory
or more to the JVM.
com.xhive.error.XhiveException: IO_ERROR:
Failure while merging external indexes, Original message:
Insufficient system resources exist to complete the requested service

To fix a corrupted index, see Repairing a corrupted index, page 175. To delete a corrupted domain, see
Delete a corrupted domain, page 157.

High-volume/low memory errors


Application server out of memory
java.lang.OutOfMemoryError: GC overhead limit exceeded. Increase the default and maximum
JVM heap size.
1. Stop the application server.
2. Edit the script that launches an xPlore server or index agent, located in
xplore_home/jboss5.1.0/server.
3. Increase the values of -Xms and -Xmx, save, and restart the application server.
4. Windows: Each instance is installed as an automatic service. Stop the service, edit the launch
script, and restart the service.
Virtual environment: Virtual environments are sometimes under powered or other applications in
the environment are overusing the resources. You can move to a less-used virtual host or add more
memory or cores to the virtual host.

Error on startup
Non-ASCII characters in indexserverconfig.xml can cause startup to fail.
If you edit indexserverconfig.xml using a simple text editor like Notepad, non-ASCII characters such
as are saved in native (OS) encoding. For example, Windows uses ISO-8859-1. xPlore uses UTF-8
encoding, which results in unexpected text errors.
42

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing the watchdog service

Use an XML editor to edit the file, and validate your changes using the xplore.bat (Windows) or
xplore.sh (Linux) script in xplore_home/dsearch/xhive/admin. Restart the xPlore instances.

Cannot change binding on a stopped instance


If an xPlore instance is stopped or crashed, you cannot change the binding of collections on that
instance to another instance. Do one of the following:
Restart before changing the binding.
If the instance has crashed, reboot the server before changing binding. If you restart the application
server without reboot, updated pages are not written to the disk.

Timeout error of an indexing request


When an index request times out, xPlore cancels it. The error in dsearch.log is like the following:
operation-id of r_object_id failed to be processed in 3600000 ms,
and will be cancelled.

If the indexing is very slow, you can modify the timeout for a request.
The index-request-time-out property specifies in milliseconds how much time is allowed before an
index request times out. The default value is 3600000 (1 hour).
Normally, you do not need to add this parameter to indexserverconfig.xml; do this only in one of the
following situations:
You want to import a very large thesaurus whose estimated processing time may exceed one hour,
set its value to a time sufficient for the thesaurus import to complete successfully.
You set the value of Index Agent timeout setting to longer than one hour. Set index-request-time-out
to the same value specified for the Index Agent timeout setting for the latter to take effect; otherwise,
index-request-time-out overrides the Index Agent setting.
To set the index-request-time-out parameter, add it to indexserverconfig.xml under
index-config/properties element and specify its value.
Remove this parameter from indexserverconfig.xml to use its default value. Here is an example
for a two-hours setting:
<property name="index-request-time-out" value="7200000"/>

Modifying indexserverconfig.xml
Some tasks are not available in xPlore administrator. These rarely needed tasks require manual editing
of indexserverconfig.xml. This file is located in xplore_home/config on the primary instance. It is
loaded into xPlore memory during the bootstrap process, and it is maintained in parallel as a versioned
file in xDB. All changes to the file are saved into the xDB file at xPlore startup.
On Windows 2008, you cannot save the file with the same name, and the extension is not shown. By
default, when you save the file, it is given a .txt extension. Be sure to replace indexserverconfig.xml
with a file of the same name and extension.
Note: Do not edit this file in xDB, because the changes are not synchronized with xPlore.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

43

Managing the watchdog service

1. Stop all instances in the xPlore federation.


2. Make your changes to indexserverconfig.xml on the primary instance using an XML editor.
Changes must be encoded in UTF-8. Do not use a simple text editor such as Notepad, which can
insert characters using the native OS encoding and cause validation to fail.
3. Set the configuration change check interval as the value of config-check-interval in milliseconds
on the root index-server-configuration element. The system will check for configuration changes
after this interval.
4. Validate your changes using the CLI validateConfigFile. From the command line, type the
following. Substitute your path to indexserverconfig.xml using a forward slash. Syntax:
xplore validateConfigFile path_to_config_file

For example:
xplore validateConfigFile "C:/xPlore/config/indexserverconfig.xml"

5. Back up the xPlore federation after you change this file.


6. Restart the xPlore system to see changes. The configuration file on the file system is compared on
startup to the one in database even if the revision number is the same.
When indexserverconfig.xml is malformed, xPlore does not start.
Troubleshooting
You can view the content of each version using the xhadmin tool (see Debugging queries, page 259.
Drill down to the dsearchConfig library, click a version, and then click Text:
Figure 6

XML content in xDB admin

Customizations in indexserverconfig.xml
Define and configure indexes for facets.
44

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing the watchdog service

Add and configure categories: Specifying the XML elements that have text extraction, tokenization,
and storage of tokens. Specify the indexes that are defined on the category and the XML elements
that are not indexed. Change the collection for a category.
Configure system, indexing, and search metrics.
Conserve disk space by purging the status database on startup.
Specify a custom routing-class for user-defined domains.
Change the xDB listener port and admin RMI port.
Turn off lemmatization.
Lemmatize specific categories or element content.
Configure indexing depth (leaf node).
Change the xPlore host name and URL.
Boost metadata and freshness in results scores.
Add or change special characters for CPS processing.
Trace specific classes. See Tracing, page 304.
Set the security filter batch size and the user and group cache size.

Tasks performed outside xPlore administrator


Some xPlore administration tasks can be performed in xPlore administrator as well as in
indexserverconfig.xml or xDB. Use xPlore administrator for those tasks. (The following tasks are not
common.)
Table 3

Tasks outside xPlore administrator

Action

indexserverconfig.xml

Define/change a category of
documents (see Configuring
categories, page 158)

Define a subpath for facets (see


Configuring facets in xPlore, page
274)

Disable system, index, search


metrics (see Configuring system
metrics, page 39)

Purge the status DB:


Set the value of the
purge-statusdb-on-startup attribute
on index-server-configuration to
true.

Register custom routing class


(see EMC Documentum Search
Development Guide)

EMC Documentum xPlore Version 1.3 Administration and Development Guide

xDB

45

Managing the watchdog service

Action

indexserverconfig.xml

Change primary instance or xPlore


host (see Replacing a failed
primary instance, page 35)

Configure lemmatization (see


Configuring query lemmatization,
page 212)

Configure indexing depth (see


Configuring text extraction, page
137)

Boost metadata and freshness (see


Configuring scoring and freshness,
page 201)

Change special characters list (see


Handling special characters, page
108)

xDB

Add a custom dictionary (see


Adding dictionaries to CPS, page
120)
Configure Documentum security
properties (see Changing search
results security, page 51)

Trace specific classes (see Tracing,


page 304.

Index agent tasks in the Documentum environment


The following index agent tasks are performed outside the index agent UI:
Limit content size for indexing (Maximum document and text size, page 98).
Exclude ACL and group attributes from indexing (Configuring the index agent after installation,
page 66).
Map file stores in shared directories (Sharing content storage, page 70).
Install an additional index agent for ACLs and groups (Setting up index agents for ACLs and
groups, page 65).
Map partitions to specific collections (Mapping Server storage areas to collections, page 71).
Verify index agent migration (Using ftintegrity, page 73).
Customize indexing and query routing, filter object types, and inject metadata (see Route a
document to a collection, page 150, Configuring index agent filters, page 69, and Injecting data
and supporting joins, page 80

Search configuration tasks in the Documentum


environment
The following search configuration tasks are performed outside xPlore administrator.
46

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing the watchdog service

Turn off xPlore native security (see Changing search results security, page 51).
Make types and attributes searchable (Making types and attributes searchable, page 223).
Turn off XQuery generation to support certain DQL operations (DQL, DFC, and DFS queries,
page 224).
Configure search for word fragments and wildcards (Configuring wildcards and fragment search,
page 218).
Route a query to a specific collection (Routing a query to a specific collection, page 257).
Turn on tracing for the Documentum query plugin (see Tracing Documentum queries, page 227).
Customize facets and queries (see About Facets, page 273).

Administration APIs
The xPlore Admin API supports all xPlore administrative functions. The Admin API provides you
with full control of xPlore and its components.
Note: Administration APIs are not supported in this release. The information is provided for planning
purposes.
Each API is described in the javadocs. Index service APIs are available in the interface IFtAdminIndex
in the package com.emc.documentum.core.fulltext.client.admin.api.interfaces. This package is in
the SDK jar file dsearchadmin-api.jar.
System administration APIs are available in the interface IFtAdminSystem in the package
com.emc.documentum.core.fulltext.client.admin.api.interfaces in the SDK jar file dsearchadmin-api.jar.
Administration APIs are wrapped in a command-line interface tool (CLI). The syntax and CLIs are
described in the chapter Automated Utilities (CLI).

Open an admin connection


Create a faade implementation via FtAdminFactory. The following example uses the web service
transport protocol to open a connection on the local xPlore instance. Parameters are String hostname
or IP address, int port. String password, for example:
IFtAdminService srv = FTAdminFactory.getAdminService("127.0.0.1", 9300, "
mypassword");

The password parameter is the xPlore administrator password.

Call an admin API


Invoke the admin API FTAdminFactory via the faade implementation in the package
com.emc.documentum.core.fulltext.client.admin.api. The following example starts the primary
instance:
IFtAdminService srv = FTAdminFactory.getAdminService("127.0.0.1", 9300, "
password");
srv.startNode("primary");

EMC Documentum xPlore Version 1.3 Administration and Development Guide

47

Managing the watchdog service

Configuration APIs
Configuration APIs are available in the interface IFtAdminConfig in the package
com.emc.documentum.core.fulltext.client.admin.api.interfaces. This package is in the SDK jar file
dsearchadmin-api.jar.

Engine configuration keys for setEngineConfig


xhive-database-name: Managed by xDB.
xhive-cache-pages: Number of pages to hold temporary data for all simultaneous xDB sessions. By
default, xPlore sets the value to 25% of the maximum memory configured for the JVM.
xhive-pagesize: Size in bytes of database pages, a power of 2. Usually matches filesystem pagesize.

Indexing configuration keys for setIndexConfig


index-requests-max-size: Maximum size of internal index queue
index-requests-batch-size: Maximum number of index requests in a batch
index-thread-wait-time: Maximum wait time in milliseconds to accumulate requests in a batch
index-threadpool-core-size: Minimum number of threads in the thread pool to service requests.
Valid values: 1 - 100.
index-threadpool-max-size: Maximum number of threads allowed in the thread pool to service
requests. Valid values: 1 - 100.
index-executor-queue-size: Maximum size of index executor queue before spawning a new worker
thread
index-executor-retry-wait-time: Wait time in milliseconds after executor queue and worker thread
maximums have been reached

Status update keys


status-requests-max-size: Maximum size of internal status queue
status-requests-batch-size: Maximum number of status update requests in a batch
status-thread-wait-time: Maximum wait time in milliseconds to accumulate requests in a batch
status-threadpool-core-size: Minimum number of threads used to process a single incoming request.
Valid values: 1 - 100.
status-threadpool-max-size: Number of threads used to process a single incoming request. Valid
values: 1 - 100.
status-executor-queue-size: Maximum size of index executor queue before spawning a new worker
thread
status-executor-retry-wait-time: Wait time in milliseconds after executor queue and worker thread
maximums have been reached
48

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing the watchdog service

Search configuration keys for setGlobalSearchConfig


query-default-locale: Default locale for queries.
query-default-result-batch-size: Default size of result batches. Default: 200.
query-result-cache-size: Default size of results buffer. When this limit is reached, no more results
are fetched from xDB until the client asks for more results. Default: 400.
query-result-spool-location: Location to spool results. Default: $Documentum/dss/spool
query-default-timeout: Interval in milliseconds for a query to time out. Default: 60000. This
setting is overridden by a setting in a client application: IDfXQuery TIMEOUT parameter, or the
query_timeout parameter in the dm_ftengine_config.
query-threadpool-core-size: Minimum number of threads used to process incoming requests.
Threads are allocated at startup, and idle threads are removed down to this minimum number.
Valid values: 1 - 100. Default: 10.
query-threadpool-max-size: Maximum number of threads used to process incoming requests. After
this limit is reached, service is denied to additional requests. Valid values: 1 - 100. Default: 100.
query-threadpool-queue-size: Maximum size of thread pool queue before spawning a new worker
thread. Default: 0.
query-threadpool-keep-alive-time: Interval after which idle threads are terminated. Default: 60000
query-threadpool-keep-alive-time-unit: Unit of time for query-thread-pool-keep-alive-time.
Default: milliseconds
query-executor-retry-interval: Wait time in milliseconds after executor queue and worker thread
maximums have been reached. Default: 100.
query-executor-retry-limit: Number of times to retry query execution.
query-thread-sync-interval: Interval after which results fetching is suspended when the result cache
is full. For a value of 0, the thread waits indefinitely until space is available in the cache (freed up
when the client application retrieves results). Default: 100 milliseconds.
query-thread-max-idle-interval: Query thread is freed up for reuse after this interval. If the client
application has not retrieved the result during this interval, the query times out. (Threads are freed
immediately after a result is retrieved.) Default: 3600000 milliseconds.
query-summary-default-highlighter: Class that determines summary. Default:
com.emc.documentum.core.fulltext.indexserver.services.summary.DefaultSummary
query-summary-analysis-window: Size in bytes of initial window in which to search for query
terms for summary. Default: 65536.
query-summary-display-length: Size in bytes of summary to display.
query-summary-highlight-begin-tag: HTML tag to insert at beginning of summary.
query-summary-highlight-end-tag: HTML tag to insert at end of summary.
query-enable-dynamic-summary: If context is not important, set to false. This returns as a summary
the first n chars defined by the query-summary-display-length configuration parameter. No summary
calculation is performed. For summaries evaluated in context, set to true (default).
query-index-covering-values: The values specified in the path attribute are used for aggregate
queries. These values are pulled from the index and not from the data pages.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

49

Managing the watchdog service

query-facet-max-result-size: Documentum only. Sets the maximum number of results used to


compute facet values. For example, if query-facet-max-result-size=12, only 12 results are used to
compute facets. If a query has many facets, the number of results per facet is reduced accordingly.

50

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Chapter 3
Managing Security
This chapter contains the following topics:

About security

Changing search results security

Manually updating security

Changing the administrator password

Configuring the security cache

Troubleshooting security

About security
xPlore does not have a security subsystem. Anyone with access to the xPlore host port can connect
to it. You must secure the xPlore environment using network security components such as a firewall
and restriction of network access. Secure the xPlore administrator port and open it only to specific
client hosts.
Passwords are encrypted with a FIPS 140-2 validated encryption module using SHA1. Existing
passwords encrypted with MD5 are decrypted and encrypted with SHA1.
Documentum repository security is managed through individual and group permissions (ACLs). By
default, security is applied to results before they are returned to the Content Server (native xPlore
security), providing faster search results. xPlore security minimizes the result set that is returned
to the Content Server.
Content Server queues changes to ACLs and groups. The queue sometimes causes a delay between
changes in the Content Server and propagation of security to the search server. If the index agent has
not yet processed a document for indexing or updated changes to a permission set, users cannot
find the document.
You can set up a separate index agent to handle changes to ACLs and groups. See Setting up index
agents for ACLs and groups, page 65.

Changing search results security


For DQL queries, you can turn on Content Server security filtering to eliminate latency and support
complete transactional consistency.
Note: When XQuery generation is turned off, search performance is worse. The following search
enhancements do not work without XQuery: Facets, paging, and parallel queries.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

51

Managing Security

To turn on security filtering in the Content Server for DQL queries:


1. Turn off XQuery generation. Add the following setting to dfc.properties on the DFC client
application:
dfc.search.xquery.generation.enable=false

2. Open the iAPI tool from the Documentum Server Manager on the Content Server host or in
Documentum Administrator.
3. To check your existing security mode, enter the following command:
retrieve,c,dm_ftengine_config
get,c,l,ftsearch_security_mode

4. Enter the following command to turn off xPlore native security. Note lowercase L in the set and
save commands:
retrieve,c,dm_ftengine_config
set,c,l,ftsearch_security_mode
0
save,c,l
reinit,c

5. Restart all xPlore instances.


Using both security filters
You can configure xPlore and the Content Server to use both security filters. This option does not
apply to the DFC search service; it applies to DQL queries only. With this option, results are not
returned to the Content Server unless the user has permissions. After xPlore security is applied,
the results are filtered again in the Content Server for changes to permissions that took place after
the security update in xPlore.
Make sure that xPlore security is configured (default). Then use the following iAPI command:
retrieve,c,dm_ftengine_config
append,c,l,param_name
acl_check_db
append,c,l,param_value
T
save,c,l

Manually updating security


When you set the index agent UI to migration mode, ACLs and groups are indexed at
the end of migration. ACL (dm_acl) and group (dm_group) objects are stored in XML
format in the xPlore xDB. The XML for ACLs and groups is stored as a collection in xDB:
domain_name/dsearch/ApplicationInfo/acl or /ApplicationInfo/group. The XML format is ACLXML
for ACLs and GroupXML for groups. They are updated when a Save, Save as new, or Destroy event
on an ACL or group takes place in the repository.
The security filter is applied in xDB to filter search results per batch. The security filter receives the
following information: User credentials, minimum permit level, whether MACL is enabled, privileges,
and dynamic state of the user (group membership).

52

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing Security

You can manually populate or update the ACL and group information in xPlore. A similar job in
Content Server 6.7 and higher allows you to selectively replication ACLs and groups. The script
replicates all ACLs and groups. Use the job or script for the following use cases:
You are testing Documentum indexing before migration.
You use xPlore to index a repository that has no full-text system (no migration).
Security in the index is out of synch with the repository from ftintegrity counts.
Note: To speed up security updates in the index, you can create a separate index agent for ACLs and
groups. See Setting up index agents for ACLs and groups, page 65.
1. Locate the script aclreplication_for_repositoryname.bat or .sh in
xplore_home/setup/indexagent/tools.
2. Edit the script before you run it. Locate the line beginning with "%JAVA_HOME%\bin\java". Set
the repository name, repository user, password, xPlore primary instance host, xPlore port, and
xPlore domain (optional).
Check the Java Method Server log and job report for any errors or exceptions thrown. When you run
the script, it prints the status of each object it tried to replicate.
Alternatively, you can run the ACL replication job dm_FTACLReplication in Documentum
Administrator. (Do not confuse this job with the dm_ACLReplication job.) By default, the job reports
only the number of objects replicated. Setting the job argument verbose to true writes the status of
each object in job report. You can selectively replicate only dm_acl, dm_group.

Table 4

ACL replication job arguments

Argument

Description

-acl_where_clause

DQL where clause to retrieve dm_acl objects.

-group_where_clause

DQL where clause to retrieve dm_group objects.

-max_object_count

Number of dm_acl and dm_group objects to be


replicated. If not set, all objects are replicated.

-replication_option

Valid values: dm_acl, dm_group or both (default).

-verbose

Set to true to record replication status for each object


in the job report. Default: false.

Note: The arguments -dsearch_host, -dsearch_port, -dsearch_domain, and -ftengine_standby are not
supported in xPlore 1.3. The argument -ftengine_standby was used for dual mode (FAST and xPlore,
two Content Servers) which is not supported in xPlore 1.3.

Changing the administrator password


Perform the following steps to reset the xPlore administrator password. All xPlore and index agent
instances must share the same instance owner and password.
Note: If you change the Documentum repository owner password, you must also change it in the
watchdog configuration file.
1. Reset the password from the xDB admin tool.
a. Navigate to xplore_home/dsearch/xhive/admin.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

53

Managing Security

2.
3.

4.
5.

b. Run the script XHAdmin.bat or XHAdmin.


c. Click the connection icon to log in. The password is the same as your xPlore administrator
password.
d. In the Federation menu, choose Change superuser password. Enter the old and new
passwords.
e. In the Database menu, choose Reset admin password. Use the new superuser password that
you created, and set the admin password. They can be the same.
Stop all xPlore instances.
Edit indexserver-bootstrap.properties of each instance. The file is located in
xplore_home/jboss5.1.0/server/DctmServer_PrimaryDsearch/deploy/dsearch.war/WEB-INF/classes.
Enter the adminuser-password and superuser-password that you created in the xDB admin tool.
Restart all xPlore instances. The passwords are now encrypted (FIPS-compliant encryption).
Change the JBoss password. Copy the new, encrypted password from
indexserver-bootstrap.properties to the following locations:
The file web-console-roles.properties in
xplore_home/jboss5.1.0/server/DctmServer_PrimaryDsearch/deploy/management.bak/console-mgr.sar/
web-console.war/WEB-INF/classes:
admin=new_password

The file PrimaryDsearch.properties in xplore_home//installinfo/instances/dsearch:


ess.instance.password.encrypted=new_password

6. Repeat step 5 for all other xPlore instances.


7. If you use CLI for backup and restore, edit the password in xplore.properties. This file is located in
xplore_home/dsearch/admin. Copy the encrypted password from indexserver-bootstrap.properties.
If you change the index agent installation owner password, on Windows you just have to change the
services entry and on Linux you do not have to change anything.

Configuring the security cache


Increase the cache size for large numbers of users or large numbers of groups that users can belong
to. Entries in the caches are replaced on a first-in, first-out (FIFO) basis. You set the following
properties for security:
groups-in-cache-size: Number of groups a user belongs to. Global LRU cache that is shared
between search sessions. Default: 1000.
not-in-groups-cache-size: Number of groups that a user does not belong to. Global LRU cache.
Default: 1000.
acl-cache-size: Number of users in the cache. Per-query LRU cache that contains ACLs and granted
permissions for users. Default: 400.
batch-size: Size of batches sent for search results security filtering. Default: 800.
max-tail-recursion-depth: Sets the maximum number of subgroup members of a group, to prevent
runaway security recursion. If you encounter the error "XQUERY_ERROR_VALUE: Tail recursive
function" you can edit the property to a value greater than 10000. Default: 10000.
1. Stop all xPlore instances
54

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing Security

2. Edit indexserverconfig.xml. For information on viewing and updating this file, see Modifying
indexserverconfig.xml, page 43.
3. Change the size of a cache in the security-filter-class element:
<security-filter-class name="documentum" default="true" class-name="
com.emc.documentum.core.fulltext.indexserver.services.security.SecurityJoin">
<properties>
<property name="groups-in-cache-size" value="1000"/>
<property name="not-in-groups-cache-size" value="1000"/>
<property name="acl-cache-size" value="400">
<property name="batch-size" value="800">
<property value="10000" name="max-tail-recursion-depth"/>
</properties>
</security-filter-class>

4. If necessary, change the Groups-in cache cleanup interval by adding a property to the
security-filter-class properties. The default is 7200 sec (2 hours).
<property name="groupcache-clean-interval" value="7200">

5. Validate your changes using the validation tool described in Modifying indexserverconfig.xml,
page 43.

Troubleshooting security
Viewing security in the log
Check dsearch.log using xPlore administrator. Choose an instance and click Logging. Click dsearch
log to view the following information:
The XQuery expression. For example, search for the term default:
QueryID=PrimaryDsearch$f3087f7a-fb55-496a-bf0a-50fb1e688fa1,
query-locale=en,query-string=
declare option xhive:fts-analyzer-class
com.emc.documentum.core.fulltext.indexserver.core.index.xhive.IndexServerAnalyzer;
declare option xhive:ignore-empty-fulltext-clauses true;
declare option xhive:index-paths-values "dmftmetadata
//owner_name,dmftsecurity/acl_name,dmftsecurity/acl_domain";
let $libs := collection(/TechPubsGlobal/dsearch/Data)
let $results := for $dm_doc score $s
in $libs/dmftdoc[(dmftmetadata//a_is_hidden = "false") and
(dmftversions/iscurrent = "true") and
(. ftcontains "test" with stemming using stop words default)]
order by $s descending
return $dm_doc return (for $dm_doc in subsequence($results,1,351)
return <r>
{for $attr in $dm_doc/dmftmetadata//*[local-name()=(
object_name,r_modify_date,r_object_id,r_object_type,
r_lock_owner,owner_name,r_link_cnt,r_is_virtual_doc,
r_content_size,a_content_type,i_is_reference,r_assembled_from_id,
r_has_frzn_assembly,a_compound_architecture,i_is_replica,r_policy_id,
subject,title)] return <attr name={local-name($attr)} type=

EMC Documentum xPlore Version 1.3 Administration and Development Guide

55

Managing Security

{$attr/@dmfttype}>{string($attr)}</attr>}
{xhive:highlight(($dm_doc/dmftcontents/dmftcontent/
dmftcontentref,$dm_doc/dmftcustom))}
<attr name=score type=dmdouble>{string(dsearch:get-score($dm_doc))}
</attr></r>) is running

Security filter applied and security statistics. For example:


2012-03-30 12:57:36,406 INFO [pool-14-thread-10]
c.e.d.c.f.i.services.security.SecurityJoin Security Filter invoked {
QueryID=PrimaryDsearch$89289023-617e-431e-99e0-d9b10a264262}
2012-03-30 12:57:36,421 INFO [pool-14-thread-10]
c.e.d.c.f.i.services.security.SecurityJoin {Minimum-permit-level=2, Total-group-probes=0, Filter-output=8,
Total-values-from-index-keys=0,
QueryID=PrimaryDsearch$89289023-617e-431e-99e0-d9b10a264262,
Total-values-from-data-page=0,
Filter-input=8, Total-not-in-groups-cache-hits=0,
Total-matching-group-probes=0,
Total-ACL-index-probes=0, Total-groups-in-cache-hits=0,
Total-ACL-cache-hits=0,
Total-res-with-no-dmftdoc=0}

When DEBUG is enabled for security package, the following information is saved in dsearch.log:
Minimum-permit-level. Returns the minimum permit level for results for the user. Levels: 0 = null |
1 = none | 2 = browse | 3 = read | 4 = relate | 5 = version | 6 = write | 7 = delete
Total-group-probes: Total number of groups checked for user
Filter-output: Total number of hits after security has filtered the results.
Total-values-from-index-keys: Number of index hits on owner_name, acl_name and acl_domain
for the document.
QueryID: Generated by xPlore to uniquely identify the query.
Total-values-from-data-page: Number of hits on owner_name, acl_name and acl_domain for the
document retrieved from the data page.
Filter-input: Number of results returned before security filtering.
Total-not-in-groups-cache-hits: Number of times the groups-out cache contained a hit (groups
the user does not belong to)
Total-matching-group-probes: How many times the query added a group to the group-in cache.
Total-ACL-index-probes: How many times the query added an ACL to the cache. If this value is
high, you can speed up queries by increasing the ACL cache size.
Total-groups-in-cache-hits: Number of times the group-in cache contained a hit.
Total-ACL-cache-hits: Number of times the ACL cache contained a hit.
Total-res-with-no-dmftdoc: Total number of hits in documents with no rendered dftxml. Should be 0.
In the following example from the log, the query returned 2200 hits to filter. Of these hits, 2000 were
filtered out, returning 200 results to the client application. The not-in-groups cache was probed 30
times for this query. The GroupOut cache was filled 3 times, for groups that the user did not belong to:
56

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing Security

<USER_NAME>tuser4</USER_NAME>
<TOTAL_INPUT_HITS_TO_FILTER>2200</TOTAL_INPUT_HITS_TO_FILTER>
<HITS_FILTERED_OUT>2000</HITS_FILTERED_OUT>
<GROUP_IN_CACHE_HIT>0</GROUP_IN_CACHE_HIT>
<GROUP_OUT_CACHE_HIT>30</GROUP_OUT_CACHE_HIT>
<GROUP_IN_CACHE_FILL>0</GROUP_IN_CACHE_FILL>
<GROUP_OUT_CACHE_FILL>3</GROUP_OUT_CACHE_FILL>

Verifying security settings in the Content Server


Use iAPI to verify that dm_fulltext_index_user is registered to receive events for security updates
(changes to dm_acl and dm_group) with the following commands. They return at least one ACL
object ID and one group object ID:
?,c,select r_object_id from dm_type where name=dm_acl
?,c,select r_object_id from dm_type where name=dm_group

Verify that the ACL IDs are registered for the events dm_save, dm_destroy, and dm_saveasnew. Verify
that the group IDs are registered for the events dm_save and dm_destroy, for example:
?,c,select registered_id,event from dmi_registry where user_name=
dm_fulltext_index_user

EMC Documentum xPlore Version 1.3 Administration and Development Guide

57

Chapter 4
Managing the Documentum Index
Agent
This chapter contains the following topics:

About the Documentum index agent

Starting the index agent

Silent index agent startup

Setting up index agents for ACLs and groups

Configuring the index agent after installation

Migrating documents

Using ftintegrity

ftintegrity output

ftintegrity result files

Running the state of index job

state of index and ftintegrity arguments

Indexing documents in normal mode

Resubmitting documents for indexing

Removing entries from the index

Indexing metadata only

Making types non-indexable

Making metadata non-searchable

Injecting data and supporting joins

Custom content filters

Reindexing after removing a Documentum attribute

Troubleshooting the index agent

About the Documentum index agent


The xPlore index agent is a multithreaded Java application running in the Content Server application
server. The index agent processes index queue items generated by Content Server, applies filters,
and prepares SysObjects for indexing.
Note: Documentum Administrator reports all queue items, including those that are subsequently
filtered out by the index agent.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

59

Managing the Documentum Index Agent

The xPlore installer includes the index agent and its configurator. Install the index agent on a Content
Server host or a separate host.
A dm_ftindex_agent_config object represents the index agent in normal mode. This object is
configured by the index agent configurator. For more information about the index agent config object,
refer to the EMC Documentum Object Reference Manual.

Content Server objects created by index agent


When you configure an index agent, it creates the following objects in the Content Server:
Object

Attributes

dm_ftengine_config

For a full list of attributes, see dm_ftengine_config, page 334.

dm_acl

object_name: dm_fulltext_admin_ac;
owner_name: Name of user specified at installation
acl_class: 3
accessor_name: dm_owner, dm_fulltext_admin, dm_world
accessor_permit: 7, 7, 3

dm_fulltext_index

object_name
is_standby: Indicates whether the index agent is in use or in standby
mode.
install_loc: Type of agent (dsearch or fast)
ft_engine_id: Specifies the associated gine_config object.

dm_ftindex_agent_config

index_name
queue_user: Identifies the full-text user who is registered in dmi_registry
for full-text events.

Index agent loading


In both migration and normal mode, index agent configuration is loaded from indexagent.xml
in the index agent deploy directory. In normal mode, the configuration is also read from the
dm_ftindex_agent_config object. If there is a conflict, the settings in the config object override the
settings in indexagent.xml.

Documentum attributes that control indexing


The a_full_text attribute is defined for the dm_sysobject type. The a_full_text attribute is set to true
whenever a sysobject is created. All sysobject subtypes inherit this Boolean attribute that controls
whether the content files of an object are indexed. Indexing is performed whenever a Save, Saveasnew,
Checkin, Destroy, Branch, or Prune operation is performed on the object. Users with Sysadmin or
Superuser privileges can change the a_full_text setting.
Properties of the format object determine which formats are indexable. If the value of the can_index
property is set to true, the content file is indexable. By default, the first content file in a format whose
60

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing the Documentum Index Agent

can_index property is set to true is indexed. Other renditions of the object are not indexed. If the
primary content of an object is not in an indexable format, you can ensure indexing by creating a
rendition in an indexable format. Use Documentum Content Transformation Services or third-party
client applications to create the rendition. For a full list of supported formats, see Oracle Outside In
8.3.7 documentation.
Some formats are not represented in the repository by a format object. Only the properties of objects in
that format are indexed. The formats.cvs file, which is located in DM_HOME/install/tools, contains a
complete list of supported mime_types and the formats with which they are associated. If a supported
mime_type has no format object, create a format object in the repository and map the supported
mime_type to the format in formats.cvs.
Documents are selected for indexing in the Content Server based on the following criteria:
If a_full_text attribute is false, the content is not indexed. Metadata is indexed.
If a_full_text attribute is true, content is indexed based on the can_index and format_class attributes
on the dm_format associated with the document:
1. If an object has multiple renditions and none of the renditions have a format_class value of
ft_always or ft_preferred, each rendition is examined starting with the primary rendition. The
first rendition for which can_index is true is indexed, and no other renditions are indexed.
2. If an object has a rendition whose format_class value is ft_preferred, each ft_preferred rendition
is examined in turn starting with the primary rendition. The first ft_preferred rendition that is
found is indexed, and no other renditions are indexed.
3. If an object has renditions with a format_class value of ft_always, those renditions are always
indexed.
Note: Index agent filters can override the settings of a_full_text and can_index. See Configuring
index agent filters, page 69.
Sample DQL to determine these attribute values for the format bmp:
select can_index, format_class from dm_format where name = bmp

To find all formats that are indexed, use the following command from iAPI:
?,c,select name,can_index from dm_format

The dm_ftengine_config object has a repeating attribute ft_collection_id that references a collection
object of the type dm_fulltext_collection. Each ID points to a dm_fulltext_collection object. It is
reserved for use by Content Server client applications.

Indexing aspect attributes


Properties associated with aspects are not indexed by default. If you wish to index them, issue an
ALTER ASPECT statement to identify the aspects you want indexed.
Table 6

Syntax for full-text indexing of aspects

Syntax

Description

ADD_FTINDEX ALL

Defines all properties of the aspect for indexing.

ADD_FTINDEX property_list

Defines for indexing only those aspect properties


listed in property_list.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

61

Managing the Documentum Index Agent

Syntax

Description

DROP_FTINDEX ALL

Stops indexing of all properties of the aspect.

DROP_FTINDEX property_list

Stops indexing of those aspect properties listed in


property_list.

When you add or drop indexing for aspect properties, clean the DFC BOF cache for the changes to
take effect.
1. Stop the index agent.
2. On the index agent host, delete the directory for the DFC bof cache. The directory is set by
dfc.data.dir in dfc.properties. For example:
xplore_home\jboss5.1.0\server\DctmServer_Indexagent\data\Indexagent\cache\
content_server_version\bof\repository_name

3. Start the index agent.


Only new objects are affected. The index is not updated to add or drop aspect property values for
aspects attached to existing objects.

Indexing lightweight sysobjects


Lightweight objects inherit the attributes of the parent as shared, not private, attributes. Inheritance
allows many lightweight objects to share the same attributes and reduce the storage requirement for
some of the private content-related attributes like ID and size. These attributes are computed on
demand. DQL queries cannot access the shared parent attribute values.
The fulltext_support attribute on the object type is used for lightweight sysobjects (LWSOs) to
determine how they are indexed. Valid values:
Value

Description

No support

Light support

Full support

Starting the index agent


To start or stop the index agent service, run the following commands. The script names contain the
index agent server name that you specified when you configured the index agent. Default: Indexagent.)
1. Start the instance.
Windows
The Documentum Index_agent Windows service, or
indexagent_home\jboss5.1.0\server\startIndexagent.cmd or stopIndex_agent.cmd.
Linux
indexagent_home/jboss5.1.0/server/startIndexagent.sh or stopIndexagent.sh.
2. Start the index agent UI. Use your browser to start the index agent servlet.
http://host:port/IndexAgent/login_dss.jsp

62

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing the Documentum Index Agent

Every index agent URL has the same URL ending: IndexAgent/login_dss.jsp. Only the port
and host differ.
host is the DNS name of the machine on which you installed the index agent.
port is the index agent port number that you specified during configuration (default: 9200).
3. In the login page, enter the user name and password for a valid repository user and optional
xPlore domain name.
4. Choose one of the following:
Start Index Agent in Normal Mode: The index agent will index content that is added or
modified after you start.
Start new reindexing operation: All content in the repository is indexed (migration mode) or
reindexed. Filters and custom routing are applied. Proceed to the next step in this task.
Continue: If you had started to index this repository but had stopped, start indexing. The date
and time you stopped is displayed.
For information on adding CPS processing daemons, see .
Viewing index agent details
Start the index agent and click Details. You see accumulated statistics since last index agent restart and
objects in the indexing queue. To refresh statistics, return to the previous screen and click Refresh,
then view Details again.

Silent index agent startup


You can start or shut down the index agent through the index agent web application. You can also
script the start in normal mode or shutdown using Content Server, iAPI or DFC. Starting in migration
mode cannot be scripted.

Setting startup in Content Server


Set the start_index_agents parameter in server.ini to TRUE.
At Content Server startup, the Server checks whether the index agent associated with the repository
is started. If not, and start_index_agents is TRUE, the Server starts the index agent using the
dm_FTIndexAgentBoot job.

Silent startup and shutdown with iAPI


Use the retrieve and dump commands to get the index_name attribute of the dm_fulltext_index object.
You use this attribute value in the start or stop script. For example:
API> retrieve,c,dm_fulltext_index
...
3b0004d280000100
API> dump,c,l
...
USER ATTRIBUTES

EMC Documentum xPlore Version 1.3 Administration and Development Guide

63

Managing the Documentum Index Agent

index_name : Repo_ftindex_01
...

Now use the retrieve and dump commands to get the object_name attribute of the
dm_ftindex_agent_config object. You use this attribute value in the start or stop script. For example:
retrieve,c,dm_ftindex_agent_config
...
0800277e80000e42
API> dump,c,l
...
USER ATTRIBUTES
object_name : Config13668VM0_9200_IndexAgent

Use the apply command to start or stop the index agent. Syntax:
apply,c,,FTINDEX_AGENT_ADMIN,NAME,S,<index_name of dm_fulltext_index>,
AGENT_INSTANCE_NAME,S,<object_name of dm_ftindex_agent_name>,ACTION,
S,start|stop|status

To start or stop all index agents, replace the index agent name with all. For example:
apply,c,NULL,FTINDEX_AGENT_ADMIN,NAME,S,LH1_ftindex_01,
AGENT_INSTANCE_NAME,S,all,ACTION,S,start

The following example starts one index agent:


apply,c,NULL,FTINDEX_AGENT_ADMIN,NAME,S,LH1_ftindex_01,AGENT_INSTANCE_NAME,
S,Config13668VM0_9200_IndexAgent,ACTION,S,start

Follow with these commands to get the results:


API> next,c,q1
...
OK
API> dump,c,q1

Status results:
0: The index agent is running.
100: The index agent has shut down.
200: The index agent has a problem.

Setting startup with a list of file IDs


You can script startup to index a list of documents like the list generated by ftintegrity.
1. Create a text file with the object ID. Save as ids.txt in the WEB-INF/classes directory of
xplore_home/jboss5.1.0/server/DctmServer_IndexAgent/deploy/IndexAgent.war/. (Substitute the
actual path to your index agent web application.)
2. Start the index agent servlet in normal mode. The objects in ids.txt are automatically submitted for
indexing.
A file ids.txt.done is created in the same directory as ids.txt. This file lists the IDs of objects that
were successfully indexed.
64

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing the Documentum Index Agent

Startup from the Java command line


Use the following command:
java.com.documentum.server.impl.utils.IndexAgentCtrl -docbase_name
repositoryName -user_name userName -action actionName

where -action argument value is one of the following: start | shutdown | status | reset.

Silent startup and shutdown using DFC


The following method gets the dm_fulltext_index object, the index_name attribute, and sets the
DQL query:
public void shutdownIA(IDfSession sess) throws DfException
{
IDfPersistentObject FTIndexObj = (
IDfPersistentObject) sess.getObjectByQualification(
"dm_fulltext_index where is_standby = false");
String indexName = FTIndexObj.getString("index_name");
//Query definition
String query = "NULL,FTINDEX_AGENT_ADMIN,NAME,S," +
indexName + ",AGENT_INSTANCE_NAME,S,all,ACTION,S,shutdown";
DfClientX clientX = new DfClientX();
IDfQuery q = clientX.getQuery();
q.setDQL(query);
try
{
IDfCollection col = q.execute(sess, IDfQuery.DF_APPLY);
}
catch (DfException e)
{
e.printStackTrace();
}
}

For startup, replace shutdown with start in the query definition.

Setting up index agents for ACLs and groups


By default, you configure an index agent for each Documentum repository that is indexed. You can
also set up multiple index agents to index dm_acl and dm_group separately from sysobjects.
1. Create a second index agent. Run the index agent configurator and give the agent instance a name
and port that are different from the first agent. (The configurator is the file configIndexagent.bat
or configIndexagent.sh in indexagent_home/setup/indexagent.)
2. Edit indexagent.xml for the second index agent. (This file is located in
indexagent_home/jboss5.1.0/server/DctmServer_Indexagent2/deploy/IndexAgent.war/WEB-INF/classes.)
EMC Documentum xPlore Version 1.3 Administration and Development Guide

65

Managing the Documentum Index Agent

3.

Add one parameter set to your new indexagent.xml file. Set the value of parameter_name to
index_type_mode, and set the value of parameter_value to aclgroup as follows:
<indexer_plugin_config>
<generic_indexer>
<class_name> </class_name>
<parameter_list>
...
<parameter>
<parameter_name>index_type_mode</parameter_name>
<parameter_value>aclgroup</parameter_value>
</parameter>
</parameter_list>
</generic_indexer>
</indexer_plugin_config>

4. In the indexagent.xml for sysobjects (the original index agent), add a similar parameter set. Set the
value of parameter_name to index_type_mode, and set the value of parameter_value to sysobject.
5. Restart both index agents. (Use the scripts in indexagent_home/jboss5.1.0/server or the Windows
services.)
Supporting millions of ACLs
If you have many ACLs (users or groups), turn off facet compression. In indexserverconfig.xml, find
the sub-path element whose path attribute value is dmftsecurity/acl_name. Change the value of
the compress attribute to false. For information on viewing and updating this file, see Modifying
indexserverconfig.xml, page 43.

Configuring the index agent after installation


The Documentum index agent is configured for the first time through the index agent configurator,
which can be run after installing xPlore. For information on the configurator, see EMC Documentum
xPlore Installation Guide.
Parameter default values have been optimized for most environments. They can
be changed later using iAPI or by editing indexagent.xml, which is located in
xplore_home//jboss5.1.0/server/DctmServer_Indexagent/deploy/IndexAgent.war/WEB-INF/classes.
For descriptions of the settings, see Index agent configuration parameters, page 336.

Limit content size for indexing


You can set a maximum size for content that is indexed. You set the actual document size, not the size
of the text within the content. To set the maximum document size, edit the contentSizeLimit parameter
within the parent element exporter. The value is in bytes. Default: 20 MB.
Note: You can also limit the size of text within a file by configuring the CPS instance. Set the Max text
threshold in bytes (default: 10 MB).

66

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing the Documentum Index Agent

Exclude ACL and group attributes from indexing


By default, all attributes of ACLs and groups are indexed. You can specify that certain attributes of
ACLs and groups are not indexed. Add an acl_exclusion_list and group_exclusion_list element to the
parent element indexer_plugin_config/generic_indexer/parameter_list.

Change the local content storage location


When you configure the index agent, you select a local content temporary staging location. You can
change this location by editing the local_content_area element in indexagent.xml. This file is located
in xplore_home//jboss5.1.0/server/DctmServer_Indexagent/deploy/IndexAgent.war/WEB-INF/classes.
Restart the index agent web application after editing this file.
Note: For multi-instance xPlore, the temporary staging area for the index agent must be accessible
from all xPlore instances.

Specify the DFC cache directory


By default, the DFC cache data is stored in the following directory:
xplore_home//jboss5.1.0/server/DctmServer_Indexagent/data/Indexagent
You can change this directory by specifying the value of
the dfc.data.dir property in the dfc.properties file located in
xplore_home/jboss5.1.0/server/DctmServer_Indexagent/deploy/IndexAgent.war/WEB-INF/classes.Note:
On Linux, path names are case-sensitive. Indexagent and IndexAgent are two different directories.

Installing index agent filters (Content Server 6.5 SPx


or 6.6)
Documents indexed before the filters are installed are not filtered.
You can install index agent filters that exclude cabinets, folders, or object types from indexing. If you
are connecting to Content Server version 6.7, the filters are already installed.
Note: Do not exclude folders if users make folder descend queries.
1. Copy IndexAgentDefaultFilters.dar, DarInstall.bat or DarInstall.sh, and DarInstall.xml from
indexagent_home/setup/indexagent/filters to a temporary install directory. (The xPlore
installer installs the index agent. The home directory on the index agent host is referred to as
indexagent_home.)
2. Edit DarInstall.xml:
a. Specify the full path to IndexAgentDefaultFilters.dar including the file name, as the value
of the dar attribute.
b. Specify your repository name as the value of the docbase attribute.
c. Specify the repository superuser name as the value of the username attribute.
d. Specify the repository superuser password as the value of the password attribute.
For example:
EMC Documentum xPlore Version 1.3 Administration and Development Guide

67

Managing the Documentum Index Agent

<emc.install dar="C:\Downloads\tempIndexAgentDefaultFilters.dar"
docbase="DSS_LH1" username="Administrator" password="password" />

3. Edit DarInstall.bat (Windows) or DarInstall.sh (Linux)


a. Specify the path to the composerheadless package as the value of ECLIPSE.
b. Specify the path to the file DarInstall.xml in the temporary working directory (excluding the file
name) as the value of BUILDFILE.
c. Specify a workspace directory for the generated Composer files, for example:
set ECLIPSE="C:\Documentum\product\6.5\install\composer\ComposerHeadless"
set BUILDFILE="C:\DarInstall\temp"
set WORKSPACE="C:\DarInstall\work"

4. Launch DarInstall.bat (Windows) or DarInstall.sh (Linux) to install the filters.


5. Test whether the filters are installed.
Use the following DQL statement. If the filters are installed, a list of object IDs and names of
the filters is returned:
select r_object_id,object_name from dmc_module where any a_interfaces=
com.documentum.fc.indexagent.IDfCustomIndexFilter

Verify filter loading in the index agent log, which is located in the logs subdirectory of the
index agent JBoss deployment directory. In the following example, the FoldersToExclude
filter was loaded:
2010-06-09 10:49:14,693 INFO FileConfigReader [http-0.0.0.0-9820-1]
Filter FoldersToExclude Value:/Temp/Jobs,
/System/Sysadmin/Reports, /System/Sysadmin/Jobs,

6. Configure the filters in the index agent UI. See Configuring index agent filters, page 69.
Troubleshooting the index agent filters
To verify that the filters are installed, use the following iAPI command:
?,c,select primary_class from dmc_module where any a_interfaces =
com.documentum.fc.indexagent.IDfCustomIndexFilter

You should see the following:


com.documentum.server.impl.fulltext.indexagent.filter.
defaultCabinetFilterAction
com.documentum.server.impl.fulltext.indexagent.filter.
defaultFolderFilterAction
com.documentum.server.impl.fulltext.indexagent.filter.
defaultTypeFilterAction

Open dfc.properties in the composerheadless package. This package is installed with Content Server
at $DOCUMENTUM/product/version/install/composer/ComposerHeadless. The file dfc.properties
is located in the subdirectory plugins/com.emc.ide.external.dfc_1.0.0/documentum.config. Find the
following lines and verify that the IP address and port of the connection broker for the target repository
are accurate.
dfc.docbroker.host[N]=connection_broker_ip_address

68

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing the Documentum Index Agent

dfc.docbroker.port[N]=connection_broker_port

Invoking the filters in ftintegrity and state of index


See Using ftintegrity, page 73. Both scripts generate a file ObjectId-filtered-out.txt that records all IDs
of filtered-out objects. To remove document content from the index, see Removing entries from the
index, page 79.

Configuring index agent filters


All index agents in a repository share the filter configuration. You can view current filters or edit
the filter list in the index agent UI.
1. Stop the index agent in the IA UI before you configure the filters. You do not have to stop the
index agent JVM.
2. Click Check or update filter settings.
3. In the Current Filters section, you see current filters.
Types to exclude: Object types that are filtered out before indexing. Subtypes are not filtered out.
Folders to exclude: Objects in the folder path that are filtered out before indexing.
Note: Do not exclude folders or cabinets if users make folder descend queries.
Cabinets to exclude: Cabinets that are filtered out before indexing.
4. In the Update filters section, you can modify the existing filters:
Remove type: List one or more object types to remove from the filter, with a comma separator.
These types are indexed. For example: type1,type2,type3
Add type: List one or more object types to filter out before indexing, with a comma separator.
Does not include subtypes.
Remove folder: List one or more folder paths to remove from the filter, with a comma separator.
For example: /Temp1/subfolder,/Temp2
Add folder(s): List one or more folder paths to filter out before indexing, with a comma separator.
Remove cabinet(s): List one or more cabinets to remove from the filter, with a comma separator.
For example: cabinet1,cabinet2
Add cabinet: List one or more cabinets to filter out before indexing, with a comma separator.
You can create a custom index agent BOF filter that implements IDfCustomIndexFilter. Base the
filter on a date attribute. For information on creating a BOF filter, see Injecting data and supporting
joins, page 80.
When the index agent starts, filters are recorded in the index agent log, located in
xplore_home/jboss5.1.0/server/DctmServer_Indexagent/logs.
By default, some object types, cabinets, and folders are excluded to improve performance. For
example:
Excluded cabinets: Temp, Templates, System, Resources.
Excluded folders: /Temp/Jobs, /System/Sysadmin/Reports, /System/Sysadmin/Jobs.
Excluded types:
EMC Documentum xPlore Version 1.3 Administration and Development Guide

69

Managing the Documentum Index Agent

dmi_expr_code

dm_process

dm_docbase_config

dmc_jar

dmc_tcf_activity_template

dm_esign_template

dm_method

dm_ftwatermark

dm_format_preferences

dm_activity

dmc_wfsd_type_info

dm_ftengine_config

dmc_module

dm_menu_system

dm_ftfilter_config

dmc_aspect_type

dm_plugin

dm_ftindex_agent_config

dm_registered

dm_script

dm_jms_config

dm_validation_descriptor

dmc_preset_package

dm_job

dm_location

dm_acs_config

dm_mount_point

dmc_java_library

dm_business_pro

dm_outputdevice

dm_public_key_certificate

dm_client_rights

dm_server_config

dm_client_registration

dm_cont_transfer_config

dm_xml_application

dm_procedure

dm_cryptographic_key

dm_xml_config

dmc_dar

Sharing content storage


By default, the index agent retrieves content from the content storage area to the index agent temporary
content location (a getfile operation). This temporary content is deleted after it has been indexed. For
performance reasons, you can choose to share the content storage. With shared content storage, CPS
has direct read access to the content. No content is streamed. You map the path to the file store in
index agent web application. This performs a getpath operation.
Note: The content storage area must be unencrypted and mountable as read-only by the Index Agent
and xPlore hosts.
1. On the index agent host, open indexagent.xml, which is located in
indexagent_home/jboss5.1.0/server/DctmServer_Indexagent/deploy/IndexAgent.war/WEB-INF/classes.
If you installed multiple index agents on this host, an integer is appended to the IndexAgent
WAR file name, for example, IndexAgent1.war.
2. Set the path in the exporter element:
If the file system paths to content are the same on the Content Server host and xPlore host,
change the value of the child element all_filestores_local to true.
If the file system paths are different, add a file store map within the exporter element.
Specify the store name and local mapping for each file store. In the following example,
Content Server is on the host Dandelion and filestore_01 is on the same host at the
directory /Dandelion/Documentum/data/repo1/content_storage_01. The index agent
and xPlore server are on a separate host with a map to the Content Server host:
/mappingtoDandelion/repo1/content_storage_01. The following map is added to the exporter
element:
<local_filestore_map>
<local_filestore>
<store_name>filestore_01</store_name>
<local_mount>/mappingtoDandelion/repo1/content_storage_01
</local_mount>
</local_filestore>

70

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing the Documentum Index Agent

<!-- similar entry for each file store -->


</local_filestore_map>

Example with UNC path:


<local_filestore_map>
<local_filestore>
<store_name>filestore_01</store_name>
<local_mount>\\CS\e$\Documentum\data\dss\content_storage_01
</local_mount>
</local_filestore>
<!-- similar entry for each file store -->
</local_filestore_map>

Note: Update the file_system_path attribute of the dm_location object in the repository to match
this local_mount value, and then restart the Content Server.
3. Save indexagent.xml and restart the index agent instance.
For better performance, you can mount the content storage to the xPlore index server host and set
all_filestores_local to true. Create a local file store map as shown in the following example:
<all_filestores_local>true</all_filestores_local>
<local_filestore_map>
<local_filestore>
<store_name>filestore_01</store_name>
<local_mount>\\192.168.195.129\DCTM\data\ftwinora\content_storage_01
</local_mount>
</local_filestore>
<!-- similar entry for each file store -->
</local_filestore_map>

Mapping Server storage areas to collections


A Content Server file store can map to an xPlore collection. File store mapping to a collection allows
you to keep collection indexes separate for faster ingestion and retrieval.
Figure 7

Filestore mapping to an xPlore collection

You can create multiple full-text collections for a repository for the following purposes:
Partition data
EMC Documentum xPlore Version 1.3 Administration and Development Guide

71

Managing the Documentum Index Agent

Scale indexes for performance


Support storage-based routing
1. Open indexagent.xml, located in the indexing agent WAR file in the directory
($DOCUMENTUM//jboss5.1.0/server/DctmServer_IndexAgent/deploy/IndexAgent.war/WEB-INF/classes).
2. Add partition-config and its child elements to the element
index-agent/indexer_plugin_config/indexer to map file stores to collections.
In the following example, filestore_01 maps to collection coll01, and 02 to coll02. The rest of the
repository is mapped to the default collection. Each repository has one default collection named default.
<partition_config>
<default_partition>
<collection_name>default</collection_name>
</default_partition>
<partition>
<storage_name>filestore_01</storage_name>
<collection_name>coll01</collection_name>
</partition>
<partition>
<storage_name>filestore_02</storage_name>
<collection_name>coll02</collection_name>
</partition>
</partition_config>

Migrating documents
Migrating content (reindexing), page 72
Migrating documents by object type, page 72
Migrating a limited set of documents, page 73

Migrating content (reindexing)


Note: You cannot filter content after it has been indexed. For information on removing documents
from the index, see Removing entries from the index, page 79.
1. Start the index agent and log in to the UI.
2. Choose Start new reindexing operation.
This indexes all indexable documents in the repository except for the documents excluded by the
index agent filter. Updates are performed for existing documents. Custom routing is applied. To
skip custom routing for documents that are already indexed, edit indexserverconfig.xml. Set
allow-move-existing-documents to false in index-config.

Migrating documents by object type


To migrate or reindex many documents of a specific object type, do not use the DQL or object ID file
in the index agent UI. Instead, perform the following steps in migration mode, which allows a restart
of the indexing process. Index agent filters are applied in this migration.
72

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing the Documentum Index Agent

1. Edit indexagent.xml, located in


indexagent_home/jboss5.1.0/server/DctmServer_indexagent/deploy/IndexAgent.war/WEB-INF/classes.
2. Add a parameter_list element with the following content to the indexagent_instance element.
Substitute the object type name for the value of the parameter_value element.
<parameter_list>
<parameter>
<parameter_name>type_based_migration</parameter_name>
<parameter_value>TYPE_NAME_HERE</parameter_value>
</parameter>
</parameter_list>

3.
4.
5.
6.
7.

Note: The parameter_list element can contain only one parameter element.
Stop and restart the index agent using the scripts in indexagent_home/jboss5.1.0/server or using
the Windows services panel.
Log in to the index agent UI and choose Start new reindexing operation.
When indexing has completed (on the Details page, no more documents in the queue), click
Stop IA.
Run the aclreplication script to update permissions for users and groups in xPlore. See Manually
updating security, page 52.
Update the indexagent.xml file to index another type or change the parameter_value to
dm_document.

Migrating a limited set of documents


If you wish to migrate few object types, you can use the index agent UI. To migrate a large set of
objects by type, see Migrating documents by object type, page 72.
1. Replicate ACLs and groups to xPlore by running the aclreplication script. See Manually updating
security, page 52.
2. Start the index agent in normal mode.
3. Check Index selected list of objects, then check DQL.
4. Select the type in the From dropdown list.
5. Repeat for each type that you want indexed.
To remove from the index documents that have already been indexed, see Removing entries from the
index, page 79.

Using ftintegrity
ftintegrity output, page 75
ftintegrity result files, page 76
Running the state of index job, page 76
state of index and ftintegrity arguments, page 77
ftintegrity and the state of index job (in Content Server 6.7 or higher) are used to verify indexing after
migration or normal indexing. The utility verifies all types that are registered in the dmi_registry_table
EMC Documentum xPlore Version 1.3 Administration and Development Guide

73

Managing the Documentum Index Agent

with the user dm_fulltext_index_user. The utility compares the object ID and i_vstamp between the
repository and xPlore. You can compare metadata values, which compares object IDs and the specified
attributes.
Run ftintegrity as the same administrator user who started the instance.
Note: ftintegrity can be very slow, because it performs a full scan of the index and content. Do not run
ftintegrity when an index agent is migrating documents.
Run the ftintegrity index verification tool after migration or restoring a federation, domain,
or collection. The tool is a standalone Java program that checks index integrity against
repository documents. It verifies all types that are registered to dmi_registry_table with the user
dm_fulltext_index_user, comparing the object ID and i_vstamp between the repository and xPlore.
Use the option -checkType to check a specific object type. Use the option -checkMetadata to check
specific single-value attributes (requires -checkType).
1. Navigate to xplore_home/setup/indexagent/tools.
2. Open the script ftintegrity_for_repositoryname.bat (Windows) or ftintegrity_for_repositoryname.sh
(Linux) and edit the script. Substitute the repository instance owner password in the script (replace
<password> with your password). The tool automatically resolves all parameters except for the
password.
3. Optional: Add the option -checkfile to the script. The value of this parameter is the full path to
a file that contains sysobject IDs, one on each line. This option compares the i_vstamp on the
ACL and any groups in the ACL that is attached to each object in a specified list. If this option
is used with the option -checkUnmaterializeLWSO, -CheckType, -StartDate, or -EndDate, these
latter options are not executed.
For example:
....FTStateOfIndex DSS_LH1 Administrator mypassword
Config8518VM0 9300 -checkfile ...

4. Optional: Add the option -checkType to compare a specific type in the Content Server and index.
You can run the script for one type at a time. The tool checks sysobject types or subtypes. It does
not check dm_acl and dm_group objects or custom types that are not subtypes of dm_sysobject.
For example:
$JAVA_HOME/bin/java ... -checkType dm_document

5. Optional: Add the option -checkMetadata at the end of the script. This argument requires a path
to a metadata.txt file that contains a list of required single-valued (not repeating) metadata fields
to check, one attribute name per line. (Create this file if it does not exist.) This option applies
only to a specific type.
For example, add the following to the ftintegrity script in xplore_home/setup/indexagent/tools:
$JAVA_HOME/bin/java ... -checkType dm_document
-checkMetadata C:/xplore/setup/indexagent/tools/metadata.txt

Metadata mismatches are recorded in several files in xPlore_home/setup/indexagent/tools/:


Object-Metadata-mismatch.txt: contains all the objects with metadata that has inconsistencies.
Object-Metadata-match.txt: contains all the objects with metadata that has valid consistencies.
Object-fetched-from-cs.txt: contains all the object IDs that are fetched from the Content Server.
Object-fetched-from-xPlore.txt: contains all the object IDs that are fetched from xPlore.
74

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing the Documentum Index Agent

ObjectId-indexOnly.txt: contains all the object IDs that exist only in xPlore.
The ftintegrity tool generates the following files:
ObjectId-common-version-mismatch.txt: contains ACLs that are out of sync as the content of the
elements acl_name|domain/acl i_vstamp in docbase/acl i_vstamp in xDB.
ObjectId-common-version-match.txt: contains all the object IDs with consistent versions.
ObjectId-dctmOnly.txt: contains groups that are out of sync as the content of the elements
Mismatching i_vstamp group:/Sysobject ID: id/Group ids in dctm only:/group id.
ObjectId-indexOnly.txt: contains all the object IDs that exist only in xPlore.
Note: All optional arguments must be appended to the end of the java command line.

ftintegrity output
Output from the script is like the following:
Executing stateofindex
Connected to the docbase D65SP2M6DSS
2011/03/14 15:41:58:069 Default network framework: http
2011/03/14 15:41:58:163 Session Locale:en
2011/03/14 15:41:59:913 fetched 1270 object from docbase for type dm_acl
2011/03/14 15:41:59:913 fetched 1270 objects from xPlore for type dm_acl
2011/03/14 15:42:08:428 fetched 30945 object from docbase for type dm_sysobject
2011/03/14 15:42:08:428 fetched 30798 objects from xPlore for type dm_sysobject
2011/03/14 15:42:08:756 fetched 347 object from docbase for type dm_group
2011/03/14 15:42:08:756 fetched 347 objects from xPlore for type dm_group
2011/03/14 15:42:09:194 **** Total objects from docbase : 32215 ****
2011/03/14 15:42:09:194 **** Total objects from xPlore : 32068 ****
2011/03/14 15:42:09:194 3251 objects with match ivstamp in both DCTM and
Index Server
2011/03/14 15:42:09:194 17 objects with different ivstamp in DCTM and Index Server
2011/03/14 15:42:09:194 147 objects in DCTM only
2011/03/14 15:42:09:194 0 objects in Index Server only
ftintegrity is completed.

Interpreting the output:


objects from dm_acl and dm_group: Numbers fetched from repository (docbase) and xPlore.
match ivstamp: Objects that have been synchronized between Content Server and xPlore.
different ivstamp: Objects that have been updated in Content Server but not yet updated in the index.
objects in DCTM only: These objects are in the repository but not xPlore for one or more of
the following reasons:
Objects failed indexing.
New objects not yet indexed.
Objects filtered out by index agent filters.
objects in Index Server only: Any objects here indicate objects that were deleted from the repository
but the updates have not yet propagated to the index.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

75

Managing the Documentum Index Agent

In the example, the ACLs and groups totals were identical in the repository and xPlore, so security is
updated. There are 147 objects in the repository that are not in the xPlore index. They were filtered out
by index agent filters, or they are objects in the index agent queue that have not yet been indexed.
To eliminate filtered objects from the repository count, add the usefilter argument to ftintegrity
(slows performance).

ftintegrity result files


The script generates four results files in the tools directory:
ObjectId-common-version-match.txt: This file contains the object IDs and i_vstamp values of all
objects in the index and the repository and having identical i_vstamp values in both places.
ObjectId-common-version-mismatch.txt: This file records all objects in the index and the repository
with identical object IDs but nonmatching i_vstamp values. For each object, it records the object
ID, i_vstamp value in the repository, and i_vstamp value in the index. The mismatch is on objects
that were modified during or after migration. You can resubmit this list after you start the index
agent in normal mode. Click Object File and browse to the file.
ObjectId-dctmOnly.txt: This report contains the object IDs and i_vstamp values of objects in
the repository but not in the index. This file does not report on inconsistent metadata checks
using the -checkMetadata option. The objects in this report could be documents that failed
indexing, documents that were filtered out, or new objects generated in the repository during or
after migration. You can resubmit this list after you start the index agent in normal mode. Click
Object File and browse to the file. To check whether filters were applied during migration, run the
following DQL query. If one or more rows are returned, a filter was applied.
select r_object_id,object_name,primary_class from dmc_module where any
a_interfaces=com.documentum.fc.indexagent.IDfCustomIndexFilter

ObjectId-indexOnly.txt
This report contains the object IDs and i_vstamp values of objects in the index but not in the
repository.
These objects were removed from the repository during or after migration, before the event has
updated the index.

You can input the ObjectId-common-version-mismatch.txt file into the index agent UI to see errors for
those files. After you have started the index agent, check Index selected list of objectsand then check
Object file. Navigate to the file and then choose Submit. Open xPlore Administrator > Reports and
choose Document processing error summary. The error codes and reasons are displayed.

Running the state of index job


Repository configuration for Content Server 6.7 SPx or higher installs a job called state of index.
Patches for previous versions of Content Server also install the job. This job is implemented as a Java
method and runs in the Content Server Java method server.
76

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing the Documentum Index Agent

You can also use ftintegrity tool to check the consistency between the repository and the xPlore index.
The ftintegrity script calls the dm_FTStateOfIndex job.
Note: ftintegrity and the dm_FTStateOfIndex job can be very slow, because they perform a full scan of
the index and content. Do not run ftintegrity or the dm_FTStateOfIndex job when an index agent is
migrating documents.
The state of index job compares the index content with the repository content. Execute the state of
index job from Documentum Administrator (DA). The job generates reports that provide the following
information:
Index completeness and comparison of document version stamps.
Status of the index server: Disk space usage, instance statistics, and process status.
Total number of objects: Content correctly indexed, content that had some failure during indexing,
and objects with no content
To disable the job, view the job properties in Documentum Administrator and change the state
to inactive.

state of index and ftintegrity arguments


state of index and ftintegrity arguments are similar, with some slight differences. You can set the
state of index job argument values in DA using the ftintegrity form of the argument. Job arguments
are case sensitive. ftintegrity arguments are not.
Table 9

State of index and ftintegrity arguments

Job arg

ftintegrity option

Description

-batchsize

batchsize (argument,
not option)

Number of objects to be retrieved from the index in each


batch.
The default value is 10000.

-check_file

-CheckFile

The value of this parameter is the full path to a file that


contains sysobject IDs, one on each line. This option
compares compares the i_vstamp on the ACL and any
groups in the ACL that is attached to each object in
a specified list. If this option is used with the option
-checkUnmaterializeLWSO, -CheckType, -StartDate, or
-EndDate, these latter options will not be executed.
ACLs that are out of sync will be listed in
ObjectId-common-version-mismatch.txt as the content
of the elements acl_name|domain/acl i_vstamp in
docbase/acl i_vstamp in xDB. Groups that are out of sync
will be listed in ObjectiD-dctmOnly.txt as the content of
the elements Mismatching i_vstamp group:/Sysobject ID:
id/Group ids in dctm only:/group id...

-check_type

-checkType

Specifies a specific object type to check (includes


subtypes). Must be a subtype of dm_sysobject (not
dm_acl or dm_group). Other types will not be checked.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

77

Managing the Documentum Index Agent

Job arg

ftintegrity option

Description

-check_metadata

-checkMetadata

Specifies path to a file metadata.txt in


xplore_home/setup/indexagent/tools. This file
contains one single-valued attribute name per line.
Requires a type specified in the -check_type argument.

-check_unmaterialized_lwso

-checkLWSO

Sets whether to check unmaterialized lightweight


sysobjects during comparison. Default: false.

-collection_name

Not available

Compares index for the specified collection to data in


the repository.
Default: All collections. Cannot use this argument with
the ftintegrity script.

-end_date

-EndDate

Local end date of sysobject r_modify_date, for range


comparison. Format: MM/dd/yyyy HH:mm:ss

-ftengine_standby

-ftEngineStandby

Dual mode (FAST and xPlore on two Content Servers)


only: set the parameter -ftEngineStandby to true:
-ftEngineStandby T
Dual mode is not supported in xPlore 1.3.

-fulltext_user

-fulltextUser

Name of user who owns the xPlore instance.


For dual mode (FAST and xPlore), the user is
dm_fulltext_index_user_01.
Dual mode is not supported in xPlore 1.3.

-get_id_in_indexing

Not available

If specified, IDs that have not yet been indexed will be


dumped to a file, ObjectId-in-indexing.txt.
Default: False. Cannot use this argument with the
ftintegrity script.

-sort_order

-sortOrder

Sets the sort order of object IDs. Valid values:


NLS_SORT values in Oracle 11g. For hex values, use
BINARY.

-start_date

-StartDate

Local start date of sysobject r_modify_date, for range


comparison. Can be used to evaluate the time period
since the last backup. Format: MM/dd/yyyy HH:mm:ss

-timeout_in_minute

-timeout

Number of minutes to time out the session. Default: 1.

-usefilter value

-usefilter

Evaluates results using the configured index agent filters.


Default: false. The job runs more slowly with -usefilter.
For information on filters, see Configuring index agent
filters, page 69.

In addition, the job is installed with the -queueperson and -windowinterval arguments set. The
-queueperson and -windowinterval arguments are standard arguments for administration jobs and are
explained in the EMC Documentum Content Server Administration and Configuration Guide.

78

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing the Documentum Index Agent

Indexing documents in normal mode


In normal mode, indexable documents are queued up for indexing. You can also index a small set of
documents in the normal mode page of the index agent.
1. Start the index agent and click Start index agent in normal mode.
2. You can select objects for indexing with a DQL statement or a list of objects. Use ftintegrity or the
state of index job to generate the list. Edit the output file ObjectId-common-version-mismatch.txt
file to remove all data from the file except object IDs.
Note: Do not use this option for many documents. Instead, use migration mode (reindexing),
which allows a restart of the indexing process. Migration also reindexes all ACLs and groups,
keeping security up to date.

Resubmitting documents for indexing


You submit a list of objects for indexing in the index agent UI.
1. Start the index agent in normal mode. You get a page that allows you to input a selected list of
objects for indexing. Submit either a file of object IDs or DQL.
2. Run ftintegrity or the state of index Content Server job to get a list of objects that failed in indexing.
See Using ftintegrity, page 73.
3. Remove all data from the file ObjectId-common-version-mismatch.txt.
4. In the index agent UI, check Index selected list of objects and then check Object file. Navigate to
the file ObjectId-common-version-mismatch.txt and then click Submit.

Removing entries from the index


You can remove certain object types, or objects that meet other criteria such as dates, from the index.
You can execute a DQL query to get object IDs of the documents that you wish to delete from the
index. Save the list of object IDs in a text file.
1. Navigate to xplore_home/dsearch/xhive/admin.
2. Open deletedocs.properties in a text editor.
3. Make sure that the host and port values correspond to your environment.
4. Set the value of dss_domain to the xPlore domain from which you wish to delete indexed
documents.
5. Change the value of the key file_contains_id_to_delete to the path to your object IDs. Alternatively,
you can list the object IDs, separated by commas, as the value of the key ids_to_delete.
6. On Windows, run deleteDocs.bat; on Linux, run deleteDocs.sh.
The deletedocs utility records activity in a log in xplore_home/dsearch/xhive/admin/logs/deleteDocs.log

Indexing metadata only


Use iAPI to set the can_index attribute of a dm_format object to F(alse). As a result, contents of that
format are not full-text indexed. For example:
EMC Documentum xPlore Version 1.3 Administration and Development Guide

79

Managing the Documentum Index Agent

retrieve,c,dm_format where name = tiff


set,c,l,can_index
F
save,c,l

Making types non-indexable


1. In Documentum Administrator, select the object type (Administration > Types in the left pane).
2. Right-click for the list of attributes.
3. Uncheck Enable for indexing to exclude this type from indexing.
This setting turns off indexing events. If the option is checked but you have enabled an index agent
filter, the filter overrides this setting.

Making metadata non-searchable


You can make specific metadata non-searchable by adding a subpath definition. For example, to make
the Documentum attribute r_full_content_size non-searchable, create a subpath like the following. Set
the full-text-search attribute to false.
<sub-path leading-wildcard="false" compress="false"
boost-value="1.0" include-descendants="false"
returning-contents="false" value-comparison="false"
full-text-search="false" enumerate-repeating-elements="false"
type="double" path="dmftmetadata//r_full_content_size"/>

If you make this change after indexing, reindex objects to make the metadata non-searchable.
Documentum object types can be marked as non-indexed in Documentum Administrator. See Making
types non-indexable, page 80.

Injecting data and supporting joins


In xPlore indexing, the metadata is added to the dmftmetadata node of dftxml. Content is added to
the dmftcontent node. You can enhance document indexing with metadata or content from outside a
Documentum repository. To inject metadata or content, create a type-based object (TBO) in DFC and
deploy it to the repository. When you create a TBO for your type and deploy the TBO, DFC invokes
your customization for objects of the custom type. For information on creating TBOs, refer to EMC
Documentum Foundation Classes Development Guide.
To support queries on multiple related objects (joins), create a type-based object (TBO) in DFC that
denormalizes multiple objects into a single XML structure. The denormalized data is placed under the
dmftcustom node of dftxml. Related content such as an MS Word document or a PDF is referenced
under dmftcontent. Define specific indexes for data that is added to these nodes.
The following pseudocode example adds data from an attached object into the dftxml for the main
object. The example triggers reindexing of the main object when the attached object is updated. It also
prevents the attached object from being indexed as a standalone object. The following figure shows the
object model. The customization adds the data for the dm_note object to the dm_document object.
80

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing the Documentum Index Agent

Figure 8

Custom indexing object model

Sample TBO class


Extend DfPersistentObject and override the method customExportForIndexing as shown in the
following example.
public class AnnotationAspect extends DfSysObject
{protected void customExportForIndexing (IDfFtExportContext context)
throws DfException
{
super.customExportForIndexing(context);
Document document = context.getDocument();//gets the main document
NodeList nodes = document.getElementsByTagName( DfFtXmlElementNames.ROOT );
Element rootElem = nodes.getLength() > 0 ? (Element) nodes.item(
0 ) : null;
if (rootElem == null)
{
rootElem = document.createElement( DfFtXmlElementNames.ROOT );
document.appendChild( rootElem );
}
nodes = rootElem.getElementsByTagName( DfFtXmlElementNames.CUSTOM );
Element dmftcustom = nodes.getLength() > 0 ? (Element) nodes.item(
0 ) : null;
if (dmftcustom == null)
{
dmftcustom = document.createElement( DfFtXmlElementNames.CUSTOM );
rootElem.appendChild( dmftcustom );
}
Element mediaAnnotations = document.createElement("mediaAnnotations");
dmftcustom.appendChild(mediaAnnotations);
DocumentBuilder domBuilder = null;
try

EMC Documentum xPlore Version 1.3 Administration and Development Guide

81

Managing the Documentum Index Agent

{
domBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
}
catch (ParserConfigurationException e)
{
throw new DfException(e);
}
IDfCollection childRelations = getChildRelatives("dm_annotation");
while (childRelations.next())
{
Element annotationNode = document.createElement("annotation");
mediaAnnotations.appendChild(annotationNode);
try
{
IDfId id = childRelations.getTypedObject().getId("child_id");
// This will get the dm_note object
IDfDocument note = (IDfDocument) getSession().getObject(id);
ByteArrayInputStream xmlContent = note.getContent();
Document doc = domBuilder.parse(xmlContent);
// Add the node content
annotationNode.appendChild(doc);
// Add a node for the author of a note
Element authorElement = document.createElement("author");
authorElement.setTextContent(note.getString("r_modifier);
annotationNode.appendChild(authorElement)
}
catch (SAXException e)
{
// Log the error
catch (IOException e)
{
// Log the error
}
}
childRelations.close();}}

Generated dftxml
<dmftdoc>
...
<dmftcustom>
<mediaAnnotations>
<annotation>
<content>
This is my first note
</content>
<author>Marc</author>
</annotation>
<annotation>
<content>
This is my second note

82

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing the Documentum Index Agent

</content>
<author>Marc</author>
</annotation>
</mediaAnnotations>
</dmftcustom>
</dmftdoc>

Triggering document reindexing


In the previous example, when the related object is modified, the main object is not reindexed. The
following TBO for the related type (dm_note) triggers reindexing of the dm_document object. The
TBO applies to dm_note objects. When the object is saved after modification, the object is queued for
indexing by calling IDfDocument.queue. Substitute the index agent name in the queue method. If you
have more than one agent, call the queue method for each one.
public class NoteTbo extends DfDocument implements IDfBusinessObject
{protected synchronized void doSaveEx (boolean keepLock, String versionLabels,
Object[] extendedArgs) throws DfException
{
super.doSaveEx(keepLock, versionLabels, extendedArgs);
IDfCollection parentRelations = getParentRelatives("dm_annotation");
while (parentRelations.next())
{
IDfId id = parentRelations.getTypedObject().getId("parent_id");
IDfDocument annotatedObject = (IDfDocument) getSession().getObject(id);
annotatedObject.queue("dm_fulltext_index_user", "
dm_force_ftindex", 1, false, new DfTime(), ""); }}}

Preventing indexing of attached objects


If the attached object does not need to be searchable, you can prevent indexing. Use a custom
preindexing filter as described in Custom content filters, page 83.

Indexing the injected metadata


In addition to your TBO, set up an index for your injected metadata. See Creating custom indexes,
page 145.

Custom content filters


You can configure index agent filters that exclude cabinets, folders, or object types from indexing.
Configure these filters using the index agent UI. For more information, see Configuring index agent
filters, page 69.
You can implement other kinds of filters with the DFC interface
com.documentum.fc.indexagent.IDfCustomIndexFilter. You can filter content by creation date or some
custom attribute value. Your class should return one of the actions of DfCustomIndexFilterAction for
the content that you are filtering:
EMC Documentum xPlore Version 1.3 Administration and Development Guide

83

Managing the Documentum Index Agent

INDEX: Index the content and metadata.


SKIP: Skip indexing of the content and metadata.
METADATA: Index only the metadata of the content.
Deploy your customization as a BOF module.
The following code skips objects of type dm_note during indexing.

TBO filtering of indexable objects


public class NoteFTFilter extends DfSingleDocbaseModule
implements IDfCustomIndexFilter
{
public DfCustomIndexFilterAction getCustomIndexAction (
IDfPersistentObject object)
throws DfException
{
// We dont filter any non-sysobjects such as Groups and ACLs
//
if (!(object instanceof IDfSysObject))
return DfCustomIndexFilterAction.INDEX;
String objectTypeName = object.getString("r_object_type");
if ("dm_note".equals(objectTypeName))
{
return DfCustomIndexFilterAction.SKIP;
}
return DfCustomIndexFilterAction.INDEX;
}
}

Reindexing after removing a Documentum


attribute
When you delete a custom Documentum attribute, the values for this attribute are still available in
the indexes and users can find them with a full-text search. To avoid inconsistent results, reindex the
object type that you modified.
Rebuilding the indexes does not solve the issue since the dftxml representations for the objects still
contain the attribute values.
If the index agent is running in normal mode, the removal of the attribute does not trigger a new
indexing.
By forcing the reindexing of the objects, the index agent retrieves the objects from the repository and
creates new dftxml representations.
To reindex objects of a specific type, follow the procedure described in Migrating documents by
object type, page 72.

84

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing the Documentum Index Agent

Troubleshooting the index agent


The index agent log uses DFC logger which uses log4j, and calls xPlore client API which uses slf4j.
The configuration files for the two logging mechanisms are located in the WEB-INF/classes directory
of the index agent WAR file. You can change the amount of information by setting the following
properties:
In logback.xml: <logger name=com.emc.documentum.core.fulltext
additivity=false level=DEBUG>

This package logs xPlore client information in dsearchclient.log.


In log4j.properties: log4j.category.com.documentum.server.impl=DEBUG
This package logs DFC information in Indexagent.log.
The log files are located in xplore_home/jboss5.1.0/server/DctmServer_Indexagent/logs.
Indexing errors are reported in Indexagent.log. For example,
2012-03-28 11:16:47,673 WARN PrepWorkItem [PollStatusThread]
[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN]
Received warn callback: id: 090023a380000202 message:
DOCUMENT_WARNING CPS Warning [Unknown error during text extraction(
native code: 961, native msg: access violation)].

Checking indexing status


The index agent UI displays indexing status. On login, you can view information about the last
indexing operation: Date and time, total count, completed count, success count, warning count, and
failure count.
When you view Details during or after an indexing process, you see the following statistics:
Active items: Error count, indexed content size, indexed count, last update timestamp, size, and
warnings count.
Indexer plugin: Maximum call time
Migration progress (if applicable): Processed docs and total docs.
Averages: Pause time, KB/sec indexed, number of indexed docs/sec, plugin blocking max time.
List of current internal index agent threads
When you start an indexing operation, a status summary is displayed until indexing has completed.
Click Refresh to update this summary. The summary disappears when indexing has completed.
The following table compares the processing counts reported by the index agent and xPlore
administrator.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

85

Managing the Documentum Index Agent

Table 10

Comparing index agent and xPlore administrator indexing metrics

Metric

Index agent

xPlore administrator

Failed

Documents not submitted to xPlore

Errors in CPS processing, content


and metadata not indexed. Count
does not include failures of the
index agent.

Warning

Metadata indexed by not content

Metadata indexed but not content

Success

Documents indexed by xPlore

Documents indexed by xPlore

Index agent timeout


DM_INDEX_AGENT_ITEM_TIMEOUT errors in the index agent log indicate that the indexing
requests have not finished in the specified time (runaway_item_timeout). It does not mean that the
documents were not indexed. These indexing requests are stored in the xPlore processing queue.
You can also have index agent timeouts for large files. To increase text (content) and file size
maximums, change the values of configuration for max_text_threshold, request_time_out, and
max_data_per_process.
Run the xPlore administrator report Document Processing Error Summary to see timeouts.
For these timeouts, change the following values in indexagent.xml. This file is located in
xplore_home/jboss5.1.0/server/DctmServer_Indexagent/deploy/IndexAgent.war/WEB-INF/classes.
exporter/content_clean_interval: Increase.
indexagent_instance/runaway_item_timeout: In migration mode, set this parameter to the same
value as content_clean_interval.
In normal mode, also set runaway_item_timeout in the dm_ftindex_agent_config object. For
example, using iAPI:
retrieve,c,dm_ftindex_agent_config
set,c,l,runaway_item_timeout
<value>
save,c,l

Where <value> is the timeout in seconds.


request_timeout_sec: Increase from 20 min. to 30 min (1800 sec). Add this parameter to the file: it
is not exposed because of the potential out of memory error. The default value is 1200 sec.
Note: Increasing the timeout values can cause an out-of-memory error. EMC recommends that you
test the system under load to make sure that your changes are safe.

Submitting objects for reindexing


You can submit for reindexing the list of objects that ftintegrity generates (Using ftintegrity, page 73.)
86

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing the Documentum Index Agent

To check on the status of queue items that have been submitted for reindexing, use the following DQL.
For username, specify the user logged in to the index agent UI and start reindexing.
select task_name,item_id,task_state,message from dmi_queue_item where
name=username and event=FT re-index

If task_state is done, the message is "Successful batch..." If the task_state is failed, the message is
"Incomplete batch..."
To resubmit one document for reindexing
Put the object ID into a temporary text file. Use the index agent UI to submit the upload: Choose
Index selected list of objects >Object File option.
To remove queue items from reindexing
Use the following DQL. For username, specify the user logged in to the index agent UI and started
reindexing.
delete dmi_queue_item object where name=username and
event=FT re-index

Problem: ACL and group objects are not replicated in


xPlore
If the administrator has disabled full-text events for the full-text user on these objects, they are not
replicated to xPlore. To verify that the full-text user is registered to generate events, run the following
query:
?,c,select registered_id, event from dmi_registry where user_name =
dm_fulltext_index_user"

You should see a result like the following:


registered_id
---------------030000f280000105
030000f280000105
030000f280000105
030000f280000105
030000f280000104
030000f280000104
030000f280000105
030000f280000101
030000f280000101
030000f280000101

event
-------------------------------dm_move_content
dm_checkin
dm_readonlysave
dm_destroy
dm_save
dm_destroy
dm_save
dm_save
dm_destroy
dm_saveasnew

Problem: Checkout events are not registered for indexing


By default dm_checkout events do not generate indexing. Checkout-related changes like r_lock_owner
and r_lock_machine are not updated in xPlore. For example, search results do not display content
as checked out.
You can register dm_checkout for indexing, but it has a performance impact.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

87

Managing the Documentum Index Agent

Problem: Content Server cannot find the index agent on a


different domain
The index agent installer does not ask for the host name of the index agent host. Instead, it uses
InetAddress.getHostName and registers the result in the dm_server_config app_server_uri attribute.
This is not a fully qualified domain name (FQDN). To work around this issue, manually update the
value in dm_server_config with an FQDN value.

Problem: Indexing is slow


The bottleneck can be in the RDBMS, index agent, or xPlore.
If a simple SQL query like count(*) from the sysobject table takes a long time, the RDBMS is slow.
Try a restart or run update statistics.
If the index agent log shows CONNECTOR_PAUSED or EXPORTER_PAUSED, the problem
is in the index agent.
If the index agent log show INDEXER_PAUSED, the problem is in xPlore.

Automatically stop indexing after errors


When the index agent receives a response from xPlore, a counter is updated for each
error message. When the counter exceeds a configurable error_threshold, the index agent
performs the configured action, for example, stops indexing. To edit the error thresholds,
stop the index agent instance and edit the fileindexagent.xml. (This file is located in
xplore_home/jboss5.1.0/server/DctmServer_Indexagent/deploy/IndexAgent.war/WEB-INF/classes.)
Locate the element error_configs, just after the closing tag of indexer_plugin_config.
For example, to automatically stop the index agent when the number of CONNECTION FAILURE
errors reach 10 in 300 seconds, add the following to indexagent.xml:
<error_config>
<error_code>CONNECTION FAILURE</error_code>
<error_threshold>10</error_threshold>
<time_threshold>300</time_threshold>
<action>stop</action>
</error_config>

Each error_config element contains the following elements:


Table 11

Index agent error configuration

Element name

Description

error_config

Contains error_code, error_threshold, time_threshold


and action elements.

error_code

See Error codes, page 89.

error_threshold

Number of errors at which the action is executed.

time_threshold

Time in seconds at which to check the counter. If


error_threshold is exceeded, the action is executed.

action

Valid value: stop

88

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing the Documentum Index Agent

Table 12

Error codes

error_code

Description

UNSUPPORTED_DOCUMENT

Unsupported format

XML_ERROR

XML parsing error for document content

DATA_NOT_AVAILABLE

No information available

PASSWORD_PROTECTED

Password protected or document encrypted

MISSING_DOCUMENT

RTS routing error

INDEX_ENGINE_NOT_RUNNING

xPlore indexing service not running

CONNECTION FAILURE

xPlore server is down

By default, if xPlore server is down (CONNECTION FAILURE error), the indexing and the data
ingestion stop after the specified number of errors happens in the specified time period. In this case,
the index agent status displayed is finished. When the problem is solved and xPlore is up and
running, use the index agent UI to stop and restart the index agent.

Problem: DM_SYSOBJECT_E_CANT_SAVE_NO_LINK
ERROR
The error in the index agent log is Cannot save xxx sysobject without any link. Possible causes:
The index agent configurator failed to retrieve full-text repository objects.
The index agent installation user does not have a default folder defined in the repository, or the
folder no longer exists.
To verify, dump the user with the following iAPI commands. Substitute the installation owner name.
retrieve,c,dm_user where user_name=installation_owner
get,c,l,default_folder

Cleaning up the index queue


In Documentum Administrator, you can check the index agent queue. Navigate to Administration >
Indexing Management > Index Queue. The drop-down list displays Indexing failed, Indexing in
progress, Awaiting indexing, Warning, and All. From the Indexing failed display, you can find the
object ID and type, and the type of failure. Some types of errors are the following:
[DM_FULLTEXT_E_SEARCH_GET_ROW_FAIL...] Caused by incorrect query plugin
[DM_FULLTEXT_E_QUERY_IS_NOT_FTDQL...] Caused by incorrect query plugin
[DM_FULLTEXT_E_EXEC_XQUERY_FAIL...] There is nothing in the index.
To sort by queue state when there is a large queue, use the following DQL command in Documentum
Administrator:
select count(*), task_state from dmi_queue_item where name like %fulltext%

EMC Documentum xPlore Version 1.3 Administration and Development Guide

89

Managing the Documentum Index Agent

group by task_state

To check the indexing status of a single object, get the queue item ID for the document in the details
screen of the index agent UI. Use the following DQL to check the status of the queue item:
select task_name,item_id,task_state,message from dmi_queue_item where name=
username and event=FT re-index

Cleaning up the index queue


You can clean up the index queue before restarting the index agent. Using iAPI in Documentum
Administrator, remove all dmi_queue_items with the following command, inserting the full-text
user for the value of name:
?,c,delete dmi_queue_item object where name = dm_fulltext_index_user

To check registered types and the full-text user name, use the following iAPI command.
?,c,select distinct t.name, t.r_object_id, i.user_name from dm_type t,
dmi_registry i where t.r_object_id = i.registered_id and i.user_name like
%fulltext%

You see results like the following:


name r_object_id user_name
--------------------------- ---------------- ---------------------dm_group
0305401580000104 dm_fulltext_index_user
dm_acl 0305401580000101 dm_fulltext_index_user
dm_sysobject 0305401580000105 dm_fulltext_index_user

Index agent startup issues


Firewall issues
The following error is logged in indexagent.log:
com.xhive.error.XhiveException [TRANSACTION_STILL_ACTIVE]
The operation can only be done if the transaction is closed

Enable connections between the index agent host, the Content Server, and xPlore through the firewall.

Startup problems
Make sure that the index agent web application is running. On Windows, verify that the Documentum
Indexagent service is running. On Linux, verify that you have instantiated the index agent using
the start script in xplore_home/jboss5.1.0/server.
Make sure that the user who starts the index agent has permission in the repository to read all content
that is indexed.
If the repository name is reported as null, restart the repository and the connection broker and try again.
If you see a status 500 on the index agent UI, examine the stack trace for the index agent instance. If a
custom routing class cannot be resolved, this error appears in the browser:
org.apache.jasper.JasperException: An exception occurred processing JSP page

90

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Managing the Documentum Index Agent

/action_dss.jsp at line 39
...
root cause
com.emc.documentum.core.fulltext.common.IndexServerRuntimeException:
com.emc.documentum.core.fulltext.client.index.FtFeederException:
Error while instantiating collection routing custom class...

If the index agent web application starts with port conflicts, stop the index agent with the script. If you
run a stop script, run as the same administrator user who started the instance. The index agent locks
several ports, and they are not released by closing the command window.

Restarting the index agent


If you stop and restart the index agent before it has finished indexing a batch of documents that you
manually submitted through the index agent UI, resubmit the indexing requests that were not finished.

Cannot stop the index agent


If you have configured two index agents on the same host and port, you see the following error
message when you attempt to stop the agent:
Exception in thread "main" java.lang.SecurityException:
Failed to authenticate principal=admin, securityDomain=jmx-console

You can kill the JVM process and run the index agent configurator to give the agents different ports.

Content query returned no data


If you see the warning Content query returned no data in the index agent log, the index agent was
not able to get the content for the file with the specified ID. The object is not indexed and remains
in the indexing queue. For example:
Content query returned no data for object
09026d9180049404 DfFtExportContext.java:363

Possible causes: I/O error, file permissions, or network packet failure.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

91

Chapter 5
Document Processing (CPS)
This chapter contains the following topics:

About CPS

Adding a remote CPS instance

Removing a CPS instance

Configuring CPS dedicated to indexing or search

Administering CPS

Modifying CPS configuration file

Maximum document and text size

Configuring languages and encoding

Indexable formats

Lemmatization

Handling special characters

Configuring stop words

Troubleshooting content processing

Adding dictionaries to CPS

Custom content processing

About CPS
The content processing service (CPS) performs the following functions:
Retrieves indexable content from content sources
Determines the document format and primary language
Parses the content into index tokens that xPlore can process into full-text indexes
If you test Documentum indexing before performing migration, first replicate security. See Manually
updating security, page 52.
For information on customizations to the CPS pipeline, see Custom content processing, page 122.

Language identification
Some languages have been tested in xPlore. Many other languages can be indexed. Some languages
are identified fully including parts of speech, and others require an exact match. For a list of languages
EMC Documentum xPlore Version 1.3 Administration and Development Guide

93

Document Processing (CPS)

that CPS detects, see Basistech documentation. If a language is not listed as one of the tested languages
in the xPlore release notes, search must be for an exact match. For tested languages, linguistic features
and variations that are specific to these languages are identified, improving the quality of search
experience.

White space
White space such as a space separator or line feed identifies word separation. Then, special characters
are substituted with white space. See Handling special characters, page 108.
For Asian languages, white space is not used. Entity recognition and logical fragments guide the
tokenization of content.

Case sensitivity
All characters are stored as lowercase in the index. For example, the phrase "Im runNiNg iN THE
Rain is lemmatized and tokenized as "I be run in the rain.
There is a limited effect of case on lemmatization. In some languages, a word can have different
meanings and thus different lemmas depending on the case.
Case sensitivity is not configurable.

Adding a remote CPS instance


By default, every xPlore instance has a local CPS service. Each CPS service receives processing
requests on a round-robin basis. For a high-volume environment with multiple xPlore instances, you
can configure a dedicated CPS for each instance. You must have low network latency. This dedicated
CPS reduces network overhead. See Configuring CPS dedicated to indexing or search, page 96.
To improve indexing or search performance, you can install CPS on a separate host. The installer adds
a JBoss instance, CPS ear file, and CPS native daemon on the remote host.
Note: The remote instance must be on the same operating system as other xPlore instances.
1. Install the remote CPS instance using the xPlore installer.
2. Configure the instance for CPS only.
3. Register the remote CPS instance in xPlore administrator. Open Services > Content Processing
Service in the tree.
4. In the Content Processing Service page, click Add.
5. In the Add Service window, select Remote and provide information of the remote CPS instance
you are adding.
a. Enter the URL to the remote instance using the following syntax:
http://hostname:port/services

For example:
http://DR:8080/services

94

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Document Processing (CPS)

b. From the Instance list, select an instance you want to add the CPS to.
c. From the Usage list, specify whether the CPS instance processes indexing requests (the index
option), search requests (the search option), or both (the all option).
d. Click OK.
Note: Once added, the remote CPS appears as UNREACHABLE. Restart all xPlore instances
for it to take effect.
6. Specify whether the CPS instance performs linguistic processing (lp) or text extraction (te). If a
value is not specified, TE and LP are sent to CPS as a single request.
a. In indexserverconfig.xml, locate the content-processing-services element. This element
identifies each CPS instance, The element is added when you install and configure a new
CPS instance.
b. Add or change the capacity attribute on this element. The capacity attribute determines whether
the CPS instance performs text extraction, linguistic processing, or all. In the following
example, a local CPS instance analyzes linguistics, and the remote CPS instance processes
text extraction.
<content-processing-services analyzer="rlp" context-characters="
!,.;?&quot;" special-characters="@#$%^_~*&amp;:()-+=&lt;&gt;/\[]{}">
<content-processing-service capacity="lp" usage="all" url="local"/>
<content-processing-service capacity="te" usage="index" url="
http://myhost:9700/services"/>
</content-processing-services>

7. Restart the CPS instance using the start script startCPS.bat or startCPS.sh in
xplore_home/jboss5.1.0/server. (On Windows, the standalone instance is installed as an automatic
service.)
8. Test the remote CPS service using the WSDL testing page, with the following syntax:
http://hostname:port/services/cps/ContentProcessingService?wsdl

After you install and register the remote instance, you see it in the Content Processing Service UI of
xPlore administrator. You can check the status and see version information and statistics.
Check the CPS daemon log file cps_daemon.log for processing event messages. For a local process,
the log is in xplore_home/jboss5.1.0/server/DctmServer_ PrimaryDsearch/logs. For a remote CPS
instance, cps_daemon.log is located in cps_home/jboss5.1.0/server/cps_instance_name/logs. If a CPS
instance is configured to process text only, TE is logged in the message. For linguistic processing, LP
is logged. Both kinds of processing log CF messages.

Removing a CPS instance


By default, every xPlore instance has a local CPS service. After you add a remote CPS to the local
xPlore instance, you can disable or remove the local CPS so that the local xPlore instance can leverage
extra CPU and memory capacity on the remote host.
To remove an existing CPS instance, in the Content Processing Service page, click the Remove action
button (red cross) next to the CPS you want to remove. Confirm the CPS removal and restart all xPlore
instances for the new configuration to take effect.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

95

Document Processing (CPS)

Note: An xPlore instance must have at least one CPS configured for it. If an xPlore instance has only
one CPS, either local or remote, you cannot disable or remove it.

Configuring CPS dedicated to indexing or


search
By default, every xPlore instance has a local CPS service. All CPS services receive processing
requests on a round-robin basis. For high-volume environments with multiple xPlore instances,
you can configure one or more CPS services to handle all processing requests for a specific xPlore
instance. This reduces network overhead.
You can configure additional CPS instances that are dedicated to indexing, for high ingestion
requirements, or dedicated to search, for heavy search usage.
1. Stop all xPlore instances.
2. Edit indexserverconfig.xml in an XML editor. In the following example, two instances are shown
in node elements. The content-processing-services element specifies the two CPS instances that are
available to all xPlore instances. Set the value of the usage attribute to all, index, or search.
<node appserver-instance-name="PrimaryDsearch" ...primaryNode="true"...
url="http://host1:9300/dsearch/"... hostname="host1" name="PrimaryDsearch">
...
<node appserver-instance-name="DsearchNode2" xdb-listener-port="9430"
primaryNode="false" ... url="http://host2:9300/dsearch/"... hostname="host2"
name="DsearchNode2">
...
<content-processing-services...>
<content-processing-service usage="all" url="http://host1:20000/services"/>
<content-processing-service usage="search" url="http://host1:21000/services"/>
</content-processing-services>

3. Create a content-processing-services element under each node element, after the node/properties
element, like the following:
<node appserver-instance-name="PrimaryDsearch" ...primaryNode="true"...
url="http://host1:9300/dsearch/"... hostname="host1" name="PrimaryDsearch">
...
<properties>
<properties>
<property value="10000" name="statusdb-cache-size"/>
</properties>
<content-processing-services analyzer="rlp" context-characters="
!,.;?&quot;"
special-characters="@#$%^_~*&amp;:()-+=&lt;&gt;/\[]{}">
</content-processing-services>
<logging>...

4. Move the shared CPS instances to each node, where they become dedicated CPS instances for that
node. Place their definitions within the content-processing-services element that you created. For
example:
<node appserver-instance-name="PrimaryDsearch" ...primaryNode="true"...
url="http://host1:9300/dsearch/"... hostname="host1" name="PrimaryDsearch">
...

96

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Document Processing (CPS)

<properties>
<properties>
<property value="10000" name="statusdb-cache-size"/>
</properties>
<content-processing-services analyzer="rlp" context-characters="
!,.;?&quot;"
special-characters="@#$%^_~*&amp;:()-+=&lt;&gt;/\[]{}">
<content-processing-service usage="all" url="http://host1:20000/services"/>
</content-processing-services>
<logging>...

Make sure that you have a remaining global CPS instance after the last node element. (You have
moved one or more of the content-processing-service elements under a node.) For example:
<content-processing-services analyzer="rlp" context-characters="!,.;?&quot;"
special-characters="@#$%^_~*&amp;:()-+=&lt;&gt;/\[]{}">
<content-processing-service usage="all" url="local"/>
</content-processing-services>

Administering CPS
Starting and stopping CPS
You can configure CPS tasks in xPlore administrator. In the left pane, expand the instance and click
Content Processing Service. Click Configuration .
1. Stop CPS: Select an instance in the xPlore administrator tree, expand it, and choose Content
Processing Service. Click Stop CPS and then click Suspend.
2. Start CPS: Select an instance in the xPlore administrator tree, expand it, and choose Content
Processing Service. Click Start CPS and then click Resume.
If CPS crashes or malfunctions, the CPS manager tries to restart it to continue processing.

CPS status and statistics


To view all CPS instances in the xPlore federation, expand Services > Content Processing Service. For
more information on remote instances, see Adding a remote CPS instance, page 94.
To view version information and statistics about a specific CPS instance, expand the instance and
click Content Processing Service.

Modifying CPS configuration file


You can modify some linguistic or text extraction parameters in CPS configuration file.
The following procedure provides high-level steps that you must perform for any change in CPS
configuration file.
1. Stop all CPS instances.
2. Make a copy of the CPS configuration file PrimaryDsearch_local_configuration.xml located in
xplore_home/dsearch/cps/cps_daemon.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

97

Document Processing (CPS)

3. Edit the CPS configuration file.


4. Make the required changes using an XML editor. Changes must be encoded in UTF-8. Do not use
a simple text editor such as Notepad, which can insert characters using the native OS encoding
and cause validation to fail.
5. Make the same changes in the configuration files of all remote CPS instances. The
configuration file of a remote CPS instance is named CPS_configuration.xml and located at
xplore_home/dsearch/cps/cps_daemon.
Remember to make a copy of the configuration files before modifying them.
6. Start all CPS instances.

Maximum document and text size


This section describes the configuration properties related to the size of documents that are indexed.
Depending on your documents, modifying these settings can improve or impact the ingestion
performance. To index large documents concurrently, implement one of the following solutions:
Add more memory and CPU capacity and stop unnecessary processes that compete for resources.
Add CPS daemons as described in Adding CPS daemons for ingestion or query processing, page
114. Only add CPS daemons if xPlore host has enough memory and CPU capacity.
Add remote CPS instances as described in Adding a remote CPS instance, page 94.
If xPlore host has enough memory and CPU capacity, you can add a CPS instance to the same host,
as described in Adding CPS instances in EMC Documentum xPlore Installation Guide.
Enable retry in a separate daemon for documents that failed during CPS processing.
1. Edit the CPS configuration file PrimaryDsearch_local_configuration.xml located in
xplore_home/dsearch/cps/cps_daemon.
2. Edit the retry_failure_in_separate_daemon property and set it to true. Default: False.
This can have an impact performance a little.
To avoid query timeouts when the indexing load is high, enable search on a dedicated CPS daemon:
1. Edit the CPS configuration file PrimaryDsearch_local_configuration.xml located in
xplore_home/dsearch/cps/cps_daemon.
2. Edit the query_dedicated_daemon_count property and set the number of daemons dedicated to
searches. Default: 0.

Maximum document size


Indexing agent (Documentum only) limits the size of the documents submitted for indexing. Stop the
index agent instance to change the size limit.
Set the maximum document size in the index agent
configuration file indexagent.xml, which is located in
indexagent_home/jboss5.1.0/server/DctmServer_Indexagent/deploy/IndexAgent.war/WEB-INF/classes.
Edit the contentSizeLimit parameter within the parent element exporter. The value is
in bytes. Default: 20 MB.
Larger documents are skipped. Only their metadata is indexed.
98

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Document Processing (CPS)

Maximum text size per document


CPS limits the size of text within a document that is indexed. A document can have a much greater
size (contentSizeLimit) compared to the indexable text within the document.
Set the maximum size of text within a document and the text in CPS batch in CPS configuration.
Choose an instance in xPlore administrator and click Configuration. Edit the max text threshold
parameter to set the size limit, in bytes. Default: 10485760 (10 MB). Maximum setting: 2 GB. Larger
values can slow ingestion rate and cause more instability.
Includes expanded attachments. For example, if an email has a zip attachment, the zip file is expanded
to evaluate document size. If you increase this threshold, ingestion performance can degrade under
heavy load.
If a document content text size is larger than this value and if the cut off text parameter is set to false,
CPS indexes only the metadata, not the content. Other documents in the same batch are not affected.
Increasing the maximum text size can negatively affect CPS memory consumption under heavy load.
In this case, the entire batch of submitted documents fails.

Partially index large documents (cut off text)


You can configure CPS to cut off text in documents that exceed max text threshold. When
configured, CPS indexes part of the content, up to the threshold instead of only the metadata. Set
cut_off_text to true in PrimaryDsearch_local_configuration.xml (default: false). This file is located in
xplore_home/dsearch/cps/cps_daemon.
Documents that are partially indexed are recorded in cps_daemon.log: docxxxx is partially processed.
The dftxml is annotated with the attribute partialIndexed on the dmftcontentref element.

Maximum text size in CPS batch


CPS processes documents by batches. You can configure the upper limit for a batch of documents
in CPS processing. Incoming requests are put on hold if the total content size for all documents in a
CPS batch exceeds the limit.
Edit the CPS configuration file PrimaryDsearch_local_configuration.xml located in
xplore_home/dsearch/cps/cps_daemon. Edit max_data_per_process and set its value in bytes. Default:
30 MB. Maximum setting: 2 GB.
For a hypothetical example, with the default of 30 MB, 3 documents with 10-MB text, 30 documents
of 1-MB text, or 300 documents of 100 KB of text would fill up the batch. If the batch text size is
exceeded, all documents in the batch fail. You can refeed them through the index agent.

Embedded content used for index rebuild


Set the maximum in xPlore administrator. Choose Indexing Service in the tree and click
Configuration. Edit the value of rebuild-index-embed-content-limit. Below this value, content is
embedded in requests sent to CPS. It is used for rebuilding the index. Larger content is passed in a file,
not embedded. Default: 2048 bytes.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

99

Document Processing (CPS)

Embedded XML content maximum size


Set the maximum size of XML content in bytes that is embedded within the xDB index. Edit
indexserverconfig.xml and set the value of the file-limit attribute of the xml-content element. Default:
512 KB.
If your repository has many XML documents and a summary is requested with search results, these
files may not display the summary properly. In this case, set xml-content embed=none.

Very large documents


In order to process a very large file (more than 100 MB), you can temporarily enlarge several thresholds
in indexagent.xml and CPS PrimaryDsearch_local_configuration.xml. In CPS configuration:
max_text_threshold, request_time_out, max_data_per_process. In Index Agent configuration:
runaway_item_timeout, content_clean_interval, request_timeout_sec. These changes will degrade
xPlore performance.

Configuring languages and encoding


Some languages have been tested in xPlore. (See the release notes for this version.) Many other
languages can be indexed. Some languages are identified fully including parts of speech, and others
are restricted to an exact match. For the list of identified languages and encodings for each language,
see the Basistech documentation.

Configuring language identification


The language of the content determines how the document is tokenized. During indexing, CPS
identifies the language of the document. If your repository has a single language, you can override
this behavior by setting a default locale for indexing. You can set a default locale for indexed content
and one for metadata. You can also set a default locale for queries.
To force a default indexing locale, add the following property to every category definition in
indexserverconfig.xml. This setting turns off language identification. Do it only if your repository has
a majority of documents in a single language. Use the 2-letter language code for the value attribute:
<category name="dftxml">
<properties>
...<property value="en" name="index-default-locale"/>
</properties>

To force a different default locale for metadata, add a property to every category definition like the
following:
<category name="dftxml">
<properties>
...<property value="fr" name="index-metadata-default-locale"/>
</properties>

For a query, the session locale from one of the following is used as the language for linguistic analysis:
Webtop login locale
100

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Document Processing (CPS)

DFC API locale or dfc.properties


iAPI and iDQL Content Server locale
The language of a query is identified by content or metadata:
Content
CPS uses the first 65536 characters in the document to identify the language of the document.
Metadata
By default, CPS uses the following Documentum object attributes to identify the language of the
metadata: object_name, title, subject, and keywords.
To set query locale:
1. Configure a default language if one is not identified. The default language must
be one of the supported languages listed in the linguistic_processor element
of the file PrimaryDsearch_local_configuration.xml. This file is located in
xplore_home/dsearch/cps/cps_daemon. Open indexserverconfig.xml in xplore_home/config.
2. Add a property named query-default-locale with the desired language under the search-config
element. For example:
<search-config><properties>
<property value="en" name "query-default-locale"/>...

The query locale can be overridden by setting the property dsearch_override_locale in


dm_ftengine_config.
3. Change the metadata that are used for language identification. Set an attribute as the value of the
name on the element element-for-language-identification. For example:
<linguistic-process>
<element-for-language-identification
<element-for-language-identification
<element-for-language-identification
<element-for-language-identification
</linguistic-process>

name="object_name"/>
name="title"/>
name="subject"/>
name="keywords"/>

4. Validate your changes to indexserverconfig.xml. See Modifying indexserverconfig.xml, page 43.


5. (Optional) Check the identified language for a document: Use xPlore administrator to view the
dftxml of a document. Click the document in the collection view, under Data Management. The
language is specified in the lang attribute on the dmftcontentref element. For example:
<dmftcontentref content-type="" lang="en" encoding="utf-16le" ...>

6. (Optional) Check the session locale for a query. Look at the xPlore log event that prints the query
string (in dsearch.log of the primary xPlore instance). The event includes the query-locale setting
used for the query. For example:
query-locale=en

7. (Optional) Change the session locale of a query. The session_locale attribute on a Documentum
object is automatically set based on the OS environment. To search for documents in a different
language, change the local per session in DFC or iAPI. The iAPI command to change the
session_locale:
set,c,sessionconfig,session_locale

EMC Documentum xPlore Version 1.3 Administration and Development Guide

101

Document Processing (CPS)

The DFC command to set session locale on the session config object (IDfSession.getSessionConfig):
IDfTypedObject.setString("session_locale", locale)

Overriding language identification for metadata


In some situations, language used by metadata and content should be consistent but the language
identified for metadata does not always represent the language used by the corresponding content.
To avoid this, suppress language identification for metadata. CPS always use the same language
identified for content.
In indexserverconfig.xml, set the following property value to True under the index-config element.
Add the property if not present in the configuration file.
<property value="True" name "use-content-lang-for-metadata"/>

For multi-language documents, set this option to False.


The index-metadata-default-locale property overrides the use-content-lang-for-metadata
property in indexserverconfig.xml. For example, if you set index-default-locale to "en,
index-metadata-default-locale to fr, and use-content-lang-for-metadata to True, CPS uses French
as the language for metadata rather than English.
Note: If the extraction configuration option xml-content index-as-sub-path is set to True, the
use-content-lang-for-metadata does not take effect.

Configuring query languages


Add the following properties to the search-config element in indexserverconfig.xml to support
language detection for queries. Query language is detected or bypassed depending on these properties
query-enable-language-detection: Set to true to detect query language. If false, the query application
server language is used.
query-no-detection-input-language: If the session locale matches one of the listed languages then
the detection is not attempted.
For example, the user session locale is in English, but the user queries in multiple languages.
English (en) should not be listed in query-no-detection-input-language. It allows auto language
detection. If a user always queries in the same language as the session locale, then add the language
to the list. It prevents the document language from being identified incorrectly.
query-detection-language-override: List the languages that override the input language when they
are detected. If the detected language matches one of the languages listed, the query application
server language is overridden with the detected language.
After you change these properties, restart xPlore instances.

Handling apostrophes
Some languages have more apostrophes as part of a name or other part of speech. The default list
of special context characters includes the apostrophe. Apostrophes in words are treated as white
102

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Document Processing (CPS)

space. You can remove the apostrophe from the list if words are not correctly found on search. See
Handling special characters, page 108.

Supporting bidirectional languages


Bidirectional languages are languages like Arabic or Hebrew. They are written mainly from
right to left, but some portions of the text are written from left to right. You must configure the
following property to support bidirectional languages for files in PDF format. Do not set this if your
environment does not index bidirectional documents, because this change impacts performance. Edit
PrimaryDsearch_local_configuration.xml in xplore_home/dsearch/cps/cps_daemon. Add or edit the
reorder_bidi_pdf property to the stellent text_extractor element:
<text_extractor>
<name>stellent</name>
...
<properties>
<property name="extract_xmp_metadata">true</property>
<property name="identify_file_normally">true</property>
<property name="read_buffer_size">2</property>
<property name="allow_sub_doc_failure">true</property>
<property name="reorder_bidi_pdf">true</property>

If you have already indexed objects in PDF format that would be affected by this change, you must
reindex them.

Indexable formats
Some formats are fully indexed. For some formats, only the metadata is indexed. For a full list of
supported formats, see Oracle Outside In 8.3.7 documentation.
If a format cannot be identified, it is listed in the xPlore administrator report Document Processing
Error Detail. Choose File format unsupported to see the list.

Lemmatization
About lemmatization
Configuring indexing lemmatization
Lemmatizing specific types or attributes
Troubleshooting lemmatization
Saving lemmatization tokens

About lemmatization
Lemmatization is a normalization process that reduces a word to its canonical form. For example, a
word like books is normalized into book by removing the plural marker. Am, are, and is are normalized
to "be. This behavior contrasts with stemming, a different normalization process in which stemmed
EMC Documentum xPlore Version 1.3 Administration and Development Guide

103

Document Processing (CPS)

words are reduced to a string that sometimes is not a valid word. For example, ponies becomes poni.
xPlore uses an indexing analyzer that performs lemmatization. Studies have found that some form
of stemming or lemmatization is almost always helpful in search.
Lemmatization is applied to indexed documents and to queries. Lemmatization analyzes a word for
its context (part of speech), and the canonical form of a word (lemma) is indexed. The extracted
lemmas are actual words.

Alternate lemmas
Alternative forms of a lemma are also saved. For example, swim is identified as a verb. The noun
lemma swimming is also saved. A document that contains swimming is found on a search for swim.
If you turn off alternate lemmas, you see variable results depending on the context of a word. For
example, saw is lemmatized to see or to saw depending on the context. See Configuring indexing
lemmatization, page 105.

Query lemmatization
Lemmatization of queries is more prone to error because less context is available in comparison to
indexing.
The following queries are lemmatized:
IDfXQuery: The with stemming option is included.
The query from the client application contains a wildcard.
The query is built with the DFC search service.
The DQL query has a search document contains (SDC) clause (except phrases). For example,
the query select r_object_id from dm_document search document contains companies winning
produces the following tokens: companies, company, winning, and win.

Phrase search and lemmatization


Even though phrases are not lemmatized, they may return a document that has been lemmatized
during indexing. For example, a document contains the word felt. It is lemmatized to felt and feel. A
phrase search for feels happy does not find the document. However, a search for feel happy finds it,
even though the original phrase is felt happy.
You can configure xPlore to match phrases exactly in queries. This requires more space for the index,
because position information is stored. You must reindex documents to get this exact match for
phrases. Also, if you enable this feature, fuzzy search will not work. For information about fuzzy
search, see Configuring fuzzy search, page 215.
To configure exact phrase match in queries:
1. Edit indexserverconfig.xml in xplore_home/config.
2. Add a query-exact-phrase-match property to the search-config element:
<search-config>
<properties>
<property value="en" name="query-default-locale"/>

104

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Document Processing (CPS)

...
<property value="true" name="query-exact-phrase-match"/>
</properties>
</search-config>

3. Restart the primary and secondary xPlore instances.


4. Rebuild the index using xPlore administrator.

Lemmatization and index size


Lemmatization saves both the indexed term and its canonical form in the index, effectively doubling
the size of the index. Multiple alternates for the lemma also increase the size of the index. Evaluate
your environment for index storage needs and lemmatization expectations for your users.

Configuring indexing lemmatization


For information on configuring query lemmatization, see Configuring query lemmatization, page 212.
1. To turn off lemmatization for indexing, add an enable-lemmatization attribute to the
index-server-configuration element in indexserverconfig.xml. Set the value to false. See Modifying
indexserverconfig.xml, page 43.
Note: If you wish to apply your lemmatization changes to the existing index, reindex your
documents.
2. Alternate lemmas are generated by default. To turn off alternate lemmas, modify the file
cps_context.xml located in xplore_home/dsearch/cps/cps_daemon. Set the value attribute on the
following property to false (default = true):
<property name="com.basistech.bl.alternatives" value="false"/>

Lemmatizing specific types or attributes


By default, all input is lemmatized unless you configure lemmatization for specific types or attributes.
1. Open indexserverconfig.xml in xplore_home/config.
2. Locate the category element for your documents (not ACLS and groups). Add or edit a
linguistic-process element. This element can specify elements or their attributes that are
lemmatized when indexed, as shown in the following table of child elements.
Table 13

linguistic-process element

Element

Description

element-with-name

The name attribute on this element specifies the


name of an element that contains lemmatizable
content.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

105

Document Processing (CPS)

Element

Description

save-tokens-for-summary-processing

Child of element-with-name. If this element exists,


the parent element tokens are saved. They are used
in determining a summary or highlighting. Specify
the maximum size of documents in bytes as the
value of the attribute extract-text-size-less-than.
Tokens will not be saved for larger content. Set
the maximum size of tokens for the element as the
value of the attribute token-size.

element-with-attribute

The name attribute on this element specifies the


name of an attribute on an element. The value
attribute contains a value of the attribute. When the
value is matched, the element content is lemmatized.

element-for-language-identification

Specifies an input element that is used by CPS to


identify the language of the document.

In the following example, the content of an element with the attribute dmfttype with a value of
dmstring is lemmatized. These elements are in a dftxml file that the index agent generates. For the
dftxml extensible DTD, see Extensible Documentum DTD, page 348.
If the extracted text does not exceed 262144 bytes (extract-text-size), the specified element is
processed. In the following example, an element with the name dmftcustom is processed . Several
elements are specified for language identification.
<linguistic-process>
<element-with-attribute name="dmfttype" value="dmstring"/>
<element-with-name name="dmftcustom">
<save-tokens-for-summary-processing extract-text-size-="
262144" token-size="65536"/>
</element-with-name>
<element-for-language-identification name="object_name"/> ...
</linguistic-process>

Note: If you wish to apply your lemmatization changes to the existing index, reindex your documents.

Troubleshooting lemmatization
If a query does not return expected results, examine the following:
Test the query phrase or terms for lemmatization and compare to the lemmatization in the context of
the document. (You can test each sample using xPlore administrator Test Tokenization.
View the query tokens by setting the dsearch logger level to DEBUG using xPlore administrator.
Expand Services > Logging and click Configuration. Set the log level for dsearchsearch. Tokens
are saved in dsearch.log.
Check whether some parts of the input were not tokenized because they were excluded from
lemmatization: Text size exceeds the configured value of the extract-text-size-less-than attribute.
Check whether a subpath excludes the dftxml element from search. (The sub-path attribute
full-text-search is set to false.)
If you have configured a collection to save tokens, you can view them in the xDB admin tool. (See
Debugging queries, page 259. ) Token files are generated under the Tokens library, located at the
106

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Document Processing (CPS)

same level as the Data library. If dynamic summary processing is enabled, you can also view tokens
in the stored dftxml using xPlore administrator . The number of tokens stored in the dftxml depends
on the configured amount of tokens to save. To see the dftxml, click a document in a collection.
Figure 9

Tokens in dftxml

Saving lemmatization tokens


You can save the tokens of metadata and content. Tokens are used to rebuild the index.
1. Open indexserverconfig.xml in xplore_home/config.
2. Set the property save-tokens to true for the collection. The default is false.
For example:
<collection document-category="dftxml" usage="Data" name="default">
<properties>
<property value="true" name="save-tokens" />
</properties>
</collection>

3. You can view the saved tokens in the xDB tokens database. Open the xDB admin tool in
xplore_home/dsearch/xhive/admin.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

107

Document Processing (CPS)

Figure 10

Tokens in xDB

The tokens database stores the following tokens:


Original and root forms (lemmas) of the text
Alternate lemmas
The components of compound words
The starting and ending offset relative to the field the text is contained in
Whether the word was identified as a stop word

Handling special characters


Special characters include accents in some languages. Accents are removed to allow searches for words
without supplying the accent in the term. You can disable diacritics removal. Other special characters
are used to break text into meaningful token: characters that are treated as white space, and punctuation.
If the following Unicode characters are contained within a word, they can be indexed. Words that
contain these characters are searchable:
Alphabetic characters
Numeric characters
Extender characters. Extender characters extend the value or shape of a preceding alphabetic
character. These are typically length and iteration marks.
108

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Document Processing (CPS)

Custom characters enclosing Chinese, Japanese, and Korean letters and months. These characters
are derived from a number of custom character ranges that have bidirectional properties, falling in
the 3200-32FF range. The specific character ranges are:
3200-3243
3260-327B
327F-32B0
32C0-32CB
32D0-32FE

Handling integer tokenization


By default, integers with spaces or dashes generate individual tokens in some languages, like English,
but not in others, like German. For example, 500599 generates two tokens in English (500 and 599)
but only one in German (500 599). A search for 500 from a German locale does not find the document
with 500599 in the object name.
To generate individual tokens, add the following the CPS configuration file
PrimaryDsearch_local_configuration.xml located in xplore_home/dsearch/cps/cps_daemon. Note that
it can have an impact on ingestion and search performance.
<property name="force_tokenize_on_whitespace">true
</property

Handling words with accents (diacritics)


Words with accents, such as those in English, French, German, and Italian, are indexed as normalized
tokens (accents removed) to allow search for the same word without the accent. However diacritics are
not removed from a query regardless of the normalize_form setting when the term contains a wildcard.
You can prevent diacritics from being normalized, in this case, only exact search terms (with accents)
return results. For example, the search term chateau would not return objects with chteau.
You can also index the two forms: the original form and the normalized form to allow wildcard
searches. In this case, normalize_form must be set to true.
To prevent diacritics from being normalized:
1. Change diacritics removal in the CPS configuration file PrimaryDsearch_local_configuration.xml
located in xplore_home/dsearch/cps/cps_daemon.
2. Locate the element linguistic_processing/properties/property and set the value of normalize_form
to false:
<property name="normalize_form">false</property>

To store original and normalized forms for words with accents:


Prerequisite: The parameter normalize_form must be set to true.
1. Edit the CPS configuration file PrimaryDsearch_local_configuration.xml located in
xplore_home/dsearch/cps/cps_daemon.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

109

Document Processing (CPS)

2. Locate the element linguistic_processing/properties/property and set the value of


keep_accents_in_original_form to true:
<property name="keep_accents_in_original_form">true</property>

If objects were already indexed, reindex them.

Characters that are treated as white space


The default special characters are defined in indexserverconfig.xml as the value of the
special-characters attribute on the content-processing-services element:
@#$%^_~&:.()-+=<>/\[]{}

Note: The special characters list must contain only ASCII characters.
For example, a phrase extract-text is tokenized as extract and text, and a search for either term finds the
document.

Characters that are required for context (punctuation)


The default context characters are defined in indexserverconfig.xml as the value of the
context-characters attribute of the content-processing-services element:
!,;?"

Note: The context characters list must contain only ASCII characters.
White space is substituted after the parts of speech have been identified. For example, the phrase "Is
John Smith working for EMC? the question mark is filtered out because it functions as a context
special character (punctuation).

Special characters in queries


When a string containing a special character is indexed, the tokens are stored next to each other in the
index. A search for the string is treated as a phrase search. For example, an index of home_base stores
home and base next to each other. A search for home_base finds the containing document but does
not find other documents containing home or base but not both.
If a query fails, check to see whether it contains a special character.
Note: Reindex your documents after you change the special characters list.

Configuring stop words


To prevent searches on common words, you can configure stop words that are filtered out of queries.
Stop words are not removed during indexing, to support phrase searches. Exceptions:
Stop words in phrase searches are not filtered.
Stop words that appear within the context of special characters are not filtered. For example, if the
query term is stop_and_go, and is a stop word. The underscore is defined in indexserverconfig.xml
as a special character. CPS does not filter the underscore.
110

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Document Processing (CPS)

A sample stop words list in English is provided in en-stopwords.txt in the directory


xplore_home\dsearch\cps\cps_daemon\shared_libraries\rlp\etc. You can edit this file.
The following stop word lists are provided for Chinese, Korean, and Japanese in subdirectories of
xplore_home\dsearch\cps\cps_daemon\shared_libraries\rlp\. Edit them in UTF-8 encoding. Each line
represents on lexeme (stop word). Blank lines or comments beginning with # are ignored.
Chinese: cma/dicts/zh_stop.utf8
Korean: kma/dicts/kr_stop.utf8
Japanese: jam/dicts/JP_stop.utf8
Stop word filtering is not configurable from Documentum interfaces. Stop words are always filtered
for single/multi-term searches, and stop words are not filtered in phrase searches.
Stop word filtering is configurable in an XQuery expression. Add the XQFT option "using stop
words default" to the query constraint.

Adding stop word lists to xPlore


To add stop words lists for other languages, register your lists in the file stop-options.xml in
xplore_home\dsearch\cps\cps_daemon\shared_libraries\rlp\etc. The stop words file must contain
one word per line, in UFT-8 format.
The following example adds a stop words list in Spanish after the English list:
<dictionarypath language="eng">
<env name="root"/>/etc/en-stopwords.txt</dictionarypath>
<dictionarypath language="es">
<env name="root"/>/etc/es-stopwords.txt</dictionarypath>

A sample Spanish stopwords file:


a
adonde
al
como
con
conmigo...

Troubleshooting content processing


CPS troubleshooting methods, page 112
CPS startup and connection errors, page 113
Adding CPS daemons for ingestion or query processing, page 114
Running out of space, page 115
CPS file processing errors, page 116
Troubleshooting slow ingestion and timeouts, page 117

EMC Documentum xPlore Version 1.3 Administration and Development Guide

111

Document Processing (CPS)

CPS troubleshooting methods


CPS log files
If CPS is installed as an in-process service on an xPlore instance, it shares the logging for the
primary instance web application. The log files cps.log and cps_daemon.log are located in
xplore_home/jboss5.1.0/server/DctmServer_PrimaryDsearch/logs.
Logging in a standalone instance
The logging configuration file is located in the CPS war file, in WEB-INF/classes. The log files cps.log
and cps_daemon.log are located in cps_home/jboss5.1.0/server/cps_instance_name/logs.
Logging separate instances
Make sure that each CPS instance specifies a different log file path. The log output file is specified
in the logback.xml file of the instance. If CPS is installed as a standalone service, the logback file is
located in the CPS war file, in WEB-INF/classes. If CPS is installed as an in-process service on an
xPlore instance, it shares the logback file of the indexserver web application.
Log levels
In order of decreasing amount of information logged: trace, debug, info, warn, and error. Set the log
level to INFO to troubleshoot CPS.
Log output
Each CPS request is logged with the prefix DAEMON#. You see prefixes for following libraries
in CPS logging:
CPS daemon: CORE
Text extraction: TE STELLENT
HTTP content retrieval: CF_HTTP
Language processing: LP
Language identification: LI_RLI
Following is an example from cps.daemon.log. (Remote CPS log is named cps_manager.log.)
2012-03-28 10:28:41,828 INFO [Daemon0(4496)-TE-Stellent-(136)]
identify_file_normally configured: true, actual: true
...
2012-03-28 11:16:37,204 WARN [Daemon0(4496)-Core-(3448)] LP of lpReq 0
of sub-request 4 of req 32 of doc 080023a38000151f based on fallback lang en,
encoding utf-16le
2012-03-28 11:16:36,829 WARN [Daemon0(4496)-LI-RLI-(5464)] No language matched.

Example: CPS performance by format


Use the timestamp difference between PERFCPSTS9 and PERFCPSTS10 to find the processing time
for a particular document. PERFCPSTS9 indicates that Content fetching of the single request is
finished. PERFCPSTS10 indicates that text extraction of the single request is finished.

112

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Document Processing (CPS)

Testing tokenization
Test the tokenization of a word or phrase to see what is indexed. Expand Diagnostic and Utilities in
the xPlore administrator tree and then choose Test tokenization. Input the text and select the language.
Different tokenization rules are applied for each language. (Only languages that have been tested are
listed. See the release notes for supported languages. Other languages are not tokenized.)
Uppercase characters are rendered as lowercase. White space replaces special characters.
The results table displays the original input words. The root form is the token used for the index. The
Start and End offsets display the position in raw input. Components are displayed for languages that
support component decomposition, such as German.
Results can differ from tokenization of a full document for the following reasons:
The document language that is identified during indexing does not match the language that is
identified from the test.
The context of the indexed document does not match the context of the text.
Use the executable CASample in xplore_home/dsearch/cps/cps_daemon/bin to test the processing
of a file. Syntax:
casample path_to_input_file

CPS startup and connection errors


CPS cannot connect on 64-bit host
CPS connection errors are recorded in the index agent log like the following:
ERROR IndexingStatus [PollStatusThread-PrimaryDsearch]
[DM_INDEX_AGENT_PLUGIN] Document represented by key 0800000c80002116 failed
to index into Repo, error:Unable to connect to CPS [CPS_ERR_CONNECT].

Workaround: Install the 64-bit version of xPlore.

CPS does not start


The most common cause of CPS non-start is that the port is occupied. The error message is Failed
to create server socket; the port may be occuped. The port is not tested when you
configure CPS. The CPS process (CPSDaemon) runs on port 9322 by default or xplore_base_port+22.
Make sure that the process has started. If it has not started, check the cps_daemon log in
xplore_home/jboss5.1.0/server/primary_instance/logs.
Go to xplore_home/dsearch/cps/cps_daemon/bin and try to run the daemon directly to see whether
it can be started.
Another possible cause is an unsupported OS. Verify the supported OS for your version of xPlore.
If CPS fails to start, the CPS configuration can be invalid. Check to see whether you have changed the
file InstanceName_local_configuration.xml in xplore_home/dsearch/cps/cps_daemon.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

113

Document Processing (CPS)

CPS restarts often


The CPS load can be too high, causing out of memory errors. Check the use of memory on the CPS
instance.
Suggested workarounds:
Change the CPS configuration: Decrease the number of worker threads.
Resubmit any failed files using the Documentum index agent.
Install a standalone CPS instance, temporarily for high ingestion periods, or permanently.
Throttle indexing for document size or count. See Throttling indexing, page 326.

Remote CPS issues


If you try to rebuild an index without configuration, you see an error in the remote CPS daemon
log like the following:
2011-02-16 22:53:32,646 ERROR [Daemon-Core-(3520)] ...
err-code 513, err-msg C:/xPlore/dsearch/cps/cps_daemon/export/080004578000
0e5bdmftcontentref1671968925161715182.txt cannot be opened.
The system cannot find the file specified.

Set the export_path location in the remote CPS PrimaryDsearch_local_configuration.xml to


a path accessible from all xPlore instances. This file is located in the CPS host directory
xplore_home/dsearch/cps/cps_daemon. If you change this file, restart all instances including the
remote CPS.

Communication error occurred while processing request


xxx
Resubmit the indicated document using the index agent UI. If you have more than one document with
this error, resubmit them in a batch.

Request marshal/unmarshal error


This error is caused by incompatible CPS installations. Make sure that you have upgraded all instances
after you upgrade the primary xPlore instance. See EMC Documentum xPlore Installation Guide
for details.

Adding CPS daemons for ingestion or query


processing
You can configure multiple CPS daemons for reindexing or heavy ingestion loads or add CPS daemons
to increase query processing. Multiple daemons do not consume as much memory as multiple
CPS instances. They have almost the same throughput as multiple CPS instances on the same
memory-provisioned host.
114

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Document Processing (CPS)

1. Stop the CPS instance in xPlore administrator. Choose Instances > Instance_name > Content
Processing Service and click Stop CPS.
2. Edit the CPS configuration file in the CPS host directory xplore_home/dsearch/cps/cps_daemon.
3. Change the value of element daemon_count to 3 or more (default: 1, maximum 7).
4. Change the value of connection_pool_size to 2.
5. Restart all xPlore instances.
6. (Optional for temporary ingestion loads) Reset the CPS daemon_count to 1 and
connection_pool_size to 4 after reindexing is complete.

Running out of space


Indexing fails when there is insufficient temp space. Both the Lucene index and the CPS daemon use
temp space. Other applications (not xPlore) can also be using the temp space. A message in the
index agent log indicates the problem:
IO_ERROR: error during add entry... Original message: No space left on device...

A similar error is Failed to create temporary file or error code 47 (file write error).
Ensure that the directory for the CPS temporary file path is large enough. Set the value of
temp_file_folder in the file PrimaryDsearch_local_configuration.xml. This file is located in
xplore_home/dsearch/cps/cps_daemon. Its size should not be less than 20 GB for file processing.

Adding temp space for Lucene


Configure Lucene to use a separate file system for temporary index items. Perform the following steps:
1.
2.
3.

Stop all xPlore instances and restart. A restart finalizes the temporary index items in the /tmp
directory.
Stop all xPlore instances again.
Add the following Java option to the primary instance start script, for example,
startPrimaryDsearch.sh in jboss5.1.0/server. Substitute your temp directory for MYTEMP_DIR:
-Djava.io.tmpdir=MYTEMP_DIR

4.

Restart all xPlore instances.

Adding temp space for CPS


Increase the space or change the location in the CPS configuration file
PrimaryDsearch_local_configuration.xml, which is located in the CPS instance
directory xplore_home/dsearch/cps/cps_daemon. After you change the temp space location, reindex
or resubmit the documents that failed indexing.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

115

Document Processing (CPS)

CPS file processing errors


Unsupported format or file encoding
If the CPS analyzer cannot identify the file type, it displays the following error. The XML element
that contains the error is displayed:
*** Error: no filter available for this file type in element_name.

Check the file using the casample utility to see if it is recognized. See CPS troubleshooting methods,
page 112. If the file is XML, check to see that it is well-formed. For a list of supported formats, see
Oracle Outside In 8.3.7 documentation.
If the document uses an unsupported encoding, a 1027 error code is displayed. For supported
encodings, see Basistech documentation.
For the error message Unknown language provided, check to see whether you have configured an
invalid locale. See Configuring languages and encoding, page 100.

Empty or very small file


If the file is empty, the following error is displayed. The XML element that contains the error is
displayed:
*** Error: file is empty in element_name.

If the error message is Not enough data to process, the file has very little text content and the
language was not detected.

File could not be opened


Check the CPS instance configuration for export file path.

File corrupted
If there are processing errors for the file, they will be displayed after the processing statistics. A corrupt
file returns the following error. The XML element that contains the error is displayed:
*** Error: file is corrupt in element_name.

A file with bad content can also return the error message Served data invalid.
Check the file using the casample utility to see if it is corrupted. See CPS troubleshooting methods,
page 112. Check the list of supported formats for the format utility

Some documents are not indexed


Is the collection set to read-only? Documents submitted for updating will not be indexed.
Is the format supported by CPS? Unsupported format is the most common error. Check the list of
supported formats in Oracle Outside In 8.3.7 documentation. Check an individual file by submitting it
116

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Document Processing (CPS)

to a test. Run casample.exe or casample.sh in xplore_home/dsearch/cps/cps_daemon/bin. If the files


pass the casample test, try to resubmit them using the index agent UI.
Is indexing enabled for the object type? Documents are not indexed if the document type is not
registered or is not a subtype of a registered type. Check whether indexing is enabled (the type
is a subtype of a registered type). You can check whether indexing is enabled in Documentum
Administrator by viewing the type properties. You can get a listing of all registered types using
the following iAPI command:
?,c,select distinct t.name, t.r_object_id from dm_type t, dmi_registry i
where t.r_object_id = i.registered_id

You see results like the following:


name r_object_id
--------------------------- ---------------dm_group
0305401580000104
dm_acl 0305401580000101
dm_sysobject 0305401580000105

You can register or unregister a type through Documentum Administrator. The type must be
dm_sysobject or a subtype of it. If a supertype is registered for indexing, the system displays the
Enable Indexing checkbox selected but disabled. You cannot clear the checkbox.
Is the format indexable? Check the class attribute of the document format. See Documentum attributes
that control indexing, page 60 for more information.
Is the document too large? See Maximum document and text size, page 98.

Troubleshooting slow ingestion and timeouts


Slow ingestion is most often seen during migration. If migration is spread over days, for example,
tens of millions of documents ingested over two weeks, slow ingestion is usually not an issue. Most
ingestion issues can be resolved with planning, pre-production sizing, and benchmarking. See the
xPlore sizing tool on the EMC Developer Network.

Request processing timeout


Check the timeout threshold in CPS. Try increasing the request_time_out parameter and resubmitting
the document using the index agent UI.
Check the document size and make sure that the file size and text content do not exceed the limits in
index agent and CPS configuration. See Large documents below.
For the error messages File path or URI is not valid or No permission to read the
file, check the export file path in xPlore administrator CPS instance configuration. If CPS cannot
access it, it cannot retrieve the document content.

Insufficient CPU
Content extraction and text analysis are CPU-intensive. CPU is consumed for each document creation,
update, or change in metadata. Check CPU consumption during ingestion.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

117

Document Processing (CPS)

Suggested workarounds: For migration, add temporary CPU capacity. For day-forward (ongoing)
ingestion, add permanent CPU or new CPS instances. CPS instances are used in a round-robin order.

Insufficient memory
When xPlore indexes large documents, it loads much content which consumes too much memory. In
this case, you get an out-of-memory error in dsearch.log such as:
Internal server error. [Java heap space] java.lang.OutOfMemoryError

Suggested workarounds:
Add more memory to xPlore.
Limit the document and text size as described in Maximum document and text size, page 98.
Enable the throttle mechanism as described in Throttling indexing, page 326.

Large documents
Large documents can tie up a slow network. These documents also contain more text to process. Use
xPlore administrator reports to see the average size of documents and indexing latency and throughput.
The average processing latency is the average number of seconds between the time the request is
created in the indexing client and the time xPlore receives the same request. The State of repository
report in Content Server also reports document size. For example, the Documents ingested per hour
reports shows number of documents and text bytes ingested. Divide bytes ingested by document count
to get average number of bytes per document processed.
Several configuration properties affect the size of documents that are indexed and consequently the
ingestion performance.Maximum document and text size, page 98 describes these settings.

Disk I/O issues


You can detect disk I/O issues by looking at CPU utilization. Low CPU utilization and high I/O
response time for ingestion or query indicate an I/O problem. Test the network by transferring large
files or using Linux dd (disk dump). You can also measure I/O performance on Linux using the
bonnie benchmark.
If a query is very slow the first time and much faster the second time, you could have an I/O problem.
If it is slow the second time, this performance is probably not due to insufficient I/O capacity.
Suggested workarounds:
NAS: Verify that the network has not been set as half duplex. Increase network bandwidth and/or
improved network I/O controllers on the xPlore host.
SAN (check in the following order)
1.

Verify that the SAN has sufficient memory to handle the I/O rate.

2.

Increase the number of drives available for the xPlore instance.

3.

If the SAN is multiplexing a set of drives over multiple application, move the "disk space"
to a less contentious set of drives.

4. If other measures have not resolve the problem, change underlying drives to solid state.
118

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Document Processing (CPS)

5.

Use striped mapping instead of concatenated mapping so that all drives can be used to service
I/O.

Slow network
A slow network between the Documentum Content Server and xPlore results in low CPU consumption
on the xPlore host. Consumption is low even when the disk subsystem has a high capacity. File
transfers via FTP or network share are also slow, independent of xPlore operations.
Suggested workarounds: Verify that network is not set to half duplex. Check for faulty hubs or
switches. Increase network capacity.

Large number of Excel documents


Microsoft Excel documents require the most processing of all text formats, due to the complexity of
extracting text from the spreadsheet structure. You can detect the number of Excel documents using
the State of repository report in Content Server.
Suggested workaround: Add temporary CPU for migration or permanent CPU for ongoing load.

Virus checking software


Virus checking software can lead to high disk I/O because it continually checks the changes in xPlore
file structures during indexing.
Workarounds: Exclude temp and xPlore working and data directories, or switch to Linux platform.

Interference by another guest OS


In a VM environment, the physical host can have several guest operating systems. This contention
could cause intermittent slowness in indexing unrelated to format, document size, I/O capacity, or
CPU capacity.
Workaround: Consult with your infrastructure team to load balance the VMs appropriately.

Slow content storage area


Ingestion is dependent on the speed of the content source. Content storage issues are especially
noticeable during migration. For example, you find that migration or ingestion takes much longer in
production than in development. Development is on a small volume of content on NAS but production
content is on a higher-latency device like Centera. You can determine the location of the original
content by using the State of the repository report in Content Server.
Workaround: Extend the migration time window.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

119

Document Processing (CPS)

Concurrent large file ingestion


When CPS processes two or more large files at the same time, the CPS log file reports one of the
following errors (cps_daemon.log):
2012-03-28 10:28:41,828
ERROR [Daemon-Core-(3400)] Exception happened, ACCESS_VIOLATION,
Attempt to read data at address 1 at (connection-handler-2)
...
FATAL [DAEMON-LP_RLP-(3440)] Not enough memory to process linguistic requests.
Error message: bad allocation

Use xPlore administrator to select the instance, and then choose Configuration. Change the following
to smaller values:
Max text threshold
Thread pool size
You can add a separate CPS instance that is dedicated to processing. This processor does not interfere
with query processing. You can also throttle ingestion to limit document content size or count. See
Throttling indexing, page 326.

Adding dictionaries to CPS


You can create user dictionaries for words specific to an industry or application, including personal
names and foreign words. In your custom dictionary, you specify the part of speech for ambiguous
nouns or verbs, which assists CPS in determining the context for a sentence. You can also prevent
words from being decompounded, making queries more precise. The following procedure creates a
Chinese user dictionary. Use these same steps for other supported languages (Japanese and Korean)
1. Create a UTF-8 encoded file. Each entry in the file is on a single line with the following syntax
(TAB-delimited):
word part_of_speech decomposition_pattern

part_of_speech: NOUN, PROPER_NOUN, PLACE, PERSON, ORGANIZATION,


GIVEN_NAME, or FOREIGN_PERSON.
decomposition_pattern: A comma-delimited list of numbers that specify the number of characters
from word to include in each part of the compound. A value of 0 indicates no decomposition.
For example, the following entry indicates that the word is decomposed into three two-character
sequences. The sum of the digits in the pattern must match the number of characters in the entry.

The following example is decomposed into two four-character sequences:

2. Compile the dictionary.


On both Linux and Windows, there is a compilation script for each supported language. Use the
one that corresponds to the language of your dictionary:
120

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Document Processing (CPS)

Chinese:
xplore_home\dsearch\cps\cps_daemon\shared_libraries\rlp\cma\source\samples\build_user_dict.sh
Japanese:
xplore_home\dsearch\cps\cps_daemon\shared_libraries\rlp\jma\source\samples\build_user_dict.sh
Korean:
xplore_home\dsearch\cps\cps_daemon\shared_libraries\rlp\kma\source\samples\build_user_dict.sh
On Linux:
1. Export the following variables:
export
BT_ROOT=xplore_home/dsearch/cps/cps_daemon/shared_libraries
export BT_BUILD=variableDir
Where variableDir is a subdirectory under
xplore_home/dsearch/cps/cps_daemon/shared_libraries/rlp/bin/ and its name differs from
computer to computer. For example:
export BT_BUILD=amd64-glibc25-gcc42
2. Use the chmod command to change the file permissions on the compilation script and some
other files:
chmod a+x build_user_dict.sh
chmod a+x build_cla_user_dictionary
chmod a+x cla_user_dictionary_util
chmod a+x t5build
chmod a+x t5sort
3. Run the compilation script build_user_dict.sh. Use the following as an example:
./build_user_dict.sh mydict.txt mydict.bin

On Windows:
1. Download and install Cygwin from http://www.cygwin.com/.
2. Launch the Cygwin terminal.
3. Export the following variables:
export
BT_ROOT=xplore_home/dsearch/cps/cps_daemon/shared_libraries
export BT_BUILD=variableDir
Where variableDir is a subdirectory under
xplore_home/dsearch/cps/cps_daemon/shared_libraries/rlp/bin/ and its name differs from
computer to computer. For example:
export BT_BUILD=amd64-glibc25-gcc42
4. Edit the build_user_dict.sh file to make sure that carriage returns are denoted by \n instead
of \n\r in the file.
5. In the Cygwin terminal, run the compilation script build_user_dict.sh specific to the
dictionary language; for example:
./build_user_dict.sh mydict.txt mydict.bin

3. Put the compiled dictionary into the directory specific to the dictionary language:
Chinese: xplore_home/cps/cps_daemon/shared_libraries/rlp/cma/dicts
EMC Documentum xPlore Version 1.3 Administration and Development Guide

121

Document Processing (CPS)

Japanese: xplore_home/cps/cps_daemon/shared_libraries/rlp/jma/dicts
Korean: xplore_home/cps/cps_daemon/shared_libraries/rlp/kma/dicts
4. Edit the CLA configuration file to include the user dictionary. You add a dictionarypath element to
cla-options.xml in xplore_home/cps/cps_daemon/shared_libraries/rlp/etc.
The following example adds a Chinese user dictionary named mydict.bin:
<claconfig>
...
...
<dictionarypath><env name="root"/>/cma/dicts/mydict.bin
</dictionarypath>
</claconfig>

5. To prevent a word that is also listed in a system dictionary from being decomposed,
modify cps_context.xml in xplore_home/cps/cps_daemon. Add the property
com.basistech.cla.favor_user_dictionary if it does not exist, and set it to true.
For example:
<contextconfig><properties>
<property name="com.basistech.cla.favor_user_dictionary"
value="true"/>...

6. Restart CPS for the changes to take effect.

Custom content processing


About custom content processing
Text extraction
Annotation
Custom content processing errors

About custom content processing


CPS uses embedded content processors for text extraction, language identification, and linguistic
analysis. These processes are thoroughly tested and supported, and they are adequate for most content
processing needs. You can add custom content processors to address the following use cases:
Custom text extraction that processes certain mime-types.
Normalization: Normalizing varied inputs of data like phone numbers or customer IDs. For
example, customer IDs could be stored as 123abc-456-789 but only the 456-789 digits are
significant. Normalization would extract this from the text so that users would find the document
when they search for 456789 or 456-789.
Entity extraction and XML annotation: For example, extracting the locations of people or places
from the full-text content.
You can customize the CPS document processing pipeline at the following stages.
1. Text extractor for certain content formats, both Java and C/C++ based. Supports custom file
decryption before extraction. Specified in CPS PrimaryDsearch_local_configuration.xml.
122

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Document Processing (CPS)

Custom text extractors are usually third-party modules that you configure as text extractors for
certain formats (mime types). You must create adaptor code based on proprietary, public xPlore
adaptor interfaces.
2. Annotators: Classify elements in the text, annotate metadata, and perform name indexing for
faster retrieval.
Custom annotators require software development of modules in the Apache UIMA framework.
Figure 11

Custom content processing

Support for plugins


EMC supports the quality of service of the xPlore default text extraction, language identification, and
linguistic analysis. A few custom plugins have been tested to validate the plugin environment, but
these plugins themselves are not supported. The plugin runtime is sandboxed, and plugin issues are
reported separately. EMC is not responsible for the performance, memory consumption, and stability
of your plugins and annotators.
Sample Java and C++ text extraction plugins and readmes are provided in
xplore_home/dsearch/cps/cps_daemon/sdk and cps/cps_manager.
For best troubleshooting of your plugins, configure your custom component on a separate CPS
instance. Documents that meet criteria in your plugin or UIMA module are routed to this CPS instance.
Make sure that you have deployed on the host any libraries, such as MSVC 2008 distribution, that your
plug-ins are dependent on.

Customization steps
1. Write plugin (Java or C/C++) or UIMA annotator.
2. Place DLLs or jar files in CPS classpath.
3. Repeat for each CPS instance.
4. Test content processing.
5. Perform a backup of your customization DLLs or jars when you back up the xPlore federation.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

123

Document Processing (CPS)

Text extraction
The text extraction phase of CPS can be customized at the following points:
Pre-processing plugin
Plugins for text extraction based on mime type, for example, the xPlore default extractor Oracle
Outside In (formerly Stellent) or Apache Tika.
Post-processing plugin
For best reliability, deploy a custom text extractor on a separate CPS instance. For instructions on
configuring a remote CPS instance for text extraction, see Adding a remote CPS instance, page 94.
The following diagram shows three different mime types processed by different plugins
Figure 12

Custom text extraction based on mime type

Configuring a text extraction plugin


Add your plugins to PrimaryDsearch_local_configuration.xml in
xplore_home/dsearch/cps/cps_daemon. Each extractor requires the following information:
Table 14

Child elements of text_extraction

Element

Description

text_extractor_preprocessor

Preprocessing specification, contains name, type,


lib_path, formats, properties

text_extractor

Contains name, type, lib_path, formats, properties

text_extractor_postprocessor

Postprocessing specification, contains name, type,


lib_path, formats, properties

124

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Document Processing (CPS)

Element

Description

name

Used for logging

type

Valid values: Java or native (C/C++)

lib_path

Fully qualified class name in CPS host classpath.

formats

Contains one or more format element. Each format


element corresponds to a mime type.

properties

Contains user-defined property elements.

property

The value of each named property element is read by


your plugin.

Creating a text extractor adaptor class


xPlore provides public, proprietary interfaces that you can implement for your plugin. Implement the
abstract class CPSTextExtractor in the package com.emc.cma.cps.processor.textextractor.

Sample Tika text extractor


The Apache Tika toolkit extracts metadata and structured text from documents. Sample jar files and
Tika configuration file are installed in cps_home/dsearch/cps/add-ons/tika. Other samples are in
cps_home/dsearch/cps/cps_daemon/sdk.
1. Dowload and place Tika jar files in xplore_home/dsearch/cps/add-ons/tika.
2. Back up PrimaryDsearch_local_configuration.xml in xplore_home/dsearch/cps/cps_daemon.
3. Copy configuration_tika.xml, located in xplore_home/dsearch/cps/add-ons/tika, to
xplore_home/dsearch/cps/cps_daemon.
4. Rename the copied file to PrimaryDsearch_local_configuration.xml.
The following example contains a preprocessor and an extractor plugin. The properties are passed
to the plugin for processing. In this example, return_attribute_name is a property required by the
Tika text extractor.
<cps_pipeline>
...
<text_extraction>
<!--preprocessor chain invoked in the configured sequence-->
<text_extractor_preprocessor>
<name>password_cracker</name>
<type>java</type>
<lib_path>
com.emc.cma.cps.processor.textextractor.CPSPasswordCracker
</lib_path>
<properties>
<property name="return_attribute_name">false</property>
</properties>
<formats>
<format>application/pdf</format>

EMC Documentum xPlore Version 1.3 Administration and Development Guide

125

Document Processing (CPS)

</formats>
</text_extractor_preprocessor>
<text_extraction>
<text_extractor>
<name>tika</name>
<type>java</type>
<lib_path>
com.emc.cma.cps.processor.textextractor.CPSTikaTextExtractor
</lib_path>
<properties>
<property name="return_attribute_name">false</property>
</properties>
<formats>
<format>application/pdf</format>
</formats>
</text_extractor> ...
</text_extraction>
...
</cps_pipeline>

Troubleshooting custom text extraction


Set up custom text extraction (TE) on a remote CPS instance. Configure the instance to do text
extraction only.
Custom text extraction plugins raise the following errors:
Missing or misplaced custom library. The CPS daemon does not start, and cps_daemon.log reports
an initialization error.
A file with the incorrect mime type is routed to the TE plugin. Change the default log level from
WARN to DEBUG and ingest the same document. If two text extractors are available for a given
format, the first one in the configuration receives the request.
If the text extractor hangs during daemon processing, the CPS daemon log records a timeout error
and the daemon restarts. If the text extractor hangs on the manager side, processing is blocked
and requires a CPS restart.
If the text extractor crashes during daemon processing, the CPS daemon log restarts. If the text
extractor crashes on the manager side, processing is blocked and requires a CPS restart.

Annotation
Documents from a Content Server are submitted to CPS as dftxml files. The CPS annotation
framework analyzes the dftxml content. Text can be extracted to customer-defined categories, and
the metadata can be annotated with information.
Annotation employs the Apache UIMA framework. A UIMA annotator module extracts data from the
content, optionally adds information, and puts it into a consumable XML output.
A UIMA annotator module has the following content:
126

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Document Processing (CPS)

An XML annotation descriptor file.


A common analysis structure (CAS) that holds the input and output for each stage in the sequence.
The type system descriptor file, also known as a consumer descriptor, defines the type structure for
the CAS. The CAS tool generates supporting code from the type descriptor file.
Annotation implementation code that extends JCasAnnotator_ImplBase in the UIMA framework.
An analysis engine consisting of one or more annotators in a sequence.
Sample annotation module files are provided in the xPlore SDK:
Sample annotator module (dsearch-uima.jar) in /samples/dist.
Annotator source files in /samples/src/com/emc/documentum/core/fulltext/indexserver/uima/ae.
Sample indexserverconfig.xml for the example in /samples/config
Sample type descriptor and annotation definition files in
/samples/src/com/emc/documentum/core/fulltext/indexserver/uima/descriptors/

Steps in creating a UIMA annotator for xPlore


Following is a high-level description of the steps. See the example, page 130 for implementation
details.
1. Download the UIMA Java framework and SDK binaries from Apache and add it to your project
build path.
2. Create a CAS results type definition file that specifies a feature element for each data type that you
are annotating.
3. Use the Apache UIMA tool jcasgen.bat (Windows) or jcasgen.sh (Linux) to generate type
implementation classes from the type definition file.
4. Create the annotator descriptor XML file that describes the annotator detail. The descriptor
describes the name of the annotator, the output type, and the runtime configuration parameters.
5. Create annotator code that extends JCasAnnotator_ImplBase in the UIMA framework.
6. Compile the Java classes and package with the XML descriptor files in a jar file. Copy the jar file to
xplore_home/jboss5.1.0/server/DctmServer_PrimaryDsearch/deploy/dsearch.war/WEB-INF/lib.
7. Deploy to xPlore:
a.

Stop all xPlore instances.

b.

Specify UIMA modules and configure their usage in indexserverconfig.xml.

c.

Change indexserverconfig.xml pipeline element to specify the descriptor, process-class, and


any properties for the annotator. Add pipeline-usage for the associated category of documents.

d.

Increase the value of the revision attribute on the index-server-configuration element to


ensure that xDB loads it.

e.

Restart the xPlore instances.

Note: To deploy the UIMA module as a remote service, you can use the Vinci service that is included
in the UIMA SDK.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

127

Document Processing (CPS)

Specifying UIMA modules in indexserverconfig.xml


Stop all xPlore instances and add your UIMA references to indexserverconfig.xml, located in
xplore_home/config. Add a pipeline-config element as a child of index-server-configuration between
content-processing-services and category-definitions. The pipeline-config element has the following
syntax:
<pipeline-config>
<pipeline descriptor=UIMA-pipeline-descriptor.xml
process-class= xPlore_UIMAProcessFactory_class
name= name-of-pipeline />
</pipeline-config>
Table 15

Pipeline configuration attributes and elements

Element

Description

name

Unique identifier of the UIMA module

descriptor

UIMA annotator descriptor file path. The path is


resolved using the classpath.

process-class

The xPlore factory class. This class must


implement IFtProcessFactory. The default
class, com.emc.documentum.core.fulltext.indexserver.uima.UIMAProcessFactory, provides for
most processing needs.

properties

Contains property elements that define runtime


parameters.

The following example configures a UIMA module. You specify a descriptor XML file, a processing
class, and an optional name. The descriptor file, and process class are hypothetical. The path to the
descriptor file is from the base of the UIMA module jar file.
<pipeline-config>
<pipeline descriptor="descriptors/PhoneNumberAnnotator.xml" process-class="
com.emc.documentum.core.fulltext.indexserver.uima.UIMAProcessFactory" name="
phonenumber_pipeline"/>
</pipeline-config>

The xPlore UIMAProcessFactory class is provided with xPlore. It executes the pipeline based on the
definitions provided.

Configuring UIMA module usage in indexserverconfig.xml


Configure the usage of the UIMA module within a category element in indexserverconfig.xml. Add
one pipeline-usage element after the indexes element. Most applications annotate the dftxml category.
Properties control the level of concurrency and the data elements from the dftxml document that are
sent to the UIMA annotator.
Table 16

Pipeline usage attributes and elements

Element

Description

name

Attribute of pipeline-usage that references a pipeline


module defined in pipeline-config.

128

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Document Processing (CPS)

Element

Description

root-result-element

Attribute of pipeline-usage that specifies an element


in the dftxml document that will be the root of the
annotated results.

mapper-class

Optional attribute of pipeline-usage that maps the


annotation result to an XML sub-tree, which can then
be added to dftxml.

input-element

One or more optional child elements of pipeline-usage


that specifies the dftxml elements that are passed to
UIMA analysis engine. Use this to annotate based
on a Document attribute. The name attribute must be
unique within the same pipeline. The element-path
attribute has an xpath value that is used to locate the
XML element in the dftxml document.

type-mapping

One or more elements that specify how to store the


analytic processing results in the original dftxml
document. The name attribute is a unique identifier
for logging. The element-name attribute specifies the
XML element that holds the result value from the
annotator. This element becomes a new child element
within dftxml. The type-name attribute specifies the
fully qualified class name for the class that renders the
result. This value should match the typeDescription
name in the descriptor file.

feature-mapping

One or more optional child elements of type-mapping.


The required feature attribute corresponds to a data
member of the type. The required element-name
attribute corresponds to the XML element that the
feature is mapped to.

properties

Child element of pipeline-usage. Specifies


concurrency and the elements that are passed to the
annotator.

property name= thread-count

Sets concurrency level in the underlying pipeline


engine. By default, it has the same value (100) as
CPS-threadpool-max-size in xPlore administrator
indexing configuration.

property name= send-content-text

Controls whether to pass the content text to the


annotator. Default: true. If the annotator does not
require content text, set to false.

property name= send-content-tokens

Controls whether to pass the content tokens to the


annotator. Default: false. If the annotator operates on
tokens, set to true.

In the following example of the Apache UIMA room number example, the annotated content is placed
under dmftcustom in the dftxml file (root-result-element). The content of the element r_object_type
(input-element element-path) and object_name are passed to the UIMA analyzer. (If you are annotating
content and not metadata, you do not need input-element.) For the object_name value, a room element
is generated by the RoomNumber class. Next, the building and room-number elements are generated
by a lookup of those features (data members) in the RoomNumber class.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

129

Document Processing (CPS)

<pipeline-usage root-result-element="dmftcustom" name="test_pipeline">


<input-element element-path="/dmftdoc/dmftmetadata//r_object_type" name="
object_type"/>
<input-element element-path="/dmftdoc/dmftmetadata//object_name" name="
object_name"/>
<type-mapping element-name="room" type-name="com.emc.documentum.core.
fulltext.indexserver.uima.ae.RoomNumber">
<feature-mapping element-name="building" feature="building"/>
<feature-mapping element-name="room-number" feature="roomnumber"/>
</type-mapping>
<type-mapping type-name="com.emc.documentum.core.fulltext.indexserver.
uima.ae.DateTimeAnnot" element-name="datetime"/>
</pipeline-usage>

See the Apache UIMA documentation for more information on creating the annotator class and
descriptor files. See a simple annotator example, page 130 for xPlore.

UIMA in the log


If there are errors in configuration or UIMA module, you see no UIMA logging.
You can configure logback.xml with the package names of the annotator to get logs on the UIMA
module. Logback.xml configuration is described in Configuring logging, page 297.
When an annotator is initialized, you see the category in the log like the following:
Create Annotator Phone Number Annotator from descriptor
descriptors/PhoneNumberAnnotator.xml for category dftxml

When an annotator is applied (tracing for dsearchindex is INFO), you see the domain name and
category in the log like the following:
Domain test category dftxml,
Apply Annotator Phone Number Annotator to document
testphonenumbers5_txt1318881952107

UIMA example
The following example is from the UIMA software development kit. It is used to create a UIMA
module that normalizes phone numbers for fast identification of results in xPlore.

Prepare the environment


Download the UIMA Java framework and SDK binaries from Apache and add it to your project
build path.

Define a CAS feature structure type


Define a CAS feature structure type in an XML file called a type system descriptor. This file
specifies a feature element for each data type that you are annotating. The following example
130

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Document Processing (CPS)

PhoneAnnotationTypeDef.xml creates the following string-type features. The class that handles the
types is specified as the value of types/typeDescription/name: FtPhoneNumber.:
phoneNumber: Phone number as it appears in a document.
normalizedForm: Normalized phone number
Note that the type descriptor must import the xPlore type definition descriptors, which handle file
access.
<?xml version="1.0" encoding="UTF-8" ?>
<typeSystemDescription xmlns="http://uima.apache.org/resourceSpecifier">
<name>TutorialTypeSystem</name>
<description>Phone number Type System Definition</description>
<vendor>The Apache Software Foundation</vendor>
<version>1.0</version>
<imports>
<import location="AnnotationTypeDef.xml" />
<import location="FtDocumentAnnotationTypeDef.xml"/>
</imports>
<types>
<typeDescription>
<name>FtPhoneNumber</name>
<description></description>
<supertypeName>uima.tcas.Annotation</supertypeName>
<features>
<featureDescription>
<name>phoneNumber</name>
<description />
<rangeTypeName>uima.cas.String</rangeTypeName>
</featureDescription>
<featureDescription>
<name>normalizedForm</name>
<description />
<rangeTypeName>uima.cas.String</rangeTypeName>
</featureDescription>
</features>
</typeDescription>
</types>
</typeSystemDescription>

Generate the type implementation class


Use the Apache UIMA tool jcasgen.bat (Windows) or jcasgen.sh (Linux) to generate type
implementation classes from the type definition file. Eclipse and other programming environments
have UIMA plugins that generate the classes. Two classes are generated:
FtPhoneNumber
FtPhoneNumber_Type
You will later reference the first class in the type-mapping element of indexserverconfig.xml.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

131

Document Processing (CPS)

Create an annotator descriptor


Create the annotator descriptor XML file that describes the annotator. The descriptor describes the
name of the annotator, inputs and outputs as defined in the type system descriptor, and the runtime
configuration parameters that the annotator accepts.
In the following example PhoneNumberAnnotator.xml, the annotator name is Phone Number
Annotator. This name is used in dsearch.log to specify annotator instantiation and, when dsearchindex
logging is set to INFO, annotator application to a document.
The configurationParameters element defines a parameter named Patterns. The element
configurationParameterSettings defines the patterns as two regular expression patterns, each in a
value/array/string element.
The typeSystemDescription element references the type system descriptor
PhoneAnnotationTypeDef.xml.
The outputs are the features referenced in the same type definition. In this descriptor, the class that
generates the output is referenced along with the feature. For example, the class FtPhoneNumber
generates the output for both the phoneNumber and normalizedForm features.
You specify this descriptor file when you configure UIMA in xPlore. Provide the filename as the value
of the descriptor attribute of the pipeline element.
<analysisEngineDescription xmlns="http://uima.apache.org/resourceSpecifier"
xmlns:xi="http://www.w3.org/2001/XInclude">
<frameworkImplementation>org.apache.uima.java</frameworkImplementation>
<primitive>true</primitive>
<annotatorImplementationName>PhoneNumberAnnotator</annotatorImplementationName>
<analysisEngineMetaData>
<name>Phone Number Annotator</name>
<description>Searches for phone numbers in document content.</description>
<version>1.0</version>
<vendor>The Apache Software Foundation</vendor>
<configurationParameters>
<configurationParameter>
<name>Patterns</name>
<description>Phone number regular expression pattterns.</description>
<type>String</type>
<multiValued>true</multiValued>
<mandatory>true</mandatory>
</configurationParameter>
</configurationParameters>
<configurationParameterSettings>
<nameValuePair>
<name>Patterns</name>
<value>
<array>
<string>\b+\d{3}-\d{3}-\d{4}</string>
<string>\(\d{3}\)\s*\d{3}-\d{4}</string>
</array>
</value>
</nameValuePair>
</configurationParameterSettings>
<typeSystemDescription>

132

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Document Processing (CPS)

<imports>
<import location="PhoneAnnotationTypeDef.xml"/>
</imports>
</typeSystemDescription>
<capabilities>
<capability>
<inputs></inputs>
<outputs>
<type>FtPhoneNumber</type>
<feature>FtPhoneNumber:phoneNumber</feature>
<feature>FtPhoneNumber:normalizedForm</feature>
</outputs>
<languagesSupported></languagesSupported>
</capability>
</capabilities>
<operationalProperties>
<modifiesCas>true</modifiesCas>
<multipleDeploymentAllowed>true</multipleDeploymentAllowed>
<outputsNewCASes>false</outputsNewCASes>
</operationalProperties>
</analysisEngineMetaData>
</analysisEngineDescription>

Create annotator code to generate output


The annotator is a subclass of JCasAnnotator_ImplBase in the Apache UIMA framework. The main
method to implement is the following:
public void process(JCas aJCas) throws AnalysisEngineProcessException

Imports:
import
import
import
import
import
import
import

org.apache.uima.analysis_engine.AnalysisEngineProcessException;
org.apache.uima.jcas.JCas;
org.apache.uima.analysis_component.JCasAnnotator_ImplBase;
org.apache.uima.UimaContext;
org.apache.uima.resource.ResourceInitializationException;
java.util.regex.Matcher;
java.util.regex.Pattern;

Class:
public class PhoneNumberAnnotator extends JCasAnnotator_ImplBase {
private Pattern[] mPatterns;
public void initialize(UimaContext aContext)
throws ResourceInitializationException {
super.initialize(aContext);
// Get config. parameter values from PhoneNumberAnnotator.xml
String[] patternStrings = (String[]) aContext.getConfigParameterValue("
Patterns");
// compile regular expressions
mPatterns = new Pattern[patternStrings.length];
for (int i = 0; i < patternStrings.length; i++) {

EMC Documentum xPlore Version 1.3 Administration and Development Guide

133

Document Processing (CPS)

mPatterns[i] = Pattern.compile(patternStrings[i]);
}
}
private String normalizePhoneNumber(String input)
{
StringBuffer buffer = new StringBuffer();
for (int index = 0; index < input.length(); ++index)
{
char c = input.charAt(index);
if (c >= 0 && c <= 9)
buffer.append(c);
}
return buffer.toString();
}
public void process(JCas aJCas) throws AnalysisEngineProcessException {
// get document text
String docText = aJCas.getDocumentText();
// loop over patterns
for (int i = 0; i < mPatterns.length; i++) {
Matcher matcher = mPatterns[i].matcher(docText);
while (matcher.find()) {
// found one - create annotation
FtPhoneNumber annotation = new FtPhoneNumber(aJCas);
annotation.setBegin(matcher.start());
annotation.setEnd(matcher.end());
annotation.addToIndexes();
String text = annotation.getCoveredText();
annotation.setPhoneNumber(text);
annotation.setNormalizedForm(normalizePhoneNumber(text));
}
}
}
}

Configure UIMA modules in indexserverconfig.xml


Stop all xPlore instances and add your UIMA references to indexserverconfig.xml, located in
xplore_home/config. Add a pipeline-config element as a child of index-server-configuration between
content-processing-services and category-definitions. The xPlore UIMAProcessFactory class is
provided with xPlore. It executes the pipeline based on the definitions provided.
The following example specifies the annotator descriptor file:
</content-processing-services>
<pipeline-config>
<pipeline descriptor="descriptors/PhoneNumberAnnotator.xml" process-class="
com.emc.documentum.core.fulltext.indexserver.uima.UIMAProcessFactory" name="
phonenumber_pipeline"/>
</pipeline-config>

134

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Document Processing (CPS)

Configure the usage of the UIMA module within a category element in indexserverconfig.xml.
Add one pipeline-usage element after the indexes element. Most applications annotate the dftxml
category, so you place pipeline-usage in category name="dftxml". For input-element, specify a
name for logging and a path to the input element in dftxml. (For a sample dftxml, see Extensible
Documentum DTD, page 348. For type-mapping, specify an element name for logging (usually the
same as type-name). For type-name, specify the type definition class. For feature-mapping, specify
the output XML element for element-name and the feature name that is registered in the annotator
descriptor PhoneNumberAnnotator.xml. For example:
</indexes>
<pipeline-usage root-result-element="ae-result" name="phonenumber_pipeline">
<input-element name="content" element-path="/dmftdoc/dmftcontents"/>
<type-mapping element-name="FtPhoneNumber" type-name="FtPhoneNumber">
<feature-mapping element-name="phoneNumber" feature="phoneNumber"/>
<feature-mapping element-name="phoneNormlizd" feature="normalizedForm"/>
</type-mapping>
</pipeline-usage>

Custom content processing errors


The report Document processing error summary retrieves errors in custom content processing. The
error type is Unknown error, Corrupt file, or Password-protected or encrypted file. The name of the
custom content processing pipeline is retrieved from CPS PrimaryDsearch_local_configuration.xml as
the value of text_extraction/name. Error Text examples:
Password-protected or encrypted file (native error:TIKA-198...)
Unknown error during text extraction (native error:TIKA-198...)

You can identify the origin of the CPS processing error in the files cps.log and dsearch.log:
CPS log examples: Failed to extract text from password-protected files:2011-08-02
23:27:23,288 ERROR
[MANAGER-CPSLinguisticProcessingRequest-(CPSWorkerThread-1)]
Failed to extract text for req 0 of doc VPNwithPassword_zip1312352841145,
err-code 770, err-msg: Corrupt file (native error:TIKA-198:
Illegal IOException from org.apache.tika.parser.pkg.PackageParser@161022a6)
2011-08-02 23:36:27,188 ERROR
[MANAGER-CPSLinguisticProcessingRequest-(CPSWorkerThread-2)]
Failed to extract text for req 0 of doc tf_protected_doc1312353385777,
err-code 770, err-msg: Corrupt file (native error:
Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@3b11d63f)

dsearch log examples: "Corrupt file":<event timestamp="2011-08-02


23:27:24,994" level="WARN" thread="
IndexWorkerThread-1" logger="
com.emc.documentum.core.fulltext.index" timeInMilliSecs="1312352844994">
<message >
<![CDATA[Document id: VPNwithPassword_zip1312352841145, message:
CPS Warning [Corrupt file (native error:TIKA-198:

EMC Documentum xPlore Version 1.3 Administration and Development Guide

135

Document Processing (CPS)

Illegal IOException from


org.apache.tika.parser.pkg.PackageParser@161022a6)].]]></message>
</event>
<event timestamp="2011-08-02 23:36:28,306" level="WARN" thread="
IndexWorkerThread-2" logger="
com.emc.documentum.core.fulltext.index" timeInMilliSecs="1312353388306">
<message >
<![CDATA[Document id: tf_protected_doc1312353385777, message:
CPS Warning [Corrupt file (native error:
Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@3b11d63f)].]]></message>
</event>

136

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Chapter 6
Indexing
This chapter contains the following topics:

About indexing

Configuring text extraction

Configuring an index

Creating custom indexes

Managing indexing in xPlore administrator

Troubleshooting indexing

Running the standalone consistency checker

Indexing APIs

About indexing
The indexing service receives batches of requests to index from a custom indexing client like
the Documentum index agent. The index requests are passed to the content processing service,
which extracts tokens for indexing and returns them to the indexing service. You can configure all
indexing parameters by choosing Global Configuration from the System Overview panel in xPlore
administrator. You can configure the same indexing parameters on a per-instance basis by choosing
Indexing Service on an instance and then choosing Configuration.
For information on creating Documentum indexes, see Creating custom indexes, page 145.
Modify indexes by editing indexserverconfig.xml. For information on viewing and updating this file,
see Modifying indexserverconfig.xml, page 43. By default, Documentum content and metadata are
indexed. You can tune the indexing configuration for specific needs. A full-text index can be created as
a path-value index with the FULL_TEXT option.
For information on scalability planning, see EMC Documentum xPlore Installation Guide.

Configuring text extraction


CPS performs text extraction by getting text and metadata from binary or XML documents. CPS then
transforms the results into UTF-16 half-width encoding.
Configure indexing in xplore_home/config/indexserverconfig.xml. You can configure compression
and how XML content is handled. Excluding content from extraction shrinks the index footprint and
speeds up ingestion.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

137

Indexing

Indexing depth: Only the leaf (last node) text values from subelements of an XML node with implicit
composite indexes are returned. You can configure indexing to return all node values instead of the leaf
node value. (This change negatively affects performance.) Set the value of index-value-leaf-node-only
in the index-plugin element to false. Reindex your documents to see the other nodes in the index.
The paths in the configuration file are in XPath syntax and see the path within the dftxml representation
of the object. (For information on dftxml, see Extensible Documentum DTD, page 348.) Specify an
XPath value to the element whose content requires text extraction for indexing.
Table 17

Extraction configuration options

Option

Description

do-text-extraction

Contains one or more for-element-with-name


elements.

for-element-with-name

Specifies elements that define content or metadata that


should be extracted for indexing.

for-element-with-name/
xml-content

When a document to be indexed contains embedded


XML content, you must specify how that content
should be handled. It can be tokenized or not
(tokenize=true | false"). It can be stored within the
input document or separately (store="embed | separate
| none"). Separate storage is not supported for this
release.
If your repository has many XML documents and a
summary is requested with search results, these files
may not display the summary properly. In this case,
set xml-content embed=none.

for-element-with-name/
save-tokens-for-summary-processing

Sets tokenization of content in specific elements for


summaries, for example, dmftcontentref (content of
a Documentum document). Specify the maximum
size of documents in bytes as the value of the attribute
extract-text-size-less-than. Set the maximum size
of index tokens for the element as the value of the
attribute token-size.

xml-content on-embed-error

You can specify how to handle parsing errors when


the on-embed-error attribute is set to true. Handles
errors such as syntax or external entity access. Valid
values: embed_as_cdata | ignore | fail. The option
embed_as_cdata stores the entire XML content as a
CData sub-node of the specified node. The ignore
option does not store the XML content. For the fail
option, content is not searchable.

xml-content index-as-sub-path

Boolean parameter that specifies whether the path is


stored with XML content when xml-content embed
attribute is set to true.

xml-content file-limit

Sets the maximum size of embedded XML content.

138

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Indexing

Option

Description

compress

Compresses the text value of specified elements to


save storage space. Compressed content is about 30%
of submitted XML content. Compression may slow
the ingestion rate by 10-20%.

compress/for-element

Using XPath notation, specifies the XML node of


the input document that contains text values to be
compressed.

Configuring an index
Indexes are configured within an indexes element in the file indexserverconfig.xml. For information on
modifying this file, see Modifying indexserverconfig.xml, page 43. The path to the indexes element
is category-definitions/category/indexes. Four types of indexes can be configured: fulltext-index,
value-index, path-value index, and multi-path.
By default, multi-path indexes do not have all content indexed. If an element does not match a
configuration option, it is not indexed. To index all element content in a multi-path index, add a
subpath element on //*. For example, to index all metadata content, use the path dmftmetadata//*.
The following child elements of node/indexes/index define an index.
Table 18

Index definition options

Index option

Description

path-value-index

The options attribute of this element specifies


a comma-delimited string of xDB options:
GET_ALL_TEXT (indexed by its string value
including descendant nodes)| SUPPORT_PHRASES
(optimizes for phrase search and increases index size)
| NO_LOGGING (turns off xDB transaction logging)
| INCLUDE_START_END_TOKEN_FLAGS (stores
position information) | CONCURRENT (index is not
locked)
The path attribute specifies the path to an attribute
that should be indexed. The path attribute contains an
XPath notation to a path within the input document
and options for the IndexServerAnalyzer. The
symbols < and > must be escaped.

path-value-index/sub-path

See Subpaths, page 141. Specifies the path


to an element for which the path information
should be saved with the indexed value. Applies
only to path-value-indexes that contain the
IndexServerAnalyzer option INDEX_PATHS.
Increases index size while enhancing performance.
subpath indexes must be configured to support
Documentum facets. For information on facets, see
About Facets, page 273.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

139

Indexing

Index option

Description

sub-path attributes

boost-value: Increases the score for hits on the


subpath metadata by a factor. Default: 1 (no boost).

(continued)

compress: Boolean, specifies whether the content


should be compressed.

(continued)

enumerate-repeating-elements: Boolean,
specifies whether the position of the element
in the path should be indexed. Used for
correlated repeating attributes, for example,
media objects with prop_name=dimension and
prop_value=800x600,blue.

(continued)

full-text-search: Specifies whether the subpath


content should be tokenized. Set to true if the tokens
will be queried. If false, you do not need to duplicate
this information in the no-tokenization element. This
exclusion reduces the binary index size. Also, when
excluded elements in a document are modified, the
full-text index does not need to be updated.

(continued)

include-descendants: Boolean. Default: false. If true,


the token for this instance will have a copy of all
descendant tokens, speeding up queries. Cost of this
option: Lowers the indexing rate and increases disk
space. Use for nodes with many small descendant
nodes, such as Documentum dmftmetadata.

(continued)

leading-wildcard: Boolean, specifies whether the


subpath supports leading wildcard searches, for
example, owner_name ends with *utility. Default:
false.

(continued)

path: Path to element relative to the index path. If the


path value contains a namespace, specify the xpath to
it as the value of the xpath attribute. The following
example has a namespace:
path="/{http://www.w3.org/1999/
02/22-rdf-syntax-ns#}RDF/{http://www.w3.org/
2004/02/skos/core#}Concept&lt;FULL_TEXT:
com.emc.documentum.core.fulltext.
indexserver.core.index.xhive.
IndexServerAnalyzer:
GET_ALL_TEXT,SUPPORT_PHRASES,
INCLUDE_START_END_TOKEN_FLAGS,
INDEX_PATHS&gt;" name="Concept"

(continued)

sortable: Set to true for the results to sort on


the attribute. Default: false. The attribute
value-comparison must also be true. Requires
reindexing.

(continued)

xpath: Value of path attribute using xpath notation


without namespace. The xpath for the example in the
path attribute is the following:
xpath==/RDF/Concept

140

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Indexing

Index option

Description

(continued)

returning-contents: Boolean, specifies whether the


indexed value will be returned. For example, the user
may search for documents with specific characteristics
in a certain repository folder path, but the folder path
does not need to be returned. Use for facets only. For
information on facets, see About Facets, page 273.

(continued)

type: type of content in sub-path. Valid values: string


| integer | double | date | datetime. The boolean type
is not supported by xDB. Supports XQuery typed
expressions such as date range.

(continued)

value-comparison: Boolean, specifies that the value


in this path should be indexed. Use for comparisons
such as =, >, <, starts-with. Value indexing requires
additional storage, so you should not index fields that
will not be searched for as comparisons or starts-with.

Subpaths
A subpath definition specifies the path to an element. The path information is saved with the indexed
value. A subpath increases index size while enhancing performance. For most Documentum
applications, you do not need to modify the definitions of the subpath indexes, except for the following
use cases:
Store facet values in the index. For example:
<sub-path ...returning-contents="
true" compress="true" value-comparison="true" full-text-search="true"
enumerate-repeating-elements="false" type="string"
path="dmftcustom/entities/person"/>

Add a subpath for non-string metadata. By default, all metadata is indexed as string type. To
speed up searches for non-string attributes, add a subpath like the following. Valid types: string |
integer | double | date | datetime.
<sub-path leading-wildcard="false" compress="false" boost-value="1.0"
include-descendants="false" returning-contents="false" value-comparison="true"
full-text-search="true" enumerate-repeating-elements="false" type="datetime"
path="dmftmetadata//r_creation_date"/>

Note: Starting from xPlore 1.3, boolean is no longer a supported subpath type. Any subpath with its
type set to boolean in indexserverconfig.xml will be automatically converted to string type.
Add paths for dmftcustom area elements (metadata or content that a TBO injects).
Add paths to support XQuery of XML content.
Modify the capabilities of existing subpaths, such as supporting leading wildcard searches for
certain paths. For example:
<sub-path description="leading wildcard queries"
returning-contents="false"

EMC Documentum xPlore Version 1.3 Administration and Development Guide

141

Indexing

value-comparison="true" full-text-search="true"
enumerate-repeating-elements="
false" leading-wildcard="true" type="string"
path="dmftmetadata//object_name"/>

Add subpaths for metadata that should not be indexed, for example, Documentum attributes. Set
full-text-search to false.
Add a subpath for sorting. Requires an attribute value of value-comparison=true. See Sort
support, page 143.
Note: For all subpath changes that affect existing documents, you must rebuild the index of every
affected collection. To verify that your subpath was used during indexing, open the xDB admin tool
and drill down to the multi-path index. Double-click Multi-Path index in the Index type column. You
see a Rebuild Index page that lists all paths and subpaths. For example:

Table 19

Path-value index with and without subpaths

Feature

Without subpath

With subpath

Key set combinations

Limited

Flexible

142

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Indexing

Feature

Without subpath

With subpath

Single key query latency

Low

High (performs better with


complex predicates)

ftcontains (full-text)

Single per probe

Supports multiple ftcontains in a


probe

Updates

Low overhead

High overhead

Returnable, indexed (covering)


values

Yes

No

Sort support
You can configure sorting by Documentum attributes. Add a subpath in indexserverconfig.xml for
each attribute that is used for sorting. Requires an attribute value of value-comparision=true.
Note: Reindexing is required to return documents sorted by attributes.
Follow these rules for attribute sort subpath definitions:
Configure the most specific subpath for sorting. This insures that the index is used for lookup.
For example, if the document has element root/pathA/my_attribute and root/pathB/my_attribute
(same attribute), do not set root//some_attribute as sortable. The following example configures an
attribute for sorting using the full xpath:
<sub-path sortable="true" ... value-comparison="true" full-text-search="
true" ...path="dmftmetadata/dm_document/r_modify_date"/>

If there is more than one path to the same attribute, you must create a subpath definition for each
path. Ambiguous paths may result in a slow search outside the index. In the example above, you
need two subpath definitions:
<sub-path sortable="true" ... value-comparison="true" full-text-search="
true" ...path="dmftmetadata/root/pathA/my_attribute"/>
<sub-path sortable="true" ... value-comparison="true" full-text-search="
true" ...path="dmftmetadata/root/pathB/my_attribute"/>

If there is only one instance of an attribute in the rendered dftxml. a shortened path notation is
treated as the most specific subpath. For example, there is only one object_name element in dftxml.
so dmftmetadata//object_name is the most specific path.

Specifying sort in DFC


The DFC client application can specify a set of Documentum attributes for sorting results using the
API in an IDfQueryBuilder API. If the query contains an order by attribute, results are returned based
on the attribute and not on the computed score. Results are sorted by the first attribute, then sorted
again by the second attribute, and so on. Two API signatures are provided:
void addASCIIOrderByAttribute(String attrName, boolean isAsc). If isAsc is true, the results are
sorted based on the specified attribute in ascending order. If false, results are sort in descending order.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

143

Indexing

void addOrderByAttribute(String attrName, boolean isAsc, Locale locale). If the attribute is a


string type, and the locale is not specified, the locale of the query is used. The locale parameter
is ignored in DQL queries.

Sort debugging
You can use the Query debug and Optimizer tools in xPlore administrator to troubleshoot sorting. When
an index is properly configured and used for the query, you see the following in the Query debug pane:
Found an index to support all order specs. No sort required.

If you do not see this confirmation, check the Optimizer pane. Find the following section:
<endingimplicitmatch indexname="dmftdoc">

In the following example, there are two order-by clauses in the query. The first fails because there
is no sub-path definition for sorting.
<endingimplicitmatch indexname="dmftdoc">
<LucenePlugin>
<ImplicitIndexOptimizer numconditionsall="1"
numorderbyclausesall="2">
<condition accepted="true" expr="child::dmftmetadata/
descendant-or-self::node()/child::object_name
[. contains text rule]"/>
<orderbyclause accepted="true" expr="child::dmftmetadata/
descendant-or-self::node()/child::object_name"/>
<orderbyclause accepted="false" reason="
No exact matching subpath configuration found that
matches order-by clause" expr="child::dmftmetadata/
descendant-or-self::node()/child::r_modify_data"/>
</ImplicitIndexOptimizer>
<numacceptedandrejected numconditionsaccepted="1"
numconditionsskipped="0" numorderbyclausessaccepted="1"
numorderbyclausesskipped="1"/>
</LucenePlugin>
<conditions numaccepted="1">
<pnodecondition>
child::dmftmetadata/descendant-or-self::
node()/child::object_name[. contains text rule]
</pnodecondition>
<externallyoptimizedcondition accepted="true">
child::dmftmetadata/descendant-or-self::node()/
child::object_name[. contains text rule]
</externallyoptimizedcondition>
</conditions>
<orderbyclauses numaccepted="1">
<orderbyclause accepted="true">child::dmftmetadata/
descendant-or-self::node()/child::object_name
</orderbyclause>
<orderbyclause accepted="false">child::dmftmetadata/
descendant-or-self::node()/child::r_modify_data
</orderbyclause>
</orderbyclauses>
</endingimplicitmatch>

144

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Indexing

The information displays that dmftmetadata//r_modify_data is not accepted. The reason is a typo in the
sub-path definition in indexserverconfig.xml: The proper sub-path is dmftmetadata//r_modify_date.

Creating custom indexes


xPlore indexes full-text content and metadata for all objects in the repository, so you do not need to set
up indexes. Create a custom index for the following:
A subpath for each facet used by the client application
Custom, non-repository metadata or content added to your documents during indexing. In
Documentum environments, use a TBO.
Set up indexes before content is ingested. If you add an index on existing content, the content must be
reingested.
The Documentum index agent generates metadata in a data input format called dftxml. Each indexable
object type is represented in dftxml, along with its indexable attributes, in a predictable path within the
dftxml representation. Refer to Extensible Documentum DTD, page 348 for additional information
on dftxml.
Indexes are defined within indexserverconfig.xml, in xplore_home/config, within a category definition.
Shut down all xPlore instances before changing this file. To enhance performance and reduce storage,
you can specify categories and attributes that are not indexed.

Indexing new attributes


You can add path-value indexes to indexserverconfig.xml for new data types so that documents of the
new type can be ingested without rebuilding the index. Shut down all xPlore instances before changing
this file. If you add a new attribute for documents that have already been indexed, rebuild the index.

Managing indexing in xPlore administrator


You can perform the following administrative tasks in xPlore administrator.
View indexing statistics: Expand an instance in the tree and choose Indexing Service. Statistics
are displayed in the right panel: tasks completed, with a breakdown by document properties, and
performance.
Configure indexing across all instances: Expand Services > Indexing Service in the tree. Click
Configuration. You can configure the various options described in Document processing and
indexing service configuration parameters, page 339. The default values have been optimized for
most environments.
Start or stop indexing: Select an instance in the tree and choose Indexing Service. Click Enable
or Disable.
View the indexing queue: Expand an instance in the tree and choose Indexing Service. The queue
is displayed. You can cancel any indexing batch requests in the queue.
Note: This queue is not the same as the index agent queue. You can view the index agent queue in
the index agent UI or in Documentum administrator.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

145

Indexing

Troubleshooting indexing
You can use reports to troubleshoot indexing and content processing issues. SeeDocument processing
(CPS) reports, page 290 and Indexing reports, page 290 for more information on these reports.

Testing upload and indexing


If xPlore administrator is running on the same instance as an index service, you can test upload a
document for indexing.
Note: If you are testing Documentum indexing before migration, run the ACL replication script to
update the security index. Manually updating security, page 52.
To do test upload in xPlore administrator, expand the Diagnostic and Utilities tree and choose Upload
testing document. You can test upload in the following ways:
Local File: Navigate to a local file and upload it.
Remote File: Enter the path to a remove file that is accessible from the xPlore administrator host.
Specify raw XML:
Click Specify raw XML, and type raw XML such as dftxml for testing.
For the Content Type option (local or remote file), specify a MIME type such as application/msword,
application/pdf, text/plain, or text/xml.
When you click the object name, you see the dftxml. This dftxml rendition is template-based, not
generated by the Documentum index agent. There can be slight differences in the dftxml when you
submit the document through the Documentum index agent. To verify remote CPS, you start with a
sample dftxml file, then edit the dmcontentref element. Set it as a path to the shared storage, then paste
the dftxml in the text box.
Results are displayed. If the document has been successfully uploaded, it is available immediately
for search unless the indexing load is high. The object name in xPlore is created by concatenating
the file name and timestamp.

Checking network bandwidth and latency


Bandwidth or latency bottlenecks can degrade performance during the transfer of content from a
source file system to xPlore. CPS reports measure indexing latency. Validate that file transfers take
place with the expected speed. This issue is seen more frequently in virtual environments.

Checking the indexing log


The xPlore log dsearch.log is located in the logs subdirectory of the JBoss deployment directory. In the
following example, logging was set to INFO (not the default) in xPlore administrator.
The following text from dsearch.log shows that a document was indexed and inserted into the tracking
DB:
146

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Indexing

<event timestamp="2009-04-03 08:40:54,536" level="INFO" thread="


http-0.0.0.0-9200-2"
logger="com.emc.documentum.core.fulltext.indexserver.core.index.FtIndexObject"
elapsedTime="1238773254536">
<message ><![CDATA[[INSERT_STATUSDB] insert into the StatusDB with docId
0965dd8980001dce operationId
primary$3cb3b293-8790-452e-af02-84c9502a45e4 status NEW (message) ]]>
</message>
</event>

Checking the status of a document


Using xPlore administrator, you can issue the following XQuery to find the indexing status of a
document. You must know the document ID.
for $i in
collection(/domainname>/dsearch/SystemInfo/StatusDB)/trackinginfo/operation
where $i [@doc-id = <document-id>]
return <r> <status>{$i/@status/data(.)}</status>
<message> {$i/data(.)} </message></r>

The status returned is one of the following:


NEW
The indexing service has begun to process the request.
ERROR
xPlore failed to process the request. Metadata and content cannot be indexed.
DONE
The document has been processed successfully.
WARN
Only the metadata was indexed.

Troubleshooting high save-to-search latency


The following issues can cause high latency between the time a document is created or updated and
the time the document is available for search:
Index agent is down
Detect this problem by monitoring the size of the index agent queue. Use xPlore administrator to
determine whether documents were sent for ingestion. For example, the Documents ingested per
hour report shows 0 for DocCount when the index agent is down.
Workaround: Configure multiple index agents for redundancy. Monitor the index agents and restart
when they fail.
CPS restarts frequently
EMC Documentum xPlore Version 1.3 Administration and Development Guide

147

Indexing

Under certain conditions, CPS fails while processing a document. xPlore restarts the CPS process,
but the restart causes a delay. Restart is logged in cps.log and cps_daemon.log. For information on
these logs, see CPS logging, page 299.
Large documents tie up ingestion
A large document in the ingestion pipeline can delay smaller documents that are further back in
the queue. Detect this issue using the Documents ingested per hour report in xPlore administrator.
(Only document size averages are reported.)
If a document is larger than the configured maximum limit for document size or text size, the
document is not indexed. The document metadata are indexed but the content is not. These
documents are reported in the xPlore administrator report Content too large to index.
Workarounds: Attempt to refeed a document that was too large. Increase the maximum size for
document processing or set cut_off_text to true in PrimaryDsearch_local_configuration.xml. This file
is located in xplore_home/dsearch/cps/cps_daemon. See Maximum document and text size, page 98.
Ingestion batches are large
During periods of high ingestion load, documents can take a long time to process. Review the
ingestion reports in xPlore administrator to find bytes processed and latency. Use dsearch.log to
determine when a specific document was ingested.
Workaround: Set up a dedicated index agent for the batch workload.
Insufficient hardware resources
If CPU, disk I/O, or memory are highly utilized, increase the capacity. Performance on a virtual
server is slower than on a dedicated host. For a comparison or performance on various storage
types, see Disk space and storage, page 313.

Changes to index configuration do not take effect


If you have edited indexserverconfig.xml, your changes are not applied unless the system is stopped.
Some changes, such as adding or modifying a subpath do not take effect on a restart but only when you
rebuild the indexes of the collection.
Modifying indexserverconfig.xml, page 43 describes the procedure to view and update the index
configuration.
If you have changed configuration using xPlore administrator, and your system crashed before the
changes were flushed to disk, repeat your configuration changes and restart xPlore.
See also: Running the standalone consistency checker, page 149.

Resolving attribute datatype conflict


To speed up searches for non-string attributes, you can define a subpath specifying the attribute type:
integer, double, date, or datetime. However, a conflict occurs in the following conditions:
You changed the attribute type in the subpath definition to an incompatible type. For example, you
can change an attribute from string type to any other type, but you cannot change an attribute from
double to date type.
And
148

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Indexing

Some objects have already been indexed.


When new objects are indexed, the system throws the exception: XhiveException:
INDEX_VALUE_FORMAT_ERROR. The exception appears in dsearch.log, in indexagent.log, and in
Documentum Administrator, in the Index queue page.
For example, in dsearch.log:
2012-09-25 03:42:16,516 ERROR [IndexWorkerThread-2]
c.e.d.c.f.i.core.index.plugin.XhivePlugin
- Failed to ingest documents: 0900107f8000425c; Exception:
com.xhive.error.XhiveException: INDEX_VALUE_FORMAT_ERROR:
Illegal value "2013-01-01T09:00:00" for type DOUBLE in index dmftdoc

As a consequence, all new objects with this attribute are not indexed.
To resolve the conflict, delete the collection and reindex the objects as described in Deleting a
collection and recreating indexes, page 165.
Rebuilding the indexes or reindexing without deleting the collection does not resolve the conflict
because the conflict still exists in xDB.

Running the standalone consistency checker


The standalone data consistency checker has two functions:
Checks data consistency for all collections in a specific domain.
Detects tracking database corruption and rebuilds it, then reconciles it with the corresponding
collection.
For example, for the index agent log error Duplicate document ID, run the consistency checker to see
whether the tracking DB is out of synch with the index. Make sure there is no indexing activity
when running this tool.
For information on the database consistency checker in xPlore administrator, see Domain and
collection menu actions, page 155.
Note: Do not check consistency during migration or index rebuild. Run only one instance of the tool
at a time.
The consistency checker is invoked from the command line. The tool checks consistency with the
following steps. The batch size for a list is configurable (default: 1000). Batches are processed until all
documents have been checked.
1. For each collection in the domain, the tool queries the tracking database to find the number of
document IDs.
2. Document IDs are returned as a list from the tracking DB.
3. Document IDs are returned as a list from the corresponding collection.
4. The list sizes are compared. Differences are reported.
5. fix-trackDB set to true, it inserts the missing entries in the tracking database. It deletes the entries in
the database that do not exist in the collection. It also validates the library-path in the database entry.
1. Back up your domain or collection.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

149

Indexing

2. Stop all indexing activity on the instance. Set the target collection, domain, or federation to
read-only or maintenance mode.
3. Invoke the checker using CLI. (See Using the CLI, page 184.) The default batch size is 1000. Edit
the script to change the batch size. Syntax (on one line):
xplore checkDataConsistency unit, domain, collection,
is_fix_trackDB, batch-size, report-directory

Valid values:
unit: collection or domain
domain: domain name.
collection: Collection name (null for domain consistency check)
is_fix-trackDB: true or false. Set to false first and check the report. If indexing has not been
turned off, inconsistencies are reported.
batch-size: Numeric value greater than or equal to 1000. Non-numeric, negative, or null values
default to 1000.
report-directory: Path for consistency report. Report is created in a subdirectory
report-directory/time-stamp/domain_name|collection_name. Default base directory is the
current working directory.
Windows example: Checks consistency of a default collection in defaultDomain and fixes the
trackingDB:
xplore "checkDataConsistency collection, defaultDomain,default1 true,
2000, C:\\tmp "

Checks consistency of all collections in defaultDomain and fixes the trackingDB:


xplore "checkDataConsistency domain, defaultDomain,null, true,
2000, C:\\tmp "

Linux example: Checks consistency of "defaultDomain:


./xplore.sh checkDataConsistency " domain, defaultDomain, null, true,
1000, export/space1/temp "

4. View the report in the current working directory.

Indexing APIs
Access to indexing APIs is through the interface
com.emc.documentum.core.fulltext.client.IDSearchClient. Each API is described in the javadocs. The
following topics describe the use of indexing APIs.

Route a document to a collection


You can route a document to a collection in the following ways:
A custom routing class. See Creating a custom routing class, page 151.
Index agent configuration. See Mapping Server storage areas to collections, page 71 and Sharing
content storage, page 70.
150

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Indexing

For a detailed example of routing to specific collections and targeting queries to that collection, see
"Improving Webtop Search Performance Using xPlore Partitioning" on the EMC Community Network
(ECN).

Creating a custom routing class


xPlore defines an interface for routing a document to a collection: IFtIndexCollectionRouting
in the package com.emc.documentum.core.fulltext.client.index.custom. You can provide a
class that implements the interface and specify this class name in the xPlore configuration file
indexserverconfig.xml. Then xPlore invokes the custom class for routing.
You can use custom routing for all domains or for specific domains. The collection determined by the
routing class takes precedence over a collection that is specified in the indexing API.
1. Stop the xPlore instances.
2. Register your custom routing class in indexserverconfig.xml, which is located in the xPlore config
directory. Add the following element between the system-metrics-service and admin-config
elements. (The xPlore server must be stopped before you add this element.)
<customization-config>
<collection-routing class-name="SimpleCollectionRouting">
</collection-routing>
</customization-config>

3. Create your custom class. (See example.) Import IFtIndexRequest in the package
com.emc.documentum.core.fulltext.common.index. This class encapsulates all aspects of an
indexing request:
public interface IFTIndexRequest
{
String getDocId ();
long getRequestId ();
FtOperation getOperation (); //returns add, update or delete
String getDomain ();
String getCategory ();
String getCollection ();
IFtDocument getDocument (); //returns doc to be indexed
public String getClientId();
void setCollection(String value);
public void setClientId(String id);
}

SimpleCollectionRouting example
This example routes a document to a specific collection based on Documentum version.
The sample Java class file in the SDK/samples directory assumes that the Documentum
index agent establishes a connection to the xPlore server. Place the compiled class
SimpleCollectionRouting in the Documentum index agent classpath, for example,
xplore_home//jboss5.1.0/server/DctmServer_Indexagent/deploy/IndexAgent.war/WEB-INF/classes.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

151

Indexing

This class parses the input DFTXNL representation from the index agent. The class gets a metadata
value and tests it, then routes the document to a custom collection if it meets the criterion (new
document).

Imports
import com.emc.documentum.core.fulltext.client.index.
custom.IFtIndexCollectionRouting;
import com.emc.documentum.core.fulltext.client.index.FtFeederException;
import com.emc.documentum.core.fulltext.client.common.IDSearchServerInfo;
import com.emc.documentum.core.fulltext.common.index.IFtIndexRequest;
import com.emc.documentum.core.fulltext.common.index.IFtDocument;
import java.util.List;
import javax.xml.xpath.*;
import org.w3c.dom.*;

Required method setEssServerInfo


The index agent environment sets the xPlore server info. You can implement this method simply
without creating the connection:
public void setEssServerInfo(IDSearchServerInfo info)
{
m_serverInfo = info;
}
IDSearchServerInfo m_serverInfo;

Add variables for routing


private boolean m_updated = false;
private static final String s_collection = "superhot";
private String m_version = null;

Parse the metadata from dftxml


This method parses the dftxml representation of the document and metadata, which is passed in from
the Documentum index agent. The Documentum version is returned to setCustomCollection().
private String parseDocumentVersion(Document inputdoc)
throws XPathExpressionException
{
String version = null;
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("//r_version_label/text()");
Object result = expr.evaluate(inputdoc, XPathConstants.NODE);
Node versionnode = (Node) result;
version = versionnode.getNodeValue();
System.out.println("version: " + version);
m_version = version;

152

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Indexing

return m_version;
}

Add routing logic


This method calls parseDocumentVersion() to get the version for routing. It sets the custom collection
if the metadata meets the criterion (new collection).
private boolean setCustomCollection(IFtIndexRequest request)
{
assert request != null;
IFtDocument doc = request.getDocument();
Document xmldoc = doc.getContentToIndex();
try
{
String version = parseDocumentVersion(xmldoc);
if (version.equals "1.0")
{
request.setCollection(s_collection);
m_updated = true;
}
} catch (XPathExpressionException e)
{
new FtFeederException(e);
}
return m_updated;
}

Set the custom collection


This method determines whether the document is being added (not updated or deleted). The method
then calls setCustomCollection to get the version and route the document to the appropriate collection.
// Return true after the collection name has been altered for any request
// Otherwise returns false.
public boolean updateCollection(List<IFtIndexRequest> requests) throws
FtFeederException
{
assert m_serverInfo != null;
assert requests != null;
for ( IFtIndexRequest request : requests )
{
if (request.getOperation().toString().equals("add"))
{
setCustomCollection(request);
}
}
return m_updated;
}

EMC Documentum xPlore Version 1.3 Administration and Development Guide

153

Chapter 7
Index Data: Domains, Categories, and
Collections
This chapter contains the following topics:

Domain and collection menu actions

Managing domains

Delete a corrupted domain

Configuring categories

Managing collections

Checking xDB statistics

Troubleshooting data management

xDB repair commands

Domain and collection menu actions


Execute XQuery
You can query a domain or collection with Execute XQuery in xPlore administrator. Enter your
XQuery expression in the input area. Check to provide information to technical support. The options
get query plan and get optimizer debug are used to provide information to EMC technical support.

Check DB consistency
Perform before backup and after restore. This check determines whether there are any corrupted or
missing files such as configuration files or Lucene indexes. Lucene indexes are checked to see whether
they are consistent with the xDB records: tree segments, xDB page owners, and xDB DOM nodes.
Note: You must set the domain to maintenance mode before running this check.
Select the following options to check. Some options require extensive processing time:
Segments and admin structures: Efficient check, does not depend on index size.
Free and used pages: Touches all pages in database.
Note: When the result from this check is inconsistent, run it two more times.
Pages owner: Touches all pages in database.
Indexes: Traverses all the indexes and checks DOM nodes referenced from the index.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

155

Index Data: Domains, Categories, and Collections

Basic checks of indexes: Efficient check of the basic structure. The check verifies that the necessary
Lucene index files exist and that the internal xDB configuration is consistent with files on the
file system.
DOM nodes: Expensive operation, accesses all the nodes in the database.
The basic check of indexes inspects only the external Lucene indexes. The check verifies that the
necessary files exist and that the internal xDB configuration is consistent with files on the file system.
This check runs much faster than the full consistency check.
Use the standalone consistency checker to check data consistency for specific collections in a domain
or specific domains in a federation. If inconsistencies are detected, the tool can rebuild the tracking
database. If tracking entries are missing, they are re-created. If tracking entries point to nothing, they
are deleted. See Running the standalone consistency checker, page 149/

View DB statistics
Displays performance statistics from xDB operations.

Backup
See Backup in xPlore administrator, page 179.

Managing domains
A domain is a separate, independent, logical, or structural grouping of collections. Domains are
managed through the Data Management screen in xPlore administrator. The Documentum index agent
creates a domain for the repository to which it connects. This domain receives indexing requests
from the repository.

New domain
Select Data Management in the left panel and then click New Domain in the right panel. Choose a
default document category. (Categories are specified in indexserverconfig.xml.) Choose a storage
location from the dropdown list. To create a storage location, see Creating a collection storage
location, page 164.
A Documentum index agent creates a domain for a repository and creates ACL and group collections
in that domain. Note: When you create a domain in xPlore administrator, ACL and group collections
are not created automatically.

New Collection
Create a collection and configure it. See Changing collection properties, page 161.
156

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Index Data: Domains, Categories, and Collections

Configuration
The document category and storage location are displayed (read-only). You can set the runtime mode
as normal (default) or maintenance (for a corrupt domain). The mode does not persist across xPlore
sessions; mode reverts to runtime on xPlore restart.

Back up, detach or attach a domain


For information on backing up a domain, see Backup in xPlore administrator, page 179.
Note: Do not use detach to move a domain index. Instead, change the binding for each collection. To
avoid corruption, do not detach or attach a domain during index rebuild.

Delete domain
If you are unable to start xPlore, with a log entry that the domain is
corrupted, force recovery. Add a property force-restart-xdb=true in
xplore_home/jboss5.1.0/server/%INSTANCE_NAME%/deploy/dsearch.war/
WEB-INF/classes/indexserver-bootstrap.properties.
Remove the domain with the following steps.
1. Bind all collections in the domain to the primary instance using xPlore administrator.
2. All collections must meet the following conditions before you detach the domain:
No collections are detached.
No collections are off-line.
No collections are in search-only mode.
If the delete domain transaction encounters a detached collection, it displays an error message.
3. (Optional) Back up the xplore_home/config and xplore_home/data/domain_name folders.
4. In xPlore administrator, choose Data Management and click the red X next to a domain to delete
it. This option is not enabled if the domain is detached. A corrupted domain cannot be deleted
using xPlore administrator. For steps to manually delete a corrupted domain, see Delete a corrupted
domain, page 157.
If there is an error in deleting any collection in the domain, the entire delete domain transaction
is rolled back.

Delete a corrupted domain


1. Force recovery if the domain is corrupted. Add a property force-restart-xdb=true in
xplore_home/jboss5.1.0/server/%INSTANCE_NAME%/deploy/dsearch.war/WEB-INF/classes/
indexserver-bootstrap.properties
2. Log in to xPlore administrator and open the domain in Data Management. Click Detach to
force detach the domain.
3. Stop all xPlore instances.
4. Back up the xplore_home/config and xplore_home/data/domain_name folders.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

157

Index Data: Domains, Categories, and Collections

5. Remove the domain element from indexserverconfig.xml, like the following:


<domain ...name="my_domain:>
...
</domain>

6. Delete the segment elements for the domain in XhiveDatabase.bootstrap in xplore_home/config.


Search on the domain name. For example, you delete the GlobalOps domain in every segment
library-path or id attribute:
<segment library-id="0" library-path="/GlobalOps/dsearch/
ApplicationInfo/acl" ...
id="GlobalOps#dsearch#ApplicationInfo#acl">
<file id="10" path="C:\xPlore\data\GlobalOps\acl\xhivedbGlobalOps#dsearch
#ApplicationInfo#acl-0.XhiveDatabase.DB"/>
<binding-server name="primary"/>
</segment>

7. Delete the domain folder under xplore_home/data, for example, xplore_home/data/GlobalOps.


8. Restart the xPlore primary and then secondary instances.

Configuring categories
A category defines a class of documents and their XML structure. The category definition specifies the
processing and semantics that are applied to the ingested XML document. You can specify the XML
elements that have text extraction, tokenization, and storage of tokens. You also specify the indexes
that are defined on the category and the XML elements that are not indexed. More than one collection
can map to a category. xPlore manages categories.
The default categories include dftxml (Documentum content), security (ACL and group, tracking
database, metrics database, audit database, and thesaurus database.
When you create a collection, choose a category from the categories defined in indexserverconfig.xml.
When you view the configuration of a collection, you see the assigned category. It cannot be changed
in xPlore administrator. To change the category, edit indexserverconfig.xml.
You can configure the indexes, text extraction settings, and compression setting for each category. The
paths in the configuration file are in XPath syntax and refer to the path within the XML representation
of the document. (All documents are submitted for ingestion in an XML representation.) Specify an
XPath value to the element whose content requires text extraction for indexing.
Table 20

Category configuration options

Option

Description

category-definitions

Contains one or more category elements.

category

Contains elements that govern category indexing.

properties/property

Specifies whether to track the location (index name) of


the content in this category. For Documentum dftxml
representations of documents, the location is tracked
in the tracking DB. Documentum ACLs and groups
are not tracked because their index location is known.

track-location

158

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Index Data: Domains, Categories, and Collections

Managing collections
About collections, page 159
Planning collections for scalability, page 160
Uses of subcollections, page 160
Adding or deleting a collection, page 161
Changing collection properties, page 161
Routing documents to a specific collection, page 162
Attaching and detaching a collection, page 162
Moving a collection, page 162
Creating a collection storage location, page 164
Rebuilding the index, page 164
Deleting a collection and recreating indexes, page 165
Querying a collection, page 165

About collections
A collection is a logical group of XML documents that is physically stored in an xDB detachable
library. A collection represents the most granular data management unit within xPlore. All documents
submitted for indexing are assigned to a collection. A collection generally contains one category of
documents. In a basic deployment, all documents in a domain are assigned to a single default collection.
You can create subcollections under each collection and route documents to user-defined collections.
A collection is bound to a specific instance in read-write state (index and search, index only, or update
and search). A collection can be bound to multiple instances in read-only state (search-only).

Viewing collections
To view the collections for a domain, choose Data Management and then choose the domain the left
pane. In the right pane, you see each collection name, category, usage, state, and instances that the
collection is bound to.
There is a red X next to the collection to delete it. For xPlore system collections, the X is grayed out,
and the collection cannot be deleted.

Viewing collection properties


Choose Data Management and drill down to the collection in the left pane. In the content pane, you
see the following information about the collection:
Library path in xDB.
Document category. For more information, see Documentum domains and categories, page 25.
xPlore instances the collection is bound to.
State: Valid states: index and search, index only, update_and_search, search only, and off_line.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

159

Index Data: Domains, Categories, and Collections

Note: You cannot back up a collection in the offline state. You must detach it or bring it online.
Usage: Type of xDB library. Valid types: data (index), ApplicationInfo and SystemInfo.
Current size on disk, in KB

Viewing collection contents


The total number displayed in Collection Contents pane is the number of documents in the collection
and its subcollections. For subcollections, the number of documents in the subcollection is displayed.

Viewing a document in a collection


Click Name for an individual document to view the XML representation (dftxml) of the document.

Planning collections for scalability


A collection is a logical grouping of tokenized content and associated full-text indexes within a
domain. For example, you have a collection that indexes all email documents. For Documentum
environments, the Documentum index agent creates a domain for each source repository, and
documents are indexed to collections within that domain.
A collection contains documents of a single category. There is generally a one-to-one mapping
between a category and a collection. A document category definition in indexserverconfig.xml
describes what is indexed within the documents that are submitted
Specify the target collection for documents using one of the following methods. They have the
following order of precedence in xPlore, with highest first:
Custom routing class. See Creating a custom routing class, page 151.
API indexing option or partition mapping in the Documentum index agent. See Sharing content
storage, page 70.
Default collection that is created for each domain.
When the target collection is not specified, documents are indexed to the collections in round-robin
order. The documents are passed to the instance with the specified collection. Only collections with
the state index or index and search can receive documents.
To plan for high ingestion rates, you can route a collection to high-speed storage for ingestion. As the
data becomes less in demand, you can bind the collection to low-cost storage.
You can create a separate collection for ingestion and then merge that collection with another. A single
collection hierarchy performs better for search.

Uses of subcollections
Create subcollections for the following uses:
Create multiple top-level collections for migration to boost the ingestion rate. After ingestion
completes, move the temporary collection to a parent collection. The temporary collection is
160

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Index Data: Domains, Categories, and Collections

now a subcollection. The parent and subcollections are searched faster than a search of multiple
top-level collections.
Create subcollections for data management. For example, you create a collection for 2011 data
with a subcollection to store November data.
The following restrictions are enforced for subcollections, including a collection that is moved
to become a subcollection:
Subcollections cannot be detached or reattached when the parent collection is indexed. For
example, a path-value index is defined with no subpaths, such as the folder-list-index.
Subcollections cannot be backed up or restored separately from the parent.
Subcollections must be bound to the same instance as the parent.
Subcollection state cannot contradict the state of the parent. For example, if the parent is
search_only, the subcollection cannot be index_only or index_and_search. If the parent is
searchable, the adopted collection cannot be search-only.
Exception: When the parent collection is in update_and_search or index_and_search state,
subcollections can be any state.

Adding or deleting a collection


1. Choose a domain and then choose New collection.
2. Set properties for the new collection:
Collection name
Parent domain
Usage: Type of xDB library. Valid types: data (index) or applicationinfo.
Document category: Categories are defined in indexserverconfig.xml.
Binding instance: Existing instances are listed.
To change the binding of a collection, see Changing collection properties, page 161.
Storage location: Choose a storage location from the dropdown list. To define a storage location,
see Creating a collection storage location, page 164.
3. To create a subcollection, click the parent collection in the navigation pane and then click New
Subcollection. The storage location is the same as the location for the parent.
4. To delete a collection, choose a domain and then click X next to the collection you wish to delete.
A collection must have the state index_and_search or index_only to be deleted. Collections
with the state search_only or off_line cannot be deleted in xPlore administrator. To delete these
collections, use the xDB admin tool.

Changing collection properties


1. Select a collection and then choose Configuration.
The Edit collection screen displays the collection name, parent domain, usage, document category,
state, binding instance, and storage location.
2. Configure state: index and search, index only, update and search, or search only.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

161

Index Data: Domains, Categories, and Collections

index and search is the default state when a new collection is created. A collection can have
only one binding that is index_and_search.
Use index only to repair the index. You cannot query a collection that is set to index only.
Use update and search for read-only collections that have updates to existing content or
metadata. You cannot add new content to the collection.
Use search only (read-only) on multiple instances for query load balancing and scalability.
3. Change binding to another xPlore instance:
a. Set the state of the collection. If the collection state is index_and_search, update_and_search,
or index_only , you can bind to only one instance. If the collection state is search_only, you can
bind the collection to multiple instances for better resource allocation.
b. Choose a Binding instance.
Limitations:
If a binding instance is unreachable, you cannot edit the binding.
You cannot change the binding of a subcollection to a different instance from the parent
collection.
To change the binding on a failed instance, restore the collection to the same instance or to a
spare instance.
4. Change storage location. To set up storage locations, see Creating a collection storage location,
page 164.

Routing documents to a specific collection


There are three ways to route documents to a specific collection. Routing is applied in the following
order of precedence, with custom routing applied in highest priority.
Custom routing class. Use for complicated routing logic before migration. See Creating a custom
routing class, page 151
Documentum index agent: Map a file store to a collection before migration. See Sharing content
storage, page 70.
Default collection of the domain, to all instances in round robin order.

Attaching and detaching a collection


Note: To avoid corruption, do not detach or attach a collection during index rebuild.
Attach and detach are required for a restore operation. See Offline restore, page 180.

Moving a collection
If you are moving a collection to another xPlore instance, choose the collection and click
Configuration. Change Binding instance.
You can create a collection for faster ingestion, and then move it to become a subcollection of another
collection after ingestion has completed. When you move it as a subcollection, search performance is
improved.
162

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Index Data: Domains, Categories, and Collections

If a collection meets the following requirements, you can move it to become a subcollection:
Facet compression was disabled for all facets before the documents were indexed.
The collection is not itself a subcollection (does not have a parent collection).
The collection does not have subcollections.
The collection has the same category and index definitions (in indexserverconfig.xml) as the new
parent.
xPlore enforces additional restrictions after the collection has been moved. For information on
subcollection restrictions, see Uses of subcollections, page 160.
1. Choose the collection and click Move.
2. In the Move to dialog, select a target collection. This collection can be a subcollection or a
top-level collection.

Moving a collection to a new drive


You can move one or more collections when you are changing disks or a drive is full. The following
procedure assumes that you do not move the /config directory.
1. Stop the index agent and xPlore instances.
2. Back up the xplore_home /data and xplore_home /config directories.
3. Copy the xplore_home/data folder to the new path. This location must be accessible and writeable
by all xPlore instances. Repeat this step if you created one or more storage locations using a
different path.
4. Update the old location to the new location in the following files on the primary instance host:
indexserverconfig.xml. For example:
<storage-location status="not_full" quota_in_MB="10"
path="C:/xPlore_1/data-new" name="default"/>

XhiveDatabase.bootstrap. Update the path of each segment to the new path. For example:
<segment reserved="false" library-id="0" library-path=
"/Repository1/dsearch/Data/default" usable="true"
usage="detachable_root" state="read_write" version="1"
temp="false" id="Repository1#dsearch#Data#default">
<file id="12" path="C:\xPlore_1/data-new\Repository1\default\
xhivedb-Repository1#dsearch#Data#default-0.XhiveDatabase.DB"/>
<binding-server name="primary"/>
</segment>

5. On all instances, change the path in indexserver-bootstrap.properties to match the new bootstrap
location. For example:
xhive-data-directory=C\:/xPlore_1/data-new

6. Delete the JBoss cache (/work folders) from the index agent and primary instance web applications.
Also delete JBoss cache folders for the secondary instances. The path of the /work folder is
xPlore_home/jboss5.1.0/server/DctmServer_InstanceName/work.
7. Start the xPlore instances.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

163

Index Data: Domains, Categories, and Collections

8. Start the index agent.


9. Import a test document and search for it using xPlore administrator. Also, search for a document
that has already been indexed.

Creating a collection storage location


xPlore stores data and indexes in an xDB database. The index for each collection can be routed to
a specific storage location. The storage location is not specific to one instance, it must be shared
among instances.
After you create a storage location, you can select it when you create a domain or a new collection. A
collection can be stored in a location different from other collections in the domain. After you choose
the location, you must back up and restore to the same location. If the disk is full, set the collection
state to update_and_search and create a collection in a new storage location.
1. In xPlore administrator, choose System Overview in the tree.
2. Click Global Configuration and then choose the Storage Management tab.
3. Click Add Storage. Enter a name and path and save the storage location. The storage location is
created with unlimited size.

Rebuilding the index


Collection must be online and in the read/write state to rebuild the collection. You can perform normal
ingestion and search during index rebuild.
Note: Do not perform the following operations during rebuild: Heavy ingestion load, backup or
restore, attach or detach a domain or collection, create collection, or check consistency.
Rebuild the index for the following use cases:
You added a stop words list.
You added a new facet.
You want to strip out accents from words in the index.
You changed the data model to add indexable attributes for an object type.
You want to use index agent or DFC filters, and the objects have already been indexed.
Rebuild a collection index in the xPlore administrator UI. Custom routing is not applied when you
rebuild an index. To apply custom routing, refeed the documents. See Deleting a collection and
recreating indexes, page 165.
1. Choose a domain or collection in the Data Management tree. If you are rebuilding the entire
domain, open each collection to perform the next step.
2. In the right pane, click Rebuild Index.
The index is rebuilt in the background and then put into place when the rebuild is complete.
3. Reingest large documents. At the original ingestion, some documents are not embedded
in dftxml because the XML content exceeds the value of the file-limit attribute on the
xml-content element in indexserverconfig.xml. The index rebuild process generates a list
of object IDs for these documents. Use this list in the next step. The list is located in
xplore_home/data/domain_name/collection_name/index_name/ids.txt.
For example:
164

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Index Data: Domains, Categories, and Collections

C:/xPlore/data/mydomain/default/dmftdoc_2er90/ids.txt

4. In the index agent UI, provide the path to the list of object IDs. See Indexing documents in normal
mode, page 79.
After the index is rebuilt, run ftintegrity or the State of Index job in Content Server 6.7 or higher. See
Using ftintegrity, page 73 or Running the state of index job, page 76.

Monitoring merges of index data


You can monitor the merge of index data from in-memory to the xDB database. When a collection is
merging, you see a Merging icon next to the collection in the Data Management view of the collection.
Because merging can slow requests, you can stop or start the merge from the UI. If merging is in
progress, click Stop Merging. If you want to initiate merging, click Start Merging.
For merging commands, see Final merge CLIs, page 192.Tuning index merges, page 319 describes the
types of merges and how to manage final merges.

Deleting a collection and recreating indexes


In some cases, such as a conflict in the index definition, you need to recreate indexes.
1. Remove the problematic collection:
a. Navigate to the domain in xPlore administrator left panel.
b. Click the X next to the collection name to delete it.

2.
3.
4.
5.

If there is only one collection to process documents, you cannot delete it. As a workaround, you
can create a temporary collection to be able to delete the problematic one.
Modify the index definition, if necessary.
Create a collection with the same name as the one you deleted.
If you created a temporary collection, remove it before refeeding documents.
Refeed the documents to launch a full reindexing: in the index agent UI, select Start new
reindexing operation.
If custom routing is defined, it is applied. Otherwise, default routing is applied.

Querying a collection
1. Choose a collection in the Data Management tree.
2. In the right pane, click Execute XQuery.
3. Check Get query debug to debug your query. The query optimizer is for technical support use.
To route queries to a specific collection, see Routing a query to a specific collection, page 257.

Checking xDB statistics


A verbose command, statistics-ls, lists all fields and tokens in the collection. You can use this
command to check whether certain Documentum attributes were tokenized.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

165

Index Data: Domains, Categories, and Collections

Follow these steps to check xDB statistics:


1. Open a command window and navigate to xplore_home/dsearch/xhive/admin.
2. Launch XHCommand.bat or XHCommand.sh.
3. Enter the xDB statistics-ls command with relevant options.
Syntax:
statistics-ls [options] path

The d argument (xhivedb) is the same for all xPlore installations. The path argument is the path
within xDB to your collection, which you can check in the XHAdmin tool. The output file is large
for most collections. Redirect output to a file using the -o option. (The file must exist before you run
the command.) For example:
statistics-ls -d xhivedb -o C:/temp/DefaultCollStats.txt
/TechPubsGlobal/dsearch/Data/default --lucene-params --details

The output gives the status of each collection (Merge State). For example:
LibPath=/PERFXCP1/dsearch/Data/default IndexName={dmftdoc}
ActiveSegment=EI-8b38b821-4e29-42b2-9fe0-8e6c82764a6b-211106232537097-luceneblobs-1
EntryName=LI-3bb3483d-38c9-4d14-8a90-5a13a9a19717
MergeState=NONE isFinalIndex=FINAL
LastMergeTime=12/09/2012-07:31:11 MinLSN=0 MinLSN=0
LibPath=/PERFXCP1/dsearch/Data/default IndexName={dmftdoc}
ActiveSegment=EI-8b38b821-4e29-42b2-9fe0-8e6c82764a6b-211106232537097-luceneblobs-1
EntryName=LI-2ea1578b-0d82-496a-9c81-ee15502b3cbe
MergeState=NONE isFinalIndex=NOT-FINAL
LastMergeTime=14/09/2012-10:15:48 MinLSN=786124
MinLSN=485357881901

Other statistics
You can check other statistics such as returnable fields, size of index, and number of documents. The
statistics command has the same arguments as statistics-ls.
For example:
statistics --docs-num -d xhivedb

/TechPubsGlobal/dsearch/Data/default dmftdoc

Statistics options:
--lucene-sz: Size of Lucene fields (.fdt), Index to fields (.fdx), and total size of Lucene index in
bytes (all).
--lucene-rf: Statistics of returnable fields (configured in indexserverconfig.xml). Includes total
count of path mapping and value mapping and compression mapping consistency.
--lucene-list: Shows whether each index is final.
--lucene-params: Lists the xDB parameters set in xDB.properties.
--docs-num: Displays the number of documents in the collection. This value should match the
number displayed for a collection in xPlore administrator.

166

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Index Data: Domains, Categories, and Collections

Troubleshooting data management


Auditing collection operations
You can audit operations on collections. Auditing is enabled by default. To enable or disable auditing,
open Global Configuration in the xPlore administrator left pane. Select the Auditing tab and check
admin to enable or disable data management auditing.
For information on configuring the audit record, see Configuring the audit record, page 39.
When auditing is enabled, the following operations on a collection are recorded:
Create and delete collection
Add, remove, or change binding
Create, rebuild start and end, or delete index
Attach or detach
Adopt collection start and end
Backup start and end
You can view a report based on the admin audit records. Go to Diagnostic and Utilities > Reports
and choose Audit records for admin component. You can filter by date.

Force restart of xDB


If xDB fails to start up, you can force a start.
1. Edit indexserver-bootstrap.properties in the WEB-INF/classes
directory of the application server instance, for example,
xplore_home/boss5.1.0/server/DctmServer_PrimaryDsearch/deploy/dsearch.war/WEB-INF/classes.
2. Set the value of force-restart-xdb in indexserver-bootstrap.properties to true. (This file is located
Restart the xPlore instance.
3. If this property does not exist in indexserver-bootstrap.properties, add the following line:
force-restart-xdb=true

Note: This property will be removed after restart.


Do not remove segments from xDB unless they are orphan segments. (See Orphaned segments CLIs,
page 189.) If you do, your backups cannot be restored.

A collection that has been force-detached cannot be deleted


After a collection has been force-detached, it cannot be removed with the Delete command in xPlore
administrator. This limit protects non-corrupted collections from being removed. (xPlore cannot
determine whether a collection is corrupted after it has been detached.)
To remove a corrupted collection, do the following. This example has a force-detached collection with
the name default in the domain defaultDomain.
1.

Shut down all xPlore instances.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

167

Index Data: Domains, Categories, and Collections

2.

Remove the collection element from indexserverconfig.xml:


<collection state="detached" document-category="dftxml" usage="Data"
name="default"/>

3.

Remove the collection-related segment including the data segment and its corresponding
tracking DB, tokens, and xmlContent segment (if any) from XhiveDatabase.bootstrap in
xplore_home/config. In the following example, the segment ID starts with defaultDomain and
ends with default:
<segment id="defaultDomain#dsearch#Data#default" temp="false"
version="1" state="detach_point" usage="detachable_root" usable="false">
<file path="c:\DSSXhive\defaultDomain\default\xhivedb-defaultDomain#
dsearch#Data#default-0.XhiveDatabase.DB" id="14"/>
<binding-server name="primary"/>
</segment>
<segment id="defaultDomain#dsearch#SystemInfo#TrackingDB#default"
temp="false" version="1" state="detach_point" usage="detachable_root"
usable="false">
<file path="c:\DSSXhive\defaultDomain\default\xhivedb-defaultDomain#
dsearch#SystemInfo#TrackingDB#default-0.XhiveDatabase.DB" id="15"/>
<binding-server name="primary"/>
</segment>

4.

Delete the physical data folder:


xplore_home/Data/defaultDomain/default

5.

Start xPlore instances.

xDB repair commands


You can troubleshoot the following issues by stopping xPlore and starting the xDB server in repair
mode. Perform the following steps:
1. Stop all xPlore instances and any xPlore-related Java processes.
2. Open a command-line window and navigate to xplore_home/dsearch/xhive/admin.
3. Enter the following command: XHCommand.bat (Windows) or XHCommand.sh (Linux). You
see the xdb prompt.
4. Start the xDB server with the run-server-repair command. The xDB port is recorded in the xDB
bootstrap file. Syntax:
run-server-repair --port <xDBport> -f <path_to_xDB bootstrap>

For example:
xdb>run-server-repair --port 9330 -f C:\xPlore\config\XhiveDatabase.bootstrap

When the xDB server starts successfully, you see a message like the following:
xDB 10_2@1448404 server listening at xhive://0.0.0.0:9330

168

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Index Data: Domains, Categories, and Collections

5. Open a new command window with XHCommand to enter a repair command.


6. When finished, enter the command to stop the repair server mode:
stop-server --nodename primary

Command arguments
To list all available commands, enter help at the xdb prompt. To get arguments for any of the
commands, enter <command> help, for example:
xdb>help repair-merge

The path_to_index argument in repair commands is a library path, not a file system path. For example:
repository_name/dsearch/Data/default. The index_name value is dmftdoc. dmftdoc is the multi-path
index for Documentum repositories.

Force a final merge to improve query performance


The final merge is executed in a certain interval in the multi-path index. Query performance improves
after final merge. By default, final merge is every 4 hours. In the xDB server repair mode, you can
trigger a final merge manually using the following Syntax:
xdb>repair-merge -d xhivedb <path_to_index> <index_name>

For example:
repair-merge -d xhivedb LH1/dsearch/Data/default dmftdoc --final

A successful merge is reported like the following:


Repairing options: {REPAIR_MERGE_FINAL_SUB_INDEXES={}}
Index repair report:
index entry : LI-303cf8ed-d0f4-454b-b54f-dd48817cffd8 has
been merged into entry:
LI-abf1d2de-1783-4ce7-a013-6cdf9e871bed
index entry :
LI-a7f66aa5-a729-4774-8b97-0950b0703f3c has
been merged
into entry:
LI-abf1d2de-1783-4ce7-a013-6cdf9e871bed

Index corruption: Repair segments


For an INDEX_CORRUPTION error, run the repair-segments command. This repair takes are long
time for a large index. Back up the index before you run it. The command is not guaranteed to fix
all data corruptions. Syntax:
repair-segments -d xhivedb <path_to_index> dmftdoc

For example:
repair-segments -d xhivedb LH1/dsearch/Data/default dmftdoc

If all segments pass repair check, you see the following:


EMC Documentum xPlore Version 1.3 Administration and Development Guide

169

Index Data: Domains, Categories, and Collections

Repairing options: {REPAIR_LUCENE_INDEXES_SEGMENTS={}}


Index repair report:
Index LI-a7f66aa5-a729-4774-8b97-0950b0703f3c is ok
Index LI-303cf8ed-d0f4-454b-b54f-dd48817cffd8 is ok

Object dead: Fix broken blacklists


For a OBJECT_DEAD error, run the repair-blacklists command. Syntax:
repair-blacklists -d xhivedb <path_to_index> dmftdoc --check-dups

For example:
repair-blacklists -d xhivedb LH1/dsearch/Data/default dmftdoc --check-dups

A successful check is like the following:


Repairing options:
{REPAIR_LUCENE_CHECK_AND_FIX_BLACKLISTS={
REPAIR_INDEX_LIBRARY=LH1/dsearch/Data/default,
REPAIR_PERFORM_FIX_ACTIVITIES=false,
REPAIR_INDEX_CHE
CK_DUPLICATE_NODES_CONSISTENCY=true}}
Index repair report:
The black lists are :
processing index
C:\xPlore\data\LH1\default\lucene-index\dmftdoc\EI-d1071aae-a76c-46ee-825e-6955c
313954b\LI-303cf8ed-d0f4-454b-b54f-dd48817cffd8
processing index
C:\xPlore\data\LH1\default\lucene-index\dmftdoc\EI-d1071aae-a76c-46ee-825e-6955c
313954b\LI-a7f66aa5-a729-4774-8b97-0950b0703f3c
total docs processed = 1
total checked blacklisted objects = 0
total unaccounted for blacklisted objects = 0
total duplicate entries found = 0
total intranode
dups found = 0

170

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Chapter 8
Backup and Restore
This chapter contains the following topics:

About backup

About restore

Handling data corruption

Backup in xPlore administrator

File- or volume-based (snapshot) backup and restore

Offline restore

Troubleshooting backup and restore

About backup
Back up a domain or xPlore federation after you make xPlore environment changes: Adding or
deleting a collection, or changing a collection binding. If you do not back up, then restoring the
domain or xPlore federation puts the system in an inconsistent state. Perform all your anticipated
configuration changes before performing a full federation backup.
You can back up an xPlore federation, domain, or collection using xPlore administrator or use your
preferred volume-based or file-based backup technologies. The EMC Documentum xPlore Installation
Guide. describes high availability and disaster recovery planning.
You can use external automatic backup products like EMC Networker. All backup and restore
commands are available as command-line interfaces (CLI) for scripting. See the chapter Automated
Utilities (CLI).
You cannot back up and restore to a different location. If the disk is full, set the collection state to
update_and_search and create a collection in a new storage location.
Note: If you remove segments from xDB, your backups cannot be restored.

Backup and restore consistency check


Perform a consistency check before backup and after restore.
Select Data Management in xPlore Administrator and then choose Check DB Consistency. This
check determines whether there are any corrupted or missing files such as configuration files or Lucene
indexes. Lucene indexes are checked to see whether they are consistent with the xDB records: Tree
segments, xDB page owners, and xDB DOM nodes. For a tool that checks consistency between the
index and the tracking database, see Running the standalone consistency checker, page 149.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

171

Backup and Restore

Order of backup and restore in Documentum systems


The backup and restore date for xPlore must be at a point in time earlier than the backup of repository
content and metadata. Use ftintegrity after the restore to detect additional files that need indexing.
(These files or ACLs were added or updated after the xPlore backup.) See Using ftintegrity, page 73

Backup state: Hot, warm, and cold


You can back up the entire xPlore federation while you continue to search and index (hot backup).
You can back up a domain (Documentum repository index) or collection with indexing suspended
(warm) or with the system off (cold). When you back up using xPlore administrator, the state is set to
search_only for domain and collection backup and reverted after the backup completes.

Backup technology
xPlore supports the following backup approaches.
Native xDB backups: These backups are performed through xPlore administrator. They are
incremental, cumulative (differential) or full. A cumulative backup has all backups since the last full
backup. You can back up hot (while indexing), warm (search only), or cold (offline). See Backup
in xPlore administrator, page 179.
File-based backups: Back up the xPlore federation directory xplore_home/data, xplore_home/config,
and /dblog files. Backup is warm (search only) or cold (offline). Incremental file-based backups
are not recommended, since most files are touched when they are opened. In addition, Windows
file-based backup software requires exclusive access to a file during backup and thus requires
a cold backup.
Volume-based (snapshot) backups: Backup is warm (search only) or cold (offline). Can be
cumulative or full backup of disk blocks. Volume-based backups require a third-party product such
as EMC Timefinder.
A snapshot , which is a type of point-in-time backup, is a backup of the changes since the last
snapshot. A copy-on-write snapshot is a differential copy of the changes that are made every time
new data is written or existing data is updated. In a split-mirror snapshot, the mirroring process is
stopped and a full copy of the entire volume is created. A copy-on-write snapshot uses less disk
space than a split-mirror snapshot but requires more processing overhead.

Backup combinations
Periodic full backups are recommended in addition to differential backups. You can perform
incremental or cumulative backups only on the xPlore federation and not on a domain or collection.
Table 21

Backup scenarios

Level

Backup state

DR technology

Backup scope

collection

warm

xPlore

full only

172

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Backup and Restore

Level

Backup state

DR technology

Backup scope

domain

warm

xPlore

full only

xPlore federation

warm or hot

xPlore

cold or warm

volume*

full, incremental or
cumulative

cold or warm

file*

full or cumulative
full only

Synchronous and asynchronous replication


Replication can be asynchronous or synchronous. In synchronous replication, the primary system must
receive a confirmation from the remote system that the data was successfully received. So, the data
is always in sync, but performance can be slow because of large data volume or the long distance
between systems. In asynchronous replication, the primary system does not wait for the remote
system to confirm that the data was received. Performance is faster than synchronous replication, but
the data is not always in sync. There is also the potential for data loss, so consistency procedures
must be run upon recovery.
Some asynchronous replication technologies support consistency groups. For these applications,
the data is in sync. You define a group of storage areas for interdependent applications, such as the
Content Server, RDBMS, full-text indexes, to be coordinated. Furthermore, that consistency group
monitors all disks that are assigned to it (as well as the I/O writes to these disks) to ensure write-order
fidelity across these disks. EMC Symmetrix Data Remote Facility/Asynchronous (SDRF/A) supports
consistency groups.

Backup risk and space planning


In any backup strategy, there is a certain amount of inherent risk. For example, backup files can
be corrupted or lost.
To mitigate risk:
Completely test your procedures to make sure that you can recover and make your xPlore installation
operational within the required RTO and RPO. (An operational deployment is able to index and
return query results.)
A best practice is to test periodically your backups (for example, by performing a restore in a test
environment) to confirm that the backup process is working properly.
As the number of backups increase, the space required to store them will also increase. Therefore, you
should consider how long you are required to retain your backups (which is also referred to as a data
retention period). Plan for the appropriate amount of storage space.

About restore
All restore operations are performed offline. If you performed a hot backup using xPlore administrator,
the backup file is restored to the point at which backup began.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

173

Backup and Restore

Each xPlore instance owns the index for one or more domains or collections, and a transaction log. If
there are multiple instances, one instance can own part of the index for a domain or collection. The
system uses the transaction log to restore data on an instance.
Restore a backup to the same location. If the disk is full, set the collection state to update_and_search
and create a collection in a new storage location.
Note: You cannot restore the backup of a previous installation of xPlore after you upgrade to 1.3,
because the xDB version has changed. Back up your installation immediately after upgrading xPlore.

Backup and restore consistency check


Perform a consistency check before backup and after restore.
Select Data Management in xPlore Administrator and then choose Check DB Consistency. This
check determines whether there are any corrupted or missing files such as configuration files or Lucene
indexes. Lucene indexes are checked to see whether they are consistent with the xDB records: Tree
segments, xDB page owners, and xDB DOM nodes. For a tool that checks consistency between the
index and the tracking database, see Running the standalone consistency checker, page 149.

Scripted restore
Use the CLI for scripted restore of a federation, collection, domain. See the chapter Automated
Utilities (CLI). xPlore supports offline restore only. The xPlore server must be shut down to restore
a collection or an xPlore federation.

Order of backup and restore in Documentum systems


The backup and restore date for xPlore must be at a point in time earlier than the backup of repository
content and metadata. Use ftintegrity after the restore to detect additional files that need indexing.
(These files or ACLs were added or updated after the xPlore backup.) See Using ftintegrity, page 73

Handling data corruption


Detecting data corruption, page 175
Handling a corrupt domain, page 175
Repairing a corrupted index, page 175
Snapshot too old, page 176
Cleaning and rebuilding the index, page 176
Dead objects, page 177
Recovering from a system crash, page 178

174

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Backup and Restore

Detecting data corruption


You can detect data corruption in the following ways:
An XhiveDataCorruptionException is reported in the xPlore server log.
Run the consistency checker on the xPlore federation. See Domain and collection menu actions,
page 155.
The primary instance does not start up.

Handling a corrupt domain


1. Use xPlore administrator to set the domain mode to maintenance. You can also use the CLI
setDomainMode. See Domain mode CLIs, page 190.
In maintenance mode, the following restrictions are applied:
The only allowed operations are repair and consistency check.
Only queries from xPlore administrator are evaluated.
Queries from a Documentum client are tried as NOFTDQL in the Content Server. xPlore
does not process them.
2. Detach the corrupted domain.
3. Restore the corrupted domain from backup. See Offline restore, page 180.
When xPlore is restarted, the domain mode is always set to normal (maintenance mode is not
persisted to disk).

Repairing a corrupted index


A collection that is corrupted or unusable cannot be queried; it is silently skipped. The console on
startup reports XhiveException: LUCENE_INDEX_CORRUPTED. The xPlore primary instance
may not start up.
1. If the xPlore primary instance does not start, do the following:
a. Force the server to start up: Set the value of force-restart-xdb in indexserver-bootstrap.properties
to true. The corrupt collection is marked unusable and updates are ignored. Restore the
corrupted domain or collection from a backup.
b. If the xPlore server starts up, choose the collection in the tree, click Configuration, and
then set the state to off_line. The offending collection and its index are marked as unusable
and updates are ignored.
c. Repair or restore from backup the corrupted collection or log. Continue with the next step.
2. Stop all xPlore instances.
3. Edit luceneindex.bootstrap in xplore_home/config. Change index/isUsable to true. (It is set to
false by the system.)
4. Restart xPlore instances.
5. Start the xDB admin tool xhadmin.bat or xhadmin.sh and drill down to the library that was
reported as corrupted.
6. Right-click and choose Library management > Check library consistency.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

175

Backup and Restore

If the consistency check passes, check the number of folders named LI-* in
xplore_home/data../lucene-index directories.
If the consistency check fails, perform the xDB command repair-segments.
1. Open a command window and navigate to xplore_home/dsearch/xhive/admin.
2. Launch XHCommand.bat or XHCommand.sh.
3. Enter the following xDB command.
xdb repair-segments d database -p path target

database is the name of the xDB database, generally "xhivedb".


path is the full path to the library (collection) containing the index. Get the path in the
xPlore administrator collection view.
target (optional) is the Lucene index name reported in the error, such as dmftdoc.
For example:
repair-segments d xhivedb -p emc101! /xplore/dsearch/Data/default
LI-af97431d-6c4d-41ae-882d-873e8e94fdcf

7. Restart all xPlore instances.


8. Run the consistency checker again.
If the consistency check passes, the system is usable.
If the consistency check fails, rebuild the index.

Snapshot too old


For the XhiveException: LUCENE_SNAP_SHOT_TOO_OLD
1. Locate the collection for which the error message was returned.
2. Rebuild the index using xPlore administrator.
3. Some snapshot errors are due to inconsistencies between the Lucene index and xDB. Contact
technical support for assistance if you are not able to resolve the issue.

Cleaning and rebuilding the index


Use this procedure for the following use cases:
Data is corrupted on disk and there is no backup
You change collections after backup and see errors when you try to restore. You see errors if you
added or deleted a collection or changed collection binding after backup.
1. If the xPlore system does not restart, try a force restart. Set the value of
force-restart-xdb in indexserver-bootstrap.properties to true. (This file is located
in the WEB-INF/classes directory of the application server instance, for example,
C:\xPlore\jboss5.1.0\server\DctmServer_PrimaryDsearch\deploy\dsearch.war\WEB-INF\classes.
2. Restart the xPlore instance. If startup still reports errors, go to the next step to clean corrupted data.
3. Shut down all xPlore instances.
4. Delete everything under xplore_home/data.
5. Delete everything under xplore_home/config except the file indexserverconfig.xml.
176

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Backup and Restore

6. Start xPlore instances.


7. Choose one:
You have a backup. Restore it using the procedure described in Offline restore, page 180.
If you created a collection after backup, and then restored the domain or xPlore federation,
the data files for the new collection are not deleted. The dsearch log reports an error like the
following:
com.xhive.error.XhiveException: IO_ERROR, Original message:
File C:\xPlore\data\doc1\444\xhivedb-doc1#dsearch#Data#444-0.
XhiveDatabase.DB already exists

Delete the file at the indicated path.


Recreate any collection that was added after backup.
Refeed documents for the new collection.
You do not have a backup. Refeed all documents. (Use migration instructions in EMC
Documentum xPlore Installation Guide.)

Dead objects
For the XhiveException: OBJECT_DEAD.
1. In xPlore administrator, make sure that all collections have a state of index_and_search.
2. Stop xPlore instances.
3. Start xDB in repair mode.
a. Change the memory in xdb.properties for all nodes. This file is located in
%XPLORE%/jboss5.1.0/server/%NODE_NAME%/deploy/dsearch.war/WEB-INF/classes/xdb.properties.
This change can remain after repair.
XHIVE_MAX_MEMORY=1536M
b. Start each instance in repair mode: Open a shell, go to the directory
xplore_home/dsearch/xhive/admin/, run xhcommand.bat (Windows) or ./XHCommand (Linux),
and input the instance name, port, and path to the bootstrap file on the host.
For example:
run-server-repair --address 0.0.0.0 --port 9330 --nodename primary
-f %XPLORE%/config/XhiveDatabase.bootstrap

4. Input the following xhcommand, specifying the domain, collection name, and parameter. The
repair command can take a long time if the index size is more than a few GB. To scan without
removing dead objects, remove the option repair-index:
repair-blacklists -d xhivedb /%DOMAIN%/dsearch/Data/%COLLECTION% dmftdoc
--check-dups --repair-index

5. Stop repair mode using xhcommand. For example:


stop-server --nodename primary

6. Start all xPlore instances.


Check the standard output from step 4. You see a summary of "Total dead objects" and "Total potential
impacted normal objects" and a detailed report. For example:
EMC Documentum xPlore Version 1.3 Administration and Development Guide

177

Backup and Restore

("total
("total
("total
("total
("total

docs processed =
checked blacklisted objects = "
unaccounted for blacklisted objects = "
duplicate entries found = ");
intranode dups found = "

If "Total potential impacted normal objects" is not 0, a file is generated with the following name
convention: %DOMAIN%#dsearch#Data#%COLLECTION%_objects_2012-03-12-21-02-06.
Resubmit this file using the index agent UI.
1. Log in to the index agent UI.
2. Choose Object File.
3. Browse to the file and choose Submit.

Recovering from a system crash


Figure 14

178

System crash decision tree

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Backup and Restore

1. See Using ftintegrity, page 73.


2. See Scripted federation restore, page 186p.
3. See Deleting a collection and recreating indexes, page 165.

Backup in xPlore administrator


When you back up using xPlore administrator, it is a hot backup. In a hot backup, indexing and
search continues during the backup process. For a warm backup, suspend indexing backup on each
instance that the collection is bound to. Choose the instance and click Indexing Service > Operations
> Disable.
After you change the federation or domain structure, back up the xPlore federation or domain. Events
such as adding or deleting a collection or changing a collection binding require backup. If you do
not back up, then restoring the domain or xPlore federation puts the system in an inconsistent state.
Perform all your anticipated configuration changes before performing a full federation backup.
Note: Backup is not available for a subcollection. Back up the parent and all its subcollections.
Order of backup and restore in Documentum systems
The backup and restore date for xPlore must be at a point in time earlier than the backup of repository
content and metadata. Use ftintegrity after the restore to detect additional files that need indexing.
(These files or ACLs were added or updated after the xPlore backup.) See Using ftintegrity, page 73.
1. In a domain or collection, click Backup.
2. Select Default location or enter the path to your preferred location and click OK..
3. Incremental backup: By default, log files are deleted at each backup. For incremental backups,
change this setting before a full backup using xPlore administrator.
a. Choose Home > Global Configuration and then choose Engine.
b. Check true for keep-xdb-transactional-log. When you change this setting, the log file from
the full backup is not deleted at the next incremental backup.
c. If you are restoring a full backup and an incremental backup, perform both restore procedures
before restarting the xPlore instances.
4. Back up your jars or DLLs for custom content processing plugins or annotators.

File- or volume-based (snapshot) backup and


restore
Data files in the backup must be on a single volume.
This procedure assumes that no system changes (new or deleted collections, changed bindings) have
occurred since backup. Perform all your anticipated environment changes before backup. Make sure
that you have sufficient disk space for the backup and for temporary space (twice the present index
size).
Note: Do not mix CLI commands (suspend or resume disk writes) with native xPlore backup in
xPlore administrator.
1. Suspend ingestion for backup or restore. Search is still enabled.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

179

Backup and Restore

a. Navigate to xplore_home/dsearch/xhive/admin.
b. Launch the command-line tool with the following command. You supply the administrator
password (same as xPlore administrator).
XHCommand suspend-diskwrites

2. Set all domains in the xPlore federation to the read_only state using the CLI. (Do not use native
xPlore to set state.) See Collection and domain state CLIs, page 191.
3. Use your third-party backup software to back up or restore. The white paper Backup and Recovery
of EMC Documentum Content Server using the NetWorker Module for Documentum available on
EMC Online Support (https://support.emc.com) provides information on backup using EMC
Networker Module for Documentum.
4. Resume xDB with the following command:
XHCommand suspend-diskwrites --resume

5. Set all backed up domains to the reset state and then turn on indexing. (This state is not displayed
in xPlore administrator and is used only for the backup and restore utilities.) Use the script in
Collection and domain state CLIs, page 191.

Offline restore
xPlore supports offline restore only. The xPlore server must be shut down to restore a collection,
domain, or xPlore federation. If you are restoring a full backup and an incremental backup, perform
both restore procedures before restarting the xPlore instances.
This procedure assumes that no system changes (new or deleted collections, changed bindings) have
occurred since backup. (Perform a full federation backup every time you make configuration changes
to the xPlore environment.)
If you are restoring a full backup and an incremental backup, restore both before restarting xPlore
instances.
If you are restoring a federation and a collection that was added after the federation backup, do the
following:
1.

Restore the federation.

2.

Start up and shut down xPlore.

3.

Restore the collection.

For automated (scripted) restore, see Scripted federation restore, page 186, Scripted domain restore,
page 187, or Scripted collection restore, page 188. The following instructions include some
non-scripted steps in xPlore administrator.
1. Shut down all xPlore instances.
2. Federation only: Clean up all existing data files.
Delete everything under xplore_home/data.
Delete everything under xplore_home/config.
3. Detach the domain or collection:
180

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Backup and Restore

1. Collection only : Set the collection state to off_line using xPlore administrator. Choose the
collection and click Configuration.
2. Domain or collection: Detach the domain or collection using xPlore administrator.Note:
Force-detach corrupts the domain or collection.
4. Domain only: Generate the orphaned segment list. Use the CLI purgeOrphanedSegments. See
Orphaned segments CLIs, page 189.
5. Stop all xPlore instances.
6. Run the restore CLI. See Scripted federation restore, page 186, Scripted domain restore, page
187, or Scripted collection restore, page 188.
7. Start all xPlore instances. No further steps are needed for federation restore. Do the following
steps for domain or collection restore.
8. Domain only: If orphaned segments are reported before restore, run the CLI
purgeOrphanedSegments. If an orphaned segment file is not specified, the orphaned segment IDs
are read from stdin.
9. Force-attach the domain or collection using xPlore administrator.
10. Perform a consistency check and test search. Select Data Management in xPlore Administrator
and then choose Check DB Consistency.
11. Run the ACL and group replication script to update any security changes since the backup. See
.Manually updating security, page 52.
12. Run ftintegrity. For the start date argument, use the date of the last backup, and for the end date use
the current date. See Using ftintegrity, page 73.

Troubleshooting backup and restore


Volume-based backup of domain or collection is not
supported
Volume-based backup requires change of a domain or collection state to read-only. As a result, you
can back up an xPlore federation with volume-based backup. You cannot use volume-based backups
for domains or collections.

Federation and collection restore procedure not followed


Restoring a federation and then a collection that was backed up after the federation can lead to data
corruption. Start and then stop xPlore after you restore the federation. Then it is safe to restore the
collection.
Perform the following steps:
1.

Restore the federation.

2.

Start all xPlore instances.

3.

Stop all xPlore instances.

4.

Restore the collection.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

181

Backup and Restore

Incremental restore must come before xPlore restart


If you do a full backup and later do an incremental backup, first restore the full backup. Then restore
the incremental backup before you restart the xPlore instances. If you restart the instances before
restoring the incremental backup, the restore procedure fails.

CLI troubleshooting
If a CLI does not execute correctly, check the following:
The output message may describe the source of the error.
Check whether the host and port are set correctly in xplore_home/dsearch/admin/xplore.properties.
Check the CLI syntax.
Linux requires double quotes before the command name and after the entire command and
arguments.
Separate each parameter by a comma.
Do not put Boolean or null arguments in quotation marks. For example:xplore
"backupFederation null,false,null"

Check whether the primary instance is running:http://instancename:port/dsearch


Check whether the JMX MBean server is started. Open jconsole in xplore_home/jdk/bin and specify
Remote Process with the value service:jmx:rmi://jndi/rmi://myhost:9331/dsearch. (9331 is the
default JMX port. If you have specified a base port for the primary instance that is not 9300, add
31 to your base port.) If jconsole opens, the JMX layer is OK.
Check the Admin web service. Open a browser with the following
link. If the XML schema is shown the web service layer is
operative.http://instancename:port/dsearch/ESSAdminWebService?wsdl
Check dsearch.log in xplore_home/jboss5.1.0/server/instance_name/logs for a CLI-related message.

182

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Chapter 9
Automated Utilities (CLI)
This chapter contains the following topics:

CLI properties and environment

Using the CLI

CLI batch file

Scripted federation restore

Scripted domain restore

Scripted collection restore

Force detach and attach CLIs

Orphaned segments CLIs

Removing orphaned indexes

Domain mode CLIs

Collection and domain state CLIs

Activate spare instance CLI

Detecting the version of an instance

Cleaning up after failed index rebuild

Final merge CLIs

CLI properties and environment


The CLI tool is located in xplore_home/dsearch/admin. The tool wrapper is xplore.bat (Windows)
or xplore.sh (Linux). Edit the properties file xplore.properties to set the environment for the CLI
execution.
Table 22

CLI properties

Property

Description

host

Primary xPlore instance host: fully qualified hostname


or IP address

port

Primary xPlore instance port. If you change to the


HTTPS protocol, change this port.

password

xPlore administrator password, set up when you


installed xPlore (same as xPlore administrator login
password)

EMC Documentum xPlore Version 1.3 Administration and Development Guide

183

Automated Utilities (CLI)

Property

Description

bootstrap

Full path to xDB bootstrap file, for example,


xplore_home/config/XhiveDatabase.bootstrap

verbose

Prints all admin API calls to console. Default: true.


For batch scripts, set to false.

protocol

Valid values: http or https

The xPlore installer updates the file dsearch-set-env.bat or dsearch-set-env.sh. This file contains the
path to dsearch.war/WEB-INF.
The CLI uses the Groovy script engine.

Using the CLI


1. Open a command-line window.
2. Change the working directory to the location of xplore_home/dsearch/admin.
3. Run a CLI command with no parameters, use the following syntax appropriate for your
environment (Windows or Linux). The command is case-insensitive.
xplore.bat <command>
./xplore.sh <command>

Examples:
xplore.bat resumeDiskWrites
./xplore.sh resumeDiskWrites

4. Run a CLI command with parameters using the following syntax appropriate for your environment
(Windows or Linux). The command is case-insensitive. Use double quotes around the command,
single quotes around parameters, and a forward slash for all paths:
xplore.bat "<command> [parameters]"
./xplore.sh "<command> [parameters]"

Examples:
xplore backupFederation c:/xPlore/dsearch/backup, true, null
./xplore.sh "dropIndex dftxml, folder-list-index "

The command executes, prints a success or error message, and then exits.
5. (Optional) Run CLI commands from a file using the following syntax:
xplore.bat -f

<filename>

Examples:
xplore.bat -f
./xplore.sh -f

184

file.txt
file.txt

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Automated Utilities (CLI)

Call the wrapper without a parameter to view a help message that lists all CLIs and their arguments.
For example:
xplore help backupFederation
./xplore.sh help backupFederation

CLI batch file


Use the following syntax to reference a batch file containing xPlore commands. Substitute the full
path to your batch file:
xplore.bat f <batch_file_name>
or
xplore.sh f <batch_file_name>

For example, the following batch file sample.gvy suspends index writes and performs an incremental
backup us the xPlore federation. Use a forward slash for paths.
suspendDiskWrites
folder=c:/folder
isIncremental=true
backupFederation folder, isIncremental, null
println Done

Call the batch file with a command like the following:


xplore.bat -f sample.gvy

EMC Documentum xPlore Version 1.3 Administration and Development Guide

185

Automated Utilities (CLI)

Scripted backup
The default backup location is specified in indexserverconfig.xml as the value of the path attribute
on the element admin-config/backup-location. Specify any path as the value of [backup_path].

Scripted federation backup


xplore backupFederation [backup_path], <is_incremental>, null

is_incremental

Boolean. Set to false for full backup, true for incremental. For
incremental backups, set keep-xdb-transactional-log to true in
xPlore administrator. Choose Home > Global Configuration
> Engine.

Examples:
xplore "backupFederation null, true, null"
xplore "backupFederation c:/xplore/backup, false, null"

Scripted domain backup


xplore backupDomain <domain_name>, [backup_path]

Examples:
xplore "backupDomain myDomain, null"
xplore "backupDomain myDomain, c:/xplore/backup "

Scripted collection backup


xplore backupCollection collection(<domain_name>, <collection_name>), [backup_path]

Examples:
"backupCollection collection(myDomain, default), null"
"backupCollection collection(myDomain, default), c:/xplore/backup "

Scripted file or volume backup


Stop the index agent and suspend disk writes before backup. Resume disk writes and the index agent
after backup.
"suspendDiskWrites"
"resumeDiskWrites"

Scripted federation restore


Back up and restore your jars or DLLs for custom content processing plugins or annotators.
[backup_path]: Path to your backup file. If not specified, the default backup location
in indexserverconfig.xml is used: The value of the path attribute on the element
admin-config/backup-location. Specify any path as the value of .
1. Stop all xPlore instances.
2. Delete existing data files:
All files under xplore_home/data
186

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Automated Utilities (CLI)

All files under xplore_home/config.


3. Run the restore CLI. If no path is specified, the default location in indexserverconfig.xml is used.
"restoreFederation [backup_path] "

For example:
xplore "restoreFederation C:/xPlore/dsearch/backup/federation/
2011-03-23-16-02-02 "

4. Restart all xPlore instances.


Note: If you are restoring a federation and a collection that was added after the federation backup,
do the following:
1.

Restore the federation.

2.

Start up and shut down all xPlore instances.

3.

Restore the collection.

Scripted domain restore


[backup_path]: Path to your backup file. If not specified, the default backup location
in indexserverconfig.xml is used: The value of the path attribute on the element
admin-config/backup-location. Specify any path as the value of .
[bootstrap_path]: Path to the bootstrap file in the WEB-INF classes directory of the xPlore primary
instance.
1. Force-detach the domain using the CLI detachDomain. The second argument is whether to force
detach. Force detach can corrupt the data. Use only for restore operations.
"detachDomain [domain_name], true"

For example:
xplore "detachDomain defaultDomain, true"

2. Generate the orphaned segment list using the CLI listOrphanedSegments.If an orphaned segment
file is not specified, the IDs or orphaned segments are sent to stdio. See Orphaned segments
CLIs, page 189.
3. Stop all xPlore instances.
4. Run the restore CLI. If no bootstrap path is specified, the default location in the WEB-INF classes
directory of the xPlore primary instance is used.
"restoreDomain [backup_path], [bootstrap_path] "

5. Restart xPlore instances.


6. If orphaned segments are reported before restore, run the CLI purgeOrphanedSegments. If an
orphaned segment file is not specified, the segment IDs are read in from stdin. See Orphaned
segments CLIs, page 189.
7. Force-attach the domain using the CLI attachDomain:
xplore "attachDomain [domain_name], true"

For example:
EMC Documentum xPlore Version 1.3 Administration and Development Guide

187

Automated Utilities (CLI)

xplore "attachDomain defaultDomain, true

8. Perform a consistency check and test search.


9. Run the ACL and group replication script aclreplication_for_repository_name in
xplore_home/setup/indexagent/tools to update any security changes since the backup. See
Manually updating security, page 52.
10. Run ftintegrity. See Using ftintegrity, page 73.

Scripted collection restore


[backup_path]: Path to your backup file. If not specified, the default backup location
in indexserverconfig.xml is used: The value of the path attribute on the element
admin-config/backup-location. Specify any path.
[bootstrap_path]: Path to the bootstrap file in the WEB-INF classes directory of the xPlore primary
instance.
1. Set the collection state to off-line.
2. Force-detach the collection using the CLI detachCollection. The last argument specifies a forced
detachment. Use only for restore, because force-detach can corrupt the collection.
xplore"detachCollection collection([domain_name>], [collection_name]), true"

3. Generate the orphaned segment list using the CLI listOrphanedSegments. If an orphaned segment
file is not specified, the IDs or orphaned segments are sent to stdio. See Orphaned segments
CLIs, page 189.
4. Stop all xPlore instances.
5. Run the restore CLI. If no bootstrap path is specified, the default location in the WEB-INF classes
directory of the xPlore primary instance is used.
"restoreCollection, [backup_path], [bootstrap_path]"

6. Restart xPlore instances.


7. If orphaned segments are reported before restore, run the CLI purgeOrphanedSegments. If an
orphaned segment file is not specified, the segment IDs are read in from stdin.
8. Force-attach the collection using the CLI attachCollection. The last argument is for forced
attachment.
xplore "attachCollection [domain_name], [collection_name], true"

9. Perform a consistency check and test search.


10. Run the ACL and group replication script aclreplication_for_repository_name in
xplore_home/setup/indexagent/tools to update any security changes since the backup. See
Manually updating security, page 52.
11. Run ftintegrity. See Using ftintegrity, page 73.

188

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Automated Utilities (CLI)

Force detach and attach CLIs


You detach a domain or collection before you restore it. You attach it after you restore it.
1. Detach syntax:
"detachDomain [domain_name], true "
or
"detachCollection collection([domain_name], [collection_name]), true"

Examples:
"detachDomain "myDomain, true"
or
"detachCollection collection(myDomain,default), true"

2. Attach syntax:
"attachDomain "]domain_name], true"
or
"attachCollection collection([domain_name], [collection_name]), true"

Examples:
"attachDomain myDomain, true"
or
"attachCollection collection(myDomain,default), true"

Orphaned segments CLIs


Segments can be orphaned when content is added to or removed from the index after backup. The
restore operation does not reflect these changes, and any new segments that were used after backup
are orphaned. The xDB database in xPlore does not start up with orphaned segments unless you
force a restart. See Troubleshooting data management, page 167. Federation restore does not create
orphaned segments.
1. List orphaned segments before restore. If [orphan_file_path] is not specific, the IDs of orphaned
segments are sent to stdio. This file is used to purge orphaned segments after restore. Syntax:
"listOrphanedSegments collection|domain [backup_path]
[orphan_file_path] [bootstrap_path]"

For example:
"listOrphanedSegments domain, backup/myDomain/2009-10,
c:/temp/orphans.lst
C:/xplore/jboss5.1.0/server/DctmServer_PrimaryDsearch/deploy/dsearch.war/
WEB-INF/classes/indexserver-bootstrap.properties "
or
"listOrphanedSegments collection, backup/myDomain/default/2009-10, null,
C:/xplore/jboss5.1.0/server/DctmServer_PrimaryDsearch/deploy/dsearch.war/
WEB-INF/classes/indexserver-bootstrap.properties "

2. If orphaned segments are reported before restore, run the CLI purgeOrphanedSegments. If
[orphan_file_path] is not specified, the segment IDs are read in from stdin. For file path, use
forward slashes. Syntax:
EMC Documentum xPlore Version 1.3 Administration and Development Guide

189

Automated Utilities (CLI)

xplore purgeOrphanedSegments [orphan_file_path]

For example:
purgeOrphanedSegments c:/temp/orphans.lst
or
purgeOrphanedSegments null

Arguments:
[backup_path]: Path to your backup file. If not specified, the default backup location
in indexserverconfig.xml is used: The value of the path attribute on the element
admin-config/backup-location. Specify any path as the value of .
[bootstrap_path]: Path to the bootstrap file in the WEB-INF classes directory of the xPlore primary
instance.

Removing orphaned indexes


The following script stops any index rebuild in process and removes orphaned indexes.
Use this tool when searches fail after an unsuccessful rebuild index process that causes
searches to fail. You see in-construction (orphaned) indexes in the following location:
xplore_home/data/domain_name/collection_name/lucene-index/dfmtdoc: Multiple EI DB files (more
than 3).
1. Open the script removeInConstructionIndexes.groovy in
xplore_home/dsearch/admin/sample_scripts.
2. Replace the following variable values with actual values: xdbConnectionString, xdbPassword,
domain, collection. For example:
xdbConnectionString = "xhive://myhost:9330"
xdbPassword = "password"
domain = "defaultDomain"
collection = "default"

3. Open a command line, change to the directory xplore_home/dsearch/admin, and run the following
command:
xplore.bat -f sample_scripts\removeInConstructionIndexes.groovy

Domain mode CLIs


If a domain index is corrupt, use the CLI setDomainMode to set the mode to maintenance. In
maintenance mode, the only allowed operations are repair and consistency check. Queries are allowed
only from xPlore administrator. Queries from a Documentum client are tried as NOFTDQL in the
Content Server. xPlore does not process them.
The mode does not persist across xPlore sessions; mode reverts to normal on xPlore restart.
Syntax:
"setDomainMode [domain_name], [maintenance|normal["

For example:
190

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Automated Utilities (CLI)

"setDomainMode myDomain, maintenance "

Collection and domain state CLIs


Syntax:
"setState domain|collection, [domain_name],[collection_name],[state]"

Domain state
Set collection_name to null. Valid states: read_only and reset.
Example:
"setState domain, myDomain, null, reset "

Collection state
Valid states: index_and_search (read/write), update_and_search (read and update existing documents
only), search_only, index_only (write only), and off_line. The update_and_search state changes a flag
so that new documents cannot be added to the collection. Existing documents can be updated. The
state change is not propagated to subcollections.
Example:
"setState collection, myDomain, default, off_line "

Activate spare instance CLI


You can install a spare instance that you activate when another instance fails. For instructions on
activating the spare using xPlore administrator, see Replacing a failed instance with a spare, page 34
Note: This CLI can be used only to replace a secondary instance. To replace a primary instance, see
Replacing a failed primary instance, page 35.
Syntax:
"activateSpareNode [failed_instance], [spare_instance] "

Example:
"activateSpareNode node2, spare1 "

Detecting the version of an instance


Use the CLI showVersion to get the version of the instance on the current host.
xplore.bat showVersion
./xplore.sh showVersion

EMC Documentum xPlore Version 1.3 Administration and Development Guide

191

Automated Utilities (CLI)

Cleaning up after failed index rebuild


When an index rebuild fails, the collection configuration in indexserverconfig.xml is annotated with a
property named Build_dmftdoc. A temporary index folder is left behind. You may see the following
error message in dsearch.log:
Xhive exception message: INTERRUPTED

To remove the rebuild index property and clean up the temporary index folder, use the following CLI:
Syntax:
removeInConstructionIndexes [domain], [collection]

For example:
xplore "removeInConstructionIndexes myDocbase,
myCollection
./xplore.sh "removeInConstructionIndexes myDocbase,
myCollection "

Final merge CLIs


You can use CLI commands to manually start and stop merges and view merge status.
Final merge is recorded in the audit record as a FINAL_MERGE event.

Getting merge status


Use the CLIs isFinalMergeOngoing and getFinalMergeStatus.
Syntax:
isFinalMergeOngoing [domain], [collection]
getFinalMergeStatus [domain], [collection]

For example:
xplore "isFinalMergeOngoing myDocbase, myCollection
./xplore.sh "getFinalMergeStatus myDocbase, myCollection "

You can also see merge status using xPlore administrator. A Merging icon is displayed during the
merge progress.

Starting and stopping final merge


Use the CLIs startFinalMerge and stopFinalMerge.
Syntax:
startFinalMerge [domain], [collection]
stopFinalMerge [domain], [collection]

For example:
xplore "startFinalMerge myDocbase, myCollection

192

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Automated Utilities (CLI)

./xplore.sh "stopFinalMerge myDocbase, myCollection "

Cleaning up after failed index rebuild


When an index rebuild fails, the collection configuration in indexserverconfig.xml is annotated with a
property named Build_dmftdoc. A temporary index folder is left behind. You may see the following
error message in dsearch.log:
Xhive exception message: INTERRUPTED

To remove the rebuild index property and clean up the temporary index folder, use the following CLI:
Syntax:
removeInConstructionIndexes [domain], [collection]

For example:
xplore "removeInConstructionIndexes myDocbase,
myCollection
./xplore.sh "removeInConstructionIndexes myDocbase,
myCollection "

EMC Documentum xPlore Version 1.3 Administration and Development Guide

193

Chapter 10
Search
This chapter contains the following topics:

About searching

Administering search

Query summary highlighting

Configuring summary security

Configuring wildcards and fragment search

Configuring Documentum search

Supporting subscriptions to queries

Troubleshooting search

Search APIs and customization

Routing a query to a specific collection

Building a query with the DFC search service

Building a query with the DFS search service

Building a DFC XQuery

Building a query using xPlore APIs

Adding context to a query

Using parallel queries

Custom access to a thesaurus

About searching
Specific attributes of the dm_sysobject support full-text indexing. Use Documentum Administrator
to make object types and attributes searchable or not searchable and to set allowed search operators
and default search operator.
Set the is_searchable attribute on an object type to allow or prevent searches for objects of that type
and its subtypes. Valid values: 0 (false) and 1 (true). The client application must read this attribute.
(The indexing process does not use it.) If is_searchable is false for a type or attributes, Webtop does
not display them in the search UI. Default: true.
Set allowed_search_ops to set the allowed search operators and default_search_op to set the default
operator. Valid values for allowed_search_ops and default_search_op:
EMC Documentum xPlore Version 1.3 Administration and Development Guide

195

Search

Value

Operator

<>

>

<

>=

<=

begins with

contains

does not contain

10

ends with

11

in

12

not in

13

between

14

is null

15

is not null

16

not

The default_search_arg attribute sets a default argument for the default operator. The client
must read these attributes; the indexing process does not use them. Webtop displays the allowed
operators and the default operator.
Content Server client applications issue queries through the DFC search service or through DQL. DFC
6.6 and higher translates queries directly to XQuery for xPlore. DQL queries are handled by the
Content Server query plugin, which translates DQL into XQuery unless XQuery generation is turned
off. Not all DQL operators are available through the DFC search service. In some cases, a DQL search
of the Server database returns different results than a DFC/xPlore search. For more information on
DQL and DFC search differences, see DQL, DFC, and DFS queries, page 224.
DFC generates XQuery expressions by default. If XQuery is turned off in DFC, FTDQL queries are
generated. The FTDQL queries are evaluated in the xPlore server. If all or part of the query does not
conform to FTDQL, that portion of the query is converted to DQL and evaluated in the Content Server
database. Results from the XQuery are combined with database results. For more information on
FTDQL and SDC criteria, see the EMC Documentum Content Server DQL Reference.
xPlore search is case-insensitive and ignores white space or other special characters. Special characters
are configurable.
Related topics:
Handling special characters, page 108
Search reports, page 290
Troubleshooting slow queries, page 246
Changing search results security, page 51

196

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

Query operators
Operators in XQuery expressions, DFC, and DQL are interpreted in the following ways:
DQL operators: All string attributes are searched with the ftcontains operator in XQuery. All other
attribute types use value operators (= != < >).In DQL, dates are automatically normalized to
UTC representation when translated to XQuery.
DFC: When you use the DFC interface IDfXQuery, your application must specify dates in UTC to
match the format in dftxml.
XQuery operators
The value operators = != < > specify a value comparison search. Search terms are not tokenized.
Can be used for exact match or range searching on dates and IDs.
Any subpath that can be searched with a value operator must have the value-comparison attribute
set to true for the corresponding subpath configuration in indexserverconfig.xml. For example,
an improper configuration of the r_modify_date attribute sets full-text-search to true. A date of
2010-04-01T06:55:29 is tokenized into 5 tokens: 2010 04 01T06 55 29. A search for
04 returns any document modified in April. The user gets many non-relevant results. Therefore,
r_modify_date must have value-comparison set to true. Then the date attribute is indexed as one
token. A search for 04 would not hit all documents modified in April.
The ftcontains operator (XQFT syntax) specifies that the search term is tokenized before
searching against index.
If a subpath can be searched by ftcontains, set the full-text-search attribute to true in the
corresponding subpath configuration in indexserverconfig.xml.

Administering search
Common search service tasks
You can configure all search service parameters by choosing Global Configuration from the System
Overview panel in xPlore administrator. You can configure the same search service parameters on
a per-instance basis by choosing Search Service on an instance and then choosing Configuration.
The default values have been optimized for most environments.
Enabling search
Enable or disable search by choosing an instance of the search service in the left pane of the
administrator. Click Disable (or Enable).
Canceling running queries
Open an instance and choose Search Service. Click Operations. All running queries are displayed.
Click a query and delete it.
Viewing search statistics
Choose Search Service and click an instance:
Accumulated number of executed queries
Number of failed queries
EMC Documentum xPlore Version 1.3 Administration and Development Guide

197

Search

Number of pending queries


Number of spooled queries
Number of execution plans
Number of streamed results
Maximum query result batch size
Total result bytes
Maximum hits returned by a query
Average number of results per query request
Maximum query execution time
Average duration of query execution
Queries per second
Use reports for additional information about queries. Search auditing must be turned on to accumulate
the report data. (It is on by default.) See Search reports, page 290.
Configuring query warmup
Configuring scoring and freshness
Configuring query summaries

Configuring query warmup


You can warm up the index and queries at xPlore startup. Queries are logged in the audit record for each
xPlore instance. The warmup utility on each instances warms up queries that were run on that instance.
Index warmup of Lucene on-disk files (with suffix .cfs) loads the stored fields and term dictionaries
for all domains. To disable index warmup, set load_index_before_query to false in query.properties.
Queries are warmed up from the audit log or from a file (file first, then user queries.) Warmup is
typically done through the audit log. By default, security evaluation is done on these queries. The
most recent queries from the last N users (last_N_unique_users in query.properties) are played.
File-based queries xPlore can speed up Documentum folder-specific queries. By default, security
is not evaluated for these queries. If you need security evaluation, set the following properties in
query.properties: security_eval, user_name, super_user.
Search auditing logs for warmup queries. Warmup activity is logged separately in the audit record and
reported in the admin report Audit Records for Warmup Component. Warmup is logged in the file
queryWarmer.log, located in xplore_home/dsearch/xhive/admin/logs. Use this log to verify when a
collection was last warmed up.

Enabling or disabling warmup


Edit indexserverconfig.xml. For information on editing this file, see Modifying indexserverconfig.xml,
page 43. Locate the performance element. If it does not exist, add it with the following content:
<performance>
<warmup status="on">
<properties>
<property value="../../dsearch/xhive/admin/QueryRunner.bat"

198

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

name="script"/>
<property value="600" name="timeout-in-secs"/>
</properties>
</warmup>
</performance>

Set the child element warmup status to on or off. You can set the warmup timeout in seconds. If the
warmup hangs, it is canceled after this timeout period.

Configuring warmup
Configure warmup in query.properties. This file is in xplore_home/dsearch/xhive/admin. Restart all
xPlore instances to enable your changes.
Table 24

Auto-warmup configuration

Key

Description

xplore_qrserver_host

Primary xPlore instance host. Not needed for


file-based warmup.

xplore_qrserver_port

Port for primary xPlore instance. Not needed for


file-based warmup.

xplore_domain

Name of domain in xPlore (usually the same as


the Content Server name). You must change this
to a valid domain. This is used only for file-based
warmup. An incorrect domain name is recorded in
Querywarmup.log as FtSearchException: Invalid
domain.

security_eval

Evaluate security for queries in warmup. Default:


false. Used only for file-based warmup.

user_name

Name of user or superuser who has permission to


execute warmup queries. Required if security_eval is
set to true. Used only for file-based warmup.

super_user

Set to true if the user who executes warmup queries is


a Content Server superuser. Required if security_eval
is set to true.

query_file

Specify a file name that contains warmup queries.


Query can be multi-line, but the file cannot contain
empty lines. If no name is specified, queries are read
from the audit record, (Query auditing is enabled by
default.)

query_plan

Set to true to include query plans in query warmup.

batch_size

Set batch size for warmup queries.

timeout

Set maximum seconds to try a warmup query.

max_retries

Set maximum number of times to retry a failed query

print_result

Set to true to print results to queryWarmer.log in


xplore_home/dsearch/xhive/admin/logs.

fetch_result_byte

Maximum number of bytes to fetch in a warmup


query.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

199

Search

Key

Description

load_index_before_query

Set to true to warm up the index before any queries


are processed.

read_from_audit

Set to true (default) to read queries from audit record.


Requires query auditing enabled (default). (See
Auditing queries, page 244.) Warmup is read first
from a file if query_file is specified..

number_of_unique_users

Number of users in audit log for whom to replay


queries. Default: 10

number_of_queries_per_user

Number of queries for each user to replay. Default: 1

data_path

Specifies the path to the Lucene index. Default:


xplore_home/data.

index_cache_mb

Set the maximum size of index cache in MB. Default:


100.

cache_index_components

Warm up index components. Valid values:


Security warmup of stored fields: fdt, fdx, fnm
Term dictionary for wildcard and full-text
queries: tis, tii
Term frequencies, used with term dictionary:
frq
Term position, used with term dictionary: prx

schedule_warmup

Enables the warmup schedule.

schedule_warmup_period

How often the warmup occurs. For example, a value


of 1 in DAYS units results in daily warmup.

schedule_warmup_units

Valid values: DAYS (default) | HOURS | MINUTES |


SECONDS | MILLISECONDS | MICROSECONDS

initial_delay

Start warmup after the specified initial delay,


and subsequently after schedule_warmup_period.
Default: 0. If any execution of the task encounters an
exception, subsequent executions are suppressed.

query_response_time

Select the queries from audit records for replay that


has query response time (fetch time + execution time)
in milliseconds lower or equal this value. Default:
60000.

exclude_users

Select the queries from audit records that are not run
by these users. Set a comma-separated list of users.
Default: unknown.

Sample query file (queries.xq)


Queries in the query file can be multi-line, but the file cannot contain empty lines. The empty sample
queries.xq is in xplore_home /dsearch/xhive/admin.
declare option xhive:fts-analyzer-class
com.emc.documentum.core.fulltext.indexserver.core.index.xhive.

200

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

IndexServerAnalyzer;
declare option xhive:ignore-empty-fulltext-clauses true;
declare option xhive:index-paths-values
dmftmetadata//owner_name,dmftsecurity/acl_name,
dmftsecurity/acl_domain,/dmftinternal/r_object_id;
for $i score $s in collection(/dm_notes/dsearch/Data) /dmftdoc[( ( (
dmftinternal/i_all_types = 030000018000010d) ) and
( (
dmftversions/iscurrent = true) ) ) and ( (. ftcontains ( (((
augmenting) with stemming)) using stop words ("") ) )) ]
order by $s descending
return <dmrow>{if ($i/dmftinternal/r_object_id)
then $i/dmftinternal/r_object_id
else <r_object_id/>}{if ($i/dmftsecurity/ispublic)
then $i/dmftsecurity/ispublic
else <ispublic/>}{if ($i/dmftinternal/r_object_type)
then $i/dmftinternal/r_object_type
else <r_object_type/>}{if ($i/dmftmetadata/*/owner_name)
then $i/dmftmetadata/*/owner_name else <owner_name/>}{
if ($i/dmftvstamp/i_vstamp)
then $i/dmftvstamp/i_vstamp else <i_vstamp/>}{xhive:highlight(
$i/dmftcontents/dmftcontent/dmftcontentref)}</dmrow>

Warmup logging
All the queries that are replayed for warmup from a file or the audit record are tagged as a
QUERY_WARMUP event in the audit records. The log includes the query to get warmup queries. You
can see this type in the admin report Top N Slowest Queries. To view all warmup queries in the audit
record, run the report Audit records for warmup component in xPlore administrator.

Configuring scoring and freshness


xPlore uses the following ranking principles to score query results:
If a search term appears more than once in a document, the hit is ranked higher than a document in
which the term occurs only once. This comparison is valid for documents of the same content length.
If a search term appears in the metadata, the hit is ranked higher than when the term occurs in
the content.
When the search criteria are linked with OR, a document with both terms is ranked higher than
a document with one term.
If the search spans multiple instances, the results are merged based on score.
Query term source (original term, alternative lemma, or thesaurus) can be weighted in
indexserverconfig.xml.
Other scoring:
For Documentum DFC clients that have a ranking configuration file, the ranking file defines how the
source score (xPlore) and the DFC score are merged. The xPlore score could be ignored, partially
used, or fully used without any other influence.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

201

Search

The DFC client application can specify a set of Documentum attributes for sorting results using the
API in an IDfQueryBuilder API. If the query contains an order by attribute, results are returned
based on the attribute and not on the computed score.
These ranking principles are applied in a complicated Lucene algorithm. The Lucene scoring details
are logged when xDB logging is set to DEBUG. See Configuring logging, page 297.
Some of the following settings require reindexing noted. Freshness is supported by default.
1. Edit indexserverconfig.xml. For information on viewing and updating this file, see Modifying
indexserverconfig.xml, page 43.
2. Add a boost-value attribute to a sub-path element. The default boost-value is 1.0. A change requires
reindexing. In the following example, a hit in the keywords metadata increases the score for a result:
<sub-path returnable="true" boost-value="2.0" path="
dmftmetadata/keywords"/>

3. By default the Documentum attribute r_modify_date is used to boost scores in results (freshness
boost). You can remove the freshness boost factor, change how much effect it has, or boost a
custom date attribute.
To remove this boost, edit indexserverconfig.xml and set the property enable-freshness-score
to false on the parent category element. This change affects only query results and does not
require reindexing.
<category name=dftxml><properties>
...
<property name="enable-freshness-score" value="false" />
</properties></category>

Change the freshness boost factor. Changes do not require reindexing. Only documents that are
six years old or less have a freshness factor. The weight for freshness is equal to the weight for the
Lucene relevancy score. Set the value of the property freshness-weight in index-config/properties
to a decimal between 0 (no boost) and 1.0 (override the Lucene relevancy score). For example:
<index-config><properties>
...
<property name="enable-subcollection-ftindex" value="false"/>
<property name="freshness-weight" value="0.75" />

To boost a different date attribute, specify the path to the attribute in dftxml as the value of
a freshness-path property. This change requires reindexing. In the following example, the
r_creation_date attribute is boosted:
<index-config><properties>
...
<property name="enable-subcollection-ftindex" value="false"/>
<property name="freshness-weight" value="0.75" />
<property name="freshness-path" value="dmftmetadata/.*/r_creation_date" />

4. Configure weighting for query term source: original term, alternative lemma, or thesaurus. Does
not require reindexing. By default, they are equally weighted. Edit the following properties in
search-config/properties. The value can range from 0 to 1000.
<property name="query-original-term-weight" value="1.0"/>
<property name="query-alternative-term-weight" value="1.0"/>
<property name="query-thesaurus-term-weight" value="1.0"/>

5.
202

Restart the xPlore instances.


EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

Supporting search in XML documents


When a document to be indexed contains XML content, you can specify how the XML markup and
content should be handled. It can be tokenized or not (tokenize=true | false"). It can be stored within
the input document or separately (store="embed | none"). (The option separate is not supported.) Set
these attributes on the for-element-with-name/xml-content element in indexserverconfig.xml.
All XML documents are parsed and entities are expanded unless the document size exceeds
max_text_threshold. CPS stores tokens for documents that exceed this size, but they are not embedded
in the index and summaries cannot be calculated.
By default, the XML structure (element names and attributes of an input document) is not indexed.
Only the values of XML elements are extracted and indexed. When the XML content size is less than
the threshold (file-limit on the xml-content element), xPlore parses the XML content and extracts the
text. The extracted text is added to the dmftcontentref element in dmftxml. You cannot search for XML
element names or attributes. You can configure xPlore to support XML zone search. XML structure is
preserved along with the content of the elements. The elements, attributes, and content can be searched.
DTD URLs are resolved. External entities are expanded and added to the XML content. DTDs and
external entities can be placed on the xPlore host.
Note: When the size of the file exceeds the size that is specified as file-limit on the xml-content element
in indexserverconfig.xml, XML element values are embedded into the dmftcontentref element of the
dmftxml record. You can still do a full-text search on the file contents if it does not fail XML parsing.

Adding DTD and external entity support


External DTDs with a URL are resolved by the xPlore parser. Add external entities and DTD files, if
necessary, that are referenced by your XML documents. Create a directory on the xPlore host for these
files. Edit indexserverconfig to add a property with the path to this directory, add your files to this
directory, and then restart xPlore instances. For example:
<index-config>
...
<property value="C:\DTDFiles" name="xml-entity-file-path"/>
</property>...

Note: If your documents containing XML have already been indexed, they must be reindexed to
include parsing with the DTD and entities.

Adding XML zone search support


To support searching on XML content or attribute values (also known as zone search):
1. Change index-as-sub-path to true on the xml-content element in indexserverconfig.xml. XML
elements and their content are embedded into dmftxml as children of the dmftcontentref element.
Attributes on the elements are preserved.
When index-as-sub-path is set to true, the language identification relies on the metadata elements
configured in element-for-language-identification on the linguistic-process element. If the language
of content is different from the language of metadata, the content is not indexed correctly. Incorrect
tokens are stored and a query against content can fail to return expected results.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

203

Search

2. Merge the following two sub-path configurations into one:


<sub-path leading-wildcard="false" compress="false"
boost-value="1.0" include-descendants="true" returning-contents
="false" value-comparison="false" full-text-search="true"
enumerate-repeating-elements="false" type="string"
path="dmftcontents/dmftcontent"/>
<sub-path leading-wildcard="false" compress="false"
boost-value="1.0" include-descendants="false" returning-contents
="false" value-comparison="false" full-text-search="false"
enumerate-repeating-elements="false" type="string"
path="dmftcontents/dmftcontent//*"/>

And set the full-text-search attribute value to true in the consolidated sub-path:
<sub-path leading-wildcard="false" compress="false"
boost-value="1.0" include-descendants="false" returning-contents
="false" value-comparison="false" full-text-search="true"
enumerate-repeating-elements="false" type="string"
path="dmftcontents/dmftcontent//*"/>

Note: If your documents containing XML have already been indexed, they must be reindexed
to include the XML content.
Note: If the content exceeds the CPS max text threshold, XML content is not embedded.
The following illustration shows how XML content is processed depending on your configuration.
The table assumes that the document submitted for indexing does not exceed the size limit in index
agent configuration and the content limit in CPS configuration.

204

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

Figure 15

XML processing options

Handling XML parsing errors


Specify how to handle parsing errors with the on-embed-error attribute. Set this attribute on the
for-element-with-name/xml-content element. Valid values: embed_as_cdata | ignore | fail. The option
embed_as_cdata stores the entire XML content as a CData child element of the specified element,
for example, dmftcontentref. The ignore option does not store the XML markup. For the fail option,
none of the content in the document is searchable.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

205

Search

Zone search in XQueries


DFC client applications like Webtop or D2 generate XQueries, but they cannot perform zone searches.
In the following example, an XML document has the following structure:
<company>
<staff>
<firstname>Michelle</firstname>
<lastname>Specht</lastname>
<nickname>mspecht</nickname>
<salary>100000</salary>
</staff>
<staff>
<firstname>John</firstname>
<lastname>Heidrich</lastname>
<nickname>jheidrich</nickname>
<salary>200000</salary>
</staff>
...1000 more entries
</company>

A search for the word staff generates the following simple XQuery:
let $j:= for $x in collection(/XMLTest)/dmftdoc
[. ftcontains staff with stemming]

You can use IDfXQuery to generate the following query, which is much more specific and performs
better:
let $j = for $x in collection(/XMLTest)/dmftdoc
[dmftcontents/dmftcontent/dmftcontentref/company/staff]
ftcontains John with stemming

Rewriting VQL queries


Older Documentum applications supported XML zone searches with Verity Query Language (VQL).
FAST indexing did not support VQL queries. With xPlore, you can configure zone search support
using index-as-sub-path, or you can rewrite some VQL queries to XQuery equivalents.
Perform structured searches of XML documents using XQuery or the DFC interface IDfXQuery.
Join different objects using DQL (NOFTDQL), XQuery, or the DFC interface IDfXQuery.
Denormalize the relationship of a document to other objects or tables, such as email attachments,
using XQuery or the DFC interface IDfXQuery.
Perform boolean searches using DQL, XQuery, or the DFC interface IDfXQuery.
For a table of VQL examples and their equivalents in XQuery expression, see VQL and XQuery
Syntax Equivalents, page 355.

206

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

Adding a thesaurus
A thesaurus provides results with terms that are related to the search terms. For example, when a user
searches for car, a thesaurus expands the search to documents containing auto or vehicle. When
you provide a thesaurus, xPlore expands search terms in full-text expressions to similar terms. This
expansion takes place before the query is tokenized. Terms from the query and thesaurus expansion
are highlighted in search results summaries.
A thesaurus can have terms in multiple languages. Linguistic analysis of all the terms that are returned,
regardless of language, is based on the query locale.
Thesaurus support is available for DFC clients. The thesaurus is not used for DFC metadata searches
unless you use a DFC API for an individual query. For DQL queries, the thesaurus is used for both
search document contains (SDC) and metadata searches.
The thesaurus must be in SKOS format, a W3C specification. FAST-based thesaurus dictionaries must
be converted to the SKOS format. Import your thesaurus to the file system on the primary instance host
using xPlore administrator. You can also provide a non-SKOS thesaurus by implementing a custom
class that defines thesaurus expansion behavior. See Custom access to a thesaurus, page 268.

SKOS format
The format starts with a concept (term) that includes a preferred label and a set of alternative labels.
The alternative labels expand the term (the related terms or synonyms). Here is an example of such an
entry in SKOS:
<skos:Concept rdf:about="http://www.my.com/#canals">
<skos:prefLabel>canals</skos:prefLabel>
<skos:altLabel>canal bends</skos:altLabel>
<skos:altLabel>canalized streams</skos:altLabel>
<skos:altLabel>ditch mouths</skos:altLabel>
<skos:altLabel>ditches</skos:altLabel>
<skos:altLabel>drainage canals</skos:altLabel>
<skos:altLabel>drainage ditches</skos:altLabel>
</skos:Concept>

In this example, the main term is canals. When a user searches for canals, documents are returned
that contain words like canals, canal bends, and canalized streams. The SKOS format supports
two-way expansion, but it is not implemented by xPlore; a search on ditch does not return documents
with canals.
An SKOS thesaurus must use the following RDF namespace declarations:
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:skos="http://www.w3.org/2004/02/skos/core#">
<skos:Concept ...
</skos:Concept>
</rdf:RDF>

Terms from multiple languages can be added like the following example:
<skos:Concept rdf:about="http://www.fao.org/aos/concept#25695">

EMC Documentum xPlore Version 1.3 Administration and Development Guide

207

Search

<skos:prefLabel xml:lang="fr">Charrue socs</skos:prefLabel>


<skos:prefLabel xml:lang="es">Arados de vertedera</skos:prefLabel>
</skos:Concept>

Importing and viewing a thesaurus


You can view a list of thesauruses for each domain using xPlore administrator. Open Diagnostic and
Utilities > Thesaurus. Choose the domain (Documentum Content Server repository) for the thesaurus.
The list shows a thesaurus URI, indicates a default thesaurus, and the date it was imported.
To install a thesaurus, choose the domain (Documentum Content Server) for the thesaurus, then
browse to a local thesaurus or enter a URI to the thesaurus. Choose true for Default if this is the
default thesaurus for the domain. You can import additional thesauruses for the domain and specify
them in DFC or DQL queries.
xPlore registers this library in indexserverconfig.xml as the category thesaurusdb.
The thesaurus is stored in an xDB library under the domain, for example,
root-library/my_domain/dsearch/SystemInfo/ThesaurusDB. A special multi-path index
is defined in indexserverconfig.xml for the SKOS format to speed up thesaurus probes. The multi-path
index allows case-insensitive and space-insensitive lookups in an SKOS dictionary.
To modify your thesaurus, delete it in xPlore administrator and reimport it.

Enabling thesaurus support in a Content Server


Enable thesaurus support for Documentum repositories by editing the fulltext engine object
configuration for each Content Server. Set the value of the dm_ftengine_config property
thesaurus_search_enable. The following iAPI commands enable thesaurus support. The Content
Server query plugin and the DFC search service read these values .
retrieve,c,dm_ftengine_config
append,c,l,param_name
thesaurus_search_enable
append,c,l,param_value
true
save,c,l

Enabling phrase search in a thesaurus


You can configure the fulltext engine to match entire phrases against the thesaurus. By default, phrase
queries are not lemmatized, and the thesaurus is bypassed. The following iAPI command instructs
xPlore to match phrases in the thesaurus:
retrieve,c,dm_ftengine_config
append,c,l,param_name
use_thesaurus_on_phrase
append,c,l,param_value
true
save,c,l

208

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

DFC thesaurus APIs


Update DFC in the client application to the latest patch. Use the following APIs to enable thesaurus use
and to use a custom thesaurus in the DFC search service. These APIs are in IDfSimpleAttrExpression
and IDfFulltextExpression. They override the default settings in dm_ftengine_config. The String
argument for setThesaurusLibrary is a URI to the thesaurus that you have imported.
void setThesaurusEnabled(Boolean thesaurusSearchEnabled);
void setThesaurusLibrary(String thesaurusLibrary)

The following example enables thesaurus expansion for the metadata object_name. The global
thesaurus setting does not enable metadata search in the thesaurus.
rootExpressionSet.addSimpleAttrExpression(
"object_name", IDfValue.DF_STRING, IDfSearchOperation.SEARCH_OP_CONTAINS,
false, false, "IIG");
aSimpleAttrExpr.setThesaurusEnabled(true);

Setting thesaurus support in DQL queries


For all DQL queries, the thesaurus is used only for full-text (SDC) queries and string metadata queries
using the like or equals operator.
If use_thesaurus_enable is set to true in dm_ftengine_config, then the thesaurus is used for DQL
queries. To override use_thesaurus_enable when it set to false, use the DQL hint ft_thesaurus_search.
For example:
select object_name from dm_document search document contains test
enable(ft_thesaurus_search)

To specify a specific thesaurus in a DQL query, use the hint ft_use_thesaurus_library, which
takes a string URI for the thesaurus. The following example overrides the thesaurus setting in
dm_ftengine_config because it adds the ft_thesaurus_search hint. If thesaurus search is enabled, use
only the ft_use_thesaurus_library hint.
select object_name from dm_document search document contains test
enable(ft_thesaurus_search, ft_use_thesaurus_library (
http://search.emc.com/myDomain/myThesaurus.rdf))

Setting thesaurus properties in xQuery expressions


xPlore supports the W3c XQuery thesaurus option in the specification.
To enable a thesaurus, add using thesaurus default to the xQuery expression, for example:
object_name ftcontains "IIG" with stemming using stop words default
using thesaurus default entire content

The following query enables thesaurus search with a phrase:


. ftcontains "programming language" with stemming
using stop words default using thesaurus default

EMC Documentum xPlore Version 1.3 Administration and Development Guide

209

Search

Logging thesaurus actions


Set the log level to DEBUG in xPlore administrator: Services > Logging > dsearchsearch. The
following information is logged:
Thesaurus is found from the specified URI. For example:<message ><![CDATA[attempt to
verify existence of thesaurus URI:
http://search.emc.com/testenv/skos.rdf]]></message>
...
<message ><![CDATA[successfully verified existence of thesaurus URI:
http://search.emc.com/testenv/skos.rdf]]></message>

Argument values that are passed into getTermsFromThesaurus: input terms, relationship,
minValueLevel, maxValueLevel. For example:<message >
<![CDATA[calling getTermsFromThesaurus with terms [leaVe],
relationship null, minLevelValue -2147483648, maxLevelValue
2147483647]]></message>

Thesaurus XQuery statement. For example:<![CDATA[thesaurus lookup xquery: declare


option xhive:fts-analyzer-class
com.emc.documentum.core.fulltext.indexserver.core.index.xhive.
IndexServerAnalyzer;
declare namespace rdf =
http://www.w3.org/1999/02/22-rdf-syntax-ns#;
declare namespace skos = http://www.w3.org/2004/02/skos/core#;
declare variable $terms external;
declare variable $relation external;
doc(/testenv/dsearch/SystemInfo/ThesaurusDB)[xhive:metadata(
.,uri)=http://search.emc.com/testenv/skos.rdf]/rdf:RDF/skos:
Concept[skos:prefLabel contains text {$terms} entire content]/
skos:altLabel/text()]]>

Tokens that are looked up in the thesaurus. The query term leaVe is rendered
case-insensitive:><![CDATA[executing the thesaurus lookup query to get related
terms for
[leaVe]]]></message>
...
<![CDATA[Returned token: leave]]>
...
<![CDATA[Total tokens count for reader: 1]]>

Query plan for thesaurus XQuery execution. Provide the query plan to technical support if you are
not able to resolve an issue. For example:<![CDATA[thesaurus lookup execution plan:
query:6:1:Creating query
plan on node /testenv/dsearch/SystemInfo/ThesaurusDB
query:6:1:for expression ...[xhive:metadata(., "uri") = "
http://search.emc.com/testenv/skos.rdf"]/child::
{http://www.w3.org/1999/02/22-rdf-syntax-ns#}RDF/child::
{http://www.w3.org/2004/02/skos/core#}
Concept[child::{http://www.w3.org/2004/02/skos/core#}prefLabel
[. contains text terms@0]]/child::
{http://www.w3.org/2004/02/skos/core#}altLabel/child::text()

210

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

query:6:1:Using query plan:


query:6:1:index(Concept)
[parent::{http://www.w3.org/1999/02/22-rdf-syntax-ns#}
RDF[parent::document()[xhive:metadata(., "uri") = "
http://search.emc.com/testenv/skos.rdf"]]]/child::
{http://www.w3.org/2004/02/skos/core#}altLabel/child::text()
]]>.

Related terms that are returned from the thesaurus. For example:<![CDATA[related terms
from thesaurus lookup query
[Absence from work, Absenteeism, Annual leave, Employee vacations,
Holidays from work, Leave from work, Leave of absence, Maternity
leave, Sick leave]]]>

You can also inspect the final Lucene query. This query is different from the original query because
it contains the expanded terms (alternate labels) from the thesaurus. In xPlore administrator, open
Services > Logging and expand xhive. Change the log level of com.xhive.index.multipath.query
to DEBUG. The query is in the xDB log as generated Lucene query clauses. xdb.log is in
xplore_home/jboss5.1.0/server/DctmServer_PrimaryDsearch/logs. The tokens are noted as tkn. For
example:
generated Lucene query clauses(before optimization):
+(((<>/dmftmetadata<0>/dm_sysobject<0>/a_is_hidden<0>/ txt:false)^0.0)) +
(((<>/dmftversions<0>/iscurrent<0>/ txt:true)^0.0))
+(<>/ tkn:shme <>/dmftcontents<0>/ tkn:shme <>/dmftcontents<0>/dmftcontent<0>/
tkn:shme <>/dmftfolders<0>/ tkn:shme <>/dmftinternal<0>/
tkn:shme <>/dmftinternal<0>/r_object_id<0>/
tkn:shme <>/dmftinternal<0>/r_object_type<0>/
tkn:shme <>/dmftkey<0>/ tkn:shme <>/dmftmetadata<0>/
tkn:shme <>/dmftsecurity<0>/
tkn:shme <>/dmftsecurity<0>/ispublic<0>/ tkn:shme <>/dmftversions<0>/
tkn:shme <>/dmftvstamp<0>/
tkn:shme) _xhive_stored_payload_:_xhive_stored_payload_

Troubleshooting a thesaurus
Make sure that your thesaurus is in xPlore. You can view thesauri and their properties in the xDB
admin tool. Navigate to the /xhivedb/root-library/<domain>/dsearch/SystemInfo/ThesaurusDB library.
To view the default and URI settings, click the Metadata tab.
Make sure that your thesaurus is used. Compare the specified thesaurus URI in the XQuery to the
URI associated with the dictionary. View the URI in the xDB admin tool or the thesaurus list in
xPlore administrator. Compare this URI to the thesaurus URI used by the XQuery, in dsearch.log.
For example:
for $i score $s in collection(/testenv/dsearch/Data) /dmftdoc[.
ftcontains food products using thesaurus at http://www.emc.com/skos]
order by $s descending return $i/dmftinternal/r_object_id

If the default thesaurus on the file system is used, the log records a query like the following:
for $i score $s in collection(/testenv/dsearch/Data) /dmftdoc[.
ftcontains food products using thesaurus default] order by $s
descending return $i/dmftinternal/r_object_id

EMC Documentum xPlore Version 1.3 Administration and Development Guide

211

Search

You can view thesaurus terms that were added to a query by inspecting the final query. Set
xhive.index.multipath.query = DEBUG in xPlore administrator. Search for generated Lucene query
clauses.

Configuring query lemmatization


xPlore supports search for similar or like terms, also known as lemmatization, by default. To speed
indexing and search performance, you can turn off lemmatization for indexing. See Configuring
indexing lemmatization, page 105.
Note: If you have already indexed documents, the lemmas cannot be removed from the index. They
will be found even if you use a phrase search. You must reindex all documents to remove lemmas
from the index.
Lemmatization in XQueries (including DFC 6.7 or higher).
You can turn on or off lemmatization of individual queries by using the XQFT modifiers with stemming
or without stemming. The XQFT default is without stemming. DFC produces XQuery syntax. The
DFC default is with stemming except for a phrase search.
Lemmatization in DQL and DFC queries
The default is with stemming. To turn off stemming in DQL or DFC queries, use a phrase search.

Configuring search on compound words


Compound words are frequent in some languages such as German or Chinese. By default, searching on
a compound term returns results that contain the compound term, any of the components, or alternate
lemmas of the compound term. For example, searching on Heilmittel returns results with Heilmittel,
Heil, or Mittel such as Schmiermittel or Finanzmittel. In some cases, the component is so common that
many irrelevant results are returned.
You can configure search to return only alternate lemmas and not components. This configuration
increases the precision of the results at the cost of a lower recall: the search retrieves fewer results, the
results are more relevant but some relevant results may not be returned.
1. Edit indexserverconfig.xml in xplore_home/config.
2. Set the property named query-components-from-compound-words to false under the search-config
element:
<search-config>
<properties>
...
<property value="false" name="query-components-from-compound-words"/>
...
</properties>
</search-config>

Configuring query summaries


Query summary highlighting, page 214
Configuring summary security, page 214
212

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

The indexing service stores the content of each object as an XML node in dftxml called dmftcontentref.
If the content exceeds the limit for indexing, only the metadata are indexed. When you examine the
dftxml for a document, the attribute islocalcopy has a value of true if the content is stored in that
element. When the value is false, only the metadata has been stored.
For all documents in which an indexed term has been found, xPlore retrieves the content node and
computes a summary. The summary is a phrase of text from the original indexed document that
contains the searched word. Search terms are highlighted in the summary.
Dynamic summaries have a performance impact. Unselective queries can require massive processing
to produce summaries. After the summary is computed, the summary is reprocessed for highlighting,
causing a second performance impact. You can disable dynamic summaries for better performance.
All of the following must be true in order to have dynamic summaries. If any condition is false, a
static summary is generated.
query-enable-dynamic-summary must be set to true
The result must be within the first X rows defined by max-dynamic-summary-threshold parameter.
The size of the extracted text must be less than extract-text-size-less-than attribute.
The query term must appear within the first X characters defined by token-size attribute.
If security_mode is set to BROWSE, the user must have at least READ permission.
Static summaries are computed when the summary conditions do not match the conditions configured
for dynamic summaries. Static summaries are much faster to compute but less specific than dynamic
summaries.
1. Configure general summary characteristics.
a. If you want to turn off dynamic summaries in xPlore administrator, choose Services > Search
Service and click Configuration. Set query-enable-dynamic-summary to false.
The first n characters of the document are displayed, where n is the value of the parameter
query-summary-display-length.
b. To configure number of characters displayed in the summary, choose Services > Search
Service in xPlore administrator. Set query-summary-display-length (default: 256 characters
around the search terms). If no search term is found, a static summary of the specified length
from the beginning of the text is displayed, and no terms are highlighted.
c. To configure the size of a summary fragment, edit indexserverconfig.xml. Search
terms can be found in many places in a document. Add a property for fragment size to
search-config/properties (default 64). The following example changes the default to 32,
allowing up to 8 fragments for a display length of 256:
<property value="32" name="query-summary-fragment-size"/>

2. Configure static summaries in indexserverconfig.xml. Specify the elements


(Documentum attributes) that are used for the static summary. Set
category-definitions/category/elements-for-static-summary. . The max-size attribute sets the
maximum size of the static summary. Default: 65536 (bytes).
3. Configure dynamic summaries.
a. Configure the maximum size of content that is evaluated for a dynamic summary.
Set the maximum as the value of the extract-text-size-less-than attribute on the
categordefinitions/category/do-text-extraction/save-tokens-for-summary-processing element.
The default: is -1 (all documents). For faster summary calculation, set this value to a positive
value. Larger documents return a static summary.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

213

Search

b. Configure the number of characters at the beginning of the document in which the query
term must appear. If the query term is not found in this snippet, a static summary is
returned and term hits are not highlighted. Set the value of the token-size attribute on the
category-definitions/category/do-text-extraction/save-tokens-for-summary-processing element.
.The default value is 65536 (64K). A value of -1 indicates no maximum content size, but this
value negatively impacts performance. For faster summary calculation, set this value lower.
c. Configure the maximum number of results that have a dynamic summary. Dynamic
summaries require much more computation time than static summaries. Set the value of
max-dynamic-summary-threshold (default: 50). Additional results have a static summary.
If most users do not go beyond the first page of results, set this value to the page size, for
example, 10, for faster performance.
With native xPlore security and the security_mode property of the dm_ftengine_config object set to
BROWSE, the user must have at least READ permission to see dynamic summaries.

Query summary highlighting


The search terms, including lemmatized terms, are highlighted within the summary that is returned to
the client search application. Wildcard search terms are also highlighted. For example, if the search
term is ran*, then the word rant in metadata is highlighted.
Highlighting does not preserve query context such as phrase search, AND search, NOT search, fuzzy
search, or range search. Each search term in the original query is highlighted separately.
Dynamic summaries have a performance impact. Unselective queries can require massive processing
to produce summaries. After the summary is computed, the summary is reprocessed for highlighting,
causing a second performance impact.

Configuring summary security


By default, users see search results for which they have BROWSE permission if SUMMARY is
not selected. If SUMMARY is in the select list, they see only results for which they have READ
permission.
To modify the permissions applied to FTDQL and non-FTDQL search summaries, change the
security_mode property of the dm_ftengine_config object. Use one of the following values:
BROWSE: Displays all results for which the user has at least BROWSE permission. If the user has
BROWSE permission, the summary is blank.
READ: Displays all results for which the user has at least READ permission.
SUMMARY_BASED (default): If SUMMARY is not in the select list, displays all results for
which the user has at least BROWSE permission . If SUMMARY is selected, displays results for
which the user has at least READ permission.
The following iAPI example sets the summary mode to READ:
retrieve,c,dm_ftengine_config
append,c,l,param_name
security_mode
append,c,l,param_value
READ

214

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

save,c,l

Configuring fuzzy search


Fuzzy search can find misspelled words or letter reversals by calculating the similarity between terms
using a Lucene algorithm. Fuzzy search is not applied when wildcards are present. Fuzzy search is not
enabled by default. Fuzzy search is only available in XQueries not in DQL queries. DFC clients must
be at least version 6.7 to enable fuzzy search.
Fuzzy search does not work if you enable exact phrase match in queries. For information about
configuring exact phrase match in queries, see About lemmatization, page 103.
When enabling fuzzy search, a query runs with the search terms as well as a number of similar terms.
To reduce the performance impact that such a query can have, we consider that the first character of
the search term is correct and we limit the number of similar terms used in the query. For example,
searching on qdministration would not return results with the term administration. Similarly, searching
on explore would not return results with the term xplore.
You can configure the number of leading characters to ignore, the number of similar terms to use in the
query, and the supported distance between terms.
Use the following procedure to enable and configure fuzzy search for all queries except DQL.
1. Check your current dm_ftengine_config parameters. Use iAPI, DQL, or DFC to check the
dm_ftengine_config object. First get the object ID, returned by this API command:
retrieve,c,dm_ftengine_config

2. Use the object ID to get the parameters:


?,c,select param_name, param_value from dm_ftengine_config where
r_object_id=dm_ftengine_config_object_id

3. If the fuzzy_search_enable parameter does not exist, use iAPI, DQL, or DFC to modify the
dm_ftengine_config object. To add a parameter using iAPI in Documentum Administrator, use
append like the following:
retrieve,c,dm_ftengine_config
append,c,l,param_name
fuzzy_search_enable
append,c,l,param_value
true
save,c,l

4. Change the allowed similarity between a word and similar words, set the parameter
default_fuzzy_search_similarity in dm_ftengine_config. This default also applies to custom fuzzy
queries in DFC and DFS for full-text and properties. Set a value between 0 (terms are different
by more than one letter) and 1 (default=0.5).
To verify that your fuzzy search setting has been applied, view the query in dsearch.log. You
should see the following argument in the query with a similarity value that you have set:
using option xhive:fuzzy "similarity=xyz

5. Edit the xdb.properties file located in the directory WEB-INF/classes of the primary instance.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

215

Search

6. Set the xdb.lucene.fuzzyQueryPrefixLength property to the number of leading characters that should
be ignored when assessing similar terms. Default: 1.
For example, when setting the prefix value to 0, searching explore returns xplore, but it has a
large impact on the performance. Only set it to 0 if the first character is critical to your business.
Setting the prefix to a high value improves the performance but similar terms can be omitted and
you lose the benefit of the feature.
7. Set the xdb.lucene.fuzzyTermsExpandedNumber property to the maximum number of similar terms
used in the query. The most similar terms are used. A smaller value improves query response
time. Default: 10.
8. Make the same changes in the xdb.properties file for all instances.
Fuzzy search in DFC and DFS
You can enable fuzzy search on individual queries in DFC or DFS. Set fuzzy search in individual
full-text and property queries with APIs on IDfFulltextExpression and IDfSimpleAttrExpression. Use
the operators CONTAINS, DOES_NOT_CONTAIN, and EQUALS for String object types:
setFuzzySearchEnabled(Boolean fuzzySearchEnabled)
setFuzzySearchSimilarity(Float similarity): Sets a similarity value between 0 and 1. Overrides the
value of the parameter default_fuzzy_search_similarity in dm_ftengine_config.
To disable fuzzy search, set the property fuzzy_search_enable in the dm_ftengine_config object to false.

Configuring index type checking


When a query is performed against indexed documents, the query data type (element data type in the
XQuery expression) is checked against the index data type (specified in the subpath definition in
indexserverconfig.xml). The query data type must be compatible with the index data type for the
query to be accepted and matched results returned.
You can adjust the level of compatibility between the two sets of data types to fine-tune query
performance by setting the xdb.lucene.strictIndexTypeCheck value in xdb.properties.
Set xdb.lucene.strictIndexTypeCheck to False to apply less strict data type checking to queries to
improve query performance. This is the default value. When xdb.lucene.strictIndexTypeCheck is
False, xPlore considers index data type string compatible with query data types dateTime and integer
(only in exact match) when handling queries.
The following table shows compatible query and index data type pairs (indicated as Yes) when
xdb.lucene.strictIndexTypeCheck is set to False.

216

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

Index Type
Query

string

integer

double

Yes

Yes

date

dateTime

time

float

long

Yes

Yes

Type
string

Yes

integer

Yes

double

Yes

date
dateTime

Yes
Yes

Yes
Yes

time

Yes

float

Yes

Yes

Yes

When xdb.lucene.strictIndexTypeCheck is True, a stricter index typing checking rule is enforced. This
may cause lower query performance if you did not specify index data types when creating subpath
definitions.
The following table shows compatible query and index data type pairs (indicated as Yes) when
xdb.lucene.strictIndexTypeCheck is set to True.
Index Type
Query
Type

string

string

Yes

integer
double

integer

double

Yes

Yes

date

dateTime

float

long

Yes

Yes

Yes
Yes

date

Yes

dateTime

Yes

time

Yes

float

time

Yes

Yes
Yes

For example, the following elements in a Content Server document are declared as type integer and
dateTime respectively:
<owner_permit dmfttype="dmint">7</owner_permit>
<r_creation_date dmfttype="dmdate">2010-09-27T22:54:48</r_creation_date>

However, without an explicit subpath definition in Indexserverconfig.xml, both element values are
indexed as string values in multi-path indexes.
Performing the following queries will return different results depending how you set the
xdb.lucene.strictIndexTypeCheck value:
/dmftdoc[dmftmetadata//owner_permit = xs:Integer(7)]
/dmftdoc[dmftmetadata//r_creation_date = xs:dateTime(2010-09-27T22:54:48)]

If xdb.lucene.strictIndexTypeCheck = True, the elements will not be returned since the query data
types integer and dateTime do not match the index data type string.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

217

Search

If xdb.lucene.strictIndexTypeCheck = False, the element will be returned since the query data types
integer and dateTime are considered compatible with the index data type string.

Configuring wildcards and fragment search


The default behavior in xPlore matches that of commonly used search engines. Wildcard (fragment)
search is not performed in a full-text search unless the user adds an explicit wildcard. This provides
fast, more precise search results than a fragment search. You can configure fragment search support,
but it can cause slower performance in an unselective query or an advanced search for ends with.
Instead of fragment search, you find similar terms using lemmatization, which is enabled by default.
You can also configure fuzzy search, which allows the user to misspell words.
Wildcard behavior is configurable for both content and metadata. See Configuring content wildcard
support, page 219.
Full-text search
xPlore searches for word fragments in the text of documents when the user adds a * wildcard. You
can turn off this behavior so that xPlore treats a wildcard as a literal (*) or configure xPlore to
implicitly add wildcards.
Metadata (properties) search
You can configure wildcard support (explicit or implicit) for properties when the search client uses
a begins with or ends with operator. You can also configure support for the contains operator, but it
has a heavy performance penalty. xPlore does not add wildcards to queries for the operators does
not contain, does not equal (<>), in, and not in.
Searching both content and metadata
In DFC clients like Webtop and CenterStage, xCP applications, or D2, a simple search searches
metadata as well as content. Simple search matches word fragments in metadata when there is a
wildcard (*) in the query term.
Phrase searches and diacritics
Wildcards in a phrase search are either ignored or treated as literals, depending on your configuration.
Wildcards in DQL
By default, DQL is transformed into XQuery by DFC. To bypass XQuery, you must turn off XQuery
generation in DFC to use DQL. This is not recommended, because most wildcard behavior can be
configured without turning off XQuery. Use DQL only for legacy applications.
Add the following setting to dfc.properties on the DFC client application:
dfc.search.xquery.generation.enable=false

The following DQL wildcards are supported. All other characters are treated as literals.
search document contains (SDC) wildcards: * and ?.
wildcards in a where clause: %. and *.
In a DQL phrase search, fragments are matched. For example, dogs*cats in a phrase matches
dogslovecats. In addition, DQL queries that contain the DQL hint FT_CONTAIN_FRAGMENT in the
where clause match fragments instead of whole words.
218

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

Note: If both leading and trailing wildcards appear in a DQL metadata condition, the wildcards are
dropped even when fast_wildcard_compatible is true. This behavior is the same as FAST indexing. To
override, use the FT_CONTAIN_FRAGMENT hint.

Limitations of wildcards
A wildcard cannot be preceded by white space. For example, a search for word_* is treated as word
* and cannot be resolved. The special character can be removed from the special characters list so
that it will not be treated as white space, if it must be searchable.
If you have configured xPlore to match phrases exactly in queries, using a period followed by a plus
sign (.+) as leading wildcard in XQueries may return inaccurate results.

Configuring content wildcard support


By default, xPlore searches match wildcards (* for multiple characters and ? for single character) in
metadata search and in the content of a document. You can change the default wildcard behavior for
content and metadata searches.
This wildcard support has changed in 1.3. For content searches in previous versions of xPlore, you can
set fast_wildcard_compatible to true. This setting is now deprecated and the following new parameters
allow you to control the user search experience.
Use the ft_wildcards_mode parameter to support wildcard search in content. The ft_wildcards_mode
parameter specifies how wildcards are evaluated in full-text clauses (Webtop simple search, Webtop
advanced search Contains field, and global search in xCP 2.0). Valid values:
none: Wildcard is treated as a literal * character.
explicit: Wildcard character must be entered to search for fragments (default).
implicit: Wildcards are added implicitly around every search term (negative impact on performance).
trailing_implicit: A wildcard is added to the end of every search term.
DQL SDC queries do not support the ft_wildcards_mode parameter. If your application adds wildcards
to DQL queries, use fast_wildcard_compatible.
For information on configuring wildcard support in metadata searches, see Configuring metadata
wildcard search, page 220.
You can configure a cutoff to limit runaway queries in wildcard searches. See Limiting wildcards
and common terms in search results, page 221.
Configure the wildcard parameters in the Documentum dm_ftengine_config object. Edit the
dm_ftengine_config object in the Content Server.
1. Using iAPI in Documentum Administrator or Server Manager, first get the object ID:
retrieve,c,dm_ftengine_config where
object_name like DSearch%

2. Use the object ID to get the dm_ftengine_config parameters and values. In the following example,
the value of r_object_id that was returned in step 1 is used to get the parameters.
?,c,select param_name, param_value from dm_ftengine_config

EMC Documentum xPlore Version 1.3 Administration and Development Guide

219

Search

where r_object_id=080a0d6880000d0d

3. If the wildcards configuration parameters are not returned, configure them. Append a param_name
and param_value element and set its value. For example:
retrieve,c,dm_ftengine_config
append,c,l,param_name
ft_wildcards_mode
append,c,l,param_value
explicit
save,c,l

4. To change an existing parameter, locate the position of the param_name attribute value of the
parameter. Use set as follows:
retrieve,c,dm_ftengine_config
dump,c,l //locates the position
set,c,l,param_value[i] //position of ft_wildcards_mode
implicit
save,c,l

Configuring metadata wildcard search


By default, a metadata search in DFC clients like Webtop and applications built with xCP 2.0 Designer
does not search for word fragments unless you explicitly use a wildcard (*). Wildcards are not
generated in advanced search for contains, starts with, or ends with. However, you can configure
wildcard support in metadata.
This wildcard support has changed in 1.3. For content searches in previous versions of xPlore, you can
set fast_wildcard_compatible to true. This setting is now deprecated and the following new parameters
allow you to control the user search experience.
For better performance, configure a cutoff for results. The cutoff applies to search results for all
wildcard queries except queries with a trailing wildcard. For cutoff configuration, see Limiting
wildcards and common terms in search results, page 221.
Use the following configuration parameters in the procedure below. All metadata wildcard
configuration has a negative effect on performance.
metadata_contains_wildcards_mode: Separately controls metadata search (Webtop advanced
search properties contains operator). Valid values: none | explicit (default) | implicit |
trailing_implicit.
metadata_startswith_wildcards_mode: Separately controls metadata search (Webtop advanced
search properties starts with operator). Valid values: none | explicit (default) | implicit. With
implicit, the wildcard is added at the end of the search term.
metadata_endswith_wildcards_mode: Separately controls metadata search (Webtop advanced
search properties ends with operator). Valid values: none | explicit (default) | implicit. With implicit,
the wildcard is added at the beginning of the search term. Leading wildcard search has the worst
performance of searches because the entire index must be probed. To mitigate the performance
impact, you can configure the support of leading wildcards for subpaths, as described in Configuring
metadata wildcard search, page 220. Requires rebuild of the index.
220

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

metadata_equals_wildcards_mode: Separately controls metadata search (Webtop advanced search


properties equals operator). Valid values: none (default) | explicit.
For information on configuring wildcard support in content searches, see Configuring content wildcard
support, page 219.

Configuring leading wildcards


By default, only the Documentum attribute object name supports leading wildcards. You can
configure support for other attributes that users search for with leading wildcards. For example, a
custom attribute customer has a variable prefix like SrcAtlantic or ResAtlantic. The user searches for
*Atlantic.
Configure only the subpaths that are used most often in searches. The additional indexes created
for the configured subpath have a space cost.
1. To support this custom metadata leading wildcard, add a subpath entry to indexserverconfig.xml
like the following:
<sub-path leading-wildcard="true" compress="false" boost-value="
1.0" include-descendants="false" returning-contents="true"
value-comparison="
true" full-text-search="true" enumerate-repeating-elements="false"
type="string" path="dmftmetadata//customer"/>

2. Adjust the metadata_endswith_wildcard_mode parameter. For example, you can set it to implicit.
3. To apply the changes, rebuild the index.

Limiting wildcards and common terms in search


results
You can configure a cutoff that silently stops searching for terms containing wildcards when a
configurable limit is reached. You can also configure a frequency limit, so that common terms that
appear often (like stop words) are dropped from the search.
Open xdb.properties in the directory WEB-INF/classes of the primary instance. If these keys do
not exist, you can add them.
xdb.lucene.termsExpandedNumber: Sets the cutoff. Default: 1000.
xdb.lucene.ratioOfMaxDoc: Sets the maximum term frequency in the index. (Term frequency is
recorded during indexing.) For example, if the ratio is 0.5 or higher, the query does not search for
terms that occur in more than half of the documents in the index. A search for *nd would reject hits
for and because it occurs in more than half of the index.

Supporting wildcards in DQL


The fast_wildcard_compatible property was introduced in dm_ftengine_config to support search for
word fragments (FAST wildcard behavior). This setting is deprecated, because xPlore supports any
combination of wildcard search in text and metadata. (See Configuring content wildcard support,
page 219.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

221

Search

Use this setting to support wildcards in DQL. Query performance is negatively affected when
fast_wildcard_compatible is set to true.

Configuring Documentum search


Query plugin configuration (dm_ftengine_config), page 222
Making types and attributes searchable, page 223
Folder descend queries, page 223
DQL, DFC, and DFS queries, page 224
Routing a query to a specific collection, page 257
Tracing Documentum queries, page 227

Query plugin configuration (dm_ftengine_config)


The Content Server query plugin settings are set during index agent installation. Change them
only if the Content Server or xPlore environment changes. The following settings affect query
processing. If you change them, you do not need to restart the Content Server. For a full description of
dm_ftengine_config attributes, see dm_ftengine_config, page 334.
Queries fail if the wrong (FAST) query plugin is loaded in the ContentServer. Check the
Content Server log after your start the Content Server. The file repository_name.log is located
in $DOCUMENTUM/dba/log. Look for the line like the following. It references a plugin with
DSEARCH in the name.
[DM_FULLTEXT_T_QUERY_PLUGIN_VERSION]info: "Loaded FT Query Plugin:
.../DSEARCHQueryPlugin.dll...FT Engine version: X-Hive/DB 10"

To check your current dm_ftengine_config settings, use iAPI, DQL, or DFC. To view existing
parameters using iAPI in Documentum Administrator:
First get the object ID:retrieve,c,dm_ftengine_config
... <dm_ftengine_config_object_id>

Use the object ID to get the parameters:?,c,select param_name, param_value from


dm_ftengine_config
where r_object_id=dm_ftengine_config_object_id

1. Add a missing parameter using iAPI append like the following:


retrieve,c,dm_ftengine_config
append,c,l,param_name
acl_check_db
append,c,l,param_value
T
save,c,l

2. To change an existing parameter, use set like the following:


retrieve,c,dm_ftengine_config
set,c,l,param_name
acl_check_db

222

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

set,c,l,param_value
false
save,c,l

3. To remove an existing parameter, use remove instead of set.

Making types and attributes searchable


You can create or alter types to make them searchable and configure them for full-text support. For
information on making attributes non-searchable, see Making metadata non-searchable, page 80.
Allowing indexing
The a_full_text attribute is defined for the dm_sysobject type and subtypes (default: true). Content
and properties are indexed when a Save, Saveasnew, Checkin, Destroy, Branch, or Prune operation
is performed on the object. When a_full_text is false, only the properties are indexed. Users with
Sysadmin or Superuser privileges can change the a_full_text setting using Documentum Administrator.
Setting indexable formats
Properties of the format object determine which formats are indexable and which content files in
indexable formats are indexed. If the value of the can_index property of a content file format object
is set to true, the content file is indexable. If the primary content of an object is not in an indexable
format, you can ensure indexing by creating a rendition in an indexable format. For more details, see
EMC Documentum Search Development Guide.
Allowing search
Set the is_searchable attribute on an object type to allow or prevent searches for objects of that type
and its subtypes (default: true). Valid values: 0 (false) and 1 (true). The client application must read
this attribute. If is_searchable is false for a type or attributes, Webtop does not display them in the
search UI.
Lightweight sysobjects (LWSOs)
Lightweight sysobjects group the attribute values that are identical for a large set of objects. This
redundant information is shared among the LWSOs from the shared parent object. For LWSOs like
dm_message_archive, the client application must configure searchable attributes. Use CREATE
TYPE and ALTER TYPE FULLTEXT SUPPORT switches to specify searchable attributes. For
more information on this configuration, see EMC Documentum Content Server DQL Reference. For
information on supporting extended search with LWSOs, see EMC Documentum Search Development
Guide.
Aspects
Properties associated with aspects are not indexed by default. If you wish to index them, use an
ALTER ASPECT statement to identify the aspects you want indexed. For more information on this
statement, see Documentum Content Server DQL Reference Manual.

Folder descend queries


Folder descend query performance depends on the folder hierarchy (number of folders) and data
distribution across folders. The following conditions can degrade query performance:
EMC Documentum xPlore Version 1.3 Administration and Development Guide

223

Search

Many folders, and a large portion of them are empty.


Folder descend queries are consistently slow, or slow the first time but faster the next time because
they are cached. Decrease folder_cache_limit in the dm_ftengine_config object. You can also add
more memory for the xDB cache and the xPlore host.
The query is unselective but the folder constraint is selective (too many descending folders).
Increase folder_cache_limit in the dm_ftengine_config object.
Many folders and low memory capacity.
This environment causes high I/O and slow response time. Decrease folder_cache_limit in the
dm_ftengine_config object.
You can detect the folder descend problem in the timeout stack trace. The timeout occurs during
execution of the FilterFoldersFunction:
<event timestamp="2012-01-18 16:48:38,451" level="WARN"
thread="pool-12-thread-9" logger="com.emc.documentum.core.fulltext.search"
timeInMilliSecs="1326905318451">
<message ><![CDATA[Xhive exception message: INTERRUPTED]]></message>
<throwable><![CDATA[com.xhive.error.XhiveInterruptedException: INTERRUPTED
... at com.emc.documentum.core.fulltext.indexserver.services.folders.
FilterFoldersFunction.executeXQuery(FilterFoldersFunction.java:86)

Set the folder_cache_limit in the dm_ftengine_config object to the expected maximum number of
folders in the query (default = 2000). If the folder descend condition evaluates to less than the
folder_cache_limit value, then folder IDs are pushed into the index probe, making the query much
faster. If the condition exceeds the folder_cache_limit value, the folder constraint is evaluated
separately for each result.

DQL, DFC, and DFS queries


DFC-based client applications use the DFC query builder package to translate a query into an XQuery
statement. DFS similarly generates XQuery statements. If your application issues DQL queries, the
Content Server query plugin for xPlore translates the DQL into an XQuery expression. Any part of the
query that is not full-text compliant (NOFTDQL) is evaluated in the Content Server database, and the
results are combined with results from the XQuery.
You can also turn off XQuery generation to force DQL for one of the following use cases:
Use fast_compatible_mode to support wildcard search in full-text. (Explicit wildcards are already
supported for metadata searches.)
To evaluate security on the Content Server security. By default, search results are filtered for security
in xPlore before they are returned to the Content Server, causing much faster performance.
To get the most recent metadata updates before they are indexed.
Note: When XQuery generation is turned off, search performance is worse. Many search features are
not supported in DQL as described in the following table.

224

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

Table 27

Differences between DQL and DFC/DFS queries

DQL

DFC and DFS

Returned attribute values are read from the database.


No latency when objects are updated.

Returned attribute values are read from xPlore. Short


latency (in the minute range) between Content Server
and xPlore.

No latency for security evaluation

Short latency (in the minute range) for security


updates but faster search results

No VQL equivalent

Extended object search (VQL-type support). See


Rewriting VQL queries, page 206.

No facets

Out-of-the-box and custom facets

No hit count

Hit count

Hints file supported

Hints file not supported unless XQuery generation


is turned off.

Fragment search

Fragment search supported (by default, wildcard must


be explicit) and configurable

No fuzzy search

Fuzzy search supported and configurable

Sorting results is performed by the database on the


results returned by xPlore. Using sort in DQL is
discouraged in particular for unselective queries or
queries with high security filtering.

Sorting results supported in xPlore with the


appropriate index configuration.

Sequential queries

Parallel queries across several collections.

No paging

Paging of results

Stemming

Stemming can be disabled at the DFC level for


EQUALS or NOT_EQUALS constraints, to return
only exact matches

Thesaurus support only on the full-text contraint

Thesaurus support on both full-text and metadata


constraints

Content Server and DFC client search differences


Support for new search features has been added progressively to versions of Content Server. The
following table illustrates the supported features by Content Server and DFC version. On all but the
most recent version of Content Server or DFC, these features require latest hotfix.
DFC 6.6 search service and higher generates XQuery unless you set one of the following:
dfc.search.xquery.generation.enable is false
ftsearch_security_mode in dm_ftengine_config is 0
acl_check_db in dm_ftengine_config is True, T, or 1
Table 28

Supported search features

Feature

Content Server

DFC search service generates DQL


query.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

DFC on client
6.5 SP2, SP3. (6.6 and higher if
XQuery generation is turned off.)

225

Search

Feature

Content Server

DFC search service generates


XQuery.

DFC on client
6.6, 6.7.x unless XQuery
generation is turned off.

Facets

No dependency

6.6 or higher

Thesaurus

6.5 SP2, SP3, 6.6, 6.7.x

6.5 SP2, SP3, 6.6, 6.7.x (DQL),


6.7.x (DQL and XQuery)

Fuzzy search

6.7.x

6.7.x XQuery and DQL

Query subscription

6.7 SP1

6.7 SP1

Extended object search

6.6 or higher

Wildcards in metadata

6.6 or higher

API for exact match

6.7.x XQuery only

Debugging enhancement

6.7.x

DQL Processing
The DFC and DFS search services by default generate XQuery expressions, not DQL, for xPlore.
DQL hints in a hints file are not applied. You can turn off XQuery generation in dfc.properties so
that DQL is generated and hints are applied. Do not turn off XQuery generation if you want xPlore
capabilities like facets.
If query constraints conform to FTDQL, the query is evaluated in the full-text index. If all or part of
the query does not conform to FTDQL, only the SDC portion is evaluated in the full-text index. All
metadata constraints are evaluated in the Content Server database, and the results are combined.
The following configurations turn off XQuery and render a query in DQL:
dfc.search.xquery.generation.enable = false in dfc.properties
ftsearch_security_mode is 0. See Changing search results security, page 51.
acl_check_db is true. See Changing search results security, page 51.

Enabling DQL hints (turn off XQuery)


The Webtop search components use the DFC query builder package to construct a query. The DFC
query builder adds the DQL hint TRY_FTDQL_FIRST. This hint prevents timeouts and resource
exceptions by querying the attributes portion of a query against the repository database. The query
builder also bypasses lemmatization by using a DQL hint for wildcard and phrase searches.
For information on using a hints file, see EMC Documentum Search Development Guide.
To use a DQL hints file or hints in a DQL query, disable XQuery generation by DFC or DFS. The hints
file allows you to specify certain conditions under which a database or standard query is done in place
of a full-text query. Turn off XQuery generation by adding the following setting to dfc.properties on
the DFC client application:
dfc.search.xquery.generation.enable=false

226

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

Unsupported DQL
xPlore does not support the DQL SEARCH TOPIC clause or pass-through DQL.

Comparing DQL and XQuery results


DQL and XQuery results can be compared by testing a query with the DFC search service. First, run
an XQuery expression (default). Next, turn off XQuery generation and generate DQL.
Sometimes you see different results if you query for metadata on an object that has not yet been
indexed. A query in DQL returns results directly from the repository. A query in XQuery does not
return results until xPlore has updated the index.

DQL hints migration


The following table lists the DQL hint support in xPlore. For a full description of the referenced DQL
hints, see "Including DQL hints" in Documentum Content Server DQL Reference Manual.
Table 29

DQL hint migration for xPlore

DQL

xPlore

RETURN TOP N

Not needed by xPlore. DFC search service asks for


the exact number of results.

FT_CONTAIN_FRAGMENT

Default is FT_CONTAIN_WORD. This setting is


deprecated. Modify ft_engine_config to support
fragments. See Configuring wildcards and fragment
search, page 218.

ENABLE(dm_fulltext(qtf_lemmatize=0|1)

Turn on DQL generation in dfc.properties or turn off


lemmatization in xPlore indexserverconfig.xml.

FOR READ/BROWSE/DELETE/MODIFY option

Turn on DQL generation in dfc.properties. (Not a hint,


but used to configure queries in DQL hint manager.)

FT_COLLECTION

Use DFC query builder addPartitionScope() or turn


on DQL generation

TRY_FTDQL_FIRST, NOFTDQL

Turn on DQL generation in dfc.properties.

FTDQL

Use default XQuery generation unless other hints are


added

ROW_BASED or database-specific hints

No equivalent

All other hints

Turn on DQL generation in dfc.properties.

Tracing Documentum queries


You can trace queries using the MODIFY_TRACE apply method. On Windows, this command controls
tracing for all sessions. On Linux, tracing is session-specific. There are four possible levels of tracing
queries in Documentum environments. You can trace subsystems with one of the following values:
all
Traces everything (sum of cs, ftplugin, and ftengine).
EMC Documentum xPlore Version 1.3 Administration and Development Guide

227

Search

cs
Traces Content Server search operations such as initializing full-text in-memory objects and the
options used in a query.
ftplugin
Traces the query plugin front-end operations such as DQL translation to XQuery, calls to the back
end, and fetching of each result.
ftengine
Traces back-end operations: HTTP transactions between the query plugin and xPlore, the request
stream sent to xPlore, the result stream returned from xPlore, and the query execution plan.
none

Turning tracing on or off


Use the iAPI command to turn off tracing:
apply,c,NULL,MODIFY_TRACE,SUBSYSTEM,S,fulltext,VALUE,S,none

Use the iAPI command to turn on tracing:


apply,c,NULL,MODIFY_TRACE,SUBSYSTEM,S,fulltext,VALUE,S,all

Tracing in the log


Trace messages are written to $DOCUMENTUM/dba/log/fulltext/fttrace_<repository_name>.log. The
log entry contains the following information:
Request query ID, so that you can find the translated query in the xPlore fulltext log
($DOCUMENTUM/dba/log/fullext/fttrace_repository_name.log).
The XQuery that was translated from DQL
The request and response streams, to diagnose communication errors or memory stream corruption
dm_ftengine_config options
The query execution plan is recorded in the xplore log dsearch.log in the logs subdirectory of the
JBoss deployment directory. You must set trace to ftengine or all.

Supporting subscriptions to queries


About query subscriptions
Installing the query subscription DAR
QuerySubscriptionAdminTool

About query subscriptions


Overview, page 229
Query subscription creation, page 229
228

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

Query subscription execution, page 230

Overview
Query subscriptions is a feature in which a user can:
Specify to automatically run a particular saved search (full-text or metadata-only) at specified
intervals (once an hour, day, week, or month) and return any new results.
The results can be discarded or saved. If the results are saved, they can be merged with or replace
the previous results.
Unsubscribe from a query.
Retrieve a list of their query subscriptions.
Be notified of the results via a dmi_queue_item in the subscribed user Inbox and, optionally, an
email.
Execute a workflow, for example, a business process defined in xCP.
Query subscriptions run in Content Server 6.7 SP1 or higher with DFC 6.7 SP1 or higher. Support for
query subscriptions is installed with the Content Server. A DFC client like Webtop or CenterStage
must be customized using DFC 6.7 SP1 or higher to present query subscriptions to the user.
Because automatically running queries at specified intervals can negatively affect xPlore performance,
tune and monitor query subscription performance.

Query subscription creation


When a user subscribes to a query, the following objects are created:
A dm_relation object relates the dm_smart_list object to a single dm_ftquery_subscription object.
A dm_ftquery_subscription object specifies the attributes of the subscription and the most recent
query results.
A user can subscribe to a dm_smart_list object only once.
Note: Queries are saved as dm_smart_list objects through the DFC Search Service API or any user
interface that exposes that API like Webtop or Taskspace Advanced Search.
A dm_smart_list object contains the query.
Query Subscription Object Model, page 230 illustrates how the different objects that comprise query
subscriptions are related. A query can be subscribed to multiple times. A single dm_smart_list
object can have multiple dm_relation objects, and each single dm_relation object, in turn, is related
to a single dm_ftquery_subscription object. For example, both Subscription1 and SubscriptionN
are related to the same dm_smart_list, SmartListX, but through different dm_relation objects,
SubscriptionRelation1 and SubscriptionRelationN, respectively. Furthermore, Subscription1 and
SubscriptionN have different characteristics. For example, Subscription1 is executed once a week
whereas SubscriptionN executes once an hour. There is only one subscriber per subscription; that is,
the subscriber of Subscription1 is user1 and the subscriber for SubscriptionN is user2.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

229

Search

Figure 16

Query Subscription Object Model

Query subscription execution


When one of four pre-installed jobs run, the following sequence of actions occurs:
1. Matching query subscriptions are executed sequentially.
A matching dm_ftquery_subscription object is one that has a frequency attribute value that
matches the job methods -frequency value. For example, a job method frequency value of
hourly executes all matching dm_ftquery_subscription objects once an hour.
Note: All queries run under the subscriber user account.
2. One of the following conditions occurs:
a.

If new results are found, then the new results are returned and one of the following occurs:
(Default) A dmi_queue_item is created and, optionally, an email is sent to the subscribing
user.
A custom workflow is executed. If the workflow fails, then a dmi_queue_item describing
the failure is created.
Note: You must create this workflow.

230

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

b.

If no new results are found, then the next matching query subscription is executed.

3. Depending on the result_strategy attribute value of the dm_ftquery_subscription object, the new
results:
Replace the current results in the dm_ftquery_subscription object.
Merge with the current results in the dm_ftquery_subscription object.
Are discarded.
Note: The number of results returned per query as well as the total number of results saved are
set in the dm_ftquery_subscription object max_results attribute.
4. The next matching query subscription is executed.
5. After all matching query subscriptions have been executed, the job stops and a job report is saved.
Note: If the stop_before_timeout value (the default is 60 seconds) is reached, then the job is
stopped and any remaining query subscriptions are executed when the job runs next time.

Installing the query subscription DAR


If you have installed Content Server 6.7 SP1 or higher, you do not need to install the query subscription
DAR.
The installation files are located in xplore_home/setup/qbs.
The query subscription DAR file contains:
An SBO that provides functionality to subscribe, unsubscribe, and list the current subscriptions
for a user.
A TBO that provides functionality to run the saved search query and save results.
dm_ftquery_subscription object type
dm_qbs_relation object (dm_relation_type)
Jobs and an associated Java class that runs the subscribed queries.
A query subscription test program with which the administrator can validate that query subscriptions
were set up properly.
1. Copy qbs.dar, DarInstall.bat or DarInstall.sh, and DarInstall.xml from xplore_home/setup/qbs to a
temporary install directory.
2. Edit DarInstall.xml:
a. Specify the full path to qbs.dar including the file name, as the value of the dar attribute.
b. Specify your repository name as the value of the docbase attribute.
c. Specify the repository superuser name as the value of the username attribute.
d. Specify the repository superuser password as the value of the password attribute.
For example:
<emc.install dar="C:\Downloads\qbs.dar" docbase="xPlore1"
username="Administrator"
password="password" />

3. Edit DarInstall.bat (Windows) or DarInstall.sh (Linux).


a. Specify the path to the composerheadless package as the value of ECLIPSE. For example:
EMC Documentum xPlore Version 1.3 Administration and Development Guide

231

Search

set ECLIPSE="C:\Documentum\product\6.6\install\composer\ComposerHeadless"

b. Specify the path to the file DarInstall.xml in a temporary working directory (excluding the file
name) as the value of BUILDFILE. For example:
set BUILDFILE="C:\DarInstall\temp"

c. Specify a workspace directory for the generated Composer files. For example:
set WORKSPACE="C:\DarInstall\work"

4. Launch DarInstall.bat (Windows) or DarInstall.sh (Linux) to install the query subscription SBO.
On Windows 2008, run the script as administrator.

Testing query subscriptions


Before starting testing query subscriptions, make sure that the following prerequisites are satisfied:
With Content Server 6.7 or lower, you installed the query subscription DAR file. See Installing the
query subscription DAR, page 231.
You installed JDK 6.
With Content Server 7.0 or higher, you activated query subscription jobs.
You included the following JAR files in the Java classpath:
aspectjrt.jar
qbs.jar
server-test.jar
dfc.jar
commons-lang-2.4.jar
log4j.jar

You execute the com.documentum.test.qbs.Tqbs.java program to test:


Subscribing to a query for a specific user ( -SubscribeAndVerify flag)
Unsubscribing from a query for a specific user (-UnsubscribeAndVerify flag)
Running a job and verifying that it completes successfully (-RunAndVerifyJob flag)
Deleting a smartlist and verifying that the associated dm_relation and dm_ftquery_subscription
objects are deleted (-VerifyDeleteCascading flag)
Note: You can also use qbsadmin.bat or qbsadmin.sh. See the qbsadmin.bat and qbsadmin.sh usage
instructions., page 241
1. If you have not done so already, create a dm_smart_list object, which is a saved query.
You can use a Documentum client (such as Webtop) to save a search, which creates a
dm_smart_list object.
2. Execute the Tqbs.java class.
For example, executing the following command with the -h flag provides the syntax:
%JAVA_HOME%\bin\java" -classpath "C:\documentum\config;.\lib\aspectjrt.jar;
.\lib\qbs.jar;.\lib\server-test.jar;.\lib\dfc.jar;.\lib\commons-lang-2.4.jar;
.\lib\log4j.jar" com.documentum.test.qbs.Tqbs -h

232

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

Subscription reports
When you support query subscriptions, monitor the usage and query characteristics of the users with
subscription reports. If there are many frequent or poorly performing subscriptions, increase capacity.

Finding frequent or slow subscription queries


You can run the following reports to troubleshoot query subscription activity. Use the report to find
both frequent and poorly performing queries.
QBS activity report by user: Find the users whose subscription queries consume the most resources
or perform poorly. Filter by date range. Set the number of users to display
QBS activity report by ID: Find the subscription queries that consume the most resources or
perform poorly. Filter by date range and user name. Order by Total processing time (descending),
or Frequency (ascending). Set the number of IDs (N) to display. If you order by total processing
time, N subscriptions with the longest query times are displayed. If you order by job frequency,
N subscriptions with the shortest job frequency are displayed.
The Top N slowest query: Find the slowest subscription queries. Filter by Subscription query type.

Finding subscriptions that use too many resources


Subscribed queries can consume too many resources. Use the QBS activity report by ID to find poorly
performing queries. Use the QBS activity report by user to find users who consume the most resources.
A user can consume too many resources with the following subscriptions:
Too many subscriptions
Queries that are unspecific (too many results)
Queries that perform slowly (too many criteria)

Users complain that subscriptions do not return results


Troubleshooting steps:
Use Documentum Administrator to make sure that the job was run. Select the job report.
Check that the user query was run using QBS activity report by user. If the query returns no results,
set a lower frequency, depending on business needs.
Check the query itself to see if it is properly formulated to return the desired results. The query may
be searching on the wrong terms or attributes. In this case, reformulate the saved query.

Subscription logging
Subscribed queries are logged in dsearch.log with the event name QUERY_AUTO. The following
information is logged:
<event name="QUERY_AUTO" component="search" timestamp="2011-08-23T14:45:09-0700">
..
<application_context>

EMC Documentum xPlore Version 1.3 Administration and Development Guide

233

Search

<query_type>QUERY_AUTO</query_type>
<app_name>QBS</app_name>
<app_data>
<attr name="subscriptionID" value="0800020080009561"/>
<attr name="frequency" value="DAILY"/>
<attr name="range" value="1015"/>
<attr name="jobintervalinseconds" value="86400"/>
</app_data>
</application_context>
</event>

Key:
subscriptionID is set by the QBS application
frequency is the subscription frequency as set by the client. Values: HOURLY, DAILY, WEEKLY,
MONTHLY.
range reports time elapsed since last query execution. For example, if the job runs hourly but the
frequency was set to 20 minutes, the range is between 0 and 40 minutes (2400 seconds). Not
recorded if the frequency is greater than one day.
jobintervalinseconds is how often the subscription is set to run, in seconds. For example, a value
86400 indicates a setting of one day in the client. Not recorded if the frequency is greater than
one day.

dm_ftquery_subscription
Represents a subscribed query.

Description
Supertype: SysObject
Subtypes: None
Internal name: dm_ftquery_subscription
Object type tag: 08
A dm_ftquery_subscription object represents subscription-specific information but not the saved query
itself, which is contained in a dm_smart_list object.

Properties
The table describes the object properties.
Table 30

dm_ftquery_subscription type properties

Property

Datatype

Single or repeating

Description

frequency

CHAR(32)

How often the subscription is to


run. Valid value is represented in
frequency query subscription job
parameter.

last_exec_date

TIME

Last date and time that the


subscription was executed.

234

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

Property

Datatype

Single or repeating

Description

subscriber_name

CHAR(32)

Name of the user who is uses this


subscription.

zone_value

INTEGER

Zone to which this subscription


belongs. With this value, all
subscriptions with the same
frequency can be picked by
different jobs (User can customize
jobs such that those jobs will be
run on the same interval but with
different value in job argument of
zone_value. Specify this value to
run jobs when there are too many
subscriptions for a single job.

result_strategy

INTEGER

Integer that indicates whether


existing results that are saved in the
dm_smart_list are to be replaced
with the new results (0, the default),
merged with the new results (1), or
the new results are to be discarded
(2).
The saved query results are sorted
as follows:
If the result_strategy is set to 0,
then the new results are sorted
from the highest to lowest score.
If the result_strategy is set to 1,
then the new results are listed
first and sorted from the highest
to lowest score; then the existing
results are listed next and sorted
from the highest to lowest score.

workflow_id

ID

Process ID of the workflow to


be executed by the job. If this
value is null, then the notification
is executed through queue item
creation when any result is returned.

dm_qbs_relation object
A dm_relation_type object that relates the subscription (dm_ftquery_subscription object) to the original
dm_smart_list object.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

235

Search

Table 31

dm_qbs_relation properties

Property

Value

Notes

object_name

dm_qbs_relation

All query subscription-created


dm_relation objects have this
name.

parent_type

dm_smart_list

None.

child_type

dm_ftquery_subscription

None.

security_type

CHILD

None.

direction_kind

integrity_kind

If a dm_smart_list object is
deleted, then the corresponding
dm_relation and dm_ftquery
subscription objects are deleted.

Query subscription jobs


A job is executed for each query subscription at a specified interval. Results are returned via an inbox queue item
and an email or a workflow. Query subscription jobs are inactive by default, if not already done, enable them
in Documentum Administrator.

Overview
Each of these jobs execute all query subscriptions that are specified to execute at the corresponding
interval:
Job Name

Description

dm_FTQBS_HOURLY

Executes all query subscriptions that are to be


executed once an hour.

dm_FTQBS_DAILY

Executes all query subscriptions that are to be


executed once a day.

dm_FTQBS_WEEKLY

Executes all query subscriptions that are to be


executed once a week.

dm_FTQBS_MONTHLY

Executes all query subscriptions that are to be


executed once a month.

Each job executes its query subscriptions in ascending order based on each subscription last_exec_date
property value. If a query subscription is not executed, it is executed when the job runs next.
Note: A job is stopped gracefully just before it is timed out.

Method arguments
Argument

Description

-frequency

(Required) Selects the corresponding subscriptions.


Valid values: hourly, daily, weekly, monthly

236

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

Argument

Description

-stop_before_timeout

(Optional) Number of seconds before which you want


the job to stop before timing out. Default value: 60.

-zone_value

(Optional) An integer that matches subscriptions with


the same zone_value. If this argument is specified,
then a dm_ftquery_subscriptions zone_value and
frequency attributes must match the corresponding
method arguments in order for a subscription to be
executed by the job.

-search_timeout

(Optional) Number of milliseconds that the job runs


before it times out. Default: 60000.

-max_result

(Optional) Maximum number of query results that can


be returned as well as maximum number that can be
saved in the subscription object. Default: 50.

Reports
Job reports are stored in:
$DOCUMENTUM\dba\log\sessionID\sysadmin
Job Name

Report File

dm_FTQBS_HOURLY

FTQBS_HOURLYDoc.txt

dm_FTQBS_DAILY

FTQBS_DAILYDoc.txt

dm_FTQBS_WEEKLY

FTQBS_WEEKLYDoc.txt

dm_FTQBS_MONTHLY

FTQBS_MONTHLYDoc.txt

Custom jobs
The job method -zone_value parameter is meant for partitioning the execution of query subscriptions
among multiple custom jobs that run on the same interval. A custom job executes every
dm_ftquery_subscription that has the same zone_value and frequency attribute values as the custom
job. You must specify a -zone_value value for every custom job that runs on the same interval and that
value must be unique amongst all those custom jobs. If a job does not specify a -zone_value value, then
it will execute all subscriptions on the same interval regardless of each subscriptions zone_value value.
Note: None of your custom jobs should have the same interval as any of the pre-installed jobs, because
the pre-installed jobs do not have a -zone_value specified and will execute all subscriptions on the
same interval regardless of their zone_value value.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

237

Search

Query subscription workflows


A workflow can be called by the query subscription job. Make sure that your workflow executes
correctly before using it with query subscription jobs. For example, execution of the workflow will fail
if the subscriber does not have at least RELATE permissions on the workflow.
Note: You can create a workflow using Documentum Process Builder.

Requirements
Activities:
One starting activity is required.
Only one starting activity can be specified.
The starting activitys name must be: QBS-Activity-1
Packages:
One package is required.
Only one package can be specified.
The package name must be: QBS-Package0
The package type must be dm_fquery_subscription.
The subscription ID must be passed as the package.

IQuerySubscriptionSBO
Provides the functionality to subscribe to, unsubscribe from, and list query subscriptions.

Interface name
com.documentum.server.impl.fulltext.qbs.IQuerySubscriptionSBO

Imports
import com.documentum.server.impl.fulltext.qbs.IQuerySubscriptionSBO;
import com.documentum.server.impl.fulltext.qbs.QuerySubscriptionInfo;
import com.documentum.server.impl.fulltext.qbs.impl.QuerySubscriptionException;

DAR
QBS.dar

Methods
public IDfId subscribe (String docbaseName,IDfId smartListID,
String subscriber, String frequency, IDfId workFlowID, int
zoneValue, IDfTime lastExecDate, int resultStrategy) throws
DfException,QuerySubscriptionException

Validates the dm_smart_list object ID and subscriber name in the specified repository; validates
the frequency value with all query
238

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

subscription jobs with the job method argument -frequency. Creates a dm_ftquery_subscription
and dm_relation objects. The object ID of dm_ftquery_subscription object is returned.
The workflow template ID can be set to null, if not applicable.
For zone_value, specify -1, if not applicable.
For lastExecDate, specify DfTime.DF_NULLDATE, if not applicable.
For resultStrategy: Integer that indicates whether existing results that are saved in the dm_smart_list
are replaced with the new results (0, the default), merged with the new results (1), or the new results
are discarded (2). Specify -1, if not applicable.
public IDfId subscribe (String docbaseName,IDfId smartListID, String
subscriber, String frequency, IDfId workFlowID, int zoneValue,
IDfTime lastExecDate, int resultStrategy, String subTypeName, Map
customAttrAndValue) throws DfException,QuerySubscriptionException

You can create a subtype of dm_ftquery_subscription that has custom attributes. It enables you to
display additional information related to the subscriptions in your application.
Creates a subscription with a subtype of dm_ftquery_subscription and its relation object based
on the passed-in parameters.
The method parameters are similar to the ones of the previous method with two additional
parameters: subTypeName and customAttrAndValue.
For subTypeName, specify the type name which is a subtype of dm_ftquery_subscription.
For customAttrAndValue, specify a map with attribute name and attribute value as key-value pair.
For single-value attributes, indicate the value in its original datatype in value. For repeating
attributes, indicate a List of values.
public boolean unsubscribe (String docbaseName, IDfId smartListID, String
subscriber) throws DfException,QuerySubscriptionException

Unsubscribe service destroys the dm_relation and dm_ftquery_subscription objects that are
associated with the specified dm_smart_list and subscriber.
public List getSubscribedSmartList(String docbaseName, String
subscriber)throws DfException

Returns a list of all of the specified user subscriptions.


public QuerySubscriptionInfo getSubscriptionInfo(String docbaseName,
IDfId smartlistId, String subscriber) throws DfException

Returns information for a subscription based on the dm_smart_list object ID and subscriber name.
The information includes: the dm_smart_list object ID and name, the subscription ID, the frequency,
the workflow ID, the last execution date, and the zone value.

For more information


For more information about invoking SBOs, see the EMC Documentum Foundation Classes
Development Guide.
For examples of calling this SBO, see the source code for the following class:
com.documentum.test.qbs.Tqbs
com.documentum.server.impl.fulltext.qbs.admin.QuerySubscriptionAdminTool
To help you extend the functionality by creating subtypes of dm_ftquery_subscription, you can use
the reference project qbs.zip located at xplore_home/setup/qbs. The EMC Documentum Composer
6.7 User Guide describes how to designate a reference project. Use Composer 6.7 SP1 to load the
query-based subscriptions reference project.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

239

Search

IQuerySubscriptionTBO
Manages basic query subscription execution.

Interface name
com.documentum.server.impl.fulltext.qbs.IQuerySubscriptionTBO

Imports
import com.documentum.server.impl.fulltext.qbs.IQuerySubscriptionTBO;
import com.documentum.server.impl.fulltext.qbs.results.DfResultsSetSAXDeserializer;

DAR
QBS.DAR

Methods
public void setSmartListId(IDfId smartListId)
Sets the dm_smart_list object ID associated with the dm_ftquery_subscription object. This method
must be called before calling runRangeQuery().
public IDfResultsSet runRangeQuery(String docbaseName, IDfTime from)
throws DfException, IOException, InterruptedException

Executes a query saved in a dm_smart_list object from the specified date/time in the from parameter.
If from is not a nulldate, a range is added to the search query with a condition like "r_modify_date >
= from". If from is a nulldate, then no range condition is added to the search query.
public void setResults(IDfResultsSet results)
Saves the results to dm_ftquery_subscription.
public IDfResultsSet getResults() throws DfException
Gets the results that are saved in dm_ftquery_subscription.
public void setSearchTimeOut(long timeout)
Sets the number of milliseconds that the search runs before it times out.
public long getSearchTimeOut()
Gets the number of milliseconds that the search runs before it times out.
public void setMaxResult(int max); public int getMaxResult()
Sets the maximum number of query results that can be returned as well as maximum number that
can be saved in the subscription object.
public int getMaxResult()
Gets the maximum number of query results that can be returned as well as maximum number
that can be saved in the subscription object.
public void setResultStrategy(int resultStrategy)
Integer indicates whether existing results that are saved in the dm_smart_list are replaced with the
new results (0, the default), merged with the new results (1), or the new results are discarded (2).
Note: doSave() updates the last_exec_date of the subscription based on this value.
240

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

public int getResultStrategy()


Gets the result strategy.
public void setQBSAttrs(Map<String,String>qbsInfo)
Sets query subscription information, such as subscription ID.
public Map<String,String> getQBSAttrs()
Returns a hash map contains key-value pairs for query subscription information.

Notes
Extending this TBO is not supported.

For more information


See the Javadoc for more information about this TBO.
For more information about invoking TBOs, see the EMC Documentum Foundation Classes
Development Guide.

QuerySubscriptionAdminTool
Class name
com.documentum.server.impl.fulltext.qbs.admin.QuerySubscriptionAdminTool

Usage
You use
com.documentum.server.impl.fulltext.qbs.admin.QuerySubscriptionAdminTool to:

Subscribe to a query for a specific user (-subscribe flag)


Unsubscribe from a query for a specific user (-unsubscribe flag)
List all subscribed queries for a specific user (-listsubscription flag)
Note: All parameter values are passed as string values and must be enclosed in double quotes if
spaces are specified in the value.
To display the syntax, specify the -h flag. For example:
C:\Temp\qbsadmin>"%JAVA_HOME%\bin\java" -classpath "C:\Documentum\config;
.\lib\qbs.jar;.\lib\qbsAdmin.jar;.\lib\dfc.jar;.\lib\log4j.jar;
.\lib\commons-lang-2.4.jar;.\lib\aspectjrt.jar"
com.documentum.server.impl.fulltext.qbs.admin.QuerySubscriptionAdminTool -h

Note: In xplore_home/setup/qbs/tool, qbsadmin.bat and qbsadmin.sh demonstrate how to call this


class. In qbsadmin.bat and qbsadmin.sh, modify the path to the dfc.properties file. You can also change
the -h flag to one of the other flags.

Required JARs
qbs.jar
qbsAdmin.jar
dfc.jar

EMC Documentum xPlore Version 1.3 Administration and Development Guide

241

Search

log4j.jar
commons-lang-2.4.jar
aspectjrt.jar

-subscribe example
C:\Temp\qbsadmin>"%JAVA_HOME%\bin\java"
-classpath "C:\Documentum\config;.\lib\qbs.jar;.\lib\qbsAdmin.jar;
.\lib\dfc.jar;.\lib\log4j.jar;.\lib\commons-lang-2.4.jar;
.\lib\aspectjrt.jar"
com.documentum.server.impl.fulltext.qbs.admin.QuerySubscriptionAdminTool
-subscribe D65SP2M6DSS user1 password password1 080000f28002ef2c daily

-subscribe output
subscribed 080000f28002ef2cfor user user1 succeeded
with subscription id 080000f28002f115

-unsubscribe example
C:\Temp\qbsadmin>"%JAVA_HOME%\bin\java" -classpath
"C:\Documentum\config;.\lib\qbs.jar;.\lib\qbsAdmin.jar;
.\lib\dfc.jar;.\lib\log4j.jar;.\lib\commons-lang-2.4.jar;
.\lib\aspectjrt.jar"
com.documentum.server.impl.fulltext.qbs.admin.QuerySubscriptionAdminTool
-unsubscribe D65SP2M6DSS user1 password passwrod1 080000f28002ef2c

-unsubscribe output
User user1 has no subscriptions on dm_smart_list object
(080000f28002ef2c)

-listsubscription example
C:\Temp\qbsadmin>"%JAVA_HOME%\bin\java" -classpath
"C:\Documentum\config;.\lib\qbs.jar;.\lib\qbsAdmin.jar;
.\lib\dfc.jar;.\lib\log4j.jar;.\lib\commons-lang-2.4.jar;
.\lib\aspectjrt.jar"
com.documentum.server.impl.fulltext.qbs.admin.QuerySubscriptionAdminTool
-listsubscription D65SP2M6DSS user1 password password1

-listsubscription output
Subscriptions for user1 are:
smartList: 080000f28002ef2c frequency: DAILYworkFlowID: 0000000000000000
smartList: 080000f28002ef2f frequency: 5 MINUTESworkFlowID: 0000000000000000

Troubleshooting search
When you set the search service log level to WARN, queries are logged. Auditing queries, page 244
describes how to view or customize reports on queries.

242

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

Testing a query in xPlore administrator


You can search on a full-text string in content or metadata. When query auditing is enabled (default is
true) and search log level is INFO or above, test queries are logged: Filter the search audit report for
QUERY_TEST.
1. In xPlore administrator, expand the Diagnostic and Utilities tree and choose Test search.
2. To search for a keyword, choose Keyword and enter the search string. If you enter multiple terms
in the Keyword field, the XQuery expression is generated using the AND condition.
3. To search using an XQuery expression, choose XQuery and enter the expression. Make sure that
you select the correct domain for the document. If you are copying a query from the log, remove
phrases that begin declare xhive from the beginning of the query.
Security is not evaluated for results from a test search. As a result, the number of items returned does
not reflect hits that are removed after security is applied in the index server. A status of fail or success
indicates that the query did or did not execute; success does not indicate the presence of hits.

Testing a query in Documentum iAPI or DQL


Try a query like the following:
api>?,c,SELECT text,object_name FROM dm_document SEARCH DOCUMENT CONTAINS test
WHERE (a_is_hidden = FALSE)

Saving generated dftxml


The Documentum index agent generates dftxml for a document, then deletes it after the document
has been indexed. To keep the generated dftxml for troubleshooting, add the following element to the
<exporter> section in indexagent.xml:
<keep_dftxml>true</keep_dftxml>

Clear the JBoss tmp and work directories for the index agent application, and restart the Index Agent.
With this change, the index agent saves the dftxml in the data directory.

Debugging from Webtop


If the query fails to return expected results in Webtop, perform a Ctrl-click on the Edit button in the
results page. The query is displayed in the events history as a select statement like the following:
IDfQueryEvent(INTERNAL, DEFAULT): [dm_notes] returned [Start processing] at
[2010-06-30 02:31:00:176 -0700]
IDfQueryEvent(INTERNAL, NATIVEQUERY): [dm_notes] returned
[SELECT text,object_name,score,summary,r_modify_date,...
SEARCH DOCUMENT CONTAINS ctrl-click WHERE (...]

If there is a processing error, the stack trace is shown.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

243

Search

Debugging from DFC


To log XQuery and XML results, set log4j.logger.com.documentum.fc.client.search=DEBUG, stdout
in dfc.properties for the DFC application. The file dfc.properties is located in the WEB-INF/classes
directory of a web application like Webtop or CenterStage.

Determining the area of failure


1. Start at the lowest level, xDB. Use the xDB admin tool to execute the XQuery. (Get the XQuery
from the log and then click the query icon in the admin tool.)
2. If the query runs successfully in xDB, use xPlore administrator to run the XQuery (Execute
XQuery in the domain or collection view).
3. If xPlore administrator runs the query successfully, check the query plugin trace log. See Tracing
Documentum queries, page 227.
4. If there are two counter.xml files in domain_name/Data/ApplicationInfo/group collection, delete
the file that contains the lower integer value.

Auditing queries
Auditing is enabled by default. Audit records are purged on a configurable schedule (default: 30 days).
To enable or disable query auditing, open System Overview in the xPlore administrator left pane.
Click Global Configuration and choose the Auditing tab. Click search to enable query auditing.
For information on configuring the audit record, seeConfiguring the audit record, page 39.
Audit records are saved in an xDB collection named AuditDB. You view or create reports on the audit
record. Query auditing provides the following information:
The XQuery expression in a CDATA element.
The user name and whether the user is a superuser.
The application context, in an application_context element. The application context is supplied by
the search client application.
The query options in name/value pairs set by the client application, in the QUERY_OPTION
element.
The instance that processed the query, in the NODE_NAME element.
The xDB library in which the query was executed, in the LIBRARY_PATH element.
The number of hits, in the FETCH_COUNT element.
The number of items returned, in the TOTAL_HITS element.
The amount of time in msec to execute the query, in the EXEC_TIME element.
The time in msec elapsed to fetch results, in the FETCH_TIME element.
The following security events are recorded for user-generated queries. The audit record reports how
many times these caches were hit for a query. For details on configuring the caches, see Configuring
the security cache, page 54.
How many times the group-in cache was probed for a query, in the GROUP_IN_CACHE_HIT.
244

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

How many times the group-out cache was probed for a query, in the GROUP_OUT_CACHE_HIT
element.
GROUP_IN_CACHE_FILL - How many times the query added a group to the group-in cache.
GROUP_OUT_CACHE_FILL - How many times the query added a group to the group-out cache.
TOTAL_INPUT_HITS_TO_FILTER - How many hits a query had before security filtering.
Number of hits filtered out by security because the user did not have sufficient permission, in the
HITS_FILTERED_OUT element.

The status of the query, in the STATUS element.


To view a query in the audit record, go to Diagnostic and Utilities > Reports and choose Audit
records for search component. You can filter by date and query type. From the results, click a
query of interest to view the XML entry in the report. The XQuery expression is contained within
the QUERY element.
To view warmup queries in the audit record, choose the query type Warmup Tool Query in the Audit
records for search component report..

Search is not available


Full-text service is disabled
Investigate the following possible causes:
No queries are allowed when the search service has not started. You see this error message:
The search has failed: The Full-text service is disabled

The username contains an illegal character for the xPlore host code page.
The wrong query plugin is in use. See Query plugin configuration (dm_ftengine_config), page 222.

Verifying the query plugin version


Queries fail if the wrong (FAST) query plugin is loaded in the ContentServer. Check the
Content Server log after your start the Content Server. The file repository_name.log is located
in $DOCUMENTUM/dba/log. Look for the line like the following. It references a plugin with
DSEARCH in the name.
[DM_FULLTEXT_T_QUERY_PLUGIN_VERSION]info: "Loaded FT Query Plugin:
.../DSEARCHQueryPlugin.dll...FT Engine version: X-Hive/DB 10"

The Content Server query plugin properties of the dm_ftengine_config object are set during xPlore
configuration. If you have changed one of the properties, like the primary xPlore host, the plugin can
fail. Verify the plugin properties, especially the qrserverhost, with the following DQL:
1> select param_name, param_value from dm_ftengine_config
2> go

EMC Documentum xPlore Version 1.3 Administration and Development Guide

245

Search

Connection refused or no collection available


If an API returns a connection refused error, check the value of the URL on the instance. Make
sure that it is valid and that search is turned on for the instance. If the search service is not enabled,
dsearch.log records the following exception:
com.emc.documentum.core.fulltext.common.search.FtSearchException:...
There is no node available to process this request type.

From Documentum DFC clients, the following exception is returned:


DfException: ..."EXEC_XQUERY failed with error:
ESS_DMSearch:ExecuteSearchPassthrough. Communication Error
Could not get node information using round-robin routing.

From Documentum DQL, the following error is returned:


dmFTSearchNew failed with error:
ESS_DMSearch:ExecuteSearch. Communication Error
Could not get node information using round-robin routing.

Error after you changed xPlore host


If you have to change the xPlore host, do the following:
Update indexserverconfig.xml with the new value of the URL attribute on the node element. For
information on viewing and updating this file, see Modifying indexserverconfig.xml, page 43.
Change the JBoss startup (script or service) so that it starts correctly.

Troubleshooting slow queries


Check the audited search events using xPlore administrator.

The attribute cannot be searched


All attributes can be found in a full-text search, but this search can be very slow or time out. Attributes
that are commonly searched should be defined in a subpath definition in indexserverconfig.xml.
Sometimes the datatype is incorrectly specified in indexserverconfig.xml. By default, a custom
attribute is treated as a string datatype. Boolean or datetime attributes are not returned. If the custom
attribute is not a string, set the appropriate type in the subpath definition. Valid values: string | integer |
double | date | datetime. Boolean is not supported.
After you change a subpath, you must reindex. If the attribute is for a new custom type, reindexing
is not required.

The query does not use the index


For slow queries, always check whether the query is running as NOFTDQL against the database (not
xPlore). If a multi-path index is not used to service the query, then the query runs slowly. DQL
246

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

and DFC Search service queries always use the index unless there is a NOFTDQL hint. Some
IDfXQuery-based queries might not use the index.
To detect this issue with query auditing, find the query using the TopNSlowestQueries report (with
user name and day). Click the query ID to get the query text in XML format. Obtain the query plan to
determine which indexes were probed, if any. (Provide the query plan to EMC technical support for
evaluation.) Rewrite the query to use the index.
Test the query in the xDB admin tool
Test the query in xDB admin and see if Using query plan: Index(dmftdoc)/child::dmftkey is in the
query debug output. If not, the query is NOFTDQL (database).
To detect queries that do not use the index (NOFTDQL queries), turn on full-text tracing in the
Content Server:
API>apply,c,NULL,MODIFY_TRACE,SUBSYSTEM,S,fulltext,VALUE,S,all

Look for temp table creation and inserts like the following:
Thu Feb 09 15:50:12 2012 790000: 6820[7756] 0100019f80023909
process_ftquery_to_temp --- will populate temp table in batch size 20000
Thu Feb 09 15:50:12 2012 790000: 6820[7756] 0100019f80023909
build_fulltext_temp --- begin: create the fulltext temporary table.
Thu Feb 09 15:50:13 2012 227000: 6820[7756] 0100019f80023909
BuildTempTbl --- temporary table dmft80023909004 was created successfully.
Thu Feb 09 15:50:13 2012 430000: 6820[7756] 0100019f80023909
Inserting row at index 0 into the table

Caches are too small


xPlore uses caches that reduce disk I/O. Response times are higher until the caches are loaded with
data. You see a slow query the first time and much faster when it is repeated. See Configuring the
security cache, page 54.
Suggested workaround: Increase the size of the xDB buffer cache for higher query rates. Stop all xPlore
instances. Change the value of the property xhive-cache-pages in indexserver-bootstrap.properties
to at least 512 KB (typically 1 GB). The maximum size depends on the physical memory available.
This file is located in the WEB-INF/classes directory of the application server instance, for example,
xplore_home/boss5.1.0/server/DctmServer_PrimaryDsearch/deploy/dsearch.war/WEB-INF/classes.
Restart the xPlore instance.

Security caches are not tuned


If a user is very underprivileged, or the user is the member of many groups, queries can slow due to
small group caches. For instructions on configuring the caches, see Configuring the security cache,
page 54.
For underprivileged users, examine the group_out_cache_fill element in the query audit record. If the
value exceeds the not-in-groups-cache-size, then the cache is too small.
For users who are members of many groups, examine the group_cache_cache_fill element in the query
audit record. If the value exceeds the groups-in-cache-size, then the cache is too small.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

247

Search

Result sets are large


By default, xPlore gets the top 12,000 most relevant results per collection to support a facet window of
10,000 results. Webtop applications consume only 350 results, so the extra result memory is costly
for large user environments or multiple collections (multiple repositories). In an environment with
millions of documents and multiple collections, you could see longer response times or out of memory
messages.
Custom clients can consume a larger result set. Make sure that query auditing is enabled (default).
Examine the number of results in the TopNSlowestQueries report for a specific user and day. If the
number of results is more than 1000, the custom client might return all the results.
Workarounds:
Limit query result set size: Open xdb.properties, which is located in the directory WEB-INF/classes
of the primary instance. Set the value of queryResultsWindowSize to a number smaller than 12000.
Change the client to consume a smaller number of results by closing the result collection early or by
using the DQL hint ENABLE(RETURN_TOP_N).

xPlore security is disabled


When xPlore native security is disabled, Content Server security is much slower than xPlore native
security, because some or many results that are passed to the Content Server are discarded. To detect
the problem with query auditing enabled, examine the number of results in the TopNSlowestQueries
report for a specific user and day. If the number of results is more than 1000, xPlore security might
be disabled and the user is underprivileged. (When the user is underprivileged, many results are
discarded.)
Workaround: Enable xPlore native security. SeeChanging search results security, page 51 .

FAST-compatible wildcard and fragment behavior is


enabled
Many Documentum clients do not enable wildcard searches for word fragments like "car for "careful.
The FAST indexing server supported word fragment searches for leading and trailing wild cards in
metadata. If you enable FAST-compatible wildcard behavior for your Documentum application, you
see slower queries when the query contains a wildcard.
You can fine tune your wildcard support with settings for content or metadata, implicit, or explicit. See
Configuring wildcards and fragment search, page 218.

Slow queries due to frequent document updates


If the update rate is high and the query rate is low, queries can be
impacted. Add the following property to xdb.properties, which is located in
xplore_home/jboss5.1.0/server/instance_name/deploy/dsearch.war/WEB-INF/classes.
xdb.lucene.refreshBlacklistCacheDuringMerge

This property value is false by default. When set to true, blacklist caches are refreshed during non-final
merges.
248

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

Slow warmup
Recent queries are run at startup to warm up the system. Some queries by testers can slow the system.
There are two properties to eliminate slow queries from warmup. You can add either property or both
to query.properties, which is located in xplore_home/dsearch/xhive/admin.
query_response_time: Specify a value in msec for maximum query response time (fetch +
execution). Set to 60000 (60 sec) or less.
exclude_users: Specify a comma-delimited list of users whose queries time out.

Insufficient CPU, disk I/O, or memory


If a query is slow the first time and much faster when it is reissued, the problem is likely due to
insufficient I/O capacity on the allocated drives. Concatenated drives often show much lower I/O
capacity than striped drives because only a subset of drives can service I/O requests.
Workarounds:
If the system has only one or two cores and a high query rate, add more CPUs.
If the system is large but receives complex or unselective queries, make sure that query auditing is
enabled. Examine the TopNSlowestQueries report for the specific user name and day on which the
problem was experienced. Look for high query rates with slow queries. Add more capacity.

Query of too many collections


A query probes each index for a repository (domain) sequentially. Results are collected across
repositories. If there are multiple repositories, or multiple collections within a domain, the query can
take more time. With query auditing enabled, try the query across repositories and then target it to
a specific repository.
Use one of the following solutions for this problem:
Merge collections using xPlore administrator. See Moving a collection, page 162.
Use parallel queries in DFC-based search applications by setting the following property to true in
dfc.properties:
dfc.search.xquery.option.parallel_execution.enable = true

Use the ENABLE(fds_collection collectionname) hint or the IN COLLECTION clause in DQL. See
Routing a query to a specific collection, page 257

User is very underprivileged


If the user is very underprivileged, the security filter discards tens of thousands of results. To detect
this problem with query auditing, find the query using the TopNSlowestQueries report for the specific
user and day. If the number in the Documents filtered out columns is large, it is a security cache issue.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

249

Search

Workaround: Queries can generally be made more selective. If you cannot modify the query, organize
the repository so that the user has access to documents in certain containers such rooms or cases.
Append the container IDs to the user query.

Non-string metadata searches are too slow


By default, all metadata is indexed as string type. Add a subpath definition to indexserverconfig.xml
for non-string metadata. To speed up searches for non-string attributes, add a subpath like the
following. Valid values: string | integer | boolean | double | date | datetime.
<sub-path leading-wildcard="false" compress="false" boost-value="1.0"
include-descendants="false" returning-contents="false"
value-comparison="true" full-text-search="true"
enumerate-repeating-elements="false" type="datetime"
path="dmftmetadata//r_creation_date"/>

Note: If the metadata is used to compute facets, set returning-contents to true.

Unexpected search results


The wrong number of results are returned
There is latency between document creation or modification and indexing. First, check whether the
object has been indexed yet. You can use the following DQL. Substitute the actual object ID of the
document that exists on the Content Server but is not found in search results:
select r_object_id from dm_sysobject search document contains object_id

If the object has been indexed, check the following:


Check user permissions. Run the query as superuser or through xPlore administrator.
ACL and group databases can be out of synch. Run the manual update script aclreplication. See
Manually updating security, page 52.
Query tokens might not match indexed tokens (because of contextual differences). Run the
tokenization test on the query terms and on the sentence containing the terms in the document. See
Troubleshooting content processing, page 111
Make sure that the attribute was not excluded from tokenization. Check indexserverconfig.xml for a
subpath whose full-text-search attribute is set to false, for example:
<sub-path ...full-text-search="false" ...path="dmftmetadata//acl_name"/>

Make sure that counter.xml has not been deleted from the collection
domain_name/Data/ApplicationInfo/group. If it has been deleted, restart xPlore.
Try the query with Content Server security turned on. (See Changing search results security,
page 51.)
Summary can be blank if the summary security mode is set to BROWSE. (See Configuring
summary security, page 214.)
250

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

Changes to configuration are not seen


If you have edited indexserverconfig.xml, your changes are not applied unless the system is stopped.
Some changes, such as adding or modifying a subpath do not take effect on a restart but only when you
rebuild the indexes of the collection.
Modifying indexserverconfig.xml, page 43 describes the procedure to view and update the index
configuration.
Try the following troubleshooting steps:
1. Make sure that the indexing status is DONE. See Checking the status of a document, page 147
2. Verify that the document was indexed to the correct collection. The collection for each document
is recorded in the tracking DB for the domain. Substitute the document ID in the following XQuery
expression, and execute it in the xDB admin tool:
for $i in collection("dsearch/SystemInfo")
where $i//trackinginfo/document[@id="TestCustomType_txt1276106246060"]
return $i//trackinginfo/document/collection-name

3.

Set the save-tokens option to true for the target collection and restart xPlore, then reindex the
document. Check the tokens in the Tokens library to see whether the search term was properly
indexed.

Document is not indexed


Objects of a certain type are not indexed. The query is not recorded in dsearch.log, and the query is
executed in the database (case-sensitive). Check the properties for the object type in Documentum
Administrator. Check Enable Indexing: Register for indexing. You can submit the specific type
for indexing using the index agent UI.

Search is case sensitive


If a query is not full-text compliant, or if the object type is not indexing-enabled in Documentum
Administrator, the query is executed against the database. Database queries are case sensitive.

Document is indexed but not searchable


Try the following troubleshooting steps:
Make sure that the indexing status is DONE. (See Troubleshooting the index agent, page 85.)
Verify that the document was indexed to the correct collection. The collection for each document is
recorded in the tracking DB for the domain. Substitute the document ID in the following XQuery
expression, and execute it in the xDB admin tool:
for $i in collection("dsearch/SystemInfo")
where $i//trackinginfo/document[@id="TestCustomType_txt1276106246060"]
return $i//trackinginfo/document/collection-name

EMC Documentum xPlore Version 1.3 Administration and Development Guide

251

Search

Set the save-tokens option to true for the target collection and restart xPlore, then reindex the
document. Check the tokens in the Tokens library to see whether the search term was properly
indexed.

Foreign language is not identified


Queries issued from Documentum clients are searched in the language of the session_locale. The
search client can set the locale through DFC or iAPI. Because queries are shorter than indexed content,
the language can be misidentified more easily than during indexing. To see the language that was
identified during indexing, open the dftxml in xPlore administrator. For example:
<dmftdoc dmftkey="090012a78000e77a" dss_tokens=":dftxml:1" lang="it">

To see the language that was identified in the query, view the query in dsearch.log. For example:
<message >
<![CDATA[QueryID=primary$f20cc611-14bb-41e8-8b37-2a4f1e135c70,
query-locale=en,...>

Search results differ when searching with different locales, especially compound terms that have
associated components. For example, a search for Stollwerk returned many more results when using
the German than the English locale. Stollwerk is lemmatized as stollwerk in English but as stoll and
werk in German. You can turn off lemmatization. See Configuring indexing lemmatization, page 105.

Document is not found after index rebuild


At the original ingestion, some documents are not embedded in dftxml. In these documents,
the XML content exceeds the value of the file-limit attribute on the xml-content element in
indexserverconfig.xml. The index rebuild process generates a list of object IDs for these documents.
The list is located in xplore_home/data/domain_name/collection_name/index_name/ids.txt, for
example:
C:/xPlore/data/mydomain/default/dmftdoc_2er90/ids.txt

Reingest these large documents.

Search for XML fails


Users can search for a specific element in an XML document. By default, XML content of an input
document is not indexed. To support search in XML content or attributes, change this setting
in indexserverconfig.xml . (For information on viewing and updating this file, see Modifying
indexserverconfig.xml, page 43.) If your documents containing XML have already been indexed, they
must be reindexed to include the XML content.
Change the value of the store attribute on the xml-content element to embed.
Change the value of the tokenize attribute on the xml-content element to true.
Change the value of the index-as-sub-path attribute on the xml-content element to true.
Verify the path value attribute on the xml-content element against the dftxml path. (For the dftxml
DTD, see Extensible Documentum DTD, page 348.) An XPath error can cause the query to fail.
252

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

Wildcard searches fail


By default, wildcards in a search match words in the content of a document, not fragments of a word.
(Wildcards are supported in attribute searches.) The FAST flag fds_contain_fragment parameter
is not evaluated for xPlore.
You can turn on fragment support: Set the fast_wildcard_compatible parameter in the
dm_ftengine_config object to true. Applies to both metadata and search document contains (simple
one-box) search.

Debugging queries
You can debug queries for the following problems:
Query does not return expected results.
Query is very slow (reported in Top N Slowest Queries report).

Debugging a query that does not return results


One of the most common causes of no results is a search for non-string metadata (integer or date) that
is not in the index. The following example executes a query from the domain Execute XQuery
command. If you input a value in Test Search, you can get the XQuery expression to provide to
Execute XQuery.
for $i score $s in /dmftdoc[. ftcontains 9001001 with stemming]
order by $s descending return <d> {$i/dmftmetadata//r_object_id}
{ $i/dmftmetadata//object_name } { $i/dmftmetadata//r_modifier } </d>

No results are returned, because the searched value is a Documentum integer attribute. When you
execute the query with the get query debug option, you see that the value is treated as a string:
query:1:20:for expression .../child::dmftdoc[. contains text 9001001]

Every library is visited, but no result is found:


query:1:20:for expression .../child::dmftdoc[. contains text 9001001]
query:1:20:No query plan with indexes found, walking descendants
query:1:67:No indexes found to support order specs

You must stop the xPlore instances and add a subpath for the non-string attribute. In this example, the
following subpath was added to the dmftdoc category. Note that partial paths are supported, in case the
metadata value is found in more than one path:
<sub-path leading-wildcard="false" compress="true"
boost-value="1.0" description="award number"
include-descendants="false" returning-contents="true"
value-comparison="true" full-text-search="true"
enumerate-repeating-elements="true" type="integer"
path="dmftmetadata//award_no"/>

For the non-string value to be found, we must reindex the domain (or the specific collection, if known).
After reindexing, we have a different result in Test Search:
query:1:99:Using query plan:
query:1:99:index(dmftdoc)

EMC Documentum xPlore Version 1.3 Administration and Development Guide

253

Search

query:1:99:Looking up "(false, true, 030012a7800001dc, 1001)" in index "dmftdoc"


query:1:290:Found an index to support all order specs. No sort required.

Debugging a slow query


Get the XQuery from the Top N Slowest Queries report by clicking the query ID. You see the query in
a window like the following:

Copy the query into a text editor and remove every phrase declare option xhive..., that is, everything
before let $libs.. In the example above, remove the following:
declare option xhive:fts-analyzer-class
com.emc.documentum.core.fulltext.indexserver.core.index.xhive.IndexServerAnalyzer;
declare option xhive:ignore-empty-fulltext-clauses true;
declare option xhive:index-paths-value ...";

Also remove is running]] if it appears at the end of the query.


Open the Data Management tree and choose the library that contains your index. For Documentum
environments, the library name is the same as the repository name. Click Execute XQuery and input
the query from your text editor. Choose the option Get query debug. Click Execute XQuery.
Click the Query debug tab to see results like to the following:
query:1:353:Path expr .../child::dmftdoc[(((
child::dmftmetadata/descendant-or-self::node()/child::a_is_hidden[. = "

254

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

false"] and child::dmftversions/child::iscurrent[. = "true"]) and


. contains text award) and (child::dmftmetadata/descendant-or-self::node(
)/child::r_modify_date[. >= ...] and child::dmftmetadata/
descendant-or-self::node()/child::r_modify_date[. <= ...]))]
query:1:353:Creating query plan on node /TechPubsGlobal/dsearch/Data
query:1:353:for expression .../child::dmftdoc[(((
child::dmftmetadata/descendant-or-self::node()/child::a_is_hidden[. = "
false"] and child::dmftversions/child::iscurrent[. = "true"]) and
. contains text award) and (child::dmftmetadata/descendant-or-self::node(
)/child::r_modify_date[. >= ...] and child::dmftmetadata/
descendant-or-self::node()/child::r_modify_date[. <= ...]))]
query:1:353:No indexes, walking descendants
query:1:353:Creating query plan on node /TechPubsGlobal/dsearch/Data/default
query:1:353:for expression .../child::dmftdoc[(((child::dmftmetadata/
descendant-or-self::node()/child::a_is_hidden[. = "false"] and
child::dmftversions/child::iscurrent[. = "true"]) and . contains
text award) and (child::dmftmetadata/descendant-or-self::node(
)/child::r_modify_date[. >= ...] and child::dmftmetadata/
descendant-or-self::node()/child::r_modify_date[. <= ...]))]
query:1:353:Found index "dmftdoc"
query:1:353:Using query plan:
query:1:353:index(dmftdoc)
query:1:353:Looking up "(false, true, award, 1980-01-01T00:00:00Z,
2010-01-01T00:00:00Z)" in index "dmftdoc"
query:1:353:Creating query plan on node /TechPubsGlobal/dsearch/Data/
collection1
query:1:353:for expression .../child::dmftdoc[(((child::dmftmetadata/
descendant-or-self::node()/child::a_is_hidden[. = "false"] and
child::dmftversions/child::iscurrent[. = "true"]) and . contains
text award) and (child::dmftmetadata/descendant-or-self::node(
)/child::r_modify_date[. >= ...] and child::dmftmetadata/
descendant-or-self::node()/child::r_modify_date[. <= ...]))]
query:1:353:Found index "dmftdoc"
query:1:353:Using query plan:
query:1:353:index(dmftdoc)
query:1:353:Looking up "(false, true, award, 1980-01-01T00:00:00Z,
2010-01-01T00:00:00Z)" in index "dmftdoc"
query:1:353:Creating query plan on node /TechPubsGlobal/dsearch/Data/
collection2
query:1:353:for expression .../child::dmftdoc[(((child::dmftmetadata/
descendant-or-self::node()/child::a_is_hidden[. = "false"] and
child::dmftversions/child::iscurrent[. = "true"]) and . contains
text award) and (child::dmftmetadata/descendant-or-self::node(
)/child::r_modify_date[. >= ...] and child::dmftmetadata/
descendant-or-self::node()/child::r_modify_date[. <= ...]))]
query:1:353:Found index "dmftdoc"
query:1:353:Using query plan:
query:1:353:index(dmftdoc)
query:1:353:Looking up "(false, true, award, 1980-01-01T00:00:00Z,
2010-01-01T00:00:00Z)" in index "dmftdoc"
query:1:353:Creating query plan on node /TechPubsGlobal/dsearch/Data/

EMC Documentum xPlore Version 1.3 Administration and Development Guide

255

Search

NewIACollection
query:1:353:for expression .../child::dmftdoc[(((child::dmftmetadata/
descendant-or-self::node()/child::a_is_hidden[. = "false"] and
child::dmftversions/child::iscurrent[. = "true"]) and . contains
text award) and (child::dmftmetadata/descendant-or-self::node(
)/child::r_modify_date[. >= ...] and child::dmftmetadata/
descendant-or-self::node()/child::r_modify_date[. <= ...]))]
query:1:353:Found index "dmftdoc"
query:1:353:Using query plan:
query:1:353:index(dmftdoc)
query:1:353:Looking up "(false, true, award, 1980-01-01T00:00:00Z,
2010-01-01T00:00:00Z)" in index "dmftdoc"
query:1:643:Found an index to support all order specs. No sort required.

Getting the query execution plan


The query plan can be useful to EMC tech support for evaluating slow queries. The query plan shows
which indexes were probed and the order in which they were probed. Use one of the following options
to save or fetch the query plan:
Using DFC query builder API
save:
IDfXQuery.setBooleanOption(IDfXQuery.FtQueryOptions.
SAVE_EXECUTION_PLAN,true)

retrieve:
IDfXQuery.getExecutionPlan(session)

Using iAPI
save:
apply,c,NULL,MODIFY_TRACE,SUBSYSTEM,S,fulltext,VALUE,S,ftengine

retrieve: The query execution plan is written to dsearch.log, which is located in the logs subdirectory
of the JBoss deployment directory.
Using xPlore search API
save:
IDfXQuery.setSaveExecutionPlan(true)

retrieve:
IFtSearchSession.fetchExecutionPlan(requestId)

Debugging Webtop queries


For queries from Webtop or other WDK applications, you can get the query by performing a Ctrl-click
on the Edit button in the results page. A query window displays events history. The internal
nativequery event contains the XQuery. (You must view the source to get the proper form of the
XQuery, stripping out the enclosing brackets and timestamp.)
256

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search APIs and customization

let $libs :=
(/TechPubsGlobal/dsearch/Data) let $results :=
for $dm_doc score $s in collection($libs)/dmftdoc[
(dmftmetadata//a_is_hidden = "false") and (
dmftversions/iscurrent = "true")
and (. ftcontains "award" with stemming)]
order by $s descending
return $dm_doc return (for $dm_doc in subsequence($results,1,351)
return <r>{for $attr in $dm_doc/dmftmetadata//*[local-name()=(
object_name,r_modify_date,r_object_id,r_object_type,
r_lock_owner,owner_name,r_link_cnt,r_is_virtual_doc,
r_content_size,a_content_type,i_is_reference,r_assembled_from_id,
r_has_frzn_assembly,a_compound_architecture,i_is_replica,
r_policy_id)]
return <attr name={local-name($attr)} type={$attr/@dmfttype}>{
string($attr)}</attr>}{xhive:highlight(($dm_doc/dmftcontents/
dmftcontent/dmftcontentref,$dm_doc/dmftcustom))}
<attr name=score type=dmdouble>{string(dsearch:get-score($dm_doc))}
</attr></r>
)

Note: The XQuery portion of the query is almost identical to the query retrieved through xPlore
administrator. These queries were issued separately, which accounts for differences.
To debug the Webtop query, edit the query from View Source and enter it in the Execute XQuery
dialog in xPlore administrator.

Search APIs and customization


Routing a query to a specific collection
You can route a query to a specific in the following ways:
Route an individual query using the DQL in collection clause to specify the target of a SELECT
statement. By default, DFC does not generate DQL, but you can turn off XQuery generation. (See
Turning off XQuery generation to support DQL, page 259.)
For example:
select r_object_id from dm_document search document contains benchmark
in collection(custom)

Route all queries that meet specific criteria using a DQL hint in dfcdqlhints.xml
enable(fds_query_collection_collectionname) where collectionname is the collection name. If
you use a DQL hint, you do not need to change the application or DFC query builder. You must
turn off XQuery generation. See Turning off XQuery generation to support DQL, page 259.) .
For more information on the hints file, refer to EMC Documentum Search Development Guide.
For example:select r_object_id from dm_document search document contains
benchmark
enable(fds_query_collection_custom)

EMC Documentum xPlore Version 1.3 Administration and Development Guide

257

Search APIs and customization

Implement the DFC query builder API addPartitionScope.


Implement the DFC IDfXQuery API collection()
DFS PartitionScope object in a StructuredQuery implementation

Use DQL
You can route a DQL query to a specific collection in the following ways. By default, DFC does
not generate DQL, but you can turn off XQuery generation. (See Turning off XQuery generation
to support DQL, page 259.)
Route an individual query using the DQL in collection clause to specify the target of a SELECT
statement. Use one of the two following syntaxes.
Collection names are separated by underscores .select attr from type SDC where
enable(
fds_query_collection_collection1_collection2__...)
select attr from type SDC in collection
(collection1,collection2,...)

Collection names are in quotation marks, separated by commas.select r_object_id from


dm_document search document contains report
in collection ( default ) enable(return_top 10)

Route all queries that meet specific criteria using a DQL hint in dfcdqlhints.xml
enable(fds_query_collection_collectionname) where collectionname is the collection name.
For more information on the hints file, refer to EMC Documentum Search Development Guide.
The following hints route queries for a specific type to a known target collection appended to
FDS_QUERY_COLLECTION_.<RuleSet>
<Rule>
<Condition>
<From condition="any">
<Type>my_type</Type>
</From>
</Condition>
<DQLHint>ENABLE(FDS_QUERY_COLLECTION_MYTYPECOLLECTION)</DQLHint>
</Rule>
</RuleSet>

Implement a DFC query builder API


If you have created a BOF module that routes documents to specific collections, customize
query routing to the appropriate collection. Call addPartititionScope(source, collection_name) in
IDfQueryBuilder. See Building a query with the DFC search service, page 259.
For a detailed example of query routing, see "Improving Webtop Search Performance Using xPlore
Partitioning" on the EMC Community Network (ECN).
258

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search APIs and customization

Turning off XQuery generation to support DQL


Add the following setting to dfc.properties on the DFC client application:
dfc.search.xquery.generation.enable=false

Debugging queries
You can debug queries by clicking a collection in xPlore administrator. Choose Execute XQuery for
the target collection or the top-level collection for the repository.
CAUTION: Do not use xhadmin to rebuild an index or change files that xPlore uses. If
you remove segments, your backups cannot be restored. This tool is not aware of xPlore
configuration settings in indexserverconfig.xml.

Building a query with the DFC search service


In a Documentum environment, the DFC APIs encapsulate complex functionality like stemming,
wildcards, fuzzy search, hit count, and facets. With the DFC search service, you can create queries for
one or more full-text indexed or non-indexed Servers. With Federated Search Services (FS2) product,
you can query external sources and the client desktop as well. The DFC interface IDfQueryBuilder
provides a programmatic interface. You can change the query structure, support external sources,
support asynchronous operations, change display attributes, and perform concurrent query execution
in a federation.
IDfQueryBuilder in the package com.documentum.fc.client.search allows you to build queries that
provide the following information:
Source query (getRootExpressionSet)
Source list (required, several scope methods)
Max result count (setMaxHitCount)
Container of source names
Transient search metadata manager bound to the query
Transient query validation flag
Facet definition (addFacetDefinition)
Specific folder location (addLocationScope)
Target for a particular collection (addPartitionScope)
Specify parallel execution across several collections
Sorting (addOrderByAttribute and addASCIIOrderByAttribute)
For examples of query building, refer to EMC Documentum Search Development Guide. For
information on creating and configuring facets and processing facet returns, see the chapter Facets.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

259

Search APIs and customization

Building a query with the DFS search service


In a Documentum environment, the DFC and DFS APIs encapsulate complex functionality like
stemming, wildcards, fuzzy search, hit count, and facets.
DFS search services provide search capabilities against EMC Documentum repositories, as well
as against external sources Documentum Federated Search Services (FS2) server. For complete
information on the DFS search service and code examples, refer to EMC Documentum Enterprise
Content Services Reference.
Searches can either be blocking or nonblocking, depending on the Search Profile setting. By default,
searches are blocking. Nonblocking searches display results dynamically. Multiple successive calls
can be made to get new results and the query status. The query status contains the status for each
source repository, indicating if it is successful, if more results are expected, or if it failed with errors.
The cache contains the search results populated in background for every search. The cache cleanup
mechanism is both timebased and sizebased. You can modify the cache cleanup properties by
editing the dfsruntime.properties file.
A structured query defines a query using an objectoriented model. The query is constrained by a set
of criteria contained in an ExpressionSet object. An ordered list of RepositoryScope objects defines
the scope of the query (the sources against which it is run). PartitionScope objects target the query to
specific collections. ExpressionScope

Building a structured query


The following example sets the repository name, folder path in the repository, whether to include
subfolders, targets a specific collection, creates the full-text expression, and specifies object type.
private Query getScopedQuery ()
{
StructuredQuery structuredQuery = new StructuredQuery();
RepositoryScope scope = new RepositoryScope();
scope.setRepositoryName(m_docbase);
scope.setLocationPath("/SYSTEM);
scope.setDescend(true);
scope.setExcluded(true);
structuredQuery.setScopes(Arrays.asList(scope));
PartitionScope pScope = new PartitionScope();
pScope.setRepositoryName(m_docbase);
pScope.setPartitionName("partitionName");
structuredQuery.setPartitionScopes(Arrays.asList(pScope));
ExpressionScope eScope = new ExpressionScope();
eScope.setRepositoryName(m_docbase);
final ExpressionSet expressionSet = new ExpressionSet();
expressionSet.addExpression(new FullTextExpression("EMC"));
eScope.setExpressionSet(expressionSet);
structuredQuery.setExpressionScopes(Arrays.asList(eScope));
structuredQuery.addRepository(m_docbase);
structuredQuery.setObjectType("dm_document");
// Set expression
ExpressionSet expressionSet2 = new ExpressionSet();

260

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search APIs and customization

expressionSet2.addExpression(new PropertyExpression(
"object_name", Condition.CONTAINS, new SimpleValue("test")));
structuredQuery.setRootExpressionSet(expressionSet2);
return structuredQuery;
}

Building a DFC XQuery


You can build XQuery expressions using DFC. However, if you are familiar with search customizations
using DFC, it may be easier to use the DFC search service to build queries (Building a query with
the DFC search service, page 259).
Perform the following steps to use the DFC APIs for building an XQuery:
1. Get a DFC client object and an IDfXQuery object. See Get IDfXQuery object, page 261.
2. Set the query target to xPlore. SeeSet the query target, page 261.
3. Create an XQuery statement. See Create the XQuery statement, page 261.
4. Set query options. See Set query options, page 262.
5. Execute the query. See Execute the query, page 263
6. Retrieve results. See Retrieve the results, page 263

Get IDfXQuery object


Create a new DFC client and get an IDfXQuery object:
IDfClientX clientx = new DfClientX();
IDfXQuery xquery = clientx.getXQuery();

Set the query target


An XQuery expression can be run against the EMC Documentum XML Store or against
the xPlore server. There are two implementations of IDfXQueryTargets in the package
com.documentum.xml.xquery: DfFullTextXQueryTargets for xPlore and DfStoreXQueryTargets for
XML Store. The following example sets the xPlore target:
IDfClientX clientx = new DfClientX();
IDfXQueryTargets fttarget = clientx.getXQueryTargets(IDfXQueryTargets.DF_FULLTEXT)

Create the XQuery statement


For Documentum search clients, the IDfXQuery interface in the package com.documentum.xml.xquery
runs user-defined XQuery expressions against xPlore indexes. XQuery expressions submitted through
the IDfXQuery interface use the xPlore native full-text security evaluation.
Create an XQuery expression to submit. The following example creates a query for a string in the
contents:
IDfXQuery xquery = clientx.getXQuery();
xquery.setXQueryString("unordered(for $i in collection(/docbase1/DSS/Data)

EMC Documentum xPlore Version 1.3 Administration and Development Guide

261

Search APIs and customization

where ( ( $i/dmftdoc/dmftmetadata/*/r_creation_date[. >=


2008-12-20T08:00:00] ) and ...");

Set query options


You can set query options in the query before calling execution. For more information on these
options, refer to the javadocs for IDfXQuery in the package com.documentum.xml.xquery.
The following example sets timeout, batch size, turns on caching, and saves the execution plan for
debugging.
IDfXQuery xquery = clientx.getXQuery();
xquery.setTimeout(10000);
xquery.setBatchSize(200);
xquery.setSaveExecutionPlan(true);
xquery.setCaching(true);

Options:
Debugging:
Get and set client application name for logging
Get and set save execution plan to see how the query was executed
Query execution:
Get and set result batch size. For a single batch, set to 0.
Get and set target collection for query
Get and set query text locale
Get and set parallel execution of queries
Get and set timeout in ms
Security:
Get and set security filter fully qualified class name
Get and set security options used by the security filter
Get and set native security (false sets security evaluation in the Content Server)
Results:
Get and set results streaming
Get and set results returned as XML nodes
Get and set spooling to a file
Get and set synchronization (wait for results)
Get and set caching
Summaries:
Get and set return summary
Get and set return of text for summary
Get and set summary calculation
Get dynamic summary maximum threshold
262

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search APIs and customization

Gets and sets length of summary fragments


Get summary security mode

Execute the query


After you have set the target, options, and XQuery in the instance of IDfXQuery, you execute, passing
in the DFC session identifier and the xPlore target.
xquery.execute(session, fttarget);

Retrieve the results


You can get the results as an input stream from the instance of IDfXQuery. You pass in the DFC
session identifier of the session in which you executed the query.
InputStream results = xquery.getInputStream(session);

Get the query execution plan


The query plan can be useful to EMC tech support for evaluating slow queries. The query plan shows
which indexes were probed and the order in which they were probed. Use one of the following options
to save or fetch the query plan:

DFC query builder API.. save:


IDfXQuery.setBooleanOption(IDfXQuery.FtQueryOptions.SAVE_EXECUTION_PLAN,true).
retrieve: IDfXQuery.getExecutionPlan(session)

iAPI. save: apply,c,NULL,MODIFY_TRACE,SUBSYSTEM,S,fulltext,VALUE,S,ftengine.


retrieve: The query execution plan written to dsearch.log, which is located in the logs subdirectory
of the JBoss deployment directory.

xPlore search API. save: IDfXQuery.setSaveExecutionPlan(true). retrieve:


IFtSearchSession.fetchExecutionPlan(requestId)

Building a query using xPlore APIs


You can build a query using xPlore XQuery APIs. If you are familiar with DFC or DFS applications, it
may be easier for you to create a query using their APIs. See Building a query with the DFC search
service, page 259 or Building a query with the DFS search service, page 260.
Access to indexing and search APIs is through IDSearchClient in the package
com.emc.documentum.core.fulltext.client. You execute a query by calling executeQuery for the
interface com.emc.documentum.core.fulltext.client.IFtSearchSession. Each API is more fully
described in the javadocs and in Search APIs, page 346.
Perform the following steps to use the xPlore APIs for query building:
1. Reference the jars in the SDK dist and lib directories in your classpath. Some of the classes in the
jars will be used for later examples. For more information, see Setting up the xPlore SDK, page 301.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

263

Search APIs and customization

2. Get a search session using IDSearchClient. The following example connects to the search service
and creates a session.
public void connect() throws Exception
{
String bootStrap = BOOT_STRAP;
DSearchServerInfo connection = new DSearchServerInfo(m_host, m_port);
IDSearchClient client = DSearchClient.newInstance(
"MySearchSession", connection);
m_session = client.createFtSearchSession(m_domain);
}
private
private
private
private

String m_host = "myhost";


int m_port = 9300;
String m_domain = "DSS_LH1";//this is the xPlore domain name
IFtSearchSession m_session;

3. Create an XQuery statement. The following example creates a query for a string in the contents:
public void testQuery()
{
String xquery = "for $doc in doc(/DSS_LH1/dsearch/Data/default) where
$doc/dmftdoc[dmftcontents ftcontains strange] return string(<R> <ID>{
string($doc/dmftdoc/dmftmetadata//r_object_id)}</ID></R>)";
executeQuery(xquery, options); //see "Executing a query"
}

4. Set query options. When you use an xPlore API to set options, the settings override the global
configuration settings in the xPlore administration APIs. See the javadocs for IFtQueryOptions in
the package com.emc.documentum.core.fulltext.common.search and Set the query target, page 261.
Add options like the following, and then provide the options object to the executeQuery method
of IFtSearchSession. For example:
IFtQueryOptions options = new FtQueryOptions();
options.setSpooled(true);

5. Set query debug options. The enumeration FtQueryDebugOptions can be used to set debug options
for IDfXQuery in DFC version 6.7 or higher. To set options, use the following syntax:
public String getDebugInfo(IDfSession session, FtQueryDebugOptions
debugOption)
throws DfException;

For example:
String queryid = xquery.getDebugInfo(m_session,
IDfXQuery.FtQueryDebugOptions.QUERY_ID);

6. Execute the query. See Execute the query, page 263. Provide the query options and XQuery
statement to your instance of IFtSearchSession.executeQuery, like the following:
requestId = m_session.executeQuery(xquery, options);

7. Retrieve results. The method executeQuery returns an instance of IFtQueryRequest from which
you can retrieve results. See Retrieve the results, page 263.

264

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search APIs and customization

The following example sets the query options, executes the query by implementing the
IFtSearchSession method executeQuery, and iterates through the results, printing them to the
console.
private void executeQuery (String xquery)
{
String requestId = null;
try
{
IFtQueryOptions options = new FtQueryOptions();
options.setSpooled(true);
options.setWaitForResults(true);
options.setResultBatchSize(5);
options.setAreResultsStreamed(false);
requestId = m_session.executeQuery(xquery, options);
Iterator<IFtQueryResultValue> results = m_session.getResultsIterator(
requestId);
while (results.hasNext())
{
IFtQueryResultValue r = results.next();
System.out.print("results = ");
//printQueryResult(r); See next step
System.out.println();
}
}
catch (FtSearchException e)
{
System.out.println("Failed to execute query");
}
}

8. Retrieve results. Results from IFtSearchSession.executeQuery are returned as an instance of


IFtQueryRequest from which you can retrieve results. Get each result value as an instance of
IFtQueryResultValue by iterating over the IFtQueryRequest instance.
requestId = m_session.executeQuery(xquery, options);
Iterator<IFtQueryResultValue> results = m_session.getResultsIterator(
requestId);
while (results.hasNext())
{
IFtQueryResultValue r = results.next();
printQueryResult(r);
}
private void printQueryResult(IFtQueryResultValue v)
throws FtSearchException
{
if (v.getSelectListType().getType() !=
IFtQuerySelectListItem.Type.NODE)
{
System.out.print(v.getValueAsString());
}
else
{

EMC Documentum xPlore Version 1.3 Administration and Development Guide

265

Search APIs and customization

List<IFtQueryResultValue> children = (
List<IFtQueryResultValue>) v.getValue();
for (IFtQueryResultValue child : children)
{
printQueryResult(child);
}
}}

Adding context to a query


Starting with DFC 6.7 SP1, you can add context information to a query. A Documentum client sets
query context using the DFC search service or IDfXQuery.
Context information is not used to execute the query. The context information is available in audit
events and reports. For example, query subscription and query warmup add context to indicate the type
of query. For information on creating reports from the audit record, see Editing a report, page 291.

DFC
IDfQueryProcessor method setApplicationContext(DfApplicationContext context).
DfApplicationContext can set the following context:
setApplicationName(String name)
setQueryType(String type). Valid values:
setApplicationAttributes(Map<String,String> attributesMap). Set user-defined attributes in a Map
object.

DFC example
The following example sets the query subscription application context and application name. This
information is used to report subscription queries.
Instantiate a query process from the search service, set the application name and query type, and add
your custom attributes to the application context object:
IDfQueryProcessor processor = m_searchService.newQueryProcessor(
queryBuilder, true);
DfApplicationContext anApplicationContext = new DfApplicationContext();
anApplicationContext.setApplicationName("QBS");
anApplicationContext.setQueryType("AUTO_QUERY");
Map<String,String> aSetOfApplicationAttributes =
new HashMap<String,String>();
aSetOfApplicationAttributes.put("frequency","300");
aSetOfApplicationAttributes.put("range","320");
anApplicationContext.setApplicationAttributes(
aSetOfApplicationAttributes);
processor.setApplicationContext(anApplicationContext);

266

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search APIs and customization

IDfResultsSet results = processor.blockingSearch(60000);

The context is serialized to the audit record as follows:


<event name="AUTO_QUERY" component="search" timestamp="
2011-07-26T14:00:18-0700">
<QUERY_ID>PrimaryDsearch$706f93fa-e382-499c-b41a-239ae800da96
</QUERY_ID>
<QUERY>
<![CDATA[for $i in collection(/yttestenv/dsearch/Data)/dmftdoc[(
dmftmetadata//a_is_hidden = "false") and (dmftversions/iscurrent = "
true") and (. ftcontains "xplore" with stemming using stop words
default)]
return string(<R>{$i//r_object_id}</R>)]]></QUERY>
<USER_NAME>unknown</USER_NAME>
<IS_SUPER_USER/>
<application_context>
<app_name>QBS</app_name>
<app_data>
<attr name="subscriptionid" value="080f444580029954"/>
<attr name="frequency" value="300"/>
<attr name="range" value="320"/>
</app_data>
</application_context>
</event>

The event data is used to create a report. For example, a report that gets failed subscribed queries has
the following XQuery expression. This expression gets queries for which the app_name is QBS and
the queries are not executed:
let $lib :=/SystemData/AuditDB/PrimaryDsearch/
let $failingQueries := collection($lib)//event[name=AUTO_QUERY
and application_context[app_name=QBS and app_data[attr[
@name=frequency]/@value < attr[@name=range]/@value]]]/QUERY_ID
return $failingQueries

The result of this XQuery is the following:


<QUERY_ID>PrimaryDsearch$706f93fa-e382-499c-b41a-239ae800da96
</QUERY_ID>

IDfXQuery
Use the API FtQueryOptions in the package com.emc.documentum.core.fulltext.common.search.
Call setApplicationName(String applicationName) to log the name of the search client application,
for example, webtop.
Call setQueryType(FtQueryType queryType) with the FtQueryType enum.

Using parallel queries


You can enable parallel queries in DFC by setting the following property to true in dfc.properties:
EMC Documentum xPlore Version 1.3 Administration and Development Guide

267

Search APIs and customization

dfc.search.xquery.option.parallel_execution.enable = false

You can also use one of the following APIs to execute a query across several collections in parallel:
DFC API: IDfXQuery FTQueryOptions.PARALLEL_EXECUTION
xPlore API: IFtQueryOptions.setParallelExecution(true)
Parallel queries are not supported in DQL.
CAUTION: Parallel queries may not perform better than a query that probes each collection
in sequence. To probe all collections in parallel, set the API and compare performance with a
sequential query (the default).

Custom access to a thesaurus


Thesaurus expansion of queries is supported without customization. You can add custom access to a
thesaurus that does not conform to the SKOS format. For example, you could add the Basistech Name
Indexer to match people, places, or organizations.
To customize access, you must create a custom thesaurus class that implements
IFtThesaurusHandler in the com.emc.documentum.core.fulltext.common.search package. Implement
getTermsFromThesaurus(), which returns a collection of string terms from the thesaurus. Multi-term
queries result in multiple calls to this method.
public Collection<String> getTermsFromThesaurus(Collection<String> terms,
String relationship, int minLevelValue, int maxLevelValue);

Use the input terms from the query to probe the thesaurus.
You can use the optional XQuery relationship and levels parameters of FTThesaurusOption to specify
special processing. For information on these parameters, see FTThesaurusOption.
In the following example, the relationship value is RT (related term), and minLevelValue and
maxLevelValue are 2:
using thesaurus at thesaurusURI relationship RT exactly 2 levels

In the following example, the relationship is BT (broader term), minLevelValue is


Integer.MIN_VALUE, and maxLevelValue is 2.
using thesaurus at thesaurusURI relationship BT at most 2 levels

Setting up a custom thesaurus


Perform the following steps:
1. Implement a class that implements the IFtThesaurusHandler interface. See
FASTThesaurusHandler.java for an example. This file is provided in the SDK at the following
path: /samples/src/com/emc/documentum/core/fulltext/common/search/impl). A sample FAST
thesaurus is provided at /samples/thesaurus.
Note: Only one instance of the class is created on startup, so multiple search threads share the class.
Avoid thread synchronization issues or use thread local storage
Compile the custom class. Include dsearch-client.jar in your classpath when you compile. For
example:
268

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search APIs and customization

javac cp dsearch-client.jar com\emc\documentum\core\fulltext\


common\search\impl\SimpleThesaurusHandler.java

Package the class in a jar file and put it into the library
xplore_home/jboss5.1.0/server/DctmServer_PrimaryDsearch/deploy/dsearch.war/WEB-INF/lib.
The path in the jar file must match the package name. For example:
jar cvf com\emc\documentum\core\fulltext\common\search\impl\
dsearch-thesaurus.jar com\emc\documentum\core\fulltext\common\
search\impl\SimpleThesaurusHandler.class

Modify indexserverconfig.xml to specify the custom thesaurus. Define a new thesaurus element
under the domain that will use the custom thesaurus. Restart the xPlore instances after making this
change. The following example indicates a thesaurus URI to a custom-defined class. When a query
specifies this URI, the custom class is used to retrieve related terms.
<domain storage-location-name="default" default-document-category="
dftxml" name=... >
<collection ... >
...
<thesaurus uri="my_thesaurus" class-name="
com.emc.documentum.core.fulltext.common.search.impl.FASTThesaurusHandler"/>
</domain>

Accessing the custom thesaurus in a query


You can specify custom thesaurus access in a DFC or IDfXQuery:
DFC: setThesaurusLibrary(String uri). Use the URI that you defined in indexserverconfig.xml.
DQL: Use the ft_use_thesaurus_library hint.
IDfXQuery: Add a thesaurus option. See FTThesaurusOption. For example:for $i score $s
in collection(/testenv/dsearch/Data)
/dmftdoc[. ftcontains food products using thesaurus default]
order by $s descending return $i/dmftinternal/r_object_id

You can access one thesaurus for full-text and one thesaurus for metadata. For example, you may have a
metadata thesaurus that lists various forms of company names. The following example uses the default
thesaurus to expand the full-text lookup and a metadata thesaurus to expand the metadata lookup:
IDfExpressionSet rootSet = queryBuilder.getRootExpressionSet();
//full-text expression uses default thesaurus
IDfFullTextExpression aFullTextExpression = rootSet.addFullTextExpression(
fulltextValue);
aFullTextExpression.setThesaurusSearchEnabled(true);
//simple attribute expression uses custom metadata thesaurus
IDfSimpleAttributeExpression aMetadataExpression =
rootSet.addSimpleAttrExpression("companyname", IDfValue.DF_STRING,
IDfSimpleAttrExpression.SEARCH_OP_CONTAINS, false, false,
companyNameValue);
aMetadataExpression.setThesaurusSearchEnabled(true);
aMetadataExpression.setThesaurusLibrary("
http://search.emc.com/metadatathesaurus");

EMC Documentum xPlore Version 1.3 Administration and Development Guide

269

Search APIs and customization

Sample thesaurus handler class


The FASTThesaurusHandler class is a sample implementation of the IFtThesaurusHandler interface.
(This class is included in the xPlore SDK.) When the class is instantiated by xPlore, it reads a FAST
dictionary file and stores the term mappings into memory. This results in quick lookups to return
related words from the FASTThesaurusHandler class during query execution.
This file is provided in the SDK at the following path:
/samples/src/com/emc/documentum/core/fulltext/common/search/impl). A sample FAST thesaurus is
provided at /samples/thesaurus.
package com.emc.documentum.core.fulltext.common.search.impl;
import com.emc.documentum.core.fulltext.common.search.IFtThesaurusHandler;
import java.util.*;
import java.io.*;
public class FASTThesaurusHandler implements IFtThesaurusHandler
{
public Collection<String> getTermsFromThesaurus(
Collection<String> terms, String relationship,
int minLevelValue, int maxLevelValue)
{
Iterator<String> termIterator = terms.iterator();
Collection<String> result = new ArrayList<String>();
while (termIterator.hasNext())
{
String key = termIterator.next();
if (s_thesaurus.containsKey(key))
result.addAll(s_thesaurus.get(key));
}
return result;
}
private static Map<String, Collection<String>> s_thesaurus =
new HashMap<String, Collection<String>>();
static
{
try
{
String location = System.getenv("DOCUMENTUM") + "
/DocumentumThesaurusFAST.txt";
FileInputStream fstream = new FileInputStream(location);
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String line;
while((line = br.readLine()) != null)
{
if (line.length() > 0)
{

270

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search APIs and customization

String[] mapping = line.split("=", 2);


String key = mapping[0];
String related = mapping[1];
// do some format checking
if (key.length() < 1)
continue;
// related should have at least "[?]", where ? is any character
if (related.length() < 3)
continue;
if (related.charAt(0) != [ || related.charAt(related.length(
)-1) != ])
continue;
related = related.substring(1, related.length()-1);
String relatedTerms[] = related.split(",");
Collection<String> terms = new ArrayList<String>(
relatedTerms.length);
for (String term : relatedTerms)
{
terms.add(term);
}
s_thesaurus.put(key, terms);
}
}
}
catch (FileNotFoundException e)
{
System.out.println("FileNotFoundException while loading FAST Thesaurus: "
+ e.getMessage());
}
catch (IOException e)
{
System.out.println("IOException while loading FAST Thesaurus: "
+ e.getMessage());
}
}
}

EMC Documentum xPlore Version 1.3 Administration and Development Guide

271

Chapter 11
Facets
This chapter contains the following topics:

About Facets

Configuring facets in xPlore

Creating a DFC facet definition

Facet datatypes

Creating a DFS facet definition

Defining a facet handler

Sample DFC facet definition and retrieval

Tuning facets

Logging facets

Troubleshooting facets

About Facets
Faceted search, also called guided navigation, enables users to explore large datasets to locate items
of interest. You can define facets for the attributes that are used most commonly for search. Facets
are presented in a visual interface, removing the need to write explicit queries and avoiding queries
that do not return desired results. After facets are computed and the results of the initial query are
presented in facets, the user can drill down to areas of interest. At drilldown, the query is reissued
for the selected facets.
A facet represents one or more important characteristics of an object, represented by one or more
object attributes in the Documentum object model. Multiple attributes can be used to compute a facet,
for example, r_modifier or keywords. Faceted navigation permits the user to explore data in a large
dataset. It has several advantages over a keyword search or explicit query:
The user can explore an unknown dataset by restricting values suggested by the search service.
The data set is presented in a visual interface, so that the user can drill down rather than constructing
a query in a complicated UI.
Faceted navigation prevents dead-end queries by limiting the restriction values to results that are
not empty.
Facets are computed on discrete values, for example, authors, categories, tags, and date or numeric
ranges. Facets are not computed on text fields such as content or object name. Facet results are not
localized; the client application must provide localization.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

273

Facets

Before you create facets, create indexes on the facet attributes. See Configuring facets in xPlore, page
274. Some facets are already configured by default.
For very specific use cases, if the out-of-the-box facet handlers do not meet your needs, you can define
custom facet handlers for facet computation. For example, if a facet potentially include many distinct
values, you can define ranges to group the values.

Facets and security


Facets restrict results based on the xPlore security filter, so that users see only those documents for
which they have permission. If security is evaluated in the Content Server and not in xPlore, facets
are disabled.

API overview
Your search client application can define a facet using the DFC query builder API or DFS search
service. For information on using the DFC query builder API, see Building a query with the DFC
search service, page 259. For information on using the DFS search service, see Building a query with
the DFS search service, page 260. Define custom facet handlers using the xPlore facet handler API,
see Defining a facet handler, page 281. In most cases, the out-of-the-box facet handlers are sufficient.
Facets are computed in the following process. The APIs that perform these operations are described
fully in the following topics. For facets javadocs, see the DFC or DFS javadocs.
1. DFC or DFS search service evaluates the constraints and returns an iterator over the results.
2. Search service reads through the results iterator until the number of results specified in
query-max-result-size has been read (default: 10000).
3. For each result, the search service gets the attribute values and increment the corresponding facet
values. Subpath indexes speed this lookup, because the values are found in the index, not in
the xDB pages.
4. The search service performs the following on the list of all facet values:
a.

Orders the facet values.

b.

Keeps only the top facet values according to setMax (DFC) or setMaxFacetValues (DFS).
Default: 10.

c.

Returns the facets values and top results.

Configuring facets in xPlore


Facets are configured in indexserverconfig.xml. Your DFC-based application must also define the facet
using query builder APIs. See Building a query with the DFC search service, page 259.

274

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Facets

Preconfigured facets in xPlore


The following facets are configured in indexserverconfig.xml. Do not add them to the application.
(They are configured in a sub-path element whose returning-contents attribute is set to true.)
a_application_type (Documentum application)
a_content_type (file format)
a_gov_room_id (room ID)
acl_domain (permission set owner)
acl_name (permission set)
city (Content Intelligence Services attribute for CenterStage)
company (Content Intelligence Services attribute)
continent (Content Intelligence Services attribute for CenterStage)
country (Content Intelligence Services attribute for CenterStage)
keywords (user-defined list of keywords)
location (CenterStage property)
owner_name (document owner)
person (Content Intelligence Services attribute)
r_full_content_size (document size)
r_modifier (last person who modified the document)
r_modify_date (last modification date)
r_object_type (object type)
state (Content Intelligence Services attribute for CenterStage)

Configuring your own facets


For each attribute that is used as a facet, configure a subpath in indexserverconfig.xml.
1. Edit indexserverconfig.xml as described in Modifying indexserverconfig.xml, page 43.
2. Set the returning-contents attribute to true to compute facets on this attribute.
3. To configure the attribute datatype, set the type attribute to: string, integer, boolean,
datetime, or double.
4. Set the compress attribute to true if the attribute can have a limited number of distinct values. If the
number of distinct values can be unlimited or very large, set compress to false.
In the following example, r_modifier is available for use as a facet:
<path-value-index path=...>
...
<sub-path returning-contents="true" compress="true" value-index="true"
tokenized="true" position-index="false" type="string"
path="dmftmetadata//r_modifier"/>
</path-value-index>

For more information on subpath configuration, see Subpaths, page 141.


EMC Documentum xPlore Version 1.3 Administration and Development Guide

275

Facets

Creating a DFC facet definition


The class DfFacetDefinition in the DFC package com.documentum.fc.client.search represents a facet.
You can set the following optional parameters using class methods. The valid values for these methods
are dependent on the datatype of the facet (underlying attribute). See Facet datatypes, page 276.
setGroupBy(String): Group by strategy, depending on data type.
setMax(int): Maximum number of facet values for the facet. Default: 10. A value of -1 specifies
unlimited number of facet values.
setName(String): Name of facet
setProperty(String): Used to set properties like timezone (for date type), range (for numeric type),
and caseInsensitive (for string type).
The IDfQueryBuilder API setMaxResultsForFacets() limits the overall number of query results that
are used to compute facets. The property query-facet-max-result-size in indexserverconfig.xml is
equivalent to the API setMaxResultsForFacets. This setting controls the maximum number of results
that are used to compute all facets in a single query.
The facet definition setMax API controls the size of the output in facets computation. For example,
if a user has an average of 5 documents, a setMaxResultsForFacets value of 50 could return the
top 10 users.

Facet datatypes
Each facet datatype requires a different grouping strategy. You can set the following parameters for
each datatype in the specified DfFacetDefinition method (DFC) or FacetDefinition object (DFS). The
main facet datatypes are supported: string, date, and numeric.

String facet datatype


A string type facet accepts the following parameters:
Set the maximum number of facet values. Default: 10. A value of -1 specifies unlimited results.
For example, a facet for r_modifier with a maximum of two returns only two values. Results of
documents modified by the first two modifiers are returned, but results for additional modifiers are
not returned. DFC: setMax(Integer max). DFS: setMaxFacetValues(int maxFacetValues)
Set the sort order. DFC: setOrderBy(ORDER orderby) DFS: FacetSort object. Values of the
ORDER enum (DFC) and FacetSort field: FREQUENCY (default)| VALUE_ASCENDING |
VALUE_DESCENDING | NONE
Set grouping strategy. You can specify the following optional properties using
DfFacetDefinition.setProperty:
Case sensitivity: By default, string facets are case sensitive. For example, if a result set has a
document with the value EMC and another with emc, two facets are returned, EMC and emc. Set
facet computation to ignore case by setting the optional property caseInsensitive to true.
Count null values as a facet: Set nullValueFacet to true.
276

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Facets

keepDuplicateValues: Documents with repeating value attributes can have duplicate attribute
values. By default, duplicate entries are removed.
alpharange: Group by range. Set a property range that specifies ranges, for example: a:m,n:r,s:z.
Specify range using ASCII characters. Uses unicode order, not language-dependent order. For
example:
myFacetDefinition.setProperty("range", "a:m,n:r,s:z");

Facets are returned as IDfFacetValue. Following is an example of the XML representation of returned
facet string values:
<facet name=r_modifier>
<eleme count=5 value=user2/>
<element count=3 value=user1/>
</facet>

Date facet datatype


A date type facet accepts the following parameters:
Set the grouping strategy. When no value for max is specified, the most recent facets are returned
first. DFC and DFS: setGroupBy(String groupBy). Valid values are: day, week, month, quarter,
year, relativeDate (Microsoft Outlook style). You can set the following grouping options:
skipEmptyValues: Call setProperty(boolean skipEmptyValues) for the facet definition. This
optional property has a default value of false: The query returns all facet values between the
oldest non-zero facet value and the most recent non-zero value (including zero values). If true,
only the facet values with a count greater than zero are returned.
Set the time zone: Valid values for client time zone are expressed in UTC (relative to GMT), for
example, GMT+10. To set the time zone in DFC, call setProperty(String timezone) for the facet
definition. To set the time zone in DFS, call setProperties(PropertySet set) for a FacetDefinition
with a Property having a UTC value.
Set the sort order. A relativeDate facet must set the order by parameter to NONE. Values of the
ORDER enum: Values of the ORDER enum (DFC) and FacetSort field: FREQUENCY (default)|
VALUE_ASCENDING | VALUE_DESCENDING | NONE DFC: setOrderBy(ORDER orderby)
DFS: FacetSort object.
Set the maximum number of facet values. Default: 10. A value of -1 specifies unlimited results. For
example, a facet for r_modification_date with a maximum of two returns only two values. Results of
documents modified for the first two modification dates are returned, but results for additional dates
are not returned. DFC: setMax(Integer max). DFS: setMaxFacetValues(Integer maxFacetValues)
Following is an example of the XML representation of returned facet date values:
<facet name=r_modify_date>
<elem count=5 value=2000-05-04T00:00:00>
<prop name=lowerbound>2000-05-04T00:00:00</prop>
<prop name=upperbound>2000-05-05T00:00:00</prop>
</elem>
<elem count=3 value= 2000-05-03T00:00:00>
<prop name=lowerbound>2000-05-03T00:00:00</prop>
<prop name=upperbound>2000-05-04T00:00:00</prop>
</elem>

EMC Documentum xPlore Version 1.3 Administration and Development Guide

277

Facets

</facet>

Numeric facet datatype


A numeric type facet accepts the following parameters:
Set the maximum number of facet values. Default: 10. A value of -1 specifies unlimited results.
For example, a facet in DFC for r_content_size with setMax(2) has only two values. DFC:
setMax(Integer max). DFS: setMaxFacetValues(int maxFacetValues)
Set the sort order. A range must set the order by enum to NONE. Values of the ORDER enum: Values
of the ORDER enum (DFC) and FacetSort field: FREQUENCY (default)| VALUE_ASCENDING |
VALUE_DESCENDING | NONE. DFC: setOrderBy (ORDER orderby) DFS: FacetSort object.
Group results in range order. To define the range, call setProperty(String range) for the facet
definition (DFC) or call setProperties(PropertySet set) for a FacetDefinition with a Property (DFS).
Valid values are a comma-separated list of ranges. Separate upper and lower bounds by a colon.
The lower bound is inclusive, the upper bound exclusive. Ranges cannot overlap. A range can
be unbounded, for example: 0:9,10:100,100:1000,1000:10000,10000: . Default: none (treated
as string).
DFC and DFS: setGroupBy(String groupBy). In the following example, a range property is
defined and provided to setGroupBy. The definition in indexserverconfig.xml indicates that this
type is integer.
DfFacetDefinition defAwardNumber = new DfFacetDefinition("dmftmetadata//award_no");
defAwardNumber.setOrderBy(DfFacetDefinition.ORDER.NONE);
defAwardNumber.setProperty("range", "1000:1009, 1010:1019, 1020:1029");
defAwardNumber.setGroupBy("range");
queryBuilder.addFacetDefinition(defAwardNumber);

Following is an example of the XML representation of returned facet numeric values for
range=0:10,10:100,100:
<facet name=r_full_content_size>
<elem count=5 value=0:10>
<prop name=lowerbound>0</prop>
<prop name=upperbound>10</prop>
</elem>
<elem count=3 value=10:100>
<prop name=lowerbound>10</prop>
<prop name=upperbound>100</prop>
</elem>
<elem count=0 value=100:>
<prop name=lowerbound>100</prop>
</elem>
</facet>

Creating a DFS facet definition


You can use the DFS data model to create facets in a structured query. The following topics describe
facet object and their place in a structured query.
278

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Facets

FacetValue
A FacetValue object groups results that have attribute values in common. The FacetValue has a label
and count for number of results in the group. For example, a facet on the attribute r_modifier could
have these values, with count in parentheses:
Tom Terrific (3)
Mighty Mouse (5)
A FacetValue object can also contain a list of subfacet values and a set of custom properties. For
example, a facet on the date attribute r_modify_date has a value of a month (November). The facet
has subfacet values of weeks in the specific month (Week from 11/01 to 11/08). xPlore computes
the facet, subfacet, and custom property values.

FacetDefinition
A FacetDefinition object contains the information used by xPlore to build facet values. The facet name
is required. If no attributes are specified, the name is used as the attribute. Facet definitions must be
specified when the query is first executed. A facet definition can hold a subfacet definition.
FacetSort is an enumeration that specifies the sort order for facet values. It is a field of the
FacetDefinition object. The possible sort orders include the following: FREQUENCY (default) |
VALUE_ASCENDING | VALUE_DESCENDING | NONE. A date facet must set the sort order
to NONE.

Adding facets to a structured query


Two fields in a StructuredQuery object relate to facets:
List of facet definitions, returned by getFacetDefinitions and set by setFacetDefinition.
Number of query results used by xPlore to compute the facets in a query, returned by
getMaxResultsForFacets and set by setMaxResultsForFacets.
See EMC Documentum Enterprise Content Services for more information on Query, StructuredQuery,
QueryExecution, and SearchProfile.

Facet results
A Facet object holds a list of facet values that xPlore builds.
A QueryFacet object contains a list of facets that have been computed for a query as well as the query
ID and QueryStatus. This object is like a QueryResult object. A call to getFacets returns QueryResult.
The getFacets method of the SearchService object calculates facets on the entire set of query results
for a specified Query. The method has the following signature:
public QueryFacet getFacets(
Query query, QueryExecution execution, OperationOptions options)
throws SearchServiceException

This method executes synchronously by default. The OperationOptions object contains an optional
SearchProfile object that specifies whether the call is blocking. For a query on several repositories that
support facets, the client application can retrieve facets asynchronously by specifying a SearchProfile
EMC Documentum xPlore Version 1.3 Administration and Development Guide

279

Facets

object as the OperationOptions parameter. Refer to EMC Documentum Enterprise Content Services for
more information on Query, StructuredQuery, QueryExecution, and SearchProfile.
You can call this method after a call to execute, using the same Query and queryId. Paging information
in QueryExecution has no impact on the facets calculation.

Paging facet results


Results can be retrieved from a specified index, page by page, as specified in the QueryExecution
object: QueryExecution(startingIndex, maxResultCount, maxResultsPerSource). Paging of results is a
new feature that FAST indexing did not support. Also, FAST returned only 350 results (configurable),
whereas xPlore paging can support higher numbers of results.
The following results are obtained from a result set of 5000 with the specified paging parameters:
(0, 25,150): Gets the first page. Search retrieves 150 results but only the first 25 are returned.
Results from 0 to 150 are cached. QueryStatus returns a hitCount of 5000.
(25, 25, 150): Gets the next page. If the next page is no longer in the cache, search is launched to
retrieve results from 25 to 50. Results from 25 to 50 are cached.
(150, 25, 150): Gets a page. Search is launched to retrieve results from 150 to 175. Results from
150 to 300 are cached.
(150, 151, 150): Gets a page. Results from 150 to 300 are returned. One result is missing because
page size is greater than maxResultsPerSource. Results from 150 to 300 are cached.
Example
// Get the SearchService
ISearchService service = m_facade.getService(ISearchService.class, null,
m_moduleName);
// Create the query
StructuredQuery query = new StructuredQuery();
query.addRepository("your_docbase");
query.setObjectType("dm_sysobject");
ExpressionSet set = new ExpressionSet();
set.addExpression(new FullTextExpression("your_query_term"));
query.setRootExpressionSet(set);
// Add a facet definition to the query: a facet on r_modify_date
// attribute.
FacetDefinition facetDefinition = new FacetDefinition("date");
facetDefinition.addAttribute("r_modify_date");
// request all facets
facetDefinition.setMaxFacetValues(-1);
// group results by month
facetDefinition.setGroupBy("month");
// set sort order
facetDefinition.setFacetSort(FacetSort.VALUE_ASCENDING);
query.addFacetDefinition(facetDefinition);
// exec options: we dont want to retrieve results, we just want the
// facets.
QueryExecution queryExecution = new QueryExecution(0, 0);

280

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Facets

// Call getFacets method.


QueryFacet queryFacet = service.getFacets(query, queryExecution,
new OperationOptions());
// Can check query status: should be SUCCESS
QueryStatus status = queryFacet.getQueryStatus();
System.out.println(status.getRepositoryStatusInfos().get(0).getStatus());

// Display facet values


List<Facet> facets = queryFacet.getFacets();
for (Facet facet : facets)
{
for (FacetValue facetValue : facet.getValues())
{
System.out.println(facetValue.getValue() + "/" +
facetValue.getCount());
}
}

Defining a facet handler


Some facet handlers are available out-of-the-box in xPlore as part of the grouping strategies for
facets, such as:
Date ranges
Numeric ranges
Alphanumeric ranges
Facet datatypes, page 276 describes the available grouping strategies for the main facet datatypes. If a
custom facet belongs to one of these datatypes, you can use the out-of-the-box facet handlers. If the
out-of-the-box facet handlers do not meet your requirements you can implement you own facet handler.

Creating a custom facet handler


Implement the interface IFacetFactory in the package
com.emc.documentum.core.fulltext.indexserver.services.facets.handler to define a custom facet
handler.
1. Create a custom class that implements IFacetFactory.
2. To compile the classes, add dsearch-server.jar to your classpath. It located in
xplore_home/jboss5.1.0/server/DctmServer_PrimaryDsearch/deploy/dsearch.war/WEB-INF/lib.
3. Package the custom handler classes as a jar
4. Copy the custom handler classes jar in xPlore war folder.
5. Edit indexserverconfig.xml as described in Modifying indexserverconfig.xml, page 43.
a.

Reference the class that implements IFacetFactory:


<search-config>
...

EMC Documentum xPlore Version 1.3 Administration and Development Guide

281

Facets

<facet-handlers>
<facet-handler class-name="my.package.MyFacetFactory1"/>
<facet-handler class-name="my.package.MyFacetFactory2"/>
</facet-handlers>
</search-config>

b.

If not already set, modify the subpath configuration for the facet as described in Configuring
your own facets, page 275.
Reindexing is only required if you modify the subpath.
6. To use it in your application, reference the custom handler in the grouping strategy (GroupBy
parameter) of the facet.

Sample DFC facet definition and retrieval


Sample facet definition
The DFC query builder interface IDfQueryBuilder creates facets with the addFacet(FacetDefinition)
method. In the following example, a facet is created for the attribute r_modifier.
First, set up session variables, and substitute values appropriate for your repository and instance owner:
private
private
private
private

static
static
static
static

final
final
final
final

String DOCBASE = "DSS";


String USER = "dmadmin";
String PASSWORD = "dmadmin";
long SEARCH_TIMEOUT = 180000;

Get a session and instantiate the search service and query builder:
IDfClient client = DfClient.getLocalClient();
IDfSessionManager m_sessionManager = client.newSessionManager();
DfLoginInfo identity = new DfLoginInfo(USER, PASSWORD);
m_sessionManager.setIdentity(DOCBASE, identity);
IDfSearchService m_searchService = client.newSearchService(
m_sessionManager, DOCBASE);
IDfQueryManager queryManager = m_searchService.newQueryMgr();
IDfQueryBuilder queryBuilder = queryManager.newQueryBuilder("dm_sysobject");

Next, add the selected source and desired results:


queryBuilder.addSelectedSource(DOCBASE);
queryBuilder.addResultAttribute("r_object_id");
queryBuilder.addResultAttribute("object_name");

Start building the root expression set by adding the result attributes:
IDfExpressionSet exprSet = queryBuilder.getRootExpressionSet();
final String DATE_FORMAT = "yyyy-MM-ddTHH:mm:ss";
queryBuilder.setDateFormat(DATE_FORMAT);
exprSet.addSimpleAttrExpression("r_modify_date", IDfAttr.DM_TIME,
IDfSimpleAttrExpression.SEARCH_OP_GREATER_EQUAL, false, false, "

282

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Facets

1980-01-01T00:00:00");
exprSet.addSimpleAttrExpression("r_modify_date", IDfAttr.DM_TIME,
IDfSimpleAttrExpression.SEARCH_OP_LESS_EQUAL, false, false, "
2010-01-01T00:00:00");

The previous code builds a query without facets. Now for the facets definition that defines a facet for
person who last modified the document:
DfFacetDefinition definitionModifier = new DfFacetDefinition("r_modifier");
queryBuilder.addFacetDefinition(definitionModifier);

Another facet definition adds the last modification date and sets some type-specific options for the date:
DfFacetDefinition definitionDate = new DfFacetDefinition("r_modify_date");
definitionDate.setMax(-1);
definitionDate.setGroupBy("year");
queryBuilder.addFacetDefinition(definitionDate);

A subpath definition in indexserverconfig.xml defines a facet for keywords as follows:


<sub-path ...path="dmftmetadata//keywords"/>

Keywords facet:
DfFacetDefinition definitionKeywords = new DfFacetDefinition("keywords");
queryBuilder.addFacetDefinition(definitionKeywords);

To submit the query and process the results, instantiate IDfQueryProcessor, which is described in the
following topic.

Getting facet values from IDfQueryProcessor

The IDfQueryProcessor method getFacets() provides facets results.


Note: When several repositories return facets, the facets are merged. The merged results
conform to the facet definition of maximum number of results and sort order. If you call
IDfQueryProcessor.getFacets() before all sources finish query execution, the results can differ from
final results.
First, instantiate the processor and launch the search:
IDfQueryProcessor processor = m_searchService.newQueryProcessor(
queryBuilder, true);
try
{
processor.blockingSearch(SEARCH_TIMEOUT);
} catch (Exception e)
{
e.printStackTrace();
}

For debugging, you can check the query status:


EMC Documentum xPlore Version 1.3 Administration and Development Guide

283

Facets

System.out.println("processor.getQueryStatus() = " + processor.getQueryStatus());


System.out.println("processor.getQueryStatus().getHistory() = " +
processor.getQueryStatus().getHistory());

Get the non-facets results by calling getResults:


IDfResultsSet results = processor.getResults();
for (int i = 0; i < results.size(); i++)
{
IDfResultEntry result = results.getResultAt(i);
System.out.println(result.getId("r_object_id") + " = " + (result.hasAttr("
object_name") ? result.getString("object_name"):"no title"));
}

Get the facets results by calling getFacets:


List <IDfFacet> facets = processor.getFacets();
for (IDfFacet facet : facets)
{
System.out.println("--- Facet: " + facet.getDefinition().getName());
List<IDfFacetValue> values = facet.getValues();
for (IDfFacetValue value : values)
{
System.out.println("value = " + value);
}
}

Tuning facets
Limiting the number of facets to save index space and
computation time
Every facet requires a special index, and every query that contains facets requires computation time
for the facet. As the number of facets increases, the disk space required for the index increases. Disk
space depends on how frequently the facet attributes are found in indexed documents. As the number
of facets in an individual query increase, the computation time increases, depending on whether
the indexes are spread out on disk.

Limiting the number of results used to compute a facet


You can limit the number of results that are used to compute an individual facet. This setting varies the
specificity of a facet. The setting depends on how many possible values a facet attribute can have. For
example, for the Documentum attribute r_modifier, you can have 10,000 users but wish to return only
the top 10. If each user has an average of 5 documents, a setMaxResultsForFacets value of 50 could
return the top 10 users. The computation will stop after 50 results are obtained. The 50 documents can
belong to only five users, or they can belong to one user who contributes many documents. Set this
property in the client application that issues the query.

284

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Facets

Logging facets
To turn on logging for facets, use xPlore administrator and open the dsearch-search family. Set
com.emc.documentum.core.fulltext.indexserver.services.facets to DEBUG:
Output is like the following:
<event timestamp="2009-08-05 14:37:18,953" level="DEBUG" thread="pool-3-thread-10"
logger="com.emc.documentum.core.fulltext.indexserver.services.facets.
impl.CompositeFacetsProcessor" timeInMilliSecs="1249475838953">
<message ><![CDATA[Begin facet computation]]></message>
</event>
<event timestamp="2009-08-05 14:37:18,953" level="DEBUG" thread="pool-3-thread-10...
<event timestamp="2009-08-05 14:37:18,953" level="DEBUG" thread="pool-3-thread-10"
logger="com.emc.documentum.core.fulltext.indexserver.services.facets.
impl.CompositeFacetsProcessor" timeInMilliSecs="1249475838953">
<message ><![CDATA[Facets computed using 13 results.]]></message>
</event>
<event timestamp="2009-08-05 14:37:18,953" level="DEBUG" thread="pool-3-thread-10"
logger="com.emc.documentum.core.fulltext.indexserver.services.facets.
impl.CompositeFacetsProcessor"
timeInMilliSecs="1249475838953">
<message ><![CDATA[Facet handler string(r_modifier) returned 11 values.]]>
</message>
</event>
<event timestamp="2009-08-05 14:37:18,953" level="DEBUG" thread="pool-3-thread-10"
logger="com.emc.documentum.core.fulltext.indexserver.services.facets.
impl.CompositeFacetsProcessor"
timeInMilliSecs="1249475838953">
<message ><![CDATA[Facet handler string(r_modify_date) returned 4 values.]]>
</message>
</event>
<event timestamp="2009-08-05 14:37:18,953" level="DEBUG" thread="pool-3-thread-10"
logger="com.emc.documentum.core.fulltext.indexserver.services.facets.
impl.CompositeFacetsProcessor"
timeInMilliSecs="1249475838953">
<message ><![CDATA[Sort facets]]></message>
</event>
<event timestamp="2009-08-05 14:37:18,953" level="DEBUG" thread="pool-3-thread-10"
logger="com.emc.documentum.core.fulltext.indexserver.services.facets.
impl.CompositeFacetsProcessor"
timeInMilliSecs="1249475838953">
<message ><![CDATA[End facet computation]]></message>
</event>

EMC Documentum xPlore Version 1.3 Administration and Development Guide

285

Facets

Troubleshooting facets
A query returns no facets
Check the security mode of the repository. Use the following IAPI command:
get,c,l,ftsearch_security_mode ... 1
API> retrieve,c,dm_ftengine_config ... 0800007580000916
...
0800007580000916
API> get,c,l,ftsearch_security_mode
...
0

If the command returns a 0, as in the example, set the security mode to evaluation in xPlore, not the
Content Server. Use the following IAPI command:
retrieve,c,dm_ftengine_config
set,c,1,ftsearch_security_mode
1
save,c,1
reinit,c

286

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Chapter 12
Using reports
This chapter contains the following topics:

About reports

Types of reports

Document processing (CPS) reports

Indexing reports

Search reports

Editing a report

Report syntax

Sample edited report

Troubleshooting reports

About reports
Reports provide indexing and query statistics, and they are also a troubleshooting tool. See the
troubleshooting section for CPS, indexing, and search for uses of the reports. describe how to use
reports for troubleshooting tips.
Statistics on content processing and indexing are stored in the audit database. Use xPlore administrator
to query these statistics. Auditing supplies information to reports on administrative tasks or queries
(enabled by default). For information on enabling and configuring auditing, see Auditing collection
operations, page 167.
To run reports, choose Diagnostic and Utilities and then click Reports. To generate Documentum
reports that compare a repository to the index, see Using ftintegrity, page 73.

Types of reports
The following types of reports are available in xPlore administrator.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

287

Using reports

Table 35

List of reports

Report title

Description

Audit records for search component

If search auditing is enabled in System > Global


configuration, you can view and create audit reports.
Filter for query type: interactive, subscription,
warmup, test search, report, metrics, ftintegrity,
consistency checker, or all.

Audit records for admin component

If admin auditing is enabled in System > Global


configuration, you can view and create audit reports.

Audit records for final merge

Displays the following detailed information about


final merges: domain, collection, instance, type,
trigger time, start time, finish time, wait time, process
time, total time, and status.

Audit records for warmup component

If search auditing is enabled in System > Global


configuration, you can view and create audit reports
on index and query warmup.

Average time of an object in each indexing stage

Reports the average bytes and time for the following


processing stages: CPS Queue, CPS Executor,
CPS processing, Index Queue, Index Executor,
Index processing, StatusDB Queue, StatusDB
Executor, StatusDB Update. This value indicates
how long a document sitting in the statusDB update
queue after the document has completed indexing.
The timing is affected by the following two CPS
configuration parameters: status-requests-batch-size
and status-thread-wait-time.

Content too large to index

Displays format, count, average size, maximum


size, and minimum size. Summarized by domain
(Documentum repository).

Document processing error summary

Use first to determine the most common problems.


Displays error code, count, and error text.

Document processing error detail

Drill down for error codes. Report for each code


displays the request ID, domain, date and time,
format, and error text.

288

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Using reports

Report title

Description

Documents ingested per month

Per month: totals for current year, including document


count, bytes ingested, average processing latency, and
CPS error count. Per day: totals for current month.
Per hour: hourly totals for current day.

Get query text

Click the query ID from the report Top N slowest


queries to get the XQuery expression.

QBS activity report by ID

Find the subcribed queries that take the longest


processing time or are run the most frequently.

QBS activity report by user

Find users whose subscribed queries take the longest


processing time.

Query counts by user

For each user, displays domain, number of queries,


average response time, and maximum and minimum
response times, and last result count (sortable
columns). Filter for query type: interactive,
subscription, warmup, test search, report, metrics,
ftintegrity, consistency checker, or all. Number of
users sets last N users to display. To get slowest
queries for a user, run the report Top N slowest
queries.

Top query terms

Displays most common query terms including number


of queries and average number of hits.

Top N slowest queries

Displays the slowest queries. Select Number of results


to display. Optionally, specify a user to get slowest
queries for a the. Specify the date and time range.
Sort by time to first result, processing time, number of
results fetched, number of hits, number filtered out by
security, or most recent queries. Filter for query type:
interactive, subscription, warmup, test search, report,
metrics (query of metrics database for reports or
indexing statistics), ftintegrity, consistency checker, or
all. Total search hits: Count before security filtering.
Number of results fetched: Count after filtering.

User activity

Displays query and ingestion activity and errors for


the specified user and specified period of time. Data
can be exported to Microsoft Excel. Query links
display the xQuery.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

289

Using reports

Document processing (CPS) reports


Run the Document processing error summary report to find the count for each type of problem. The
error count for each type is listed in descending order. The following types of processing errors are
reported: request and fetch timeout, invalid path, fetching errors, password protection or encryption,
file damage, unsupported format, language and parts of speech detection, or document size.
View detailed reports for each type of processing error. For example, the Document processing error
detail report for Error code 770 (File corrupt) displays object ID, domain, date, time, format, and error
text. You can then locate the document in xPlore administrator by navigating to the domain and
filtering the default collection for the object ID. Using the object ID, you can view the metadata in
Content Server to determine the document owner or other relevant properties.
Run the report Content too large to index to see how many documents are being rejected for size. If
your indexing throughput is acceptable, you can increase the size of documents being indexed. For
more information about indexing performance, see Indexing performance, page 324.
Run the report User activity to see ingestion activity and error messages for ingestion by a specific
user and time period.

Indexing reports
To view indexing rate, run the report Documents ingested per month/day/hour. The report shows
Average processing latency. The monthly report covers the current 12 months. The daily report covers
the current month. The hourly report covers the current day. From the hourly report, you can determine
your period of highest usage. You can divide the document count into bytes processed to find out the
average size of content ingested. For example, 2,822,469 bytes for 909 documents yields an average
size of 3105 bytes. This size does not include non-indexable content.

Search reports
Enable auditing in xPlore administrator to view query reports (enabled by default).

Top N slowest queries


Find the slowest queries by selecting Top N slowest queries. To determine how many queries are
unselective, sort by Number of results fetched. Results are limited by default in Webtop to 350.
Sort Top N slowest queries by Number of hits denied access by security filter to see how many
underprivileged users are experiencing slow queries due to security filtering. For information on
changing the security cache, see Configuring the security cache, page 54.

Get query text

290

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Using reports

To examine a slow or failed query by a user, get the query ID from Top N slowest queries and then
enter the query ID into Get query text. Examine the query text for possible problems. The following
example is a slow query response time. The user searched in Webtop for the string "xplore" (line
breaks added here):
declare option xhive:fts-analyzer-class
com.emc.documentum.core.fulltext.indexserver.core.index.xhive.IndexServerAnalyzer;
for $i score $s in collection(
/DSS_LH1/dsearch/Data) /dmftdoc[( ( ( (dmftmetadata//a_is_hidden = false) ) )
and ( (dmftinternal/i_all_types = 030a0d6880000105) )
and ( (dmftversions/iscurrent = true) ) )
and ( (. ftcontains ( ((xplore) with stemming) ) )) ]
order by $s descending return
<dmrow>{if ($i/dmftinternal/r_object_id) then $i/dmftinternal/r_object_id
else
<r_object_id/>}{if ($i/dmftsecurity/ispublic) then $i/dmftsecurity/ispublic
else <ispublic/>}{if ($i/dmftinternal/r_object_type) then
$i/dmftinternal/r_object_type
else <r_object_type/>}{if ($i/dmftmetadata/*/owner_name)
then $i/dmftmetadata/*/owner_name
else <owner_name/>}{if ($i/dmftvstamp/i_vstamp) then $i/dmftvstamp/i_vstamp
else <i_vstamp/>}{if ($i/dmftsecurity/acl_name) then $i/dmftsecurity/acl_name
else <acl_name/>}{if ($i/dmftsecurity/acl_domain) then $i/dmftsecurity/acl_domain
else <acl_domain/>}<score dmfttype=dmdouble>{$s}</score>{xhive:highlight(
$i/dmftcontents/dmftcontent/dmftcontentref)}</dmrow>

Use the xDB admin tool to debug the query. For instructions on using xhadmin, see Debugging
queries, page 259.

Query counts by user


Use Query counts by user to determine which users are experiencing the slowest query response times
or to see queries by a specific user. You can filter by date and domain.

User activity
Use User activity to display queries by the specified user for the specified time. Data can be exported
to Microsoft Excel. Click a query link to see the xQuery.
Note: This report can take a very long time to run. If you enter a short date range or a user name,
the report runs much faster.

Editing a report
You can edit any of the xPlore reports. Select a report in xPlore administrator and click Save as.
Specify a unique file name and title for the report. Alternatively, you can write a new copy of the report
and save it to xplore_home/jboss5.1.0/server/primary_instance/deploy/dsearchadmin.war/reports.
To see the new report in xPlore administrator, click somewhere else in xPlore administrator and
then click Reports.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

291

Using reports

Reports are based on the W3C XForms standard. For a guide to the syntax in a typical report, see
Report syntax, page 292.

Accessing the audit record


The audit record is stored in the xDB database for the xPlore federation. You can filter the audit
record by date using xPlore administrator. To view the entire audit record, drill down to the AuditDB
collection in Data Management > SystemData. Click AuditDB and then click auditRecords.xml.

Adding a variable
Reports require certain variables. The XForms processor substitutes the input value for the variable in
the query.
1. Declare it.
2. Reference it within the body of the query.
3. Define the UI control and bind it to the data.
These steps are highlighted in the syntax description, Report syntax, page 292.

Report syntax
xPlore reports conform to the W3C XForms specification. The original report XForms are located in
xplore_home/jboss5.1.0/server/DctmServer_PrimaryDsearch/deploy/dsearchadmin.war/reports. You
can edit a report in xPlore administrator and save it with a new name. Alternatively, you can copy the
XForms file and edit it in an XML editor of your choice.
These are the key elements that you can change in a report:
Table 36

Report elements

Element

Description

xhtml:head/input

Contains an element for each input field

xhtml:head/query

Contains the XQuery that returns report results

xforms:action

Contains xforms:setvalue elements.

xforms:setvalue

Sets a default value for an input field. The ref attribute


specifies a path within the current XForms document
to the input field.

xforms:bind

Sets constraints for an input field. The nodeset


attribute specifies a path within the current XForms
document to the input field.

xhtml:body

Contains the xhtml markup that is rendered in a


browser (the report UI)

The following example highlights the user the input field startTime in the report Query Counts By
User (rpt_QueryByUser.xml). The full report is line-numbered for reference in the example (some
lines deleted for readability):
292

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Using reports

1 ...<xforms:model><xforms:instance><ess_report xmlns="">
2 <input>
3
<startTime/><endTime/>...
4 </input>

1 Defines an XForms instance, model (data definition), and ess_report.


2 Model data: Declares the input fields for start and end date and time variables.

1
2
3
4
5
6
7
}

<query><![CDATA[ ...
let $u1 := distinct-values(collection(/SystemData/AuditDB)//
event[@component = "search"...
and START_TIME[ . >= $startTime]...
return <report ...>...<rowset>...
for $d in distinct-values(collection(/SystemData...
and START_TIME[ . >= $startTime] and START_TIME[ . <= $endRange]]...
return let $k := collection(AuditDB)...and
START_TIME[ . >= $startTime] and START_TIME[ . <= $endRange]
... return ...
</rowset></report> ]]></query></ess_report></xforms:instance>

1 Specifies the XQuery for the report. The syntax conforms to the XQuery specification.
xhtml:head/xforms:model/xforms:instance/ess_report/query

2 let: Part of an XQuery FLWOR expression that defines variables.


The first line specifies the collection for the report: let $u1 :=
distinct-values(collection(/SystemData/AuditDB)...

3 References the start time and end time variables and sets criteria for them in the query: as greater
than or equal to the input start time and less than or equal to the input end time:
and START_TIME[ . >= $startTime]
and START_TIME[. <= $endRange]]/USER_NAME)

4 return report/rowset: The return is an XQuery FLWOR expression that specifies what is returned
from the query. The transform plain_table.xsl, located in the same directory as the report, processes
the returned XML elements.
5 This expression iterates over the rows returned by the query. This particular expression evaluates
all results, although it could evaluate a subset of results.
6 This expression evaluates various computations such as average, maximum, and minimum
query response times.
7 The response times are returned as row elements (evaluated by the XSL transform).
1 <xforms:action ev:event="xforms-ready">
2
<xforms:setvalue ref="input/startTime" value="seconds-to-dateTime(
seconds-from-dateTime(local-dateTime()) - 24*3600)"/>...
</xforms:action>...
3 <xforms:bind nodeset="input/startTime" constraint="seconds-from-dateTime(

EMC Documentum xPlore Version 1.3 Administration and Development Guide

293

Using reports

.) <= seconds-from-dateTime(../endTime)"/>
4 <xforms:bind nodeset="input/startTime" type="xsd:dateTime"/>...
</xforms:model>...</xhtml:head>
5 <xhtml:body>...<xhtml:tr class="">
6 <xhtml:td>Start from:</xhtml:td>
<xhtml:td><xforms:group>
7
<xforms:input ref="input/startTime" width="100px" ev:event="DOMActivate">
8<xforms:message ev:event="xforms-invalid" level="ephemeral">
The "Start from" date should be no later than the "to" date.
</xforms:message>
9<xforms:action ev:event="xforms-invalid">
<xforms:setvalue ref="../endTime" ev:event="xforms-invalid"
value="../startTime"/><xforms:rebuild/>
</xforms:action>
</xforms:input></xforms:group></xhtml:td></xhtml:tr>...

1.

xforms:action completes the xforms model data.

2. Sets the data mode using XPath expressions.


3. Binds the form control to the startTime variable and constrains the data that can be entered, using
the XPath constraint seconds-from-dateTime.
4. Binds the form control to the XML schema datatype dateTime.
5.

xhtml:body: Defines the UI presentation in xhtml. The body contains elements that conform to
XForms syntax. The browser renders these elements.

6. 6 The first table cell in this row contains the label Start From:
7.

xforms:input contain elements that define the UI for this input control. Attributes on this element
define the width and event that is fired.

8. xforms:message contains the message when the entry does not conform to the constraint.
9.

xforms:action ev:event="xforms-invalid" defines the invalid state for the input control. Entries
after the end date are invalid.

Sample edited report


This example edits the Query counts by user report to add a column for number of failed queries.
1. Using an XML editor, open the report rpt_QueryByUser.xml located in
xplore_home/jboss5.1.0/server/DctmServer_PrimaryDsearch/deploy/dsearchadmin.war/reports.
2. Save the report with a new file name.
3. Change the report title in metadata/title, CDATA, to Failed Query Counts By User.
4. After the column element whose value is Query Cnt, add the following column:
<column type="integer">Failed Queries</column>

5. This step finds failed queries. Locate the variable definition for successful queries (for $j ...let $k
...) and add your new query. Find the nodes in a QUERY element whose TOTAL_HITS value is
equal to zero to get the failed queries:
294

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Using reports

let $z := collection(AuditDB)//event[@component = "search" and @name = "QUERY"


and START_TIME[ . >= $startTime and . <= $endRange] and USER_NAME = $j
and TOTAL_HITS = 0]

6. Define a variable for the count of failed queries and add it after the variable for successful query
count (let $queryCnt...):
let $failedCnt := count($z)

7. Return the failed query count cell, after the query count cell (<cell> { $queryCnt } ...):
<cell> { $failedCnt } </cell>

8. Redefine the failed query variable to get a count for all users. Add this line after <rowset...>let $k...:
let $z := collection(AuditDB)//event[@component = "search" and @name = "
QUERY" and START_TIME[ . >= $startTime and . <= $endRange] and USER_NAME
and TOTAL_HITS = 0]

9. Add the total count cell to this second rowset, after <cell> { $queryCnt } </cell>:
<cell> { $failedCnt } </cell>

10. Save and run the report. The result is like the following:

EMC Documentum xPlore Version 1.3 Administration and Development Guide

295

Using reports

Figure 18

Customized report for query count

If your query has a syntax error, you get a stack trace that identifies the line number of the error. You
can copy the text of your report into an XML editor that displays line numbers, for debugging.
If the query runs slowly, it will time out after about one minute. You can run the same query in
the xDB admin tool.

Troubleshooting reports
If you update Internet Explorer or turn on enforced security, reports no longer contain content. Open
Tools > Internet Options and choose the Security tab. Click Trusted sites and then click Sites. Add
the xPlore administrator URL to the Trusted sites list. Set the security level for the Trusted sites zone
by clicking Custom level. Reset the level to Medium-Low.

296

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Chapter 13
Logging
This chapter contains the following topics:

Configuring logging

CPS logging

Configuring logging
Note: Logging can slow the system and consume disk space. In a production environment, run the
system with minimal logging.
Basic logging can be configured for each service in xPlore administrator. Log levels can be set for
indexing, search, CPS, xDB, and xPlore administrator. You can log individual packages within these
services, for example, the merging activity of xDB. Log levels are saved to indexserverconfig.xml and
are applied to all xPlore instances. xPlore uses slf4j (Simple Logging Faade for Java) to perform
logging.
To set logging for a service, choose System Overview in the left panel. Choose Global Configuration
and then choose the Logging Configuration tab to configure logging for all instances. You can open
one of the logging families like xDB and set levels on individual packages.
To customize the instance-level log setting, edit the logback.xml file in each xPlore instance. The
logback.xml file is located in the WEB-INF/classes directory for each deployed instance war file, for
example, xplore_home/boss5.1.0/server/DctmServer_PrimaryDsearch/deploy/dsearch.war. Levels set
in logback.xml have precedence over log levels in xPlore administrator. Changes to logback.xml
take up to two minutes to take effect.
Each logger logs a package in xPlore or your customer code. The logger has an appender that specifies
the log file name and location. DSEARCH is the default appender. Other defined appenders in the
primary instance logback configuration are XDB, CPS_DAEMON, and CPS.
You can add a logger and appender for a specific package in xPlore or your custom code. The
following example adds a logger and appender for the package com.mycompany.customindexing::
<logger name="com.mycompany.customindexing" additivity="false"
level="INFO">
<appender name="CUSTOM" class="
ch.qos.logback.core.rolling.RollingFileAppender">
<file>C:/xPlore/jboss5.1.0/server/DctmServer_PrimaryDsearch/
logs/custom.log
</file>
<encoder>
<pattern>%date %-5level %logger{20} [%thread]
%msg%n</pattern>

EMC Documentum xPlore Version 1.3 Administration and Development Guide

297

Logging

<charset>UTF-8</charset>
</encoder>
<rollingPolicy class="ch.qos.logback.core.rolling.
FixedWindowRollingPolicy">
<maxIndex>100</maxIndex>
<fileNamePattern>C:/xPlore/jboss5.1.0/server/DctmServer_
PrimaryDsearch/logs/custom.log.%i</fileNamePattern>
</rollingPolicy>
<triggeringPolicy class="ch.qos.logback.core.rolling.
SizeBasedTriggeringPolicy">
<maxFileSize>10MB</maxFileSize>
</triggeringPolicy>
</appender>
</logger>

You can add your custom logger and appender to logback.xml. Add it to a logger family if you want
your log entries to go to one of the logs in xPlore administrator. This is an optional step if you dont
add your custom logger to a logger family, it will still log to the file that you specify in your appender.
Logger families are defined in indexserverconfig.xml. They are used to group logs in xPlore
administrator. You can set the log level for the family, or expand the family to set levels on individual
loggers.
The following log levels are available. Levels are shown in increasing severity and decreasing amounts
of information, so that TRACE displays more than DEBUG, which displays more than INFO.
TRACE
DEBUG
INFO
WARN
ERROR
Troubleshooting the index agent, page 85 provides information about the logging configuration for
the index agent.
Enabling logging in a client application, page 308 provides information about logging for xPlore
client APIs.
Tracing, page 304 indicates how to enable and configure tracing.

Viewing logs
You can view indexing, search, CPS, and xDB logs in xPlore administrator. Choose an instance in
the tree and click Logging. Indexing and search messages are logged to dsearch. Click the tab for
dsearch, cps, cps_daemon, or xdb to view the last part of the log. Click Download All Log Files
to get links for each log file.

Query logging
The xPlore search service logs queries. When you turn on query auditing (default is true), additional
information is saved to the audit record and is available in reports. Auditing queries, page 244 provides
more information about query logging.
298

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Logging

For each query, the search service logs the following information for all log levels:
Start of query execution including the query statement
Total results processed
Total query time including query execution and result fetching
More query information is logged when native xPlore security (not Content Server security) is
enabled. When query auditing is enabled, you can filter for the following query types in the search
audit records report: interactive, subscription, warmup, test search, report, metrics, ftintegrity,
consistency checker, or all.
Set the log level in xPlore administrator. Open Services in the tree, expand and select Logging, and
click Configuration. You can set the log level independently for administration, indexing, search,
and default. Levels in decreasing amount of verbosity: TRACE, DEBUG, INFO, WARN (default),
and ERROR.
The log message has the following form:
2012-03-28 11:16:45,798 WARN [IndexWorkerThread-6]
c.e.d.c.f.i.core.index.plugin.XhivePlugin - Document id: 090023a380000202,
message: CPS Warning [Unknown error during text extraction(native code: 961,
native msg: access violation)].

To view a log, choose the instance and click Logging. The following examples from dsearch.log
show a query:
2012-03-28 12:19:02,664 INFO [RMI TCP Connection(9)-10.8.47.144]
c.e.d.c.fulltext.indexserver.search.SearchServer QueryID=PrimaryDsearch$6f35b53d-34b8-470d-b699-5b4364ef0815,
query-locale=en,query-string=let $j:= for $i score $s in /dmftdoc
[. ftcontains ASMAgentServer with stemming]
order by $s descending return <d> {$i/dmftmetadata//r_object_id}
{ $i/dmftmetadata//object_name } { $i/dmftmetadata//r_modifier } </d>
return subsequence($j,1,200) is running
...
2012-03-28 12:19:05,117 INFO [pool-14-thread-10]
c.e.d.c.f.i.admin.mbean.ESSAdminSearchManagement QueryID=PrimaryDsearch$6f35b53d-34b8-470d-b699-5b4364ef0815,
Result count=1,bytes count=187

CPS logging
CPS uses the xPlore slf4j logging framework. A CPS instance that is embedded in an xPlore instance
(installed with xPlore, not separately) uses the logback.xml file in WEB-INF/classes of the dsearch
web application. A standalone CPS instance uses logback.xml in the CPS web application, in the
WEB-INF/classes directory.
If you have installed more than one CPS instance on the same host, each instance has its own web
application and logback.xml file. To avoid one instance log overwriting another, make sure that each
file appender in logback.xml points to a unique file path.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

299

Chapter 14
Setting up a Customization Environment
This chapter contains the following topics:

Setting up the xPlore SDK

Customization points

Adding custom classes

Tracing

Enabling logging in a client application

Handling a NoClassDef exception

Setting up the xPlore SDK


1. Download the xPlore software development kit (SDK) from the EMC Online Support
(https://support.emc.com).
2. Expand the download file to your development host. The SDK contains the following content:
README: Information about the SDK file.
LICENSE: The redistribution license for the Eclipse online help framework used by xPlore.
conf: Sample configuration files. The configuration parameters are described within the file
dsearchclientfull.properties. For information on sample-logback-for-xplore-client.xml, see
Enabling logging in a client application, page 308 and Tracing, page 304.
dist: xPlore distribution APIs and classes needed for customizations.
doc: This guide and javadocs.
lib: Java libraries that your application needs.
samples: Examples of a FAST thesaurus API and a natural language processing UIMA module.
3. Add the lib directory to your project or system classpath.
4. Add the distribution jar files to your build path.

Customization points
You can customize indexing and searching at several points in the xPlore stack. The following
information refers to customizations that are supported in a Documentum environment.
The following diagram shows indexing customization points.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

301

Setting up a Customization Environment

Figure 19

Indexing customization points

1. Using DFC, create a BOF module that pre-filters content before indexing. See Custom content
filters, page 83.
2. Create a TBO that injects data from outside a Documentum repository, either metadata or content.
You can use a similar TBO to join two or more Documentum objects that are related. See Injecting
data and supporting joins, page 80.
3. Create a custom routing class that routes content to a specific collection based on your enterprise
criteria. See Creating a custom routing class, page 151.
The following diagram shows query customization points.

302

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Setting up a Customization Environment

Figure 20

Query customization points

1. Using WDK, modify Webtop search and results UI. See EMC Documentum Search Development
Guide.
2. Using DFS, implement StructuredQuery, which generates an XQuery expression. xPlore processes
the expression directly. See Building a query with the DFS search service, page 260.
3. Using DFC or DFS, create NOFTDQL queries or apply DQL hints (not recommended except for
special cases).
DQL is evaluated in the Content Server. Implement the DFC interface IDfQuery and the DFS query
service. FTDQL queries are passed to xPlore. Queries with the NOFTDQL hint or which do not
conform to FTDQL criteria are not passed to xPlore. See DQL Processing, page 226.

4. Using DFC, modify Webtop queries. Implement the DFC search service, which generates XQuery
expressions. xPlore processes the expression directly. See EMC Documentum Search Development
Guide.
5. Using DFC, create XQueries using IDfXQuery. See Building a DFC XQuery, page 261.
6. Create and customize facets to organize search results in xPlore. See the Facets chapter.
7. Target a specific collection in a query using DFC or DFS APIs. See Routing a query to a specific
collection, page 257.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

303

Setting up a Customization Environment

8. Use xPlore APIs to create an XQuery for an XQuery client. See Building a query using xPlore
APIs, page 263.

Adding custom classes


Custom classes are registered in indexserverconfig.xml. To route documents from all domains using a
custom class, edit this configuration file. Custom routing classes are supported in this version of xPlore.
1.

Stop all instances in the xPlore system.

2. Edit indexserverconfig.xm in xplore_home/config using an XML-compliant editor.


3. Add the following element to the root element index-server-configuration, between the elements
system-metrics-service and admin-config, substituting your fully qualified class name.
<customization-config>
<collection-routing class-name="custom_routing_class_name"/>
</customization-config>

4. Place your class in the indexagent.war WEB-INF/classes directory. Your subdirectory path under
WEB-INF/classes must match the fully qualified routing class name.
5. Restart the xPlore instances, starting with the primary instance.

Tracing
xPlore tracing provides configuration settings for various formats of tracing information. You can
trace individual threads or methods. Use the file dsearchclientfull.properties, which is located
in the conf directory of the SDK. The configuration parameters are described within the file
dsearchclientfull.properties.
The xPlore classes are instrumented using AspectJ (tracing aspect). When tracing is enabled and
initialized, the tracing facility uses log4j API to log the tracing information.

Enabling tracing
Enable or disable tracing in xPlore administrator: Expand an instance and choose Tracing. Tracing
does not require a restart. Tracing files named ESSTrace.XXX.log are written to the Java IO temp
directory (where XXX is a timestamp generated by the tracing mechanism).
The tracing facility checks for the existence of a log4j and appender in the log4j.properties
file. When a logger and appender are not found, xPlore creates a logger named
com.emc.core.fulltext.utils.trace.IndexServerTrace.
When you enable tracing, a detailed Java method call stack is logged in one file. From that file, you
can identify the methods that are called, with parameters and return values.

304

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Setting up a Customization Environment

Configuring tracing
You can configure the name, location, and format of the log file for the logger and its appender in
indexserverconfig.xml or in the log4j.properties file. The log4j configuration takes precedence. You
can configure tracing for specific classes and methods. A sample log4j.properties file is in the SDK
conf directory. The following example in log4j.properties debugs a specific package:
log4j.logger.com.emc.documentum.core.fulltext.client.common = DEBUG

The following table describes the child elements of <tracing-config> in indexserverconfig.xml.


Table 37

Tracing configuration elements in tracing-config/tracing

Element and attribute

Description

Values

tracing enable

Turns tracing on or off

true | false ; default: false

tracing mode

Determines whether the tracing


records method entry and exit
on separate lines as they occur
(standard) or whether everything
(the method arguments and return
value) is recorded a single line
(compact). In compact mode, the
trace entries will appear in the
order of method entrance.

standard | compact. Default:


standard

tracing verbosity

Amount of information saved.

standard | verbose

output dir

Directory in which the trace file


should be placed. Value should be
specified.

Any valid directory; default:


current working directory . If not
writable, then tracing will use the
directory specified by the System
property java.io.tmpdir. Default is
C:\TEMP (Windows) or "/tmp" or
"/var/tmp" (Linux).

output file-prefix

In standard file creation mode, the


tracing infrastructure names log
files <file_prefix>.timestamp.log.

Any legal file name. Default:


xPloreTrace

output max-file-size

Maximum size that the log file can


reach before it rolls over

Any string that slf4j accepts for


MaxFileSize; default: 100MB

output max-backup-index

Specifies the number of log files.


Positive integer. Default: 1
The oldest log files are deleted first.

output file-creation-mode

Specifies whether to create one


single tracing file or one file per
thread.

print-exception-stack

true | false ; default: false


Specifies whether tracing should
print the exception stack when a
method call results in an exception.
If false, logs only the name and
message of the exception. If true,
records the entire stack trace after
the method exit.

max-stack-depth

Limits the depth of calls that are


traced

EMC Documentum xPlore Version 1.3 Administration and Development Guide

single-file | file-per-thread

Integer; default: -1 (unlimited)

305

Setting up a Customization Environment

Element and attribute

Description

Values

tracing-filters/ method-name*

Default not set; all methods are


Repeating element that specifies
traced
methods to trace. When a thread
enters a method which matches one
of the filters, tracing will be turned
on for that thread. All calls made
within the context of that method
will be traced. Tracing continues
for that thread until the method that
was matched is exited.

tracing-filters/ thread-name

Case-sensitive repeating element


that filters the trace output to only
those threads whose names match
the filter. The filter is a regular
expression (see the Javadoc for
the class java.util.regex.Pattern
for syntax). For example,
Thread-[4-6] matches the threads
named Thread-4, Thread-5 or
Thread-6.

date-output format

Specifies date format if timing-style Format string conforms to the


is set to date.
syntax supported by the Java class
java.text.SimpleDateFormat

date-output column-width

If date format is specified, a value


for date column width is required.

Positive integer

date-output timing-style

Specifies the units for time


recording of call entrance or
exit (standard mode) or method
entrance (compact mode). In
compact mode, the second column
displays the duration of the method
call.

nanoseconds | milliseconds |
milliseconds_from_start | seconds |
date ; default: milliseconds

Regular expression conforming to


the regular expression language.
Default: Not set; all threads are
traced

* The method-name filter identifies any combinations of packages, classes, and methods to trace. The
property value is one or more string expressions that identify what is traced. Syntax with asterisk as
wild card:
([qualified_classname_segment][*]|*).[.[method_name_segment][*]0]

Trace log format


The format of each trace item entry is the following, with line breaks inserted for readability:
time-stamp [method-duration] [thread-name] [entry_exit_designation]
stack-depth-indicator qualified-class-name@object-identity-hash-code.
method(method-arguments)
[==>return-value|exception]

Key:
[method-duration] Appears only if tracing-config/tracing[@mode="compact"].
[entry_exit_designation] One of the following:
306

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Setting up a Customization Environment

ENTER: The entry represents a method entry.


EXIT: The entry represents a method exit.
!EXC!: The entry represents a call that results in an exception.
[==>return-value|exception] Recorded if the mode is compact or the entry records a method exit.
If the entry records an exception thrown by a method, the return value is the exception name and
message, if any.
The trace data are aligned vertically, and spaces separate the fields. Because trace output is vertically
aligned, the trace data can be sorted by tools like awk or Microsoft Excel.
Output for tracing-config/tracing[@enable="true"]
1218670851046 [http-8080-1] [ENTER]
..com.emc.documentum.core.fulltext.indexserver.core.Node$ContainerInfo@12bf560.
<init>("http://myhost:8080",true)
1218670851046 [http-8080-1] [EXIT]
..com.emc.documentum.core.fulltext.indexserver.core.Node$ContainerInfo@12bf560.
<init> ==> <void>

Reading trace output


To troubleshoot problems, inspect the trace file ESSTrace.XXX.log in the Java IO temp
directory (where XXX is a number generated by the tracing mechanism). For example, the file
ESSTrace.1319574712514.log is found on Windows 2008 at C:\TEMP.
The following sample from a trace file shows the XML generated by xPlore for a test index document.
The final line shows the repository or source, the collection name, and category name.
1263340384815 [http-0.0.0.0-9300-1] [ENTER]
.com.emc.documentum.core.fulltext.indexserver.admin.controller.ESSAdminWebService@
124502c.testIndexDocument("
<dmftdoc dmftkey="In this VM_txt1263340384408"
ess_tokens=":In this VM_txt1263340384408:dmftdoc:1">
<dmftkey>In this VM_txt1263340384408</dmftkey>
<dmftmetadata>
<dm_document>
<r_object_id dmfttype="dmid">In this VM_txt1263340384408</r_object_id>
<object_name dmfttype="dmstring">In this VM.txt</object_name>
<r_object_type dmfttype="dmstring">dm_document</r_object_type>...
</dmftdoc>","DSS_LH1","superhot",null)

In the following snippet, a CPS worker thread processes the same document:
1263340387580[CPSWorkerThread-1] [ENTER]
....com.emc.documentum.core.fulltext.indexserver.cps.CPSElement@1f1df6b.<init>(
"dmftcontentref","
file:///C:/DOCUME~1/ADMINI~1.EMC/LOCALS~1/Temp/3/In this VM.txt",true,
[Lcom.emc.documentum.core.fulltext.indexserver.cps.CPSOperation;@82efed)

Now the document is indexed:


1263340395783 [IndexWorkerThread-1] [ENTER] com.emc.documentum.core.fulltext.
indexserver.core.index.xhive.IndexServerAnalyzer.addElement(
<object_name dmfttype="dmstring">In this VM.txt</object_name>,{

EMC Documentum xPlore Version 1.3 Administration and Development Guide

307

Setting up a Customization Environment

[]={[], "null", []}, [child::dmftkey[0]]={[child::dmftkey[0]], "


In this VM_txt1263340384408", ...

A search on a string "FileZilla in the document renders this XQuery expression, source repository,
language, and collection in the executeQuery method:
1263340474627 [http-0.0.0.0-9300-1] [ENTER]
.com.emc.documentum.core.fulltext.indexserver.admin.controller.
ESSAdminWebService@91176d.executeQuery("for $i in /dmftdoc[. ftcontains
FileZilla] return
<d> {$i/dmftmetadata//r_object_id} { $i//object_name } { $i//r_modifier } </d>","
DSS_LH1","en","superhot")

A query for a string in the file is executed:


1263340474643 [RMI TCP Connection(116)-10.8.8.53] [EXIT]
..com.emc.documentum.core.fulltext.indexserver.search.SearchMessages.getFullMessage
==> "QueryID=PrimaryDsearch$ba06863d-7713-4e0e-8569-2071cff78f71,query-locale=
en,query-string=for $i in /dmftdoc[. ftcontains FileZilla] return
<d> {$i/dmftmetadata//r_object_id} { $i//object_name } { $i//r_modifier } </d>
is running"

The execution results are then logged in dsearch.log:


1263340475190[pool-10-thread-10] [EXIT]
..com.emc.documentum.core.fulltext.utils.log.ESSXMLLayout@1cff504.format ==> "
<event timestamp="2010-01-12 23:54:35,190" level="INFO" thread="pool-10-thread-10"
logger="com.emc.documentum.core.fulltext.search" timeInMilliSecs="1263340475190">
<message ><![CDATA[QueryID=PrimaryDsearch$ba06863d-7713-4e0e-8569-2071cff78f71,
execution time=547 Milliseconds]]></message>
</event>"

You can find all trace statements for the document being indexed by searching on the dmftkey
value. In this example, you search for "In this VM_txt1263340384408" in the trace log. You
can find all trace statements for the query ID. In this example, you search for the query ID
"PrimaryDsearch$ba06863d-7713-4e0e-8569-2071cff78f71" in the trace log.

Enabling logging in a client application


Logging for xPlore operations is described in Configuring logging, page 297.
You can enable logging for xPlore client APIs and set the logging parameters in your client application.
Use the file sample-logback-for-xplore-client.xml, which is located in the conf directory of the SDK.
Save the file as logback.xml and put it in your client application classpath. All the xPlore JARs (JARs
beginning with dsearch) will use the configuration.
The following procedure describes the high-level steps to use slf4j logger:
1. Import slf4j classes.
2. Use LoggerFactory.getLogger to get slf4j logger.
3. Log messages.

308

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Setting up a Customization Environment

Figure 21

Sample class using slf4j logger


import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class yourclass
{
private static transient Logger logger=
LoggerFactory.getLogger(yourclass.class);
..
logger.info(msg);

The slf4j documentation provides more information on slf4j logging.

Handling a NoClassDef exception


If you see the following Java exception, you have not included all of the libraries (jar files) in the
SDK dist and lib directories:
...java.lang.NoClassDefFoundError: com/emc/documentum/fs/rt/ServiceException at
com.emc.documentum.core.fulltext.client.admin.api.FtAdminFactory.getAdminService(
FtAdminFactory.java:41)

EMC Documentum xPlore Version 1.3 Administration and Development Guide

309

Chapter 15
Performance and Disk Space
This chapter contains the following topics:

Planning for performance

Disk space and storage

System sizing for performance

Memory consumption

Measuring performance

Tuning the system

Tuning index merges

Indexing

Documentum index agent performance

Indexing performance

Excluding duplicates on ingestion

Throttling indexing

Search performance

Planning for performance


Plan your system sizing to match your performance and availability requirements. See the system
planning topic in EMC Documentum xPlore Installation Guide and the xPlore sizing tool on the
EMC Community Network. This information helps you plan for the number of hosts and storage.
The following diagram shows ingestion scaling. As you increase the number of documents in your
system, or the rate at which documents are added, do the following:
1. Add memory, disk, or CPU
2. Add remote CPS or more JVM memory
3. Increase the number of collections for ingestion specificity.
4. Add xPlore instances on the same or different hosts to handle your last scaling needs.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

311

Performance and Disk Space

Figure 22

Scaling ingestion throughput

Multiple collections increase the throughput for ingestion. You can create a collection, ingest
documents to it, then move it to be a subcollection of a parent collection. (See Moving a collection,
page 162. Fewer collections speed up search.
Use the rough guidelines in the following diagram to help you plan scaling of search. The order of
adding resources is the same as for ingestion scaling.

312

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Performance and Disk Space

Figure 24

Scaling number of users or query complexity in search

Disk space and storage


Managing xPlore disk space
xPlore provides a sizing tool for disk space on the EMC Community Network. As a rule of thumb, you
need twice the final index space, in addition to the index space itself, for temporary Lucene merges.
For example, if the index size after migration is 60 GB, you need an additional 120 GB for merges and
optimizations. As the index grows, the disk space must grow correspondingly.
xPlore requires disk space for the following components. xDB and Lucene require most of the xPlore
space. Choose a domain in xPlore administrator to get the total size.
Table 38

How xPlore uses disk space

Component

Space use

Indexing

xDB

dftxml representation of
document content and
metadata, metrics, audit,
and Document ACLs and
groups.

Next free space consumed Random access retrieval


by disk blocks for batches of particular elements and
of XML files.
summary.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Search

313

Performance and Disk Space

Component

Space use

Indexing

Search

Lucene

Stores an index of content Information is updated


and metadata. Locations: through inserts and
xplore_home/data/../lucene- merges.
index.

Inverted index lookup,


facet and security lookup.

xDB transaction (redo)


log

Stores transaction
information.

Updates areas in xDB


from log.

Sometimes provides
snapshot during retrieval.

Lucene temporary
working area

Used for Lucene updates


of non-transactional data.

None
Uncommitted data is
stored to the log. Allocate
twice the final index size
for merges.

The following procedures limit the space consumed by xPlore:


Status database
Purge the status database when the xPlore primary instance starts up. By default, the status DB is
not purged. See Managing the status database, page 39
Saved tokens
If you have specified save-tokens for summary processing, edit indexserverconfig.xml to limit the
size of tokens that are saved. For information on viewing and updating this file, see Modifying
indexserverconfig.xml, page 43. Set the maximum size of the element content in bytes as the value
of the attribute extract-text-size-less-than. Tokens are not saved for larger documents. Set the
maximum size of tokens for the document as the value of the attribute token-size. For details on
extraction settings, see Configuring text extraction, page 137.
Lemmas
You can turn off alternative lemmatization support, or turn off lemmatization entirely, to save space.
See Configuring indexing lemmatization, page 105.

Estimating index size (Documentum environments)


The average size of indexable content within a document varies from one document type to another and
from one enterprise to another. You must calculate the average size for your environment. The easiest
estimate is to use the disk space that was required for a Documentum indexing server with FAST. If you
have not installed a Documentum indexing server, use the following procedure to estimate index size.
1. Perform a query to find the average size of documents, grouped by a_content_type, for example:
select avg(r_full_content_size),a_content_type from dm_sysobject group by
a_content_type order by 1 desc

2. Perform a query to return 1000 documents in each format. Specify the average size range, that is,
r_full_content_size greater than (average less some value) and less than (average plus some value).
Make the plus/minus value a small percentage of the average size. For example:
select r_object_id,r_full_content_size from dm_sysobject
where r_full_content_size >(1792855 -1000) and
r_full_content_size >(1792855 +1000) and
a_content_type = zip enable (return_top 1000)

314

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Performance and Disk Space

3. Export these documents and index them into new, clean xPlore install.
4. Determine the size on disk of the dbfile and lucene-index directories in xplore_home./data
5. Extrapolate to your production size.
For example, you have ten indexable formats with an average size of 270 KB from a repository
containing 50000 documents. The Content Server footprint is approximately 12 GB. You get a sample
of 1000 documents of each format in the range of 190 to 210 KB. After export and indexing, these
10000 documents have an indexed footprint of 286 MB. Your representative sample was 20% of the
indexable content. Thus your calculated index footprint is 5 x sample_footprint=1.43 GB (dbfile 873
MB, lucene-index 593 MB).

Disk space vs. indexing rebuild performance


If you save indexing tokens for faster index rebuilding, they consume disk space. By default
they are not saved. Edit indexserverconfig.xml and set domain.collection.properties.property
"save-tokens" to true for a collection. For information on viewing and updating this file, see Modifying
indexserverconfig.xml, page 43.

Tuning xDB properties for disk space


You can set the following property in xdb.properties, which is located in the directory WEB-INF/classes
of the primary instance. If this property is not listed, you can add it.
TEMP_PATH
Temporary path for Lucene index. If not specified, the current system property java.io.tmpdir is used.

Adding storage
The data store locations for xDB libraries are configurable. The xDB data stores and indexes can
reside on a separate data store, SAN or NAS. Configure the storage location for a collection in xPlore
administrator. You can also add new storage locations through xPlore administrator. See Changing
collection properties, page 161.

Storage types and locations


Table 39

Comparison of storage types performance

Function

SAN

NAS

local disk

iSCSI

CFS

Used for
Content Server

Common

Common
(content)

Common

Rare

Rare

Network

Fiber

Ethernet

Local

Ethernet

Fiber

Performance

Best

Slower than
SAN, improved
with 10GE

Good until I/O


limit reached

Slower than
SAN, improved
with 10GE

Almost as fast
as SAN

EMC Documentum xPlore Version 1.3 Administration and Development Guide

315

Performance and Disk Space

Function

SAN

NAS

local disk

iSCSI

CFS

High
availability

Requires cluster
technology

Provides shared
drives for server
takeover

Requires
complete dual
system

Requires cluster
technology

Provides shared
drives for server
takeover

xPlore multiinstance

Requires
network shared
drives

Drives already
shared

Requires
network shared
drives

Requires
network shared
drives

Drives already
shared

System sizing for performance


You can size several components of an xPlore system for performance requirements:
CPU capacity
I/O capacity (the number of disks that can write data simultaneously)
Memory for temporary indexing usage

Sizing for migration from FAST


When you compare sizing of the FAST indexing system to xPlore, use the following guidelines:
Size with the same allocations used for FAST, unless the FAST installation was very undersized or
you expect usage to change.
Use VMware-based deployments, which were not supported for FAST.
Include sizing for changes to existing documents:
A modification to a document requires the same CPU for processing as a new document.
A versioned document requires the same (additional) space as the original version.
Size for high availability and disaster recovery requirements.

Add processing instances


CPS processing of documents is typically the bottleneck in ingestion. CPS also processes queries. You
can add CPS instances either on the same host as the primary instance or on additional hosts (vertical
and horizontal scaling, respectively). You should have at least one CPS instance for each Documentum
repository. If you have process documents for multiple collections, add an instance for each collection.
A remote CPS instance does not perform as well as a CPS instance on an indexing instance.
The remote instance adds overhead for the xPlore system. To add CPS instances, run the xPlore
configuration script and choose Create Content Processing Service Only.
Sizing for search performance
You can size several components of an xPlore system for search performance requirements:
CPU capacity
Memory for query caches
316

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Performance and Disk Space

Using xPlore administrator, change the value of query-result-cache-size in search service


configuration and restart the search service.

Memory consumption
Following are ballpark estimates for memory consumption by the various components in an xPlore
installation.
Table 40

Average memory consumption

Component

RAM

Index agent

1+ GB (4+ GB on 64-bit host)

xPlore indexing and search services

4 GB

CPS daemon

2 GB

For best performance, add index agent processes and CPS instances on hosts other than the xPlore host.

Measuring performance
The following metrics are recorded in the metrics database. View statistics in xPlore administrator
to help identify specific performance problems. Select an xPlore instance and then choose Indexing
Service or Search Service to see the metric. Some metrics are available through reports, such as
document processing errors, content too large, and ingestion rate.
Table 41

Metrics mapped to performance problems

Metric

Service

Problem

Ingestion throughput: Bytes and


docs indexed per second, response
time, latency

Indexing Service

Slow document indexing


throughputs

Total number of documents


indexed (or bytes)

Indexing Service

Indexing runs out of disk space

Formats

Content Processing Service

Some formats are not indexable

Languages

Content Processing Service

A language was not properly


identified

Error count per collection

Content Processing Service

Finding collection where errors


occurred

Number of new documents and


updates

Content Processing Service

Search response time

Search Service

Query timeouts or slow query


response

To get a detailed message and count of errors, use the following XQuery in xPlore administrator:
for $i in collection(/SystemData/MetricsDB/PrimaryDsearch)
/metrics/record/Ingest[TypeOfRec=Ingest]/Errors/ErrorItem

EMC Documentum xPlore Version 1.3 Administration and Development Guide

317

Performance and Disk Space

return string(<R><Error>>{$i/Error}</Error><Space>" " </Space>


<Count>>{$i/ErrorCnt}</Count></R>

To get the total number of errors, use the following XQuery in xPlore administrator:
sum(for $i in collection(/SystemData/MetricsDB/PrimaryDsearch)
/metrics/record/Ingest [TypeOfRec=Ingest]/Errors/ErrorItem/ErrorCnt return $i)

Tuning the system


System tuning requires editing of indexserverconfig.xml. For information on viewing and updating
this file, see Modifying indexserverconfig.xml, page 43.

Excluding diacritics and alternative lemmas


Diacritics are not removed during indexing and queries. In some languages, a word changes meaning
depending on a diacritic. You can turn off diacritics indexing to improve ingestion and query
performance. See Handling special characters, page 108.
Alternate lemmas are also indexed. A word like "swim" is indexed as more than one part of speech
("swim" and "swimming") is more likely to be found on search. You can turn off alternative lemmas to
improve ingestion and query performance. See Configuring indexing lemmatization, page 105.

Excluding xPlore files from virus scanners


Performance of both indexing and search can be degraded during virus scanning. Exclude xPlore
directories, especially the xplore_home/data directory.

Tuning memory pools


xPlore uses four memory caches. The last three are part of the xPlore instance memory and have a
fixed size:
OS buffer cache
Holds temporary files, xDB data, and Lucene index structures. Has largest impact on Lucene
index performance.
xDB buffer cache
Stores XML file blocks for ingestion and query. Increase for higher query rates: Change the value
of the property xhive-cache-pages in indexserver-bootstrap.properties. This file is located in the
WEB-INF/classes directory of the application server instance. Back up the xPlore federation after
you change this file, and then restart all instances.
Lucene working memory
Used to process queries. Lucene working memory is consumed from the host JVM process.
Increasing the JVM memory usually does not affect performance.
318

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Performance and Disk Space

xPlore caches
Temporary cache to buffer results. Using xPlore administrator, change the value of
query-result-cache-size in search service configuration and restart the search service.

Tuning virtual environments


VMware deployments require more instances than physical deployments. For example, VMware is
limited to eight cores.

Sizing the disk I/O subsystem


xPlore supports local disk, SAN, and NAS storage. These storage options do not have equal
performance. For example, NAS devices send more data and packets between the host and subsystem.
Jumbo frame support is helpful as is higher bandwidth.

Using compression
Indexes can be compressed to enhance performance. Compression uses more I/O memory. The
compress element in indexserverconfig.xml specifies which elements in the ingested document have
content compression to save storage space. Compressed content is about 30% of submitted XML
content. Compression can slow the ingestion rate by 10-20% when I/O capacity is constrained. See
Configuring text extraction, page 137.
If ingestion starts fast and gets progressively slower, set compress to false for subpath indexes in
indexserverconfig.xml. Modifying indexserverconfig.xml, page 43 describes how to view and update
this file.

Tuning index merges


This section describes the types of index merges and how to manage final merges.
You can monitor the merge of a collection as described in Monitoring merges of index data, page 165.

Types of merges
To improve indexing performance, the Lucene index database is split into small chunks called segments
that contain one or more indexed documents. Lucene adds segments as new documents are added to
the index. However, to improve query performance and save disk space, you can reduce the number of
segments by merging smaller segments into larger ones and ultimately into a single segment.
There are three levels of index merges that you can fine-tune to achieve an optimal and balanced level
of indexing and query performance.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

319

Performance and Disk Space

Lucene internal merges


Lucene automatically merges smaller segments into larger ones regularly on a frequent basis. You can
adjust the values of two xDB properties to fine tune the indexing performance.
mergeFactor
When new documents are added to a Lucene index, they are initially stored in memory instead of
being immediately written to the disk. The mergeFactor variable determines how many documents
to store in memory before writing them to the disk, as well as how often to merge multiple segments
together. With the default value of 10, Lucene will store 10 documents in memory before writing
them to a single segment on the disk. The mergeFactor value of 10 also means that once the number
of segments on the disk has reached 10, Lucene will merge these segments into a single segment.
For example, if you set mergeFactor to 10, a new segment will be created on the disk for every 10
documents added to the index. When the 10th segment of size 10 is added, all 10 will be merged
into a single segment of size 100. When 10 such segments of size 100 have been added, they will be
merged into a single segment containing 1000 documents, and so on. Therefore, at any time, there
will be no more than 9 segments in each power of 10 index size.
Using a higher value for mergeFactor will cause Lucene to use more RAM, but will let Lucene write
data to disk less frequently, which will speed up the indexing process. A smaller mergeFactor will
use less memory and will cause the index to be updated more frequently, which will make it more
up-to-date, but will also slow down the indexing process.
Note: High values can cause a too many open files exception. You can increase the maximum
number of open files allowed on a Linux host by increasing the nofile setting to greater than 65000.
maxMergeDocs
Use the maxMergeDocs to specify the largest size of legitimate segments for merging. While
merging segments, Lucene will ensure that no segment with more than maxMergeDocs is created.
For example, if you set maxMergeDocs to 1000, when you add the 10,000th document, instead of
merging multiple segments into a single segment of size 10,000, Lucene will create a 10th segment
of size 1000, and keep adding segments of size 1000 for every 1000 documents added. A larger
maxMergeDocs is better suited for batch indexing.

Non-final merges
In addition to Lucene internal merges, you can configure the system to run non-final merges that merge
segments under a specified size (determined by the nonFinalMaxMergeSize property) into a fresh, new
index at a regular interval (determined by the cleanMergeInterval property).

Final merges
Perform final merges to merge all existing Lucene index entries into a single large Lucene index entry
to maximize query performance. You can manually run a final merge on a collection or schedule final
merges to run at regular intervals. The final merge is very I/O intensive and may cause noticeable
performance drop in both indexing and query performance when running, so you should closely
monitor and carefully schedule the running of final merges and avoid them during performance-critical
hours. You can use the Audit Records for Final Merge report to view detailed final merge log data to
quickly identify performance issues associated with final merges.
320

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Performance and Disk Space

During a final merge, the system shrinks existing segments to empty by moving and consolidating
Lucene index entries from them into an empty segment. When an empty segment is not available, the
system does not shrink existing segments and waits for the next final merge to run, which consumes
more disk space. Under such circumstances, you can manually launch a final merge to accelerate
the merging process and free more disk space.
The final merge can require up to two times the size of the final index entries to move things around
during the interim process. For example, you can see disk space usage at 100G at one point but 300G
at another point when a final merge is in progress.

Setting final merge blackout periods


The final merge is very I/O intensive and may cause noticeable performance drop in both indexing
and query performance when running. To prevent scheduled final merges from running on an xPlore
instance during performance-critical periods such as business hours, you can set a final merge blackout
period.
Edit the xdb properties file under \deploy\dsearch.war\WEB-INF\classes of the instance, set the
finalMergingBlackout value in the following format:
xdb.lucene.finalMergingBlackout = StartHour-EndHour

For example, if you set the value to 8-20, the blackout period will be from 8 a.m. to 8 p.m., and
scheduled final merges will not start to run during this period every day.
Blackout periods do not stop an already running final merge process. A scheduled final merge started
before the blackout start hour will continue to run into or even past defined final merge blackout
periods without being affected. Also, manually started final merges ignore any blackout periods.

Starting and stopping a final merge


The final merge is very I/O intensive and may cause noticeable performance drop in both indexing and
query performance, so you should avoid running final merges during performance critical hours.
Before you start a final merge on a collection, we recommend that you stop other operations such as
consistency check, document migration, backup and restore on the collection; otherwise, the final
merge may significantly lower the performance of these operations.
To start a final merge manually:
1. Under Data Management in xPlore Administrator, go to the collection page and click the Start
Merging button.
2. A confirmation dialogue box appears warning you of the performance impact that the final merge
operation may bring. Click OK to start final merge on the collection.
However, clicking Start Merging may not start the final merge immediately. If there is already another
final merge running on the current xPlore instance, the final merge you manually started waits in a
queue and will not run until the final merge in progress is completed or stopped, since at any point in
time, only one final merge operation can run on an xPlore instance. Manually started final merges take
precedence over scheduled final merges in a queue and run in a FIFO (First In First Out) manner.
For example, if there is already a final merge running and you manually start a new final merge on
collection A, then another final merge on collection B, when the current final merge is completed, the
latest final merge you started on collection B will run first before the one on collection A.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

321

Performance and Disk Space

Unlike scheduled final merges, manually started final merges are not affected by any blackout period
settings.
If you find a final merge is in progress and is causing huge performance drops on your systemslowing
ingestions and queries, or preventing backupyou can stop it immediately.
When a collection is merging, a Merging icon is displayed next to the collection in the Data
Management view of the collection. To stop a final merge manually, click Stop Merging.
You can also start and stop a final merge using CLI commands. See Final merge CLIs, page 192.

Scheduling Final Merges


The final merge is very I/O intensive and may cause noticeable performance drop in other system
operations such as indexing, querying, data consistency check, and backup and store, so you should
schedule final merges around performance-critical hours. For example, you may want to run final
merges every night or on every weekend so that system performance is not impacted during business
hours.
There are two ways you can schedule final merges:
Instance-specific scheduler
By default, a collection uses the Lucene internal final merge interval setting to run final merges at
a specified interval. You can set the interval by specifying the xdb.lucene.finalMergingInterval
value in xdb.properties and it applies to all the collections in the xPlore instance that uses this
scheduling option.
Once set, the instance-specific scheduler takes effect immediately. It runs the initial final merge and
sets the current time as the starting point to schedule subsequent final merges based on the specified
interval. However, the starting point is reset and final merges rescheduled whenever the instance
the collection is bound to is restarted, which makes the final merge running schedules subject to
change and not very predictable or manageable.
For example, at 20:00 on the first day, you set final merges to run on a collection every 24 hours.
The first final merge runs immediately after the scheduling takes effect and the second final merge
runs at 20:00 the next day. On the third day, the xPlore instance the collection is bound to is restarted
at 9:00. This resets the starting point of the scheduler for the collection to 9:00 and reschedules
subsequent final merges to run at 9:00 every day. Do not use the instance-specific scheduler if you
want to maintain a precise final merge schedule.
Collection-specific scheduler
The collection-specific scheduler give you more flexibility and precise control over when your final
merges run. With this option, you can schedule a one-off final merge, final merges to run at fixed
internals, at a specific time on a daily or weekly basis, or define your schedule in an advanced
format. Once set, the collection-specific scheduler takes effect immediately and the settings apply to
the current collection only.
Note: Final merge scheduling options are not available for collection with state search_only or off_line.
To schedule final merges for a collection:
1. In xPlore Administrator, under Data Management, click the collection.
2. On the collection page, click Configuration.
3. In the Edit Collection page, choose a Final Merge Scheduler option.
322

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Indexing

Instance-specific scheduler
If you select this option, you can adjust the interval by specifying the
xdb.lucene.finalMergingInterval value in xdb.properties.
Collection-specific scheduler
Choose one of the following schedule formats and enter your values:
Fixed interval
Daily
Weekly
Advanced
If you define a schedule in advanced format that equates to one of the simpler formats listed
above, the simpler format will be selected instead after you save the settings. For example, if
you enter 6 in the Day of week field and save the settings, you will see Weekly option selected
with Sat checked when you review the settings.
4. Click Save. The scheduler is effective immediately.
Note: For both schedulers, if a scheduled time falls into a blackout period, the final merge will not start.

Indexing
Documentum index agent performance
Index agent settings
The parameters described in this section can affect index agent performance. Do not change these
values unless you are directed to change them by EMC technical support.
In migration mode, set the parameters in the indexagent.xml located in
index_agent_WAR/WEB-INF/classes/.
In normal mode, also set the corresponding parameters in the dm_ftindex_agent_config object.
In normal mode, index agent configuration is loaded from indexagent.xml and from the
dm_ftindex_agent_config object. If there is a conflict, the settings in the config object override the
settings in indexagent.xml.
exporter.thread_count (indexagent.xml) / exporter_thread_count (dm_ftindex_agent_config)
Number of threads that extract metadata into dftxml using DFC.
connectors.file_connector.batch_size (indexagent.xml) / connectors_batch_size
(dm_ftindex_agent_config)
Number of items picked up for indexing when the index agent queries the repository for queue items.
exporter.queue_size (indexagent.xml) / exporter_queue_threshold (dm_ftindex_agent_config)
Internal queue of objects submitted for indexing.
indexer.queue_size (indexagent.xml) / indexer_queue_threshold (dm_ftindex_agent_config)
Queue of objects submitted for indexing.
indexer.callback_queue_size (only in indexagent.xml, used for both migration and normal mode)
EMC Documentum xPlore Version 1.3 Administration and Development Guide

323

Indexing

Size of queue to hold requests sent to xPlore for indexing. When the queue reaches this size, the
index agent waits until the callback queue has reached 100% less the callback_queue_low_percent.

Measuring index agent performance


Verify index agent performance using the index agent UI details page. Find the details for Indexed
content KB/sec and Indexed documents/sec. All Averages measures the average time between index
agent startup and current run time. All Averages up to Last Activity measures the time between index
agent startup and last indexing activity.

Indexing performance
Various factors affect the rate of indexing. You can tune some indexing and xDB parameters and
adjust allowable document size.

Factors in indexing rate


The following factors affect indexing rate:
The complexity of documents
For example, a simple text document containing thousands of words can take longer to index
than a much larger Microsoft Word document full of pictures. MS Excel files take much longer
to index due to their complex cell structure.
The indexing server I/O subsystem capabilities
The number of CPS instances
For heavy ingestion loads or high availability requirements, add CPS instances to increase content
processing bandwidth.
The number of collections
Create multiple collections spread over multiple xPlore instances to scale xPlore. (Documents can
be indexed into specific target collections. For best search performance, route queries to specific
collections. See Routing a query to a specific collection, page 257
Recovery during heavy ingestion
If the system crashes during a period of heavy ingestion, transactional recovery could take a long
time as it replays the log. The recovery process is single-threaded. Avoid this bottleneck with
frequent incremental backups, which shorten the restore period. Alternatively, you can set up an
active/active high availability system so that failure in a single system does not disrupt business.

Creating temporary collections for ingestion


You can create a collection and ingest documents to that collection. After ingestion has completed,
move the collection to become a subcollection of existing collection. See Moving a collection,
page 162.
324

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Indexing

Tunable indexing properties


The number of threads, batch size, tracking DB cache size, thread wait time, and queue size at each
stage of indexing impacts ingestion performance. The biggest impact on ingestion rate is with
threadpool size and processing buffer size. You can configure CPS and indexing settings using
xPlore administrator. For a list of these properties, see Document processing and indexing service
configuration parameters, page 339.
To scale up for large ingestion requirements or for high availability, add more CPS instances.

Document size and performance


Several configuration properties affect the size of documents that are indexed and consequently the
ingestion performance.Maximum document and text size, page 98 describes these settings.

Tuning xDB properties for indexing performance


Most applications do not need to modify xDB properties. The bottleneck for indexing is usually the
process of writing index files to disk, which you can address by increasing I/O capabilities. Indexing
can be slowed by the merge process in which additions or modifications to the index are merged into
xDB. With the guidance of Documentum technical support, you can set merge scheduling or interrupt
a merge. See Tuning index merges, page 319.
You can tune the follow properties for indexing in xdb.properties, which is located in the directory
WEB-INF/classes of the primary instance. If these properties are not listed, you can add them.
ramBufferSizeMB: Size in megabytes of the RAM buffer for document additions, updates, and
deletions. For faster indexing, use as large a RAM buffer as possible for the host. Default: 3.
maxRamDirectorySize: Maximum RAM in bytes for in-memory Lucene index. Higher values use
more memory and support faster indexing. Default: 3000000.

Excluding duplicates on ingestion


You can configure ingestion to avoid duplicates. There may be a performance impact on heavy
ingestion. If you do not have a retry mechanism set up to resubmit timed out indexing requests, and
your environment does not have multiple index agents, turn off duplicate prevention. (Default: true)
To prevent indexing of the same document more than once:
1. Shut down all xPlore instances.
2. Edit indexserverconfig.xml. Add a property to the index-config element as follows:
<property name="index-duplicate-prevention" value="true"/>

EMC Documentum xPlore Version 1.3 Administration and Development Guide

325

Indexing

Throttling indexing
If your environment has periodic, frequent bursts of document updates that slow the system, you
can throttle ingestion based on document count, document size, or both. The throttle mechanism
is disabled by default.
Stop all xPlore instances and edit indexserverconfig.xml on the primary instance. Add the following
properties to the index-config element:
enable-throttle: Set to true to enable the throttle mechanism.
throttle-interval: Time in seconds to allow content up to throttle-threshold size.
throttle-threshold: Sets the total content size in KB that xPlore can process during the
throttle-interval. Content above this content size will be rejected.
throttle-document-count: Sets the number of documents to process during the throttle-interval.
To throttle by document size only, set throttle-document-count to a high value. To throttle by document
count only, set throttle-threshold to a high value.
Documents that exceed throttle-threshold or throttle-document-count will be processed after the
throttle-interval.

Search performance
Creating subcollections
You can ingest documents to a new collection and then move the collection to become a subcollection
for faster search performance. For moving subcollections, see Moving a collection, page 162.

Changing the security cache sizes


Monitor the query audit record to determine security performance. The value of
<TOTAL_INPUT_HITS_TO_FILTER> records how many hits a query had before security filtering.
The value of <HITS_FILTERED_OUT> shows how many hits were discarded because the user did
not have permissions for the results. The hits filtered out divided by the total number of hits is the
hit ratio. A low hit ratio indicates an underprivileged user, who often has slower query response
times than other users.
There are two caches that affect security performance: Groups that a user belongs to, and groups that
a user does not belong to. Cache sizes are configured in indexserverconfig.xml. For information
on viewing and updating this file, see Modifying indexserverconfig.xml, page 43. The audit
record reports how many times these caches were hit for a query (GROUP_IN_CACHE_HIT,
GROUP_OUT_CACHE_HIT). The record reports how many times the query added a group to the
cache (GROUP_IN_CACHE_FILL, GROUP_OUT_CACHE_FILL). For information on how to
change these configuration settings, see Configuring the security cache, page 54.
For underprivileged users, increase the not-in-groups cache size to reduce the number of times
this cache must be checked. For highly privileged users (members of many groups), increase the
groups-in-cache size to reduce the number of times this cache must be checked.
326

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Indexing

If you have many ACLs, increase the value of acl-cache-size (number of permission sets in the cache).

Increasing query batch size


In a Documentum client application based on DFC, you can set the query batch size. Edit dfc.properties
on the search client to increase the value of dfc.batch_hint_size. Default: 50. Suggested size: 350.
Troubleshooting slow queries
Measuring performance

About search performance


Factors in query performance
The following features of full-text search can affect search performance:
Single-box search in Webtop
The default operator for multiple terms is AND. The user can explicitly combine terms with OR
(the old Webtop default), but performance is much slower. A DFC customization of Webtop search
or a DFC filter can also change the default from AND to OR.
Flexible metadata search (FTDQL)
Searches on multiple object attributes can affect performance, especially when the first term is
unselective.
Leading or trailing wildcards
By default, xPlore does not match parts of words. For example, WHERE object_name LIKE foo%
matches foo bar but not football. Support for fragment matches (leading and trailing wildcards)
can be enabled, but this impacts performance. A more limited support for leading wildcards in
metadata search can also be enabled.
Security
Native xPlore security performs faster than security applied to results in the Content Server. The
latter option can be enabled, but this impacts performance.
Number of documents
Documents can be routed to specific collections based on age or other criteria. When queries are
routed to a collection, performance is much better. Scaling to more instances on the same or multiple
hosts, as well as use of 64-bit hosts, can also improve search performance.
Size of query result set
Consume results in a paged display for good performance. Webtop limits results to 350, with a
smaller page size (from 10 to 100). The first page of results loads while the remaining results
are fetched. CenterStage limits results to 150. Paging is especially important to limit result sets
for underprivileged users.
Number of collections
If queries are not run in parallel mode (across several collections at once), response time rises as the
number of collections rises. Queries can be targeted to specific collections to avoid this problem. If
you do not use targeted queries, try to limit the number of collections in your xPlore federation.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

327

Indexing

For information on targeted queries, see Routing a query to a specific collection, page 257. To set
parallel mode for DFC-based search applications, set the following property in dfc.properties to true:
dfc.search.xquery.option.parallel_execution.enable = true

Caches empty on system startup


At startup, the query and security caches have not been filled, so response times are slower. Make
sure that you have allocated sufficient memory for the file system buffer cache and good response
time from the I/O subsystem.
Response times slower during heavy ingestion
Slow queries during ingestion are usually an issue only during migration from FAST. If your
environment has large batch migrations once a month or quarterly, you can set the target collection
or domain to index-only during ingestion. Alternatively, you can schedule ingestion during an
off-peak time.

Measuring query performance


Make sure that search auditing is enabled. You can also turn on tracing information for query
execution. Select an instance, choose Tracing, and then choose Enable.
Examine the query load to see if the system is overloaded. Run the report Top N slowest queries .
Examine the Start time column to see whether slow queries occur at a certain period during the day
or certain days of the month.
Save the query execution plan to find out whether you need an additional index on a metadata element.
(For more information on the query plan, see Debugging queries, page 253.) Documentum clients can
save the plan with the following iAPI command:
apply,c,NULL,MODIFY_TRACE,SUBSYSTEM,S,fulltext,VALUE,S,ftengine

The query execution plan recorded is in dsearch.log, which is located in the logs subdirectory of the
JBoss deployment directory.

Tuning CPS and xDB for search


Tuning CPS for query performance
The following parameters for high-volume ingestion environments can improve
query response. Configure the following parameters in the CPS configuration file
PrimaryDsearch_local_configuration.xml, which is located in the CPS instance directory
xplore_home/dsearch/cps/cps_daemon. If these properties are not in the file, you can add them. These
settings apply to all CPS instances.
add_failure_documents_automatically: Specifies whether to add automatically documents that fail
processing to the failed document ID list stored in the file designated by the failure_document_id_file
parameter. Default: false. If you turn on this feature, make sure the failure_document_id_file is
exclusively used by the current CPS instance and not shared by other CPS instances.
connection_pool_size: Specifies how many connections CPS manager allocates to each daemon.
If CPU and memory allow, add to daemon_count, then increase connection_pool_size to use a
reasonable amount of memory.
328

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Indexing

cut_off_text: Set to true to cut off text of large documents that exceed max_text_threshold instead
of rejecting entire document. Default: false. Documents that are partially indexed are recorded
in cps_daemon.log: docxxxx is partially processed. The dftxml is annotated with the element
partiallyIndexed.
daemon_count: Specifies the number of daemons that handle normal indexing and query requests
(not a dedicated query daemon). Set from 1 to 8. Default: 1. For information about adding CPS
daemons, see Adding CPS daemons for ingestion or query processing, page 114.
daemon_restart_threshold: Specifies how many requests a CPS daemon can handle before it restarts.
Default: 1000.
daemon_restart_memory_threshold: Set a value in bytes for maximum CPS memory consumption.
After this limit, CPS will restart. A maximum of 8 GB is recommended. Default: 4000000000.
daemon_restart_consistently: Specifies whether CPS should restart regularly after it is idle for 5
minutes. Default: true.
dump_context_if_exception: Specifies whether to dump stack trace if exception occurs. Default:
true.
failure_document_id_file: The file that contains IDs of failed documents to
be skipped. You can edit this file. IDs of failed documents are added to it
automatically if add_failure_documents_automatically is set to true. Default:
xplore_home/cps/skip_failure_document.txt.
io_block_unit: Logical block unit of the read/write target device. Default: 4096.
io_chunk_size: Size for each read/write chunk. Default: 4096.
linguistic_processing_time_out: Interval in seconds after which a CPS hang in linguistic processing
forces a restart. Valid values: 60 to 360. Default: 360.
load_content_directly: For internal use only.
query_dedicated_daemon_count: The number of CPS daemons dedicated to query processing.
Other CPS daemons handle ingestion when there is a dedicated query daemon. Valid values: 0
to 3. Default: 1.
retry_failure_in_separate_daemon: Specifies whether to retry failed documents in a newly spawned
CPS daemon. Default: true. A retry daemon is not limited by the value of daemon_count.
skip_failure_documents: Specifies whether CPS should skip documents that fail processing instead
of retrying them, to reduce CPS crashes. Default: true. Failed documents are retried once unless
this property is set to true.
skip_failure_documents_upper_bound: Specifies the maximum number of failed documents that
CPS will record in the failure document. Valid values: integers. Default: -1 (no upper bound)
text_extraction_time_out: Interval in seconds after which a CPS hang in text extraction forces a
restart. Valid values: 60 to 300. Default: 300.
use_direct_io: Requires CPS to read and write staging files to devices directly. Default: false. If
most incoming files are local, use the default caching. If most files are remote, use direct IO.

Tuning xDB properties for search performance


Search performance can be slowed during index merges. For information on scheduling and
interrupting merges, see Tuning index merges, page 319.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

329

Indexing

You can also limit query results set size, which is 12000 results by default. This default value supports
facets. If your client application does not support facets, you can lower the result set size. Open
xdb.properties, which is located in the directory WEB-INF/classes of the primary instance. Set the
value of queryResultsWindowSize to a number smaller than 12000.

Creating a CPS daemon dedicated to search


Queries are processed in CPS linguistic analysis. During periods of high ingestion, queries can be
slow. You can configure CPS instances for search only. You can configure a processing daemon
within a CPS instance that processes only queries.
To configure a CPS instance for search only, see Configuring CPS dedicated to indexing or search,
page 96.
To configure a CPS daemon dedicated to search, edit the CPS configuration file
PrimaryDsearch_local_configuration.xml, which is located in the CPS instance directory
xplore_home/dsearch/cps/cps_daemon. Change the following properties, or add them if they are
not present:
daemon_count: Sets the number of daemons that process indexing requests. Set from 1 to 8.
Default: 1. If query_dedicated_daemon_count is greater than 0, other daemons process only
indexing requests.
query_dedicated_daemon_count: Sets the number of CPS daemons dedicated to query processing
(no ingestion). Valid values: 0 to 3. Default: 0. Set to 1 and increase daemon_count to 2 or more.
See also: Tuning CPS and xDB for search, page 328.

Improving search performance with time-based


collections
You can plan for time-based collections, so that only recent documents are indexed. If most of your
documents are not changed after a specific time period, you can migrate data to collections. Base your
migration on creation date, modification date, or a custom date attribute.
Route using a custom routing class, index agent configuration, or customized DFC query builder. You
can migrate a limited set of documents using time-based DQL in the index agent UI.
To determine whether a high percentage of your documents is not touched after a specific time period,
use two DQL queries to compare results:
1.

Use the following DQL query to determine the number of documents modified and accessed in
the past two years (change DQL to meet your requirements):
select count(+) from dm_sysobject where
datediff(year,r_creation_date,r_access_date)<2 and
datediff(year,r_creation_date,r_modify_date)<2

2.

Use the following DQL query to determine the number of documents in the repository:
select count(*) from dm_sysobject

330

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Indexing

3.

Divide the results of step 1 by the results of step 2. If the number is high, for example, 0.8, most
documents were modified and accessed in the past two years. (80%, in this example)

EMC Documentum xPlore Version 1.3 Administration and Development Guide

331

Appendix A
Index Agent, CPS, Indexing, and
Search Parameters
This appendix covers the following topics:

dm_ftengine_config

Index agent configuration parameters

Document processing and indexing service configuration parameters

Search service configuration parameters

API Reference

EMC Documentum xPlore Version 1.3 Administration and Development Guide

333

dm_ftengine_config

dm_ftengine_config
Attributes
Following are attributes specific to dm_ftengine_config. Some attribute values are set by the index
agent when it creates the dm_ftengine_object.
Note: Not all attributes values are set at object creation. If you do not set values, the default
values are used. For instructions on changing the attribute values, see Query plugin configuration
(dm_ftengine_config), page 222.
For iAPI syntax to change attributes, see Query plugin configuration (dm_ftengine_config), page 222.
Attribute

Description

acl_check_db:

If ftsearch_security_mode is set to xPlore and this


value is set to true, xPlore filters the results based on
its ACL information, and then Content Server filters
again based on current database ACL information
from the filtered results set. You must disable XQuery
generation. See Changing search results security,
page 51.

acl_domain:

Owner of dm_fulltext_admin_acl (user specified at


installation)

acl_name

dm_fulltext_admin_acl

default_fuzzy_search_similarity

Controls the degree of similarity between a search


term and an index term. Default; 0.5. See Configuring
fuzzy search, page 215.

dsearch_config_host

Specifies the fully qualified host name or IP address


of the xPlore host that the index agent connects to.

dsearch_config_port

Specifies the HTTP or HTTPS port that the index


agent connects to.

dsearch_domain

Name of repository

dsearch_override_locale

Overrides the locale of the query with the specified


locale.

dsearch_qrserver_host

Specifies the fully qualified host name or IP address of


the xPlore host that the Content Server query plugin
connects to.

dsearch_qrserver_port

Specifies the HTTP or HTTPS port of the xPlore host


that the Content Server query plugin connects to.

dsearch_qrserver_protocol

Sets HTTPS or HTTP as connection protocol.

dsearch_qrserver_target

xPlore index server servlet: Partial URL combined


with host, protocol and port to create full URL.

dsearch_qrygen_mode

For internal use only.

dsearch_result_batch_size

Sets the number of results fetched from xPlore in each


batch. Default: 200.

fast_wildcard_compatible (replaces fds_contain_fragment)

Sets fragment search option. Default: false. See


Configuring wildcards and fragment search, page 218

334

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Index Agent, CPS, Indexing, and Search Parameters

Attribute

Description

filter_config_id

Most recent object ID of dm_filter_config type

folder_cache_limit

Specifies the maximum number of maximum folder


IDs included in the index probe. Default: 2000.
If folder descend condition evaluates to less than
folder_cache_limit value, then folder ids are pushed
into index probe. Otherwise, the folder constraint is
evaluated separately for each result. Raise the value
if folder descend queries are slow or timing out.
Lower the value if folder descend queries cause out of
memory to too many clauses errors. See Query plugin
configuration (dm_ftengine_config), page 222.

ft_collection_id

Repeating attribute that references a collection object


of the type dm_fulltext_collection. Reserved for use
by Content Server client applications.

ftsearch_security_mode

0: Content Server. DFC search service will not use


IDfXQuery and instead will generate DQL.
1: xPlore (default)

fuzzy_search_enable

Specifies whether fuzzy search is applied. Default:


false. See Configuring fuzzy search, page 215
.

group_name

dm_fulltext_admin

object_name

Dsearch Fulltext Engine Configuration


FAST

query_plugin_mapping_file

.
Path on Content Server host to mapping file. This file
maps attribute conditions to the XQuery subpaths.

query_timeout

Sets the interval in milliseconds for a query to time


out. Default: 60000.

security_mode

Sets summary security mode. See Configuring


summary security, page 214
.

thesaurus_search_enable

Set to true to enable thesaurus search.

use_thesaurus_on_phrase

Set to true to match entire phrases in the thesaurus.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

335

Index agent configuration parameters

Index agent configuration parameters


General index agent parameters
The index agent configuration file indexagent.xml is located in
xplore_home/jboss5.1.0/server/DctmServer_Indexagent/deploy/IndexAgent.war/WEB-INF/classes.
Most of these parameters are set at optimal settings for all environments.

Generic index agent parameters


Table A.43

Indexagent configuration parameters in generic_indexer.parameter_list

Parameter name

Description

acl_exclusion_list

Add this parameter to exclude specific ACL


attributes from indexing. Contains an
acl_attributes_exclude_list element. Check
with technical support before you add or modify this
list.

acl_attributes_exclude_list

Specifies a space delimited list of ACL attributes that


will not be indexed.

collection

Specifies the name of a collection to which all


documents will be routed. Default: The domain
default collection.

dsearch_qrserver_host

Fully qualified host name or IP address of host for


xPlore server

dsearch_qrserver_port

Port used by xPlore server. Default is 9200

dsearch_domain

Repository name

group_exclusion_list

Add this parameter to exclude specific


group attributes from indexing. Contains a
group_attributes_exclude_list element. Check with
technical support before you add or modify this list.

group_attributes_exclude_list

Specifies a space delimited list of group attributes that


will not be indexed.

index_type_mode

Object types to be indexed. Values: both (default)


| aclgroup | sysobject. If you use two index agents,
each can index either ACLs or sysobjects.

max_requests_in_batch

Maximum number of objects to be indexed in a batch.


Default: 5

336

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Index Agent, CPS, Indexing, and Search Parameters

Parameter name

Description

max_batch_wait_msec.

Maximum wait time in milliseconds for a batch


to reach the max_requests_in_batch size. When
this timeout is reached, the batch is submitted to
xPlore. The default setting (1) is for high indexing
throughput. If your index agent has a low ingestion
rate of documents and you want to have low
latency, reduce both max_requests_in_batch and
max_submission_timeout_sec.

max_pending_requests

Maximum number of indexing requests in the queue.


Default: 10000

max_tries

Maximum number of tries to add the request to the


internal queue when the queue is full. Default: 2

group_attributes_exclude_list

Attributes of a group to exclude from indexing

General index agent runtime parameters


Requests for indexing pass from the exporter queue to the indexer queue to the callback queue.
Table A.44

Index agent runtime configuration in indexer element

Parameter

Description

queue_size

Size of queue for indexing requests. When the queue


reaches this limit, the index agent will wait for the
queue to be lower than queue_size less (queue_size *
queue_low_percent). For example, if the queue_size
is 500 and queue_low_percent is 10%, then the agent
will resume indexing when the queue is lower than
500 - (500 * .1) = 450.

queue_low_percent

Percent of queue size at which the index agent will


resume processing the queue.

callback_queue_size

Size of queue to hold requests sent to xPlore for


indexing. When the queue reaches this size, the index
agent will wait until the callback queue has reached
100% less the callback_queue_low_percent.

callback_queue_low_percent

Percent of callback queue size at which the index


agent will resume sending requests to xPlore.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

337

Other index agent parameters

Parameter

Description

wait_time

Time in seconds that the indexing thread waits before


reading the next item in the indexing queue.

thread_count

Number of threads to be used by index agent.

shutdown_timeout

Time the index agent should wait for thread


termination and cleanup before shutdown.

runaway_timeout

Timeout for runaway requests..

content_clean_interval

Timeout to clean local content area. Set to the same


value as runaway timeout. After the local content area
is cleaned, documents may still remain in the queue
for xPlore processing.

partition_config

You can add this element and its contents to map


partitions to specific collections. See Mapping Server
storage areas to collections, page 71.

Other index agent parameters


Table A.45

Other index agent parameters

Parameter

Description

contentSizeLimit

In exporter.parameter_list. Sets the maximum size


for documents to be sent for indexing. The value is
in bytes. Default: 20MB.

338

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Index Agent, CPS, Indexing, and Search Parameters

Document processing and indexing service


configuration parameters
You can configure the following settings for the CPS and indexing services in xPlore administrator. For
CPS and indexing processing settings, choose Indexing Service in the tree and click Configuration.
The default values have been optimized for most environments.

Document processing (CPS) global parameters


You can configure the following settings for the CPS and indexing services in xPlore administrator. For
CPS and indexing processing settings, choose Indexing Service in the tree and click Configuration.
The default values have been optimized for most environments.
The per-instance CPS settings relate to the instance and do not overlap with the CPS settings in
Indexing Service configuration.
CPS-requests-max-size: Maximum size of CPS queue. Default: 1000.
CPS-requests-batch-size: Maximum number of CPS requests in a batch. Default: 5.
CPS-threadpool-core-size: Minimum number of threads used to process a single incoming request.
Valid values: 1 - 100. Default: 10.
Note: If you decrease the threadpool size, ingestion rate can slow down.
CPS-threadpool-max-size: Maximum number of threads used to process a single incoming request.
Valid values: 1 - 100. Default: 100.
CPS-thread-wait-time: Time in milliseconds to accumulate requests in a batch. Range:
1-2147483647. Default: 1000.
CPS-executor-queue-size: Maximum size of CPS queue before spawning a new worker thread.
Default: 10.
CPS-executor-retry-wait-time: Wait time in milliseconds after queue and worker thread maximums
have been reached. Range: 1-2147483647. Default: 1000.
commit-option: Default: -1.
rebuild-index-batch-size: Number of documents to add to rebuild of index in a batch. Default: 1000.
rebuild-index-embed-content-limit: Maximum embedded content for index rebuilding. Larger
content is passed in a file, not embedded. Default: 2048.

Indexing global parameters


index-requests-max-size: Maximum size of internal index queue. Default: 1000.
index-requests-batch-size: Maximum number of index requests in a batch. Default: 10.
index-threadpool-core-size: Minimum number of threads in the thread pool to service requests.
Valid values: 1 - 100. Default: 10.
index-threadpool-max-size: Maximum number of threads allowed in the thread pool to service
requests. Valid values: 1 - 100. Valid values: 1 - 100. Default: 100.
index-thread-wait-time: Maximum wait time in milliseconds to accumulate requests in a batch.
Range: 1-2147483647. Default: 1000.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

339

CPS instance configuration parameters

index-executor-queue-size: Maximum size of index queue before spawning a new worker thread.
Default: 10.
index-executor-retry-wait-time: Wait time in milliseconds after index queue and worker thread
maximums have been reached. Default: 1000.
status-requests-batch-size: Maximum number of status update requests in a batch. Default: 1000.
status-thread-wait-time: Maximum wait time in milliseconds to accumulate requests in a batch.
Default: 1000.
index-check-duplicate-at-ingestion: Set to true to check for duplicate documents. May slow
ingestion. Default: true.
enable-subcollection-ftindex: Set to true to create a multi-path index to search on specific
subcollections. Ingestion is slower, especially when you have multiple layers of subcollections. If
false, subcollection indexes are not rebuilt when you rebuild a collection index. Default: false.
rebuild-index-batch-size: Sets the number of documents to be reindexed. Default: 1000.
rebuild-index-embed-content-limit: Sets the maximum size of embedded content for language
detection in index rebuild. Larger content is streamed. Default: 2048.

CPS instance configuration parameters


You can configure the following CPS settings for each instance in xPlore administrator. The
values are recorded in the file PrimaryDsearch_local_configuration.xml. This file is located in
xplore_home/dsearch/cps/cps_daemon. The default values have been optimized for most environments.
Connection pool size: Maximum number of concurrent connections. Valid values: 1-100. Default:
4.
Increasing the number of connections consumes more memory. Decreasing can slow ingestion.
Port number: Listener port for CPS daemon, used by the CPS manager. Default: 9322.
This value is set during xPlore configuration.
Daemon path: Specifies the path to the installed CPS daemon (read-only).
This value is set during xPlore configuration.
Keep intermediate temp file: Keep content in a temporary CPS folder for debugging.
Enabling temp file has a large impact on performance. Disable (default) to remove temporary files
after the specified time in seconds. Time range in seconds: 1-604800 (1 week).
Restart threshold: Check After processed... and specify the number of requests after which
to restart the CPS daemon.
Disable if you do not want the daemon restarted. Decreasing the number can affect performance.
Heartbeat: Interval in seconds between the CPS manager and daemon.
Range: 1-600. Default: 60.
Embedded return: Check Yes (default) to return embedded results to the buffer. Check No return
results to a file, and specify the file path for export.
Embedded return increases communication time and, consequently, impacts ingestion.
Export file path: Valid URI at which to store CPS processing results, for example, file:///c:/.
340

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Index Agent, CPS, Indexing, and Search Parameters

If the results are larger than Result buffer threshold, they are saved in this path. This setting
does not apply to remote CPS instances, because the processing results are always embedded in
the return to xPlore.
Result buffer size threshold: Number of bytes at which the result buffer returns results to file.
Valid values: 8 - 16 MB. Default: 1 MB (1048576 bytes). Larger value can accelerate process but
can cause more instability.
Processing buffer size threshold: Specifies the number of bytes of the internal memory chunk used
to process small documents.
If this threshold is exceeded, a temporary file is created for processing. Valid values: 100 KB-10 MB.
Default: 2 MB (2097152 bytes). Increase the value to speed processing. Consumes more memory.
Load file to memory: Check to load the submitted file into memory for processing. Uncheck to pass
the file to a plug-in analyzer for processing (for example, the Documentum index agent).
Batch in batch count: Average number of batch requests in a batch request.
Range: 1-100. Default: 5. CPS assigns the number of Connection pool threads for each
batch_in_batch count. For example, defaults of batch_in_batch of 5 and connection_pool_size of 5
result in 25 threads.
Thread pool size: Number of threads used to process a single incoming request such as text
extraction and linguistic processing.
Range: 1-100. Default: 10). Larger size can speed ingestion when CPU is not under heavy load.
Causes instability at heavy CPU load.
System language: ISO 639-1 language code that specifies the language for CPS.
Max text threshold: Sets the size limit, in bytes, for the text within documents. Range: 5MB - 2GB
expressed in bytes. Default: 10485760 (10 MB). Maximum setting: 2 GB. Larger values can slow
ingestion rate and cause more instability. Above this size, only the document metadata is tokenized.
Includes expanded attachments. For example, if an email has a zip attachment, the zip file is
expanded to evaluate document size. If you increase this threshold, ingestion performance can
degrade under heavy load.
Illegal char file: Specifies the URI of a file that defines illegal characters.
To create a token separator, xPlore replaces illegal characters with white space. This list is
configurable.
Request time out: Number of seconds before a single request times out.
Range: 60-3600. Default: 600.
Daemon standalone: Check to stop daemon if no manager connects to it. Default: unchecked.
IP version: Internet Protocol version of the host machine. Values: IPv4 or IPv6. Dual stack is not
supported.
Use express queue: This queue processes admin requests and query requests. Queries are processed
for language identification, lemmatization, and tokenization. The express queue has priority over the
regular queue. Set the maximum number of requests in the queue. Default: 128.
The regular queue processes indexing requests. Set the maximum number of requests in the queue.
Default: 1024.
When the token count is zero and the extracted text is larger than the configured threshold, a
warning is logged
EMC Documentum xPlore Version 1.3 Administration and Development Guide

341

CPS instance configuration parameters

You can configure the following additional parameters in the CPS configuration file
PrimaryDsearch_local_configuration.xml, which is located in the CPS instance directory
xplore_home/dsearch/cps/cps_daemon. If these properties are not in the file, you can add them. These
settings apply to all CPS instances.
detect_data_len: The number of bytes used for language identification. The bytes are analyzed
from the beginning of the file. A larger number slows the ingestion process. A smaller number
increases the risk of language misidentification. Default: 65536.
max_batch_size: Limit for the number of requests in a batch. Valid values: 2 - 65535 (default:
65535).
Note: The index agent also has batch size parameters.
max_data_per_process: The upper limit in bytes for a batch of documents in CPS processing.
Default: 30 MB. Maximum setting: 2 GB.
normalize_form: Set to true to remove accents in the index, which allows search for the same
word without the accent.
slim_buffer_size_threshold: Sets memory buffer for CPS temporary files. Increase to 16384 or
larger for CenterStage or other client applications that have a high volume of metadata.
temp_directory: Directory for CPS temporary files. Default:
xplore_home/dsearch/cps/cps_daemon/temp.
temp_file_folder: Directory for temporary format and language identification. Default:
xplore_home/dsearch/cps/cps_daemon/temp.
daemon_restart_memory_threshold: Maximum memory consumption at which CPS is restarted.
use_direct_io: Requires CPS to read and write to devices directly.
io_block_unit: Logical block unit of the read/write target device.
io_chunk_size: Size for each read/write chunk.
cut_off_text: Set to true to cut off text of large documents that exceed max_text_threshold instead of
rejecting entire document.

342

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Index Agent, CPS, Indexing, and Search Parameters

Search service configuration parameters


You can configure the following settings for the search service in xPlore administrator. The default
values have been optimized for most environments.
query-default-locale: Default locale for queries. See Basistech documentation for identified
language codes. See your release notes for supported languages in this release. Default: en (English).
query-default-result-batch-size: Default size of result batches that are sent to the client. Default:
200. In a Documentum environment, dfc.batch_hint_size in dfc.properties overrides this setting.
Negative values default to a single batch.
query-result-cache-size: Default size of results buffer. When this limit is reached, no more results
are fetched from xDB until the client asks for more results. Default: 200.
query-result-spool-location: Path to location at which to spool results. Default:
xplore_home/dsearch/spool
query-default-timeout: Interval in milliseconds for a query to time out. Default: 60000 (one
minute). Negative values default to no timeout (not recommended). This setting is overridden by
the query_timeout parameter in dm_ftengine_config and by the Search Service and IDfXQuery
TIMEOUT APIs.
query-threadpool-core-size: Minimum number of threads used to process incoming requests.
Threads are allocated at startup, and idle threads are removed down to this minimum number. Valid
values: 1 - 100. Default: 10.Note: If you decrease the threadpool size, search performance can
decrease.
query-threadpool-max-size: Maximum number of threads used to process incoming requests. After
this limit is reached, service is denied to additional requests. Valid values: 1 - 100. Default: 100.
query-threadpool-queue-size: Maximum number in threadpool queue before spawning a new
worker thread. Default: 0.
query-threadpool-keepalive-time: Interval after which idle threads are terminated. Default: 600000.
query-threadpool-keep-alive-time-unit: Unit of time for query-thread-pool-keep-alive-time.
Default: milliseconds.
query-executor-retry-interval: Wait time in milliseconds after search queue and worker thread
maximums have been reached. Default: 100.
query-executor-retry-limit: Number of times to retry query. Default: 3.
query-thread-sync-interval: Used for xPlore internal synchronization. Interval after which results
fetching is suspended when the result cache is full. For a value of 0, the thread waits indefinitely
until space is available in the cache (freed up when the client application retrieves results). Default:
100 units.
query-thread-max-idle-interval: Query thread is freed up for reuse after this interval, because the
client application has not retrieved the result. (Threads are freed immediately after a result is
retrieved.) Default: 3600000.
query-summary-default-highlighter: Class that determines summary and highlighting. Default:
com.emc.documentum.core.fulltext.indexserver.services.summary.DefaultSummary. For query
summary configuration, see Configuring query summaries, page 212.
query-summary-display-length: Number of characters to return as a dynamic summary. Default: 64.
EMC Documentum xPlore Version 1.3 Administration and Development Guide

343

Search service configuration parameters

query-summary-highlight-begin-tag: HTML tag to insert at beginning of summary. Default: empty


string.
query-summary-highlight-end-tag: HTML tag to insert at end of summary. Default: empty string.
query-enable-dynamic-summary: If context is not important, set to false to return as a summary the
first n chars defined by the query-summary-display-length configuration parameter. For summaries
evaluated in context, set to true (default).
query-index-covering-values: Supports Documentum DQL evaluation. Do not change unless
tech support directs you to.
query-facet-max-result-size: Documentum only. Sets the maximum number of results used to
compute facet values. For example, if query-facet-max-result-size=12, only 12 results for all facets
in a query are returned. If a query has many facets, the number of results per facet is reduced
accordingly. Default: 10000.

344

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Index Agent, CPS, Indexing, and Search Parameters

API Reference
CPS APIs
Content processing service APIs are available in the interface IFtAdminCPS in the package
com.emc.documentum.core.fulltext.client.admin.api.interfaces. This package is in the SDK jar file
dsearchadmin-api.jar.
To add a CPS instance using the API addCPS(String instanceName, URL url, String usage), the
following values are valid for usage: all, index, or search. If the instance is used for CPS alone, use
index. For example:
addCPS("primary","
http://1.2.3.4/services","
index")

CPS configuration keys for setCPSConfig()


CPS-requests-max-size: Maximum size of CPS queue
CPS-requests-batch-size: Maximum number of CPS requests in a batch
CPS-thread-wait-time: Maximum wait time in milliseconds to accumulate requests in a batch
CPS-threadpool-core-size: Minimum number of threads used to process a single incoming request
such as text extraction and linguistic processing. Valid values: 1 - 100.
CPS-threadpool-max-size: Number of threads used to process a single incoming request such as
text extraction and linguistic processing. Valid values: 1 - 100.
CPS-executor-queue-size: Maximum size of CPS executor queue before spawning a new worker
thread
CPS-executor-retry-wait-time: Wait time in milliseconds after executor queue and worker thread
maximums have been reached

Indexing engine APIs (DFC)


DFC exposes APIs to get information about the index engine that a repository uses. Get the
IDfFtConfig interface from a DFC session. Use IDfFtConfig to get the following information:
getCollectionPath. Returns the complete path of the root collection, useful for constructing some
queries.
getEngine(). Returns DSEARCH, FAST, Lucene, or unknown
getEngineConfig(). The following example assumes that a connection to the repository has
been established and saved in the class variable m_session (instance of IDfSession.import
com.documentum.fc.client.fulltext.IDfFtConfig;
public void getEngineConfig() throws Exception
{
IDfFtConfig cfg = m_session.getFtConfig();
String coll = cfg.getCollectionPath();
System.out.println("Collection path: " + coll);
String eng = cfg.getEngine();
System.out.println("Full-text engine: " + eng);
}
EMC Documentum xPlore Version 1.3 Administration and Development Guide

345

Search APIs

isCapabilitySupported(String capability). Supported inputs are scope_search,


security_eval_in_fulltext, xquery, relevance_ranking, hit_count, and search_topic. Returns 1 for
supported, 0 for unsupported, and -1 when it cannot be determined. Supported returns are as follows:
scope_search: XML element data can be searched, like the <IN> operator in previous versions of
Content Server. xPlore and FAST support this feature.
security_eval_in_fulltext: Security is evaluated in the full-text engine before results are returned
to Content Server, resulting in faster query results. This feature is available only in xPlore.
xquery: xPlore supports XQuery syntax.
relevance_ranking: The full-text engine scores results using configurable criteria. xPlore and
FAST support this feature.
hit_count: The full-text engine returns the total number of hits before returning results. FAST
supports this feature; xPlore does not support it.Note: The count that is returned for DQL queries
does not reflect the application of security, which reduces the actual count returned for the query.
search_topic: The full-text engine indexes all XML elements and their attributes. FAST supports
zone searching (search topic) for backward compatibility. The xPlore server returns false.
For information on creating and retrieving facets, see the Facets chapter.

Search APIs
Search service APIs are available in the following packages of the SDK jar file dsearchadmin-api.jar. :
IFtAdminSearch in the package com.emc.documentum.core.fulltext.client.admin.api.interface.
IFtSearchSession in com.emc.documentum.core.fulltext.client.search
IFtQueryOptions in com.emc.documentum.core.fulltext.common.search.

Auditing APIs
Auditing APIs are available in the interface IFtAdminAudit in the package
com.emc.documentum.core.fulltext.client.admin.api.interfaces. This package is in the SDK jar file
dsearchadmin-api.jar.

xDB data management APIs


The data management APIs are available in the interface IFtAdminDataManagement in the package
com.emc.documentum.core.fulltext.client.admin.api.interfaces. This package is in the SDK jar file
dsearchadmin-api.jar.

346

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Appendix B
Documentum DTDs

This appendix covers the following topics:

Extensible Documentum DTD

EMC Documentum xPlore Version 1.3 Administration and Development Guide

347

Extensible Documentum DTD

Extensible Documentum DTD


Viewing the dftxml representation of a document
Documentum repository content is stored in XML format. Customer-defined elements and attributes
can be added to the DTD as children of dmftcustom. Each element specifies an attribute of the object
type. The object type is the element in the path dmftdoc/dmftmetadata/type_name, for example,
dmftdoc/dmftmetadata/dm_document.
To view the dftxml representation that is generated by the index agent, add
the following element as a child of the exporter element in indexagent.xml in
xplore_home/jboss5.1.0/server/DctmServer_Indexagent/deploy/IndexAgent.war/WEB-INF/classes.
<keep_dftxml>true</keep_dftxml>

To view the dftxml representation of a document that has been indexed, open xPlore administrator
and click the document in the collection view.
To find the path of a specific attribute in dftxml, use a Documentum client to look up the object ID of a
custom object. Using xPlore administrator, open the target collection and paste the object ID into the
Filter word box. Click the resulting document to see the dftxml representation.

DTD
This DTD is subject to change. Following are the top-level elements under dmftdoc.
Table B.46

dftxml top-level elements

Element

Description

dmftkey

Contains Documentum object ID (r_object_id)

dmftmetadata

Contains elements for all indexable attributes from


the standard Documentum object model, including
custom object types. Each attribute is modeled as
an element and value. Repeating attributes repeat
the element name and contain a unique value. Some
metadata, such as r_object_id, are repeated in other
elements as noted.

dmftvstamp

Contains the internal version stamp (i_vstamp)


attribute.

dmftsecurity

Contains security attributes from the object model


plus computed attributes: acl_name, acl_domain, and
ispublic.

dmftinternal

Contains attributes used internally for query


processing.

dmftversions

Contains version labels and iscurrent for sysobjects.

dmftfolders

Contains the folder ID and folder parents.

dmftcontents

Contains content-related attributes and one or more


pointers to content files. The actual content can be
stored within the child element dmftcontent as a
CDATA section.

348

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Documentum DTDs

Element

Description

dmftcustom

Contains searchable information supplied by custom


applications. Requires a TBO. See Injecting data and
supporting joins, page 80.

dmftsearchinternals

Contains tokens used by static and dynamic


summaries.

Example dftxml of a custom object type


<?xml version="1.0"?>
<dmftdoc dmftkey="090a0d6880008848" dss_tokens=":dftxml:1">
<dmftkey>090a0d6880008848</dmftkey>
<dmftmetadata>
<dm_sysobject>
<r_object_id dmfttype="dmid">090a0d6880008848</r_object_id>
<object_name dmfttype="dmstring">mylog.txt</object_name>
<r_object_type dmfttype="dmstring">techpubs</r_object_type>
<r_creation_date dmfttype="dmdate">2010-04-09T21:40:47</r_creation_date>
<r_modify_date dmfttype="dmdate">2010-04-09T21:40:47</r_modify_date>
<r_modifier dmfttype="dmstring">Administrator</r_modifier>
<r_access_date dmfttype="dmdate"/>
<a_is_hidden dmfttype="dmbool">false</a_is_hidden>
<i_is_deleted dmfttype="dmbool">false</i_is_deleted>
<a_retention_date dmfttype="dmdate"/>
<a_archive dmfttype="dmbool">false</a_archive>
<a_link_resolved dmfttype="dmbool">false</a_link_resolved>
<i_reference_cnt dmfttype="dmint">1</i_reference_cnt>
<i_has_folder dmfttype="dmbool">true</i_has_folder>
<i_folder_id dmfttype="dmid">0c0a0d6880000105</i_folder_id>
<r_link_cnt dmfttype="dmint">0</r_link_cnt>
<r_link_high_cnt dmfttype="dmint">0</r_link_high_cnt>
<r_assembled_from_id dmfttype="dmid">0000000000000000</r_assembled_from_id>
<r_frzn_assembly_cnt dmfttype="dmint">0</r_frzn_assembly_cnt>
<r_has_frzn_assembly dmfttype="dmbool">false</r_has_frzn_assembly>
<r_is_virtual_doc dmfttype="dmint">0</r_is_virtual_doc>
<i_contents_id dmfttype="dmid">060a0d688000ec61</i_contents_id>
<a_content_type dmfttype="dmstring">crtext</a_content_type>
<r_page_cnt dmfttype="dmint">1</r_page_cnt>
<r_content_size dmfttype="dmint">130524</r_content_size>
<a_full_text dmfttype="dmbool">true</a_full_text>
<a_storage_type dmfttype="dmstring">filestore_01</a_storage_type>
<i_cabinet_id dmfttype="dmid">0c0a0d6880000105</i_cabinet_id>
<owner_name dmfttype="dmstring">Administrator</owner_name>
<owner_permit dmfttype="dmint">7</owner_permit>
<group_name dmfttype="dmstring">docu</group_name>
<group_permit dmfttype="dmint">5</group_permit>
<world_permit dmfttype="dmint">3</world_permit>
<i_antecedent_id dmfttype="dmid">0000000000000000</i_antecedent_id>
<i_chronicle_id dmfttype="dmid">090a0d6880008848</i_chronicle_id>
<i_latest_flag dmfttype="dmbool">true</i_latest_flag>
<r_lock_date dmfttype="dmdate"/>
<r_version_label dmfttype="dmstring">1.0</r_version_label>
<r_version_label dmfttype="dmstring">CURRENT</r_version_label>
EMC Documentum xPlore Version 1.3 Administration and Development Guide

349

Example dftxml of a custom object type

<i_branch_cnt dmfttype="dmint">0</i_branch_cnt>
<i_direct_dsc dmfttype="dmbool">false</i_direct_dsc>
<r_immutable_flag dmfttype="dmbool">false</r_immutable_flag>
<r_frozen_flag dmfttype="dmbool">false</r_frozen_flag>
<r_has_events dmfttype="dmbool">false</r_has_events>
<acl_domain dmfttype="dmstring">Administrator</acl_domain>
<acl_name dmfttype="dmstring">dm_450a0d6880000101</acl_name>
<i_is_reference dmfttype="dmbool">false</i_is_reference>
<r_creator_name dmfttype="dmstring">Administrator</r_creator_name>
<r_is_public dmfttype="dmbool">true</r_is_public>
<r_policy_id dmfttype="dmid">0000000000000000</r_policy_id>
<r_resume_state dmfttype="dmint">0</r_resume_state>
<r_current_state dmfttype="dmint">0</r_current_state>
<r_alias_set_id dmfttype="dmid">0000000000000000</r_alias_set_id>
<a_is_template dmfttype="dmbool">false</a_is_template>
<r_full_content_size dmfttype="dmdouble">130524</r_full_content_size>
<a_is_signed dmfttype="dmbool">false</a_is_signed>
<a_last_review_date dmfttype="dmdate"/>
<i_retain_until dmfttype="dmdate"/>
<i_partition dmfttype="dmint">0</i_partition>
<i_is_replica dmfttype="dmbool">false</i_is_replica>
<i_vstamp dmfttype="dmint">0</i_vstamp>
<webpublish dmfttype="dmbool">false</webpublish>
</dm_sysobject>
</dmftmetadata>
<dmftvstamp>
<i_vstamp dmfttype="dmint">0</i_vstamp>
</dmftvstamp>
<dmftsecurity>
<acl_name dmfttype="dmstring">dm_450a0d6880000101</acl_name>
<acl_domain dmfttype="dmstring">Administrator</acl_domain>
<ispublic dmfttype="dmbool">true</ispublic>
</dmftsecurity>
<dmftinternal>
<docbase_id dmfttype="dmstring">658792</docbase_id>
<server_config_name dmfttype="dmstring">DSS_LH1</server_config_name>
<contentid dmfttype="dmid">060a0d688000ec61</contentid>
<r_object_id dmfttype="dmid">090a0d6880008848</r_object_id>
<r_object_type dmfttype="dmstring">techpubs</r_object_type>
<i_all_types dmfttype="dmid">030a0d68800001d7</i_all_types>
<i_all_types dmfttype="dmid">030a0d6880000129</i_all_types>
<i_all_types dmfttype="dmid">030a0d6880000105</i_all_types>
<i_dftxml_schema_version dmfttype="dmstring">5.3</i_dftxml_schema_version>
</dmftinternal>
<dmftversions>
<r_version_label dmfttype="dmstring">1.0</r_version_label>
<r_version_label dmfttype="dmstring">CURRENT</r_version_label>
<iscurrent dmfttype="dmbool">true</iscurrent>
</dmftversions>
<dmftfolders>
<i_folder_id dmfttype="dmid">0c0a0d6880000105</i_folder_id>
</dmftfolders>
<dmftcontents>
<dmftcontent>

350

EMC Documentum xPlore Version 1.3 Administration and Development Guide

Documentum DTDs

<dmftcontentattrs>
<r_object_id dmfttype="dmid">060a0d688000ec61</r_object_id>
<page dmfttype="dmint">0</page>
<i_full_format dmfttype="dmstring">crtext</i_full_format>
</dmftcontentattrs>
<dmftcontentref content-type="text/plain" islocalcopy="true" lang="en"
encoding="US-ASCII" summary_tokens="dmftsummarytokens_0">
<![CDATA[...]]>
</dmftcontentref>
</dmftcontent>
</dmftcontents>
<dmftdsearchinternals dss_tokens="excluded">
<dmftstaticsummarytext dss_tokens="excluded"><![CDATA[mylog.txt ]]>
</dmftstaticsummarytext>
<dmftsummarytokens_0 dss_tokens="excluded"><![CDATA[1Tkns ...]]>
</dmftsummarytokens_0></dmftdsearchinternals></dmftdoc>

Note: The attribute islocalcopy indicates whether the content was indexed. If true, only the metadata
was indexed, and no copy of the content exists in the index.

EMC Documentum xPlore Version 1.3 Administration and Development Guide

351

Appendix C
XQuery and VQL Reference

This appendix covers the following topics:

Tracking and status XQueries

VQL and XQuery Syntax Equivalents

EMC Documentum xPlore Version 1.3 Administration and Development Guide

353

Tracking and status XQueries

Tracking and status XQueries


You can issue the following XQuery expressions against the tracking database for each domain. Many of
these expressions are available in xPlore administrator or as audit reports. These XQuery expressions can be
submitted in the xDB console.

Object count from tracking DB


Get object count in a collection
count(//trackinginfo/document[collection-name="<Collection_name>"])

For example:
for $i in collection("dsearch/SystemInfo")
return count($i//trackinginfo/document)

Get object count in all collections (all indexed objects)


count(//trackinginfo/document)

For example:
Get object count in library
count(//trackinginfo/document[library-path="<LibraryPath>"])

Find documents
Find collection in which a document is indexed
//trackinginfo/document[@id="<DocumentId>"]/collection-name/string(.)

For example:
for $i in collection("dsearch/SystemInfo")
where $i//trackinginfo/document[@id="TestCustomType_txt1276106246060"]
return $i//trackinginfo/document/collection-name

Find library in which a document is indexed


//trackinginfo/document[@id="<DocumentId>"]/library-path/string(.)

Get tracking information for a document


//trackinginfo/document[@id="<DocumentId>"]

Status information
Get operations and status information for a document
//trackinginfo/operation[@doc-id="<DocumentId>"]

354

EMC Documentum xPlore Version 1.3 Administration and Development Guide

XQuery and VQL Reference

VQL and XQuery Syntax Equivalents


xPlore does not support the Verity Query Language (VQL). The following table maps VQL syntax
examples that have equivalent in XQuery.
Table C.47

DQL and XQuery mapping

DQL

XQuery

IN

for $i in collection(
/XX/dsearch/Data)/dmftdoc[
(dmftcontents/dmftcontent
ftcontains (test1)) ]

NEAR/N

for $i in collection(
/XX/dsearch/Data)/dmftdoc[
(dmftcontents/dmftcontent
ftcontains (test1 ftand test2
distance exactly N words)) ]

ORDERED

for $i in collection(
/XX/dsearch/Data)/dmftdoc[
(dmftcontents/dmftcontent
ftcontains (test1 ftand test2)
ordered]

ENDS

let $result := ( for $i in collection(


/XX/dsearch/Data)/dmftdoc[
(dmftcontents/dmftcontent
ftcontains (test1)) and
(ends-with(dmftmetadata/dm_sysobject/
object_name, test2))]

STARTS

for $i in collection(
/XX/dsearch/Data)/dmftdoc[
(dmftcontents/dmftcontent
ftcontains (test1)) and
starts-with(dmftinternal/r_object_type,
dm_docu)]

EMC Documentum xPlore Version 1.3 Administration and Development Guide

355

xPlore Glossary
Term
category

collection

content processing service (CPS)

domain

DQL

FTDQL
ftintegrity

full-text index

index agent

ingestion

Description
A category defines a class of documents and
their XML structure.
A collection is a logical group of XML
documents that is physically stored in an
xDB library. A collection represents the most
granular data management unit within xPlore.
The content processing service (CPS)
retrieves indexable content from content
sources and determines the document format
and primary language. CPS parses the content
into index tokens that xPlore can process into
full-text indexes.
A domain is a separate, independent group of
collections with an xPlore deployment.
Documentum Query Language, used by many
Content Server clients
Full-text Documentum Query Language
A standalone Java program that checks index
integrity against Content Server repository
documents. The ftintegrity script calls the
state of index job in the Content Server.
Index structure that tracks terms and their
occurrence in a document.
Documentum application that receives
indexing requests from the Content Server.
The agent prepares and submits an XML
representation of the document to xPlore for
indexing.
Process in which xPlore receives an XML
representation of a document and processes
it into an index.

Term
instance

lemmatization

Lucene

node

persistence library

state of index job

status library

stop words

text extraction
token

Description
A xPlore instance is one deployment of the
xPlore WAR file to an application server
container. You can have multiple instances on
the same host (vertical scaling), although it is
more common to have one xPlore instance
per host (horizontal scaling). The following
processes can run in an xPlore instance: CPS,
indexing, search, xPlore administrator. xPlore
can have multiple instances installed on the
same host.
Lemmatization is a normalization process in
which the lemmatizer finds a canonical or
dictionary form for a word, called a lemma.
Content that is indexed is also lemmatized
unless lemmatization is turned off. Terms in
search queries are also lemmatized unless
lemmatization is turned off.
Apache open-source, Java-based full-text
indexing, and search engine.
In xPlore and xDB, node is sometimes used
to denote instance. It does not denote host.
Saves CPS, indexing, and search metrics.
Configurable in indexserverconfig.xml.
Content Server configuration installs the state
of index job dm_FTStateOfIndex. This job is
run from Documentum Administrator. The
ftintegrity script calls this job, which reports
on index completeness, status, and indexing
failures.
A status library reports on indexing status for
a domain. There is one status library for each
domain.
Stop words are common words filtered out of
queries to improve query performance. Stop
words can be searched when used in a phrase.
Identification of terms in a content file.
Piece of an input string defined by semantic
processing rules.

Term
tracking library

transactional support

watchdog service

xDB

XQFT

XQuery

Description
An xDB library that records the object IDs
and location of content that has been indexed.
There is one tracking database for each
domain.
Small in-memory indexes are created in
rapid transactional updates, then merged into
larger indexes. When an index is written
to disk, it is considered clean. Committed
and uncommitted data before the merge is
searchable along with the on-disk index.
Installed by the xPlore installer, the watchdog
service pings all xPlore instances and sends
an email notification when an instance does
not respond.
xDB is a database that enables high-speed
storage and manipulation of many XML
documents. In xPlore, an xDB library stores
a collection as a Lucene index and manages
the indexes on the collection. The XML
content of indexed documents can optionally
be stored.
W3C full-text XQuery and XPath extensions
described in XQuery and XPath Full Text 1.0.
Support for XQFT includes logical full-text
operators, wildcard option, anyall option,
positional filters, and score variables.
W3C standard query language that is designed
to query XML data. xPlore receives xQuery
expressions that are compliant with the
XQuery standard and returns results.

Index
A

ACL replication
job, 53
script, 53
aclreplication, 53
ACLs
large numbers of, 66
aspects, 61
attach, 189

events
register, 28
exporter queue_size, 323
exporter_thread_count, 323

B
backup
incremental, 179
batch_hint_size, 343

C
capacity
allocate and deallocate, 14
categories
Documentum, 26
overview, 23
collection
configure, 161
global, 24
move to another instance, 162
overview, 159
connectors_batch_size, 323
Content Server
indexing, 27, 29

D
detach, 189
dm_FTStateOfIndex, 76
dm_fulltext_index_user, 28
dmi_registry, 28
document
maximum size for ingestion, 341
document size
maximum, 98
Documentum
categories, 26
domains, 25
domain
create, 156
Documentum, 25
overview, 23
reset state, 180
restore with xDB, 180
DQL
using IDfQueryBuilder, 259
DQL, compared to DFC/DFS, 224

F
facets
date, in DFC, 277
numeric, in DFC, 278
out of the box, 275
results from IDfQueryProcessor, 283
string in DFC, 276
FAST
migration, sizing, 316
federation
restore with xDB, 180
force
detach and attach, 189
freshness-weight, 202
ftintegrity
running, 74
full-text indexing
Content Server documents, 27, 29
index server, 29
overview, 27
software installation, 27, 29
verifying indexes, 74
xPlore, 29

G
Get query text, 290

H
highlighting
in results summary, 214

I
IDfQueryBuilder, 259
incremental backup, 179
index agent
role in indexing process, 27, 29
index server
role in indexing process, 27, 29
indexer callback_queue_size, 323
indexer queue_size, 323
indexing
queue items, 28
indexserverconfig.xml
Documentum categories, 26
installing indexing software, 27, 29
instance
deactivate, 34

EMC Documentum xPlore Version 1.3 Administration and Development Guide

359

Index

jobs
state of index, 76

save-tokens, 106
security
manually update, 53
view in Content Server, 57
view in log, 55
size
content size limit, in Documentum, 66
maximum, for ingestion, 341
sizing
CPS, 316
migration from FAST, 316
search, 316
spare instance
deactivate, 34
replace primary, 35
state of index, 76
summary
dynamic, 213
performance, 214

M
metadata
boost in results, 202

P
password
change, 53
performance
language identification, 342
query summary, 214
primary instance
replace, 35

Q
query
counts by user, 291
query definition, 259
queue
items, 28

T
text size
maximum, 99
Top N slowest queries, 290

W
watchdog service, 37

recent documents
boost in results, 202
reindexing, 28
report
Get query text, 290
Query counts by user, 291
Top N slowest queries, 290
reset
domain state, 180
restore
domain, with xDB, 180
federation, with xDB, 180

360

EMC Documentum xPlore Version 1.3 Administration and Development Guide

You might also like