Content Server 6.5 SP2 Full-Text Indexing Deployment and Administration Guide
Content Server 6.5 SP2 Full-Text Indexing Deployment and Administration Guide
Content Server 6.5 SP2 Full-Text Indexing Deployment and Administration Guide
Content Server
Version 6.5 SP2
EMC Corporation
Corporate Headquarters:
Hopkinton, MA 01748-9103
1-508-435-1000
www.EMC.com
Copyright © 1992 - 2009 EMC Corporation. All rights reserved.
Published September 2009
EMC believes the information in this publication is accurate as of its publication date. The information is subject to change
without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMC CORPORATION MAKES NO REPRESENTATIONS
OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY
DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.
For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.
All other trademarks used herein are the property of their respective owners.
Table of Contents
Preface ................................................................................................................................. 9
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 3
Table of Contents
4 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Table of Contents
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 5
Table of Contents
6 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Table of Contents
List of Figures
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 7
Table of Contents
List of Tables
8 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Preface
Intended audience
This manual is intended for the person installing Content Server and the full-text indexing software.
Typically, a system administrator installs the software.
Related documentation
• The Content Server Installation Guide contains information on installing Content Server.
• The Documentum Administrator online help system contains instructions for managing the index
queue and for starting and stopping the index server and index agent, in basic, consolidated,
and high-availability deployments. No Documentum Administrator support is provided for
multinode deployments.
• The EMC Documentum Search Development Guide contains complete information on querying.
Revision history
The following revisions have been made to this document:
Revision History
Date Description
July 2008 Initial publication for version 6.5.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 9
Preface
Date Description
March 2009 Initial publication for version 6.5 SP1.5.
September 2009 Initial publication for version 6.5 SP2.
10 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Part 1
Overview
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 11
Overview
12 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Chapter 1
Overview
Introduction
A repository can contain millions of objects. Locating objects containing particular values can be a
challenge, whether you are trying to find a phrase, a date, or a name. You can make the task of
locating objects somewhat easier by carefully structuring folders to storing related objects together.
You can also use the Documentum Query Language (DQL) to rapidly locate objects whose metadata
contains the values for which you are searching. However, to meet this daunting challenge, full-text
indexes will enable you to quickly search the content of every indexable file as well as the properties
of every indexable object anywhere in your repositories.
The full-text indexing system:
• Creates indexes of all the words in every indexable object’s properties and content file in a
repository, and
• Enables searches to be made against those indexes and quickly return results.
Note: Phonetic searching is not supported.
An indexable object is an object that is a SysObject, SysObject subtype, or, optionally, a lightweight
SysObject (LWSO). (Because the SysObject type is the parent type of the most commonly used objects,
most of your relevant objects are indexable.) All properties of SysObject and SysObject subtype
objects are indexed automatically. A content file (which can reside in any storage area) is indexed
only if its associated object’s a_full_text property value is “TRUE” and the content file’s format is a
supported file format.
Note: Unless otherwise required for clarity, any usage of the term, “full-text indexing”, will be
simplified to “indexing”; for example, a full-text index will be referred to as an “index”.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 13
Overview
Features
The full-text indexing system supports these major features:
• Wide variety of file formats
See Appendix D, Supported and Unsupported Formats.
• Indexes every object’s properties
• Wide variety of characters
See Appendix C, Indexed and Non-Indexed Characters.
• Wide variety of languages
All standard Unicode character sets are supported and no special configuration is necessary.
By default, content files and properties in all supported languages (including two right-to-left
languages, Hebrew and Arabic) are indexed.
See Appendix E, Supported Languages.
• Grammatical normalization (also known as “lemmatization”)
Enabling grammatical normalization forces all grammatical forms (such as singular and plural) of
a word to be indexed and includes all other grammatical forms of a word in a search. For example,
a query on “cat” would return objects containing not only “cat”, but also those containing “cats”.
See Using grammatical normalization , page 14.
• Thesaurus
You can specify in a thesaurus all of the synonyms to include in a search that only contains one of
the synonyms. For example, if the thesaurus specifies “feline”, “polecat”, “cougar”, and “bobcat”
as synonyms of “cat”, then a search on the term “cat” will also include the synonyms, “feline”,
“polecat”, “cougar”, and “bobcat”.
See Enabling thesaurus searching, page 84.
14 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Overview
Architectural overview
Full-text indexing in a repository is controlled by three software components:
• Content Server, which does the following:
— Manages the objects in a repository
— Generates the events that trigger full-text indexing operations
— Queries the full-text indexes
— Returns query results to client applications
Note: Although Content Server supports full-text indexing by default, Content Server itself
does not create nor maintain full-text indexes. You need to install the full-text indexing software
components, which create and maintain the full-text indexes.
• The index agent, which exports documents from a repository and prepares them for indexing
• The index server, which is a third-party server product that creates and maintains the full-text
index for a repository. The index server also receives full-text queries from Content Server and
responds to those queries.
The index server’s operations are processor- and memory-intensive. Therefore, EMC
Documentum recommends that you install the index server on a host other than the Content
Server host. You can install multiple index servers on a single UNIX or Linux host, provided the
directories, ports, and environment variables for each index server are configured properly. You
can install only one index server on a Windows host.
Note: DQL, the Content Server query language, supports querying against full-text indexes. You
can run queries against full-text indexes only or against full-text indexes and the metadata tables.
For information about the types of queries supported by DQL, refer to the Search Development
Guide.
Figure 1, page 16, illustrates the relationships among the Content Server, index agent, and index
server. For indexing, the Content Server sends metadata and content for indexing to the index agent,
which in turn packages that information as Documentum Full-text XML (DFTXML) and sends it to
the index server. DFTXML is an XML format that contains the object’s properties and the location
of the object’s content file. The index server processes the information and updates the index. For
queries, Content Server sends the query to the index server, and the index server’s query server
process returns the results to Content Server.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 15
Overview
16 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Overview
The indexing process is not destructive to existing content or attributes in the repository. The content
files and object attributes are read, but not modified, during the indexing process.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 17
Overview
18 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Part 2
Deploy
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 19
Deploy
20 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Chapter 2
Prepare for Installation
Use the information in this chapter to prepare your installation for the full-text indexing component
installation.
These topics are included:
• Pre-installation procedure, page 21
• Preinstallation checklist, page 25
Pre-installation procedure
1. Install Content Server and configure a repository.
Use the instructions in the Content Server Installation Guide.
2. Host names
You need to identify the host where the index server and index agent are installed by a fully
qualified domain name (FDQN). For example, the host name isolde.documentum.com is
acceptable, but an IP address, for example, 172.04.8.275 is not acceptable.
3. Ports to use for the index agent
The index agent runs in the application server container. When an index agent instance is
configured, you need to designate two ports for the index agent and application server to use.
The default ports for the first index agent on a host are 9200 and 9080. The default for any other
index agents are 90220 for index agent 2, 9240 for index agent 3, and so on. If the index agent is
on the Content Server host, ensure that the ports are not the ports used for the application server
instance in which the Java method server and ACS server run.
4. Ports to use for the index server
The index server requires a contiguous range of 4000 free ports. You must designate which ports
to use during installation. The default range is from 13000 to 17000.
5. Index server operating system and host
Note: Installing the index server installs Java on the host. The installed version of Java is the same
version as the Java version installed with Content Server.
You need to install the index server on a supported operating system. EMC Documentum
recommends using a host on which a clean installation of the operating system has been
performed.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 21
Prepare for Installation
Install the index server on a disk partition separate from the system partition and larger than
the system partition.
Constraints on third-party software on the index server host — The following restrictions
apply to the index server host:
• Do not run network security scanning software on a host where the index server is installed.
Network security scanners might lock index server processes, which can create intermittent
search and indexing failures.
• Do not run backup utilities while the index server is running.
Backup utilities might lock indexing processes.
• Do not run antivirus software on the %FASTSEARCH% directory, where the index server and
indexes are stored.
Antivirus software interferes with index server startup and proper functioning. Antivirus
software might quarantine log and other frequently-changed files.
EMC Documentum recommends testing any third-party monitoring tools on a development
system before the tools are deployed to a production system where the index server is installed.
Windows host requirements for the index server — The following restrictions apply to
Windows hosts:
• Do not run the Windows Index System on the index server host.
• On 32-bit Windows hosts, do not set the /3GB option in the boot.ini file.
• Disable automatic Windows updates on index server hosts.
• Do not install the index server on a domain controller.
Note: If a 5.2.x repository is running on a Windows host and you are performing a pre-upgrade
index migration, you must install the index agent and index server on a host other than the
Content Server host. For more information, refer to Chapter 4, Upgrade.
6. Host time settings
Set the time zone on the host where the index server runs to Greenwich Mean Time (GMT) or
Universal Time Coordinated (UTC). On Windows hosts, clear Automatically adjust clock for
daylight saving changes.
7. Ensuring correct network configuration
If you are installing the indexing software on a host other than the Content Server host, ensure
that the domain name service (DNS) entries for the two machines are correct so that they are able
to locate each other on the network.
To verify the DNS entries:
a. On the index server host, look up the Content Server host:
nslookup FQDN_of_Content_Server_host
where FQDN_of_Content_Server_host is the FQDN of the Content Server host.
This returns one or more IP addresses for the Content Server host.
b. Use the first IP address returned in step 1 for a reverse lookup:
22 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Prepare for Installation
nslookup IP_address_returned
The correct return value is the same FQDN you entered in step 1.
c. If the two nslookup commands do not return the correct values, update the DNS servers used
by the two hosts to reflect the correct FQDNs.
d. If necessary, on Windows with more than one network card, update the host files to ensure
that the correct IP address for each host is listed first.
e. If the nslookup commands succeeded and return the correct values, ping the index server
host from the Content Server host to ensure that it responds to the pin and to ensure that the
IP address that responds to the ping is the IP address defined in the ftengine config object.
8. Index agent and index server installation account
You need to install the index agent and index server under the same user name under which you
installed Content Server (the Content Server installation owner). If you are installing the index
agent and index server on a host other than the Content Server host, ensure that the user exists
on that host. Refer to “Installation Owner Account” in the correct section for your operating
system, in Chapter 3, “Preparing for Installation,” of the Content Server Installation Guide, for more
information on the installation owner account.
9. Environment variables on UNIX and Linux hosts
You must set the following environment variables in the installation owner’s environment on
UNIX and Linux hosts before installing the index agent and index server:
$DOCUMENTUM/fulltext/
fast
$DOCUMENTUM_SHARED/
dfc
$DOCUMENTUM_SHARED/
IndexAgents/ftintegrity
FASTSEARCH Location of the index server $DOCUMENTUM/fulltext/
IndexServer
DISPLAY Controls the display localhost:0.0
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 23
Prepare for Installation
10. Ensuring that the index server environment is correct on UNIX and Linux hosts
The index server installation includes a script that sets required environment variables for
running the index server. The script is setupenv.sh or setupenv.csh, depending on the shell from
which you run, and it is located in the indexserver_install_dir/bindirectory. You can source this
script to ensure that the environment variables are correct.
11. The deprecated DFC_DATA environment variable on UNIX hosts
The DFC_DATA environment variable was deprecated after the 5.1 EMC Documentum
release, but it is still used by Documentum installers for backward compatibility. If you are
installing the indexing software on a UNIX host where older EMC Documentum software
required setting DFC_DATA, the installer uses the value of DFC_DATA to create the /config
directory ($DFC_DATA/config). However, the startupIndexAgent.sh script expects to
find the $DOCUMENTUM_SHARED variable set and expects the /config directory to be
$DOCUMENTUM_SHARED/config.
If the /config directory is not $DOCUMENTUM_SHARED/config, edit the startupIndexAgent.sh
script so that it points to the valid /config directory path on the index agent host. Replace these
lines:
CLASSPATH=$DOCUMENTUM_SHARED/dctm.jar:$DOCUMENTUM_SHARED/config:
$DOCUMENTUM_SHARED/dfc/dfc.jar:$DOCUMENTUM_SHARED/dfc/dfcbase.jar:
$DOCUMENTUM_SHARED/dfc/log4j.jar
with:
CLASSPATH=$DOCUMENTUM_SHARED/dctm.jar:$DOCUMENTUM/config:
$DOCUMENTUM_SHARED/dfc/dfc.jar:$DOCUMENTUM_SHARED/dfc/dfcbase.jar:
$DOCUMENTUM_SHARED/dfc/log4j.jar
24 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Prepare for Installation
Preinstallation checklist
Use following the checklist to ensure that you have performed all required tasks before installing
or upgrading the full-text indexing software.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 25
Prepare for Installation
26 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Chapter 3
Install
Overview
This chapter contains instructions for installing the full-text indexing software and creating
full-text indexes, whether you are upgrading from an earlier Documentum release or creating new
repositories.
The high-level procedure for installing the full-text indexing server and full-text indexing components
for a new installation is:
1. Install the index server and index agent configuration program.
Use the instructions in Installing the index server and the index agent configuration program,
page 30.
2. Configure the index agent.
Configuring the index agent, page 36 has instructions.
3. Start the index agent in normal mode.
To install a consolidated deployment, install the index agent and configure an index agent for each
repository, then index each repository.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 27
Install
To install a high availability configuration, you need to install the indexing software on the first host,
perform steps on the Content Server host, install the indexing software on the second host, and
perform additional steps on the Content Server host.
1. If you do not already have an existing full-text indexing system, install the full-text indexing
software on the first host and configure an index agent.
Note: On Windows, you need to log on as the same user, in the same domain, as the user who
installed the Content Server installation.
Use the instructions in Installing the index server and the index agent configuration program,
page 30 and Configuring the index agent, page 36.
2. Log on to the Content Server host as the installation owner.
3. Ensure that no users are connected to the repository.
4. Shut down the first index agent.
5. Navigate to the directory where the create_fulltext_objects_ha.ebs script is located:
• On Windows, the %DM_HOME%\install\admin folder
• On UNIX or Linux, the $DM_HOME/install/admin directory
6. If you are upgrading the system, run the following command:
dmbasic -f create_fulltext_objects_ha.ebs
-e HACleanupBeforeUpgradeStep --
repository_name
Superuser_name Superuser_password
where repository_name is the name of the repository, Superuser_name is the user name of a user
with superuser privileges in the repository, and Superuser_password is the superuser’s password
7. Run the create_fulltext_objects_ha.ebs script using this syntax, where repository_name is the name
of the repository, Superuser_name is the user name of a user with Superuser privileges in the
repository, and Superuser_password is the Superuser’s password:
dmbasic -f create_fulltext_objects_ha.ebs
-e HAPreInstallStep -- repository_name
Superuser_name Superuser_password
8. Install the full-text indexing software on the second indexing host and configure an index agent,
using the instructions in Installing the index server and the index agent configuration program,
page 30 and Configuring the index agent, page 36.
Do not start the new index agent. The repository now contains two full-text index objects, two
ft index agent config objects, and two ft engine config objects.
9. Log on to the Content Server host as the Content Server installation owner.
28 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Install
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 29
Install
To install the index server and the index agent configuration program:
1. On HP-UX, you must set the following parameter values:
• Set maxdsiz or data seg size at 2 GB (0x80000000)
• Enable Largefiles
• Set maxusers to 256 or higher
• Set max_thread_proc to 256 or higher
• Set maxfiles to 1024 or higher
2. Ensure that the repository (for which you are installing the index server and index agent)
is running.
3. Log on to the index server and index agent host as the Content Server installation owner.
On Windows, this means you need to log on as the same user, in the same domain, as the user
who installed the Content Server installation.
4. Copy the installation files from the EMC Documentum download site or distribution CDs to
a temporary location on the host.
5. Start the installation program.
• On Windows, double-click fulltextWinSuiteSetup.exe.
• On UNIX and Linux, type
% fulltextoperatingsystemSuiteSetup.bin
and press Enter, where operatingsystem is the operating system on which you are installing.
A Welcome dialog box is displayed.
6. Click Next.
The license agreement dialog box is displayed.
7. Click I accept the terms of the license agreement and click Next.
A dialog box is displayed that lists the products you can install.
8. Choose the products to install.
• To install the index agent, check Documentum Index Agent Configuration Program.
• To install the index server, check Index Server.
30 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Install
9. Click Next.
10. Indicate whether to install the developer documentation and click Next.
11. If required, install DFC.
a. On Windows, accept the default installation directory or type a different directory name
and click Next.
This is typically C:\Program Files\Documentum. On UNIX and Linux, the DFC directories
are determined by environment variables set before installation.
b. On Windows, accept the default user directory or type the name of a different directory
and click Next.
This is typically C:\Documentum.
Note: If the DOCUMENTUM environment variable is set on the host, you need to accept the
default directories. The installation program uses the value of the DOCUMENTUM variable to
generate the default directories and will not allow you to change the directories.
12. If a dfc.properties file does not exist on the host, provide connection information.
Note: If you are installing the software to create an index for a repository prior to version
5.3 for migration purposes, the connection information is in a dmcl.ini file on the host, not
a dfc.properties file.
a. Type the name of a host where a connection broker is running.
b. Type the port number used by the connection broker.
c. Click Next.
13. Read the explanatory information about full-text indexing, then click OK.
14. To install the index server, complete these steps.
a. Accept the default index server installation directory or type the name of a new directory,
then click Next.
Note: If the DOCUMENTUM environment variable is set on the host, you need to accept the
default directory. The installation program uses the value of the DOCUMENTUM variable to
generate the default directory and will not allow you to change the directory.
b. On Windows, type the password for the account you used to log on, then click Next.
The installer verifies the password.
c. Type the base port number for the index server, then click Next.
The index server requires 4,000 available ports in sequence; for example, if the base port
you designate is 3000, the index server uses ports 3000 through 7000. Do not use a port
in ephemeral range.
The default base port is 13000.
Note: You cannot change the chosen ports unless you remove and reinstall the index server.
d. Check the check box to enable support for grammatical normalization.
Choosing the parts of speech to be indexed can reduce the size of the indexes and the disk
space required for maintaining the indexes. You can enable grammatical normalization only
for the languages listed on the dialog box. If you enable grammatical normalization, it is
enabled by default for Japanese and Korean and you cannot disable it.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 31
Install
Note: Content files in languages that are not chosen or that are not available for normalization
are still indexed.
e. Choose the languages for which to perform grammatical normalization and the parts of
speech to be indexed by selecting the appropriate check boxes.
The recommended choice is to normalize only nouns.
f. Accept the default directory for the full-text indexes or type the name of a different directory,
then click Next.
The default on Windows is %DOCUMENTUM%. The default on UNIX or Linux is
$DOCUMENTUM. If you choose another directory, the name cannot contain any blank
spaces. The installer creates the directory \data\fulltext (/data/fulltext on UNIX or Linux)
under the location you designate.
On HP-UX (B.11.23 U 9000/800), you cannot install the Index Server in a directory that
contains an “_d” in the directory path.
Note: If the DOCUMENTUM environment variable is set on the host, you need to accept the
default directory. The installation program uses the value of the DOCUMENTUM variable to
generate the default directory and will not allow you to change the directory.
A summary dialog box is displayed, listing the products that will be installed.
15. Click Next.
The products are installed and a panel is displayed indicating success when the installation is
completed.
16. Ensure that the index server starts.
• On Windows, select Yes, restart my computer., then click Next.
— If the computer does not restart automatically, click Start > Shutdown > Restart and restart
the computer manually.
— If the index server does not automatically start, click Start > Programs > Administrative
Tools > Services and start the FAST InStream service.
— If the system restarts, the index server starts automatically as a Windows service.
• On UNIX and Linux, navigate to the $DOCUMENTUM/fulltext/IndexServer/bin directory
(the installation location), type startup.sh, and press Enter.
The index server is started.
Caution: Do not run backup utilities while the index server is running, because they may lock
indexing processes.
32 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Install
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 33
Install
34 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Install
format, create a rendition in an indexable format. Appendix D, Supported and Unsupported Formats,
contains a complete list of indexable formats.
If the content file associated with a SysObject exists in a no-indexable format, its properties are
still indexed. To index the content, create a rendition of the SysObject in an indexable format. Use
Documentum Content Transformation Services or third-party client applications to create the
rendition.
Appendix D, Supported and Unsupported Formats lists the formats considered indexable by the
index server.
Some formats in the appendix are not represented in the repository by a format object. The
formats.cvs file, which is located in $DM_HOME/install/tools, contains a complete list of supported
mime_types and the formats with which they are associated. If a supported mime_type is not
represented by a format object, create a format object in the repository and map the supported
mime_type to the format.
2. Set the param_value property at the corresponding index position to the list of error codes
representing the additional errors you wish to catch.
The error codes are the four-digit codes returned in the error message. For example, here is an
error message with the four-digit error code highlighted in bold:
[DM_FULLTEXT_E_SEARCH_NEW_FAIL]error:"dmFTSearchnew
failed with error: QRServer Error (1012):
Resource limit exceeded, Error from QRServer, error code: -2"
Separate multiple errors codes with commas.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 35
Install
36 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Install
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 37
Install
38 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Install
If the paths to the content files are different, do not modify the value of <all_filestores_local>, but
instead, create a file store map within the <exporter> element.
For example, if Content Server is on a host called Dandelion where filestore_01 is physically
located in the directory /Dandelion/Documentum/data/repository_name/content_storage_01 and
the index agent and index server on a host from which the drive on the Content Server host is
shared as /mappingtoDandelion/repository_name/content_storage_01, create an alias as follows:
<local_filestore_map>
<local_filestore>
<store_name>filestore_01</store_name>
<local_mount>/mappingtoDandelion/
repository_name
/content_storage_01</local_mount>
</local_filestore>
<!-- and so on for each filestore --!>
</local_filestore_map>
If you are indexing content stored on a NAS device or a Windows 2003 Server host, you may see
the following error message in the dmi_queue_item’s message attribute:
DocumentRetriever :ERROR Retrieval error: Couldn’t open file
<file path/name> ERROR Processor error status:
DataNotAvailable Not read permission
To resolve this error, edit the <local_mount> element or elements in the IndexAgent.xml file that
reference the storage area or areas on the NAS device. Add two back slashes immediately after
the opening <local_mount> element. For example, assume the following references a storage
are on an NAS device:
<local_mount>\\100.2.4.32\share3\c\data_for_example
\content_storage_1</local_mount>
After editing, it is now:
<local_mount>\\\\100.2.4.32\share3\c\data_for_example
\content_storage_1</local_mount>
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 39
Install
40 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Install
and press Enter, where operatingsystem is the operating system on which you are installing.
4. Select index server installation only.
The installation program detects the InstallProfile.xml file and asks you to confirm that you
require multinode installation.
5. Click Yes.
The correct index server processes are installed and configured on the host.
6. After the installation is completed, ensure that the index server starts.
• On Windows, select Yes, restart my computer., then click Next.
— If the computer does not restart automatically, click Start > Shutdown > Restart and restart
the computer manually.
— If the index server does not automatically start, click Start > Programs > Administrative
Tools > Services and start the FAST InStream service.
— If the system restarts, the index server starts automatically as a Windows service.
• On UNIX and Linux, run $DOCUMENTUM/fulltext/jboss4.2.0/IndexServer/bin/startup.sh.
The index server is started.
7. Install and configure an index agent as described in Configuring the index agent, page 42.
8. To configure directed routing, perform the steps described in Configuring directed routing,
page 44 on the administrative host.
9. Repeat Step 1 through Step 6 on each additional node.
10. Confirm that the index server processes are running correctly.
• On the administrative node, type dsadmin listmodules at a command prompt.
The running processes, their version numbers, host name, and port numbers are displayed.
• On each nonadministration node, use the nctrl command to verify that the processes are
running correctly:
$ cd $FASTSEARCH/bin
$ ../setupenv.sh
$ nctrl sysstatus
Each running module name is listed, as well as the process name, process ID, and status.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 41
Install
5. Start the index agent configuration program using the appropriate command for your platform.
• Windows: IndexAgent_Configuration_Program.exe –config setup.ini
• AIX: IndexAgent_Configuration_Program.aix –config setup.ini
• Solaris: IndexAgent_Configuration_Program.bin –config setup.ini
• HP-UX: IndexAgent_Configuration_Program.hp –config setup.ini
• On Linux, IndexAgent_Configuration_Program.linux –config setup.ini
A Welcome dialog box is displayed.
6. Click Next.
7. On Windows, type in the installation owner’s password, then click Next.
8. Type in the port used to communicate with application server for administration purposes,
then click Next.
The default port, 9200, is for the first index agent on the host. The default port for any other index
agents are 9220 for index agent 2, 9240 for index agent 3, and so on.
9. Select the repository for which the index agent will prepare documents, then click Next.
The drop-down list contains the repositories that project to the connection brokers listed in
the dfc.properties file on the host.
10. Type in the user name and password for the Superuser account that the index agent will use to
connect to the repository.
11. Indicate whether to run the index agent in normal mode or migration mode.
42 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Install
12. Type in the host where the index server for this index agent is running and the base port number
for the index server, then click Next.
A summary dialog box is displayed.
13. Click Next.
14. To exit from the configuration program, click Finish.
15. To complete the installation process:
• If you are mapping the file stores, complete the instructions in Modifying the IndexAgent.xml
file to map file stores, page 38.
• If you are using directed routing, complete the instructions in Configuring directed routing,
page 44.
16. On Windows, to start the index agent manually, start the IndexAgent service to start IndexAgent
instance.
On Linux, to start the index agent manually, run $DOCUMENTUM_SHARED/jboss4.2.0/server/
startIndexAgent1.sh.
If you have shared or mounted the drives containing the repository’s file stores and installed the
indexing software, the index agent configuration file needs to be manually edited to indicate that the
drives are shared. The changes depend on whether the file system paths to the content are identical
on the Content Server host and index server host.
Using shared or mounted drives improves performance, because the index agent does not need to
copy documents from Content Server to a staging area. See the Documentum System Planning Guide
for more information about sharing file store drives.
4. If the paths to the content files are different, do not modify the value of <all_filestores_local>, but
instead, create a file store map within the <exporter> element.
For example, if Content Server is on a host called Dandelion where filestore_01 is physically
located in the directory /Dandelion/Documentum/data/repository_name/content_storage_01 and
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 43
Install
the index agent and index server on a host from which the drive on the Content Server host is
shared as /mappingtoDandelion/repository_name/content_storage_01, create an alias as follows:
<local_filestore_map>
<local_filestore>
<store_name>filestore_01</store_name>
<local_mount>/mappingtoDandelion/
repository_name
/content_storage_01</local_mount>
</local_filestore>
<!-- and so on for each filestore --!>
</local_filestore_map>
If you are indexing content stored on a NAS device or a Windows 2003 Server host, you may see
the following error message in the dmi_queue_item’s message attribute:
DocumentRetriever :ERROR Retrieval error: Couldn’t open file
<file path/name> ERROR Processor error status:
DataNotAvailable Not read permission
To resolve this error, edit the <local_mount> element or elements in the IndexAgent.xml file that
reference the storage area or areas on the NAS device. Add two backslashes immediately after
the opening <local_mount> element. For example, assume the following references a storage
are on an NAS device:
<local_mount>\\100.2.4.32\share3\c\data_for_example
\content_storage_1</local_mount>
After editing, it is now:
<local_mount>\\\\100.2.4.32\share3\c\data_for_example
\content_storage_1</local_mount>
44 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Install
All documents submitted for indexing are assigned to a collection, which is a logical set of data to
which the index server applies the same indexing rules. In a basic implementation, all documents
are typically assigned to a single default collection that is created automatically as part of the index
server installation process.
With directed routing, you want the index server to treat documents differently depending on which
file stores contain their content files — specifically, you want the index server to route them to
different nodes for inclusion in different columns. To enable the index server to route documents to
different nodes, you need to assign the documents to different collections.
The first step is to create additional collections. You create one collection for each node.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 45
Install
Once you have created the necessary full-text collections, you associate them with file stores in the
indexagent.xml configuration file, which resides on the index agent host machine.
46 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Install
<partition>
<storage_name>name_of_file_store</storage_name>
<collection_name>name_of_collection</collection_name>
</partition>
</partition_config>
The name_of_default_collection needs to match the value of the <fds_collection> element that
appears later in the indexagent.xml file. It is the name of the collection created during installation.
Include a <partition> element for each file store whose documents you want to assign to a specific
collection. The name_of_file_store needs to match the file store name from the repository, and
the name_of_collection needs to match the collection name you specified when you created the
collection in Creating full-text collections, page 45.
For example, the <partition_config> element below assigns documents from filestore_01,
filestore_02, filestore_03, and filestore_04 to three collections (repb01, repb02, and repb03).
Documents from other file stores are assigned to repb01, which is designated as the default
partition. Notice also that filestore_03 and filestore_04 are assigned to the same collection.
<partition_config>
<default_partition>
<collection_name>repb01</collection_name>
</default_partition>
<partition>
<storage_name>filestore_01</storage_name>
<collection_name>repb01</collection_name>
</partition>
<partition>
<storage_name>filestore_02</storage_name>
<collection_name>repb02</collection_name>
</partition>
<partition>
<storage_name>filestore_03</storage_name>
<collection_name>repb03</collection_name>
</partition>
<partition>
<storage_name>filestore_04</storage_name>
<collection_name>repb03</collection_name>
</partition>
</partition_config>
</indexer>
When using directed routing, the index agent assigns documents to collections based on which file
store contains their content files. The index server needs to route the documents to different nodes
based on which collection they belong to. To configure the index server to do this, you create a routing
configuration file (routing.cfg) and update the Status Server configuration file (NodeConf.xml) so
that it refers to the routing configuration file.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 47
Install
48 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Install
Adding a node
Use these instructions to add an additional node to an existing multinode configuration.
d. Within the <search_clusters><cluster> element, add a new <column> element for the new
node. The <column> element has this format:
<column host="fully_qualified_host_name"
port="base_port_number" mode="NORMAL"
partition_id="partition_number"
ft_mode="0" docapiport="15500">
The partition_number should be the integer specified as part of the <search-engine-column>
ID in the of the IndexProfile.xml file; if the IndexProfile.xml file includes
<search-engine-column id="col4">, the partition_number for the column is 4. For
example:
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 49
Install
The port is the index server’s base port number plus the constant 3099. If the index server
is using the default base port 13000, then the port value is 16099. The first integer after
’webcluster’ is the partition_id of the column hosted on the node.
f. Rebuild the index by running the FIXML feeder on each node. Enter this command:
cobra fixmlfeeder.py -i path_to_temp_dir
The path_to_temp_dir is the path to the directory where you backed up the FIXML files
at step a.
g. When the feeding is complete, use the FAST InStream administration tool to reset all
collections to use the DFTXML (webcluster) pipeline (see step d above).
h. Resume indexing on each node. Enter this command to resume indexing:
rtsadmin localhost port webcluster
partition_id 0 resetindex
9. Restart the index server on each node and restart the index agent.
50 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Install
Removing a node
Use these instructions to remove a node from an existing multinode configuration.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 51
Install
52 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Chapter 4
Upgrade
Overview
This chapter provides the instructions for upgrading the full-text indexing components.
If you are using full-text indexing, are upgrading from 5.3 SP4 or later, you do not need to rebuild
your full-text indexes to migrate to version 6.0 or 6.5. If you are using full-text indexing, are
upgrading from 5.3 SP2 or SP3, and have applied the Get Well 4.3.1 hot fix, you do not need to
rebuild your full-text indexes to migrate to Documentum 6. If you are upgrading from 5.3 or 5.3 SP1,
you will need to rebuild your full-text indexes.
In a consolidated deployment, a single index server provides indexing services to multiple
repositories. In any indexing configuration, the indexing software and Content Servers must have the
same version number. Therefore, to upgrade a consolidated deployment, the indexing software and
all repositories must be upgraded simultaneously.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 53
Upgrade
If you have installed the December 2006 Fulltext Hotfix, the version numbers will be:
• On 5.3 SP2: 5.3.0.229
• On 5.3 SP3: 5.3.0.325
Caution: Do not use this procedure if the deployment is a multinode deployment. Contact
Professional Services (or qualified third-party integrators) for instructions on upgrading a
multinode deployment.
54 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Upgrade
The uninstaller will present a panel that asks if you wish to delete the existing index. The default
is “no”, which preserves the index. Accept the default.
Note: This step asks you to preserve the index so that the FIXML files are not deleted. Removing
the index at this stage removes the FIXML files also. In subsequent steps, you will remove the
actual index entries and regenerate the index using the FIXML files.
6. Manually delete the IndexServer directory.
Note: This step is only necessary if the index server is on a different host than Content Server. If
the index server is on the same host as Content Server, uninstalling the index server will also
remove the IndexServer directory in the following location:
• On Windows: C:\Documentum\fulltext\IndexServer
• On UNIX: $DOCUMENTUM/fulltext/IndexServer
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 55
Upgrade
If you have installed the December 2006 Fulltext hotfix, the version numbers will be:
• On 5.3 SP2: 5.3.0.229
• On 5.3 SP3: 5.3.0325
Caution: Do not use this procedure if the deployment is a multinode deployment. Contact
Professional Services (or qualified third-party integrators) for instructions on upgrading a
multinode deployment.
56 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Upgrade
7. Install the index server and the index agent configuration program.
Follow the instructions in Installing the index server and the index agent configuration program,
page 30 .
8. Shut down and restart Content Server.
Note: The index agent restarts only if the index agent’s application server instance is running.
9. Start the index server.
10. Configure the index agents.
Use the instructions in the Configuring the index agent, page 36.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 57
Upgrade
58 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Chapter 5
Uninstall
Uninstall order
Use a particular order to uninstall Content Server, a repository, the index agent, and the index server.
To uninstall an index agent, the repository its servers need to be running. To uninstall an index server,
the repository needs to be shut down. If the index server is on the Content Server host, additional
issues arise because of shared libraries in the software installations.
To uninstall a multinode configuration, stop all running processes on all hosts.
Uninstall the software components in this order:
1. Shut down and uninstall the index agent.
2. Shut down the repository.
3. Shut down and uninstall the index server.
4. Delete the repository, if required.
5. Uninstall the Content Server software, if required.
6. Uninstall the Index Agent Configuration Program, if required.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 59
Uninstall
Use the instructions in Starting and stopping the index agent, page 79 or in Documentum
Administrator online help, depending on which repository version the index agent is running
against.
3. Start the Index Agent Configuration Program.
• On Windows, click Start > Programs > Documentum > Index Agent Configuration Program.
• On UNIX and Linux, navigate to $DOCUMENTUM_SHARED/IndexAgents and start the
configuration program for your operating system:
— On AIX, IndexAgent_Configuration_Program.aix
— On Solaris, IndexAgent_Configuration_Program.bin
— On HP-UX, IndexAgent_Configuration_Program.hp
— On Linux, IndexAgent_Configuration_Program.linux
A Welcome dialog box is displayed.
4. Click Next.
5. Select Delete index agent and click Next.
6. Read the information and click Next.
The index agent is deleted.
7. To run the configuration program again, check the check box.
8. Click Next.
9. If you checked the check box to run the configuration program again, skip to Step 5; otherwise,
the program exits.
The index agent software and configuration program are still on the host.
60 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Uninstall
6. Click Finish.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 61
Uninstall
62 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Part 3
Administer
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 63
Administer
64 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Chapter 6
Manage a Full-text Index
This chapter contains the information you need to create and verify the accuracy and completeness
of the full-text index. The chapter contains the following topics:
• Managing and administering the index agent and index server, page 65
• Managing the index queue, page 66
• The “State of the Index” job, page 69
• Turning indexing on and off, page 72
• Suspending and resuming indexing, page 73
• Reindexing a repository, page 74
• Troubleshooting indexing timeouts, page 75
• Creating a new index, page 74
• Pointing a repository to a previously created index, page 75
• Troubleshooting indexing timeouts, page 75
If you are installing a high-availability configuration, create and verify the index on both hosts.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 65
Manage a Full-text Index
66 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Manage a Full-text Index
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 67
Manage a Full-text Index
68 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Manage a Full-text Index
A prune method is used to delete multiple object versions from a version tree. The method generates
a destroy event for each version that is deleted. Content Server can only generate these destroy events
for the default full-text indexing queue users. The events cannot be generated for the second indexing
queue user in a high-availability configuration.
This means that when a prune method is used, the index entries for the objects destroyed cannot be
removed from the non-default index. However, false positive hits on the index for such objects are
filtered from query results, and are not returned to end users or applications generating the queries.
Load operations do not generate save events for objects created during the load; special events are
generated instead. The events cannot be queued to the second full-text user.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 69
Manage a Full-text Index
The job does recognize object types chosen for exclusion from indexing through Documentum
Administrator. Objects of those types are not included in the generated report.
Execute the job from Documentum Administrator.
Arguments
The following table lists the arguments for the job.
The default is F.
70 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Manage a Full-text Index
The default is F.
-matchsysobjversion Boolean If set to T, the job performs the version stamp
comparison only for SysObjects.
The default is F.
-matchallversion Boolean If set to T, the job performs the version stamp
comparison for SysObjects and lightweight objects.
The default is F.
In addition, the job is installed with the -queueperson and -windowinterval arguments set. The
-queueperson and -windowinterval arguments are standard arguments for administration jobs and
are explained in the Content Server Administration Guide.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 71
Manage a Full-text Index
This file records all objects in the index and the repository with identical object IDs but
nonmatching i_vstamp values. For each object, it records the object’s object ID, i_vstamp value in
the repository, and i_vstamp value in the index.
• docids-dctm-only.txt
This report contains the object IDs and i_vstamp values of objects in the repository but not in
the index.
• docids-index-only.txt
This report contains the object IDs and i_vstamp values of objects in the index but not in the
repository.
In addition to the report and the four standard result files, the report may generate additional result
files depending on the arguments defined for the job run:
• If -dumpfailedid is set to T, the job generates a file called docids-failedids.txt. This file contains the
object ID of all objects whose content experienced some failure during indexing.
• If -usefilter is set to T, the job generates two additional reports.
— docids-filtered.txt, which records the objects that have been filtered from the repository for
indexing. The objects are identified by object ID and i_vstamp values.
— docids-error-inindex.txt, which records the object ID and i_vstamp values of any objects in
the index that should have been filtered out before indexing.
The report and result files are in %DOCUMENTUM%\dba\log\sessionID\sysadmin
($DOCUMENTUM/dba/log/sessionID/sysadmin).
72 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Manage a Full-text Index
where adminhost is the name of the index server’s administrative host. The port is the index server’s
base port number plus the constant 3099. If the index server is using the default base port, then
the port value is 16099.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 73
Manage a Full-text Index
where adminhost is the name of the index server’s administrative host. The port is the index server’s
base port number plus the constant 3099. If the index server is using the default base port, then
the port value is 16099.
The first integer after ’webcluster’ represents a column, and each node has one column. The
commands to suspend and reset indexing reference the nodes through the column numbers. Column
numbers start with 0 and increment by one for each columns, or node. The second integer must
always be 0.
To determine how many nodes are running, if needed, you can use the dsadmin tool to find all the
configured nodes. You can use grep (UNIX) or findstr (Windows) to limit the returned values to
Indexer modules. For example:
$ dsadmin listmodules | grep Indexer
[2006-12-11T21:47:17Z]: INFO dsadmin RTS Indexer 3.0.98-Release
mice.mylabs.com 15674
[2006-12-11T21:47:17Z]: INFO dsadmin RTS Indexer 3.0.98-Release
cats.mylabs.com 15674
Count the number of returned rows to determine how many indexing nodes are running. This
example return indicates that two nodes are running.
Reindexing a repository
An existing index may become corrupted because of disk failure or corruption, host failure, or the
unexpected shutdown of the index server. When the index is corrupted, you might want to reindex
the repository.
To reindex a repository use an index agent running in migration mode to resubmit all qualifying
repository objects for indexing. Reindexing the repository rebuilds the existing index by replacing
entries in the index. EMC Documentum recommends running the index server in suspended mode if
you reindex. For information on suspended mode, refer to Index server modes, page 112.
74 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Manage a Full-text Index
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 75
Manage a Full-text Index
76 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Chapter 7
Manage Full-text Indexing Components
This chapter contains information on starting and stopping the index agent and index server,
disabling index agents, viewing index server properties, and viewing or modifying index agent
properties. You need to have system administrator or superuser privileges to perform these tasks.
Topics in this chapter are:
• Administration tools, page 77
• Starting and stopping the full-text indexing system, page 78
• Viewing or modifying index agent properties, page 81
• Viewing index server properties, page 82
• Reviewing the log files, page 82
• Administration operations, page 84
• Large file rejection error, page 90
• Increasing capacity, page 91
Administration tools
To administer a Version 6 or later full-text indexing system, use Documentum Administrator.
Depending on the administration task you want to perform, use either the Index Management node
or the resource agent for the index agent and index server.
Note: Using console mode in RemoteDesktop for the administration of the index server is not
supported.
If the repository has a high-availability indexing configuration running, in which multiple, redundant
index agents and index servers are installed and multiple, redundant indexes are maintained, the
paired index agents and index servers are displayed in Documentum Administrator, listed under
the name of the index. The default names are:
• For the first index, repositoryname_ftindex_01
• For the first index server, FAST Fulltext Engine Configuration
• For the first and all subsequent index agents, hostname_IndexAgent1
• For the next index, repositoryname_ftindex_01_ha
• For the next index server, FAST Fulltext Engine Configuration - 2
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 77
Manage Full-text Indexing Components
78 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Manage Full-text Indexing Components
Caution: Stopping the index agent interrupts full-text indexing operations, including updates
to the index and queries to the index. An index agent that is stopped does not pick up index
queue items or process documents for indexing.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 79
Manage Full-text Indexing Components
2. Click Administration > Indexing Management > Index Agents and Index Servers.
3. Select the correct index agent.
4. To stop the index agent, click Tools > Stop.
5. Confirm that you want the index agent stopped.
The index agent’s status changes to stopped.
6. Stop the application server:
• On Windows, stop the application server using its service. The service name is IndexAgentN,
where N is the number of the index agent.
• On UNIX, stop the application server by running the following script:
$DOCUMENTUM_SHARED/jboss4.2.0/domains/DctmDomain/
stopIndexAgentN.sh
where N is the number of the index agent.
Caution: Stopping the index server interrupts full-text indexing operations, including updates
to the index and queries to the index.
80 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Manage Full-text Indexing Components
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 81
Manage Full-text Indexing Components
82 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Manage Full-text Indexing Components
where hostname is the name of the host on which the agent is running and N is number of the
particular index agent. For example, if the index agent is the first index agent on the host named
LondonOne, which is a Windows host, the log file is:
C:\Documentum\jboss4.2.0\domains\DctmDomain\servers\
DctmServer_IndexAgent1_LondonOne\logs\IndexAgent1.log
The numbers disambiguating the index agents are used when there are multiple index agents running
against a single index server. For example, in a consolidated indexing deployment, in which more
than one repository is indexed by a single index server, you might have three index agents running,
with each corresponding to a particular repository.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 83
Manage Full-text Indexing Components
$DOCUMENTUM_SHARED/jboss4.2.0/domains/DctmDomain/servers/
DctmServer_IndexAgentN_hostname
/logs/DctmServer_IndexAgentN_hostname.log
where hostname is the name of the host on which the agent is running and N is number of the index
agent running in that application server instance. For example, if the index agent is the first index
agent on the host named LondonOne, which is a Windows host, the associated application server
log file is:
C:\Documentum\jboss4.2.0\domains\DctmDomain\servers\
DctmServer_IndexAgent1_LondonOne\
logs\DctmServer_IndexAgent1_LondonOne.log
Administration operations
Use the instructions in this section to perform configuration and administrative tasks.
Note: Using console mode in Remote Desktop for the administration of the index server is not
supported.
84 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Manage Full-text Indexing Components
The synonym file defines which words you want to be returned by a particular search. Create a text
file with synonym entries in the following format:
term=[[spelling variations],
[abbreviations],[[synonyms][]]]
The file may have any name. Save the file in the UTF-8 code page.
The brackets in each entry are literal characters that separate spelling variations, abbreviations, and
synonyms. The empty brackets at the end of the synonym entry are required.
The following examples contain only synonyms:
car=[[],[],[[automobiles,autos],[]]]
cars=[[],[],[[automobiles,autos],[]]]
auto=[[],[],[[automobiles,autos],[]]]
autos=[[],[],[[automobiles,autos],[]]]
No limitations exist on the length of an entry or the number of entries in the synonym file.
You can import a synonym file created as described in Creating the synonym file, page 85 or you can
import an existing Verity thesaurus file.
To import the synonym file, connect to the index server host as the index server installation owner
and run the ImportDictionary.py script. The script is executed by Cobra, which is included in the
index server installation. The following conditions need to be met to run ImportDictionary.py:
• The FASTSEARCH environment variable must be set ($FASTSEARCH on UNIX,
%FASTSEARCH% on Windows).
• The script must be run from the $FASTSEARCH/bin directory (UNIX) or %FASTSEARCH%\bin
folder (Windows).
• The synonym file needs to be found in the $FASTSEARCH/bin directory (UNIX) or
%FASTSEARCH%\bin folder (Windows).
The syntax is
cobra ImportDictionary.py [-I filename -M
verity|fast -L language -T transaction_type]
or
cobra ImportDictionary.py [--importfile=filename
--mode=verity|fast --language=
language --transactiontype=transaction_type]
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 85
Manage Full-text Indexing Components
Argument Description
-I filenameor --importfile The name of the synonym file. May be any arbitrary name.
filename
-M verity|fast or Whether an existing Verity thesaurus file or a new index server
--modeverity|fast synonym file is being imported. Allowable values are verity and fast. If
the argument is not included, the default value is fast.
-L or --language The language of the synonym file. Allowable values are the names
of the languages listed in Appendix E, Supported Languages. If the
argument is not included, the default value is English.
-T transaction_type The allowable values are:
or --transactiontype • createnew, which instructs the index server to create a new synonym
transaction_type dictionary using the terms in the file filename.
Logging
86 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Manage Full-text Indexing Components
Log sample
##########################################################
##########################################################
##########################################################
##########################################################
##########################################################
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 87
Manage Full-text Indexing Components
##########################################################
Starting Dictionary documentum_synonym_en import
Creating Dictman command file DictmanCmdFile
##########################################################
##########################################################
##########################################################
##########################################################
##########################################################
##########################################################
88 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Manage Full-text Indexing Components
returned
with error code -5
Error : Launch failed for application C:\Documentum\fulltext\
IndexServer\bin\nctrl
Error : Application C:\Documentum\fulltext\IndexServer\bin\
nctrl returned
with error code -5
When ft_convertible is TRUE, converted_ft_query contains the converted full-text equivalent of the
WHERE clause. If a where clause was not specified in the query’s select statement, converted_ft_query
is empty. The ft_search_type indicates how the query was executed against the index.
When tracing is turned on and non-FTDQL queries are run, the SQL statements created for the
DQL queries are logged to the full-text log file.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 89
Manage Full-text Indexing Components
log4j.category.com.documentum=INFO
where objectID is the object ID of the document containing the rejected content.
90 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Manage Full-text Indexing Components
Increasing capacity
Use the information in this section if you require increased indexing or querying capacity.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 91
Manage Full-text Indexing Components
92 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Chapter 8
Back up and Restore Full-text Indexes
Overview
An index installation backup consists of these required directories and files:
• FIXML files — Represent the content and properties of repository objects. These files are used by
the index server to create the binary index files.
• Binary index files — Represent the final index against which queries are executed.
• Configuration files, including custom files — Contain configuration specifications for a
specific index server installation. For example, index agent and multinode installation profile
configuration files. Some example of options that system-wide configuration files set are:
grammatical normalization, specific rendition formats to index, processing of batched returns for
queries that are not FTDQL, a thesaurus to implement thesaurus searching.
• Status server data files — Contain state and routing information for all documents in the indexing
system.
You can create only a cold, full backup of an index because all of an index’s files are changed
periodically over the course of a day; that is, the static copy of an index is completely replaced by its
dynamic copy. The dynamic copy is constantly being updated (as a result of additions or deletions of
content and objects in the associated repository); whereas the static copy (against which queries run)
remains unchanged until the dynamic copy replaces it.
In order to perform a cold backup, the entire indexing system is shut down. During the backup, no
content is indexed nor are query results returned during the backup. If you cannot perform a cold
backup because your backup window (that is, the time that is available for the indexing system to
be backed up) is smaller than the time is takes for the backup, then consider a high availability
configuration.
For more information about full-text indexing high availability configurations, see the Documentum
System Planning Guide.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 93
Back up and Restore Full-text Indexes
Index backups require approximately three times more storage than the actual index size, because
of the static and dynamic copies of the index.
When your full-text index installation fails, you recover it by restoring it to a previously consistent
state (that is, a particular point in time) from your backups. Restoring to a particular point in time
is also known as a “point-in-time recovery”. If you have partitioned your full-text index storage
areas or have a multinode configuration, then you may only need to recover those storage areas or
nodes that have failed.
If the associated Content Server has also failed, then you will need to coordinate your full-text index
recovery with the Content Server recovery.
Disk status: 11% used - 440.51 GB free - 51.91 GB used - 492.42 GB total
94 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Back up and Restore Full-text Indexes
3. Stop administrator and all indexing and search processes on each node. For example:
• On Windows:
net stop FastDSService
• On Unix and Linux:
cd $DOCUMENTUM/fulltext/IndexServer/bin
./setupenv.sh
./shutdown.sh
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 95
Back up and Restore Full-text Indexes
1. Stop all index agents (if they were not already stopped) on each failed node.
2. Stop administrator and all indexing and search processes (if they were not already stopped)
on each failed node. For example:
• On Windows:
net stop FastDSService
96 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Back up and Restore Full-text Indexes
Disk status: 11% used - 440.51 GB free - 51.91 GB used - 492.42 GB total
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 97
Back up and Restore Full-text Indexes
98 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Chapter 9
Verify Full-text Index Completeness
and Accuracy
Overview
Use the ftintegrity utility to verify the completeness or accuracy or both of an index. Use this tool after
completing a recovery of an index, an index migration, or adding an index to an existing repository.
When run in completeness mode, the utility generates three reports:
• res-comp-common.txt
This file contains the object IDs of all documents that are found in both the index and in the
repository.
• res-comp-dctmonly.txt
This file contains the object IDs of all indexable documents that are found in the repository but
not in the index.
Note: The ftintegrity tool is not aware of object types that you have chosen not to index, through
either a custom filter or the specification of selective indexing in Documentum Administrator. If
you have chosen not to index one or more object types, the object IDs of instances of those types
will still appear in this file.
• res-comp-fastonly.txt
This file contains the object IDs of documents found in the index, but not in the repository.
When running in accuracy mode, you can check a subset of documents or all documents in the index.
For each document the utility performs the following operations:
1. Gets metadata from the repository
2. Chooses a random metadata item and generates a random test for that item.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 99
Verify Full-text Index Completeness and Accuracy
For example, it might generate the query: subject like “%foo”, r_content_szie>4000
3. Runs a test program named “iftdql’ to find that document.
The utility is installed when the index agent is configured. Each index agent instance has an
associated parameter file, called ftintegrity.params.txt. This file identifies the repository on which the
utility operates. It also accepts parameters, which you must complete before running the utility, that
identify the user name and password used to connect to that repository. The utility is found in:
• On Windows
Drive:\Program Files\Documentum\IndexAgents\ftintegrity
The utility is run from the command line. Before running the utility, you must modify its associated
parameter file to add the user name and password. For instructions on modifying the file, refer to
Modifying the parameter file, page 100. For instructions on running the utility, refer to Running
the index verification tool, page 103.
After you run the utility, you may wish to resubmit documents for indexing if the utility reports
missing documents. For resubmission instructions, refer to Resubmitting objects to the index agent,
page 105.
Caution: The instruction below save a Superuser name and password to the file system in a
plain text parameter file. For security reasons, you may wish to remove that information from
the file after running the ftintegrity tool. It is recommended that you save the parameter file in a
location accessible only to the repository Superuser and installation owner.
100 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Verify Full-text Index Completeness and Accuracy
3. Add the following two lines immediately after the first line
-U username
-P password
where username is the user name of the Superuser whose account was used to install the index
agent and password is the Superuser’s password.
4. Save the ftintegrity.params.txt file to %DOCUMENTUM%\fulltext\IndexServer\bin (Windows)
or $DOCUMENTUM/fulltext/IndexServer/bin (UNIX).
Additional parameters
Table 5, page 101 lists the full set of parameters it is possible to provide on the command line or
in the parameter file for ftintegrity.
Parameter Description
-h Prints list of parameters with short descriptions
-i filename Reads the command line parameters from the specified file. In the
file, each parameter must be on a separate line.
-m [b|a|c] Set the mode for the operation. Values are:
• b, meaning perform both accuracy and completeness checks
If the default value was accepted when installing the index server,
this value is 13000.
-f host The fully qualified domain name of the host on which the index
server resides
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 101
Verify Full-text Index Completeness and Accuracy
Parameter Description
-c collection Identifies a collection to be checked. If this is not specified, all
collections in the index are checked.
102 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Verify Full-text Index Completeness and Accuracy
Parameter Description
-Q path Specifies the command line for the iftdql program. This program is
a small standalone tool that allows you to run Content Server-like
queries without the Content Serer to test full-text.
For example:
C:\Program
Files\Documentum\IndexAgents\ftintegrity\iftdql.exe
Note: The ftintegrity tool is not aware of object types that you have chosen not to index, through
either a custom filter or the specification of selective indexing in Documentum Administrator. If
you have chosen not to index one or more object types, the object IDs of instances of those types
will appear in the generated res-comp-dctmonly.txt file.
3. To verify completeness only, open a command prompt and type the following command and
press Enter:
cobra ftintegrity.py -i ftintegrity.params.txt -m c
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 103
Verify Full-text Index Completeness and Accuracy
Note: The ftintegrity tool is not aware of object types that you have chosen not to index, through
either a custom filter or the specification of selective indexing in Documentum Administrator. If
you have chosen not to index one or more object types, the object IDs of instances of those types
will appear in the generated res-comp-dctmonly.txt file.
To verify accuracy only and query all indexed objects, open a command prompt and type the
following command and press Enter:
cobra ftintegrity.py -i ftintegrity.params.txt -m a
4. To verify accuracy only and query a subset of indexed objects, open a command prompt and
type the following command and press Enter:
cobra ftintegrity.py -i ftintegrity.params.txt -m a -d integer
104 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Verify Full-text Index Completeness and Accuracy
Additional samples of the tool’s output are in Appendix B, Sample Output of ftintegrity Utility.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 105
Verify Full-text Index Completeness and Accuracy
will generate queue items for those objects, but they are not indexed and the queue items are deleted
later in the process.
Objects may be resubmitted when the index agent is running in either migration mode or normal
mode. The file of object IDs is designated in the file_name parameter of the indexagent.xml
configuration file for a particular index agent. By default, the file_name parameter is set to the file
name ids.txt. The indexagent.xml file is in
C:\Documentum\bea9.2\domains\DctmDomain\upload\IndexAgentN\IndexAgentN.war\WEB-
INF\classes (on Windows) or $DOCUMENTUM_SHARED/bea9.2/domains/DctmDomain/upload/
IndexAgentN/IndexAgentN.war/WEB-INF/classes (on UNIX), when N is the number of the index
agent.
The index agent periodically checks for the existence of ids.txt. If the file is found, the objects are
resubmitted for indexing.
The file designated in the file_name parameter can have any arbitrary name. For example, the output
file re-comp-dctmonly.txt might be renamed to resubmit.txt or to the default ids.txt. The name
may be a simple name (for example, ids.txt) or a fully-qualified name (for example, C:\Program
Files\Documentum\MyFiles\ids.txt). If the file has a simple name, the index agent tries to resolve
the name by following the CLASSPATH. If you change the name from the default name, you must
modify the indexagent.xml file and substitute the name you assign the file in the file_name parameter.
A particular file must be made available to one index agent only. If multiple index agents are
installed on a host, at installation time the indexagent.xml files for all index agents name the ids.txt
file in the file_name parameter.
For example, if the file is located in C:\Program Files\Documentum\bea9.
2\domains\DctmDomain\upload\IndexAgent1\IndexAgent1.war\WEB-INF\classes, only that
instance will find the file.
When the index agent has read the entire file and submitted all objects for indexing, the input file
is renamed to original_filename.done. For example, if the input file is ids.txt, when all objects are
submitted, it is renamed ids.txt.done.
106 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Verify Full-text Index Completeness and Accuracy
5. If you rename res-comp-dctmonly.txt to any name other than ids.txt, you must modify the
file_name parameter in the indexagent.xml file for the particular index agent to reflect the new
name.
The objects are automatically resubmitted for indexing.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 107
Verify Full-text Index Completeness and Accuracy
108 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Part 4
Reference
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 109
Reference
110 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Appendix A
Components
Index server
The index server has two functions: it creates full-text indexes and responds to full-text queries
from Content Server.
An index server node is any physical host on which an index server instance runs, regardless of
whether multiple instances of the index server’s individual software process are running.
The index server is represented in the repository by the ft engine config object (dm_ftengine_config).
There is one ft engine config object for each fulltext index object.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 111
Components
Document processors are the largest consumer of CPU power in the index server. In multinode
configurations, a document processor communicates with the indexers on all other hosts in the
configuration.
• Indexer
The indexer creates the searchable full-text index from the intermediate FIXML format. It consists
of two processes. The frtsobj process interfaces with the document processor and spawns different
findex processes as necessary to build the index from FIXML.
• Query and results servers
The QR server is a permanently running process that accepts queries from Content Server,
passes queries to the fsearch processes, and merges the results when there are multiple fsearch
processes running. In a multinode configuration, the QR Server is installed on the node where
administrative processes are running.
• Search servers
Search servers locate items in the index as specified in a query. The fsearchctrl, fdispatch, and
fsearch processes work together to fulfil requests from the qrserver process.
In a single-node deployment, all of these processes reside on one host.
In a multinode deployment, copies of the document processors, the Indexer, and the search servers
reside on each node. The administrative processes and the query and results servers reside on only
one of the nodes.
Index agent
The index agent is a multithreaded Java application running in the application server servlet
container. Each index agent runs in its own application server instance, and each index agent is
associated with only one repository.
Install the index agent and the index server on the same host.
112 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Components
Certain operations on SysObjects in the repository generate queue items indicating that the object and
any associated content files must be submitted for full-text indexing. When the operation occurs,
Content Server generates an event that is stored as a queue item for the dm_fulltext_user user. The
index agent reads the queue items, then exports the content files and metadata from the repository
and prepares the documents and metadata for indexing by converting them to DFTXML.
The events that generate a queue item for the dm_fulltext_user user are:
• dm_checkin
• dm_destroy
• dm_move_content
This occurs when a content is moved to another storage area using MIGRATE_CONTENT.
• dm_readonlysave
This event occurs when Content Server sets a property of an immutable object.
• dm_save
An index agent in normal mode is represented by an ft index agent config object. The properties of
the ft index agent config object primarily record status information about the index agent, including
the mode in which the index agent is running and when the index agent began processing queue
items. The properties also record configuration information about the index agent, such as the
number of queue items processed in a single batch, the number of exporter threads, and the time
interval at which the index agent polls the repository for queue items. This information may be
viewed using Documentum Administrator. For more information about the ft index agent config
object, refer to the Documentum Object Reference Manual.
An index agent in migration mode is represented by an XML configuration file, IndexAgent.xml, on
the index agent host. Do not modify the parameters in the configuration file unless you are enabling
file store mapping. Mapping file stores for improved indexing performance is documented in the
Documentum System Planning Guide.
Index agent modes, page 114, contains complete information on the modes in which the index agent
runs.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 113
Components
Normal mode
Content Servers 5.3 and later generate a queue item when an event such as a check-in or save
requires that a new or modified object must be indexed. In normal mode, the index agent reads
the queue item, prepares the object for indexing, and updates the queue item. When the index
agent successfully submits the object for indexing, the index agent deletes the queue item from the
repository. If the object is not submitted successfully, the queue item remains in the repository and
114 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Components
the error or warning generated by the attempt to index the object is stored in the queue item. The
index agent can run in normal mode only against a 5.3 or later repository.
An index agent in normal mode and an index agent in migration mode cannot simultaneously
update the same index.
Migration mode
In migration mode, the index agent prepares all indexable objects in a repository for indexing
in object ID order. The high-water mark records the ID of the most recent object indexed. The
index agent reads the value in the queue item, exports the next batch of indexable objects from
the repository, and updates the queue item. The index agent can run in migration mode to create
new indexes against a 5.2.5.xor 5.3.x repository.
For example, if you need to move the indexing component installation to a new host, you can install
the components on the new host and create a new index there in migration mode, while the existing
indexing component installation maintains the existing index in normal mode.
In migration mode, the index agent prepares all indexable objects for indexing in object ID order. A
single queue item records the ID of the most recent object indexed.
File mode
File mode is used for submitting a list of object IDs to the index agent for indexing.
File mode can be used when the index agent is in any operational mode. However, it is most useful
immediately after a new index is created. After a new index is created, the ftintegrity tool is run to
determine whether all indexable objects in the repository are included in the index. The ftintegrity
tool produces a list of object IDs of SysObjects that are not included in the index. The list is used to
submit the objects to the index agent for indexing.
Full-text index
An index is made up of nodes. Depending on your deployment configuration, an index may have a
single node or multiple nodes.
Each node has one column. A column is a physical set of processes and partitions that contain a
portion of the indexed data. Each column has three partitions and the processes that search those
partitions.
The data in each column is unique. That is, a document’s indexed data resides on only one node. A
column may contain data from one repository or multiple repositories.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 115
Components
Partitions
A partition is the smallest physical grouping of documents within a column. By default, each column
has three partitions: a small partition, a medium partition, and a large partition. The documents most
recently submitted for indexing are initially indexed into the small partition. The index server moves
index data from the small partition to the medium partition, and finally to the large partition. The
index data for all documents eventually is in the large partition.
While new documents are being added to a partition, the partition is offline and in the process of
being rebuilt. New documents are not added to the index while the partition is offline. A copy of the
partition remains online and searchable. When the offline partition is completely rebuilt, it becomes
online and searchable, and the online copy is taken offline and newly-processed FIXML is indexed.
Collections
All documents in an index belong to a collection, a logical set of data. The full-text indexing engine
uses collections to segregate data logically. If the index stores data from only one repository in a
single-node deployment, the data is typically in one collection.
In consolidated deployments, in which multiple repositories are indexed in the same index, the
indexed data from each repository is associated with a separate collection.
In a multinode deployment, a particular collection might be spread out over multiple nodes or it
might be stored on only one node. The default behavior in a multinode deployment is to store
the collections across nodes. However, you can configure the system to use directed routing. This
behavior requires you to store content files in particular storage areas, which are mapped to particular
collections, and in turn, those collections are mapped to particular nodes. Which content files go to
which storage area is a business decision.
Directed routing
Directed routing refers to the ability to direct index data to particular nodes. Using a multinode
deployment with directed routing is recommended if you need to separately back up and restore
individual collections or if you have a very large repository (in this release, 20 million or more
documents).
Directed routing is implemented by mapping each file store storage area to a particular collection,
and then mapping each collection to a particular column. Any arbitrary number of storage areas may
be mapped to a particular collection. The mapping may be modified after installation to add more
collections, though the index agent must be restarted after the mapping is modified.
Note: The index agent restarts only if the index agent’s application server instance is running.
Use directed routing only in the following circumstances:
• If large amounts of the data will be unchanging (never updated) and you want to reduce the
amount of data to index.
• If the full-text indexing system is growing and you need to incrementally add hardware to the
system with little or no need to reorganize the indexed data.
116 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Components
Directed routing allows you to fill a node and move on to newly added nodes. If you were
not using directed routing in these situations, the system would perform additional work to
redistribute the data across all of the new nodes.
Configuring directed routing is a manual process requiring EMC Documentum Professional Services
(or qualified third-party integrators).
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 117
Components
Refer to Install a high-availability deployment, page 28, for high-level information on installing a
high-availability configuration and on the objects that must be created and attribute values updated
after installation.
118 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Components
Location objects
Each fulltext index object points to a location object that represents the location of the dmfulltext.ini
file.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 119
Components
The a_full_text property is defined for the SysObject type and is inherited by all SysObject subtypes.
It is a Boolean property that controls whether an object’s associated content files are indexed.
When a_full_text is TRUE, content files are indexed whenever a Save, Checkin, Destroy,
Readonlysave, or MoveContent operation generates an index queue item for the object. Any changes
to the object’s content are added to the index.
The a_full_text property is set to TRUE whenever a SysObject is created. Users with system
administrator or superuser privileges can change the a_full_text setting.
The value of the fulltext_location property of the server config object contains the name of the location
object that identifies the full-text configuration file, dmfulltext.ini.
Initialization files
This section describes the entries in the server initialization file that support full-text indexing and the
dedicated full-text initialization file.
120 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Components
The dmfulltext.ini file is created when the server is installed. It contains information used by Content
Server to find the index agent plug-in.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 121
Components
122 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Appendix B
Sample Output of ftintegrity Utility
The following report is the output of a failed completeness and accuracy run of the ftintegrity utility:
29 C% cobra ftintegrity.py -i ftintegrity.params.txt -m b
Mon Feb 07 15:48:37 2005 [PROGRESS]: Completeness Test Started
Mon Feb 07 15:48:38 2005 [PROGRESS]: Fetched all 11 docids from
Documentum
Mon Feb 07 15:48:38 2005 [WARNING ]: Suspiciously low document
count. Following
errors could be due to invalid result from the command below.
Mon Feb 07 15:48:38 2005 [WARNING ]: Please run it to validate
correctness
Mon Feb 07 15:48:38 2005 [WARNING ]: C:\PROGRA~1\DOCUME~1
\java\1449C1~1.2_0\bin\
java -classpath C:\PROGRA~1\DOCUME~1\INDEXA~1\INDEXA~1\webapps\
INDEXA~1\WEB-INF\
lib\server-impl.jar;C:\PROGRA~1\DOCUME~1\dctm.jar;C:\Documentum\
config com.docum
entum.server.impl.fulltext.ftintegrity.FTDumpIDs -m:0
-D:waasdfasdfasfasfasdfasf
-U:wongw -P:groucho2
Mon Feb 07 15:48:38 2005 [INFO ]: Get Documentum IDs: 0.988 s
Mon Feb 07 15:48:38 2005 [DEBUG ]: Got 11 docids from Dctm
Mon Feb 07 15:48:38 2005 [INFO ]: Get Number of Documents in
Collection: 0.28
1 s
Mon Feb 07 15:48:38 2005 [INFO ]: Number of documents in
collection documentu
m: 1351
Mon Feb 07 15:48:39 2005 [PROGRESS]: Fetched all 1351 docids from FAST
Mon Feb 07 15:48:39 2005 [INFO ]: Get FAST IDs: 0.780 s
Mon Feb 07 15:48:39 2005 [DEBUG ]: Got 1351 docids from fast
Mon Feb 07 15:48:39 2005 [INFO ]: Transform FAST ID file: 0.084 s
Mon Feb 07 15:48:39 2005 [INFO ]: Transform Documentum ID file:
0.026 s
Mon Feb 07 15:48:39 2005 [INFO ]: Compare IDs: 0.064 s
Mon Feb 07 15:48:39 2005 [PROGRESS]: Completeness Test Finished
Mon Feb 07 15:48:39 2005 [PROGRESS]: Accuracy Test Started
Mon Feb 07 15:48:39 2005 [DEBUG ]: Getting Dctm Ids
from res-comp-common.txt
Mon Feb 07 15:48:39 2005 [FATAL ]: Failed to read any docids from
Documentum.
The command below failed.
type res-comp-common.txt
Terminated.
The following report is the output of a successful completeness and accuracy run of the ftintegrity tool:
30 C% cobra ftintegrity.py -i ftintegrity.params.txt -m b
Mon Feb 07 15:50:29 2005 [PROGRESS]: Completeness Test Started
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 123
Sample Output of ftintegrity Utility
Mon Feb 07 15:50:35 2005 [PROGRESS]: Fetched all 1351 docids from
Documentum
Mon Feb 07 15:50:35 2005 [INFO ]: Get Documentum IDs: 5.223 s
Mon Feb 07 15:50:35 2005 [DEBUG ]: Got 1351 docids from Dctm
Mon Feb 07 15:50:35 2005 [INFO ]: Get Number of Documents in
Collection: 0.21
9 s
Mon Feb 07 15:50:35 2005 [INFO ]: Number of documents in
collection documentu
m: 1351
Mon Feb 07 15:50:36 2005 [PROGRESS]: Fetched all 1351 docids from FAST
Mon Feb 07 15:50:36 2005 [INFO ]: Get FAST IDs: 0.819 s
Mon Feb 07 15:50:36 2005 [DEBUG ]: Got 1351 docids from fast
Mon Feb 07 15:50:36 2005 [INFO ]: Transform FAST ID file: 0.114 s
Mon Feb 07 15:50:36 2005 [INFO ]: Transform Documentum ID file:
0.099 s
Mon Feb 07 15:50:36 2005 [INFO ]: Compare IDs: 0.136 s
Mon Feb 07 15:50:36 2005 [PROGRESS]: Completeness Test Finished
Mon Feb 07 15:50:36 2005 [PROGRESS]: Accuracy Test Started
Mon Feb 07 15:50:36 2005 [DEBUG ]: Getting Dctm Ids from
res-comp-common.txt
Mon Feb 07 15:50:36 2005 [PROGRESS]: Fetched all 1351 docids
from Documentum
Mon Feb 07 15:50:36 2005 [PROGRESS]: Starting test of 0 documents
Mon Feb 07 15:50:36 2005 [PROGRESS]: Launching iftdql: C:\PROGRA~1\
DOCUME~1\INDE
XA~1\FTINTE~1\iftdql C:\PROGRA~1\DOCUME~1\INDEXA~1\FTINTE~1
\FASTQueryPlugin.dll
124 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Sample Output of ftintegrity Utility
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 125
Sample Output of ftintegrity Utility
126 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Appendix C
Indexed and Non-Indexed Characters
Indexed characters
The following Unicode characters are indexed and are searchable:
• Alphabetic characters
• Numeric characters
• Extender characters
Extender characters extend the value or shape of a preceding alphabetic character. These are
typically length and iteration marks.
• Custom characters enclosing Chinese, Japanese, and Korean letters and months
These are derived from a number of custom character ranges that have bidirectional properties,
falling in the 3200-32FF range. The specific character ranges are:
— 3200-3243
— 3260-327B
— 327F-32B0
— 32C0-32CB
— 32D0-32FE
For a detailed list of UTF-8 special characters see http://doc.infosnel.nl/extreme_utf-8.html.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 127
Indexed and Non-Indexed Characters
For example, when the email address MyName@company.com is indexed, it appears as “MyName
company com” in the index. The text is treated as three words. Documents returned by a search for
MyName@company.com are treated as if they contain the words “MyName company com”.
Note: Because the index treats that email address as three words, the document containing
MyName@company.com would also be returned if the user conducted a phrase search on “MyName
company com”.
128 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Appendix D
Supported and Unsupported Formats
The formats listed in Table 6, page 129 are supported for full-text indexing.
Note: Arabic and Hebrew — For non-binary formats, only logical text representation is supported;
visual text representation is not supported. For those binary formats listed below that support
right-to-left text in the native format, support is provided for indexing Hebrew and Arabic text, with
the exception that PDF files cannot be indexed.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 129
Supported and Unsupported Formats
130 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Supported and Unsupported Formats
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 131
Supported and Unsupported Formats
132 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Supported and Unsupported Formats
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 133
Supported and Unsupported Formats
134 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Supported and Unsupported Formats
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 135
Supported and Unsupported Formats
136 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Appendix E
Supported Languages
Danish da Norwegian_Nynorsk nn
1
2. There is no dictionary for Chinese because that language is not an inflected language and
therefore grammatical normalization is not an operation that can be performed on Chinese text.
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 137
Supported Languages
2. There is no dictionary for Chinese because that language is not an inflected language and
therefore grammatical normalization is not an operation that can be performed on Chinese text.
138 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Index
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 139
Index
140 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Index
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 141
Index
142 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide
Index
EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide 143
Index
V W
verification of completeness and Windows host requirement, 22
accuracy, 99
viewing
index server properties, 82
144 EMC Documentum Content Server Version 6.5 SP2 Full-text Deployment and Administration Guide