General Text Matcher (GTM)
written by:
Ryan Green
Joseph P. Turian
I. Dan Melamed
Luke Shen
Ali Argyle
Ben Wellington
Daniel Galron
The latest version of this software is version 1.4
To be notified of upgrades, please sign up for the
GTM-announce
mailing list via the web-form at
http://www.cs.nyu.edu/mailman/listinfo/gtm-announce.
It's moderated and very low-volume.
Introduction
GTM measures the similarity between texts. To read more about using
this software to automatically evaluate machine translation, please
read Proteus technical report #03-005, Evaluation of Machine
Translation and its Evaluation, available here. To learn more about our
work, please visit the Proteus
Project homepage.
GTM is written in Java, and is open source, released under the
BSD license (available in the LICENSE file).
The Proteus Project is supported by grants and contracts from the National Science Foundation (NSF) and the
Defense Advanced Research Projects
Agency (DARPA), as well as by gifts from Sun Microsystems.
Demo
A simple applet demonstrating the core functionality of GTM is
available at http://nlp.cs.nyu.edu/call_gtm.html.
Documentation
All documentation is in the README file,
which is also included in the distribution.
Downloads
1.4 (January 2008)
- fixed bug that occurs when processing blank lines
- added an option to use a file to indicate lines to compare
- added options for dumping hits and/or runs
1.3
- Changed text option to interpret newlines as
segment separators, and number the segments
consecutively. Previously, all lines in a file
were glommed into one segment.
- Made the SGML reader case insensitive
The integrity of the following files can be verified by checking the
message digests. The files are signed with Ali's key.
- Source TGZ: gtm-1.3-source.tgz (33 KB)
- MD5 (gtm-1.3-source.tgz) = c040548cff1655560d2994b1e67aa4df
- SHA1 (gtm-1.3-source.tgz) = 20815ce3e95e1b2dc531eaecde92325a71564b45
- Signature: gtm-1.3-source.tgz.asc (Signed with Ali's key)
- Source ZIP: gtm-1.3-source.zip (65 KB)
- MD5 (gtm-1.3-source.zip) = e207d0837767d873f9b865f2d7389706
- SHA1 (gtm-1.3-source.zip) = 0f20303f2623e42d450af54b0453b52f3fec2932
- Signature: gtm-1.3-source.zip.asc (Signed with Ali's key)
- Binary TGZ: gtm-1.3-binary.tgz (34 KB)
- MD5 (gtm-1.3-binary.tgz) = 8a9f41c7f8e0d373425dd226d01ff25d
- SHA1 (gtm-1.3-binary.tgz) = 62ece243241da4e0fdb34a1857eb64d662dad089
- Signature: gtm-1.3-binary.tgz.asc (Signed with Ali's key)
- Binary ZIP: gtm-1.3-binary.zip (64 KB)
- MD5 (gtm-1.3-binary.zip) = 4cd6ceb967442ba1bb4dd1da5be6143e
- SHA1 (gtm-1.3-binary.zip) = 863d9943da9ca17b534e3c651ffd19107148b551
- Signature: gtm-1.3-binary.zip.asc (Signed with Ali's key)
1.2:
- Fixed mispackaged jar in the binary package
1.1:
1.0:
- Default exponent is now 1.0.
- Tie-breaking is now deterministic.
- Re-enabled the sanity check suite.
- Removed legacy GPL library.
- Pointer to TR 03-005.
- Samples no longer include non-standard ASCII text.
- Added documentation note about non-standard ASCII.
0.3:
- Documentation updates.
- Temporarily disabled the sanity check suite.
0.2:
- Verbose output can be enabled using "-v".
- Sanity check suite included.
0.1:
- Initial release.
- Scoring of segment and document matches.
Feedback
Questions? Comments? Suggestions?
Bugs? Code patches? Feature requests?
This is beta software and we welcome all feedback.
Please email feedback to Joseph Turian and I. Dan Melamed.
Our email addresses are [lastname] 'at' cs 'dot' nyu 'dot' edu
index.html,v 1.22 2005/11/07 20:48:40 melamed Exp