Daniel Marcu

2017

pdf bib abs
Biomedical Event Extraction using Abstract Meaning Representation
Sudha Rao | Daniel Marcu | Kevin Knight | Hal Daumé III
BioNLP 2017

We propose a novel, Abstract Meaning Representation (AMR) based approach to identifying molecular events/interactions in biomedical text. Our key contributions are: (1) an empirical validation of our hypothesis that an event is a subgraph of the AMR graph, (2) a neural network-based model that identifies such an event subgraph given an AMR, and (3) a distant supervision based approach to gather additional training data. We evaluate our approach on the 2013 Genia Event Extraction dataset and show promising results.

2016

pdf bib
Name Tagging for Low-resource Incident Languages based on Expectation-driven Learning
Boliang Zhang | Xiaoman Pan | Tianlu Wang | Ashish Vaswani | Heng Ji | Kevin Knight | Daniel Marcu
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Natural Language Communication with Robots
Yonatan Bisk | Deniz Yuret | Daniel Marcu
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Unsupervised Neural Hidden Markov Models
Ke M. Tran | Yonatan Bisk | Ashish Vaswani | Daniel Marcu | Kevin Knight
Proceedings of the Workshop on Structured Prediction for NLP

pdf bib abs
Extracting Structured Scholarly Information from the Machine Translation Literature
Eunsol Choi | Matic Horvat | Jonathan May | Kevin Knight | Daniel Marcu
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Understanding the experimental results of a scientific paper is crucial to understanding its contribution and to comparing it with related work. We introduce a structured, queryable representation for experimental results and a baseline system that automatically populates this representation. The representation can answer compositional questions such as: “Which are the best published results reported on the NIST 09 Chinese to English dataset?” and “What are the most important methods for speeding up phrase-based decoding?” Answering such questions usually involves lengthy literature surveys. Current machine reading for academic papers does not usually consider the actual experiments, but mostly focuses on understanding abstracts. We describe annotation work to create an initial hscientific paper; experimental results representationi corpus. The corpus is composed of 67 papers which were manually annotated with a structured representation of experimental results by domain experts. Additionally, we present a baseline algorithm that characterizes the difficulty of the inference task.

2015

pdf bib
Parsing English into Abstract Meaning Representation Using Syntax-Based Machine Translation
Michael Pust | Ulf Hermjakob | Kevin Knight | Daniel Marcu | Jonathan May
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2012

pdf bib
HyTER: Meaning-Equivalent Semantics for Translation Evaluation
Markus Dreyer | Daniel Marcu
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Automatic Parallel Fragment Extraction from Noisy Data
Jason Riesa | Daniel Marcu
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

bib abs
A New Method for Automatic Translation Scoring-HyTER
Daniel Marcu
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Government MT User Program

It is common knowledge that translation is an ambiguous, 1-to-n mapping process, but to date, our community has produced no empirical estimates of this ambiguity. We have developed an annotation tool that enables us to create representations that compactly encode an exponential number of correct translations for a sentence. Our findings show that naturally occurring sentences have billions of translations. Having access to such large sets of meaning-equivalent translations enables us to develop a new metric, HyTER, for translation accuracy. We show that our metric provides better estimates of machine and human translation accuracy than alternative evaluation metrics using data from the most recent Open MT NIST evaluation and we discuss how HyTER representations can be used to inform a data-driven inquiry into natural language semantics.

2011

pdf bib
Meaning-equivalent semantics forunderstanding, generation, translation, and evaluation
Daniel Marcu
Proceedings of the 8th International Workshop on Spoken Language Translation: Keynotes

pdf bib
Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation
Jason Riesa | Ann Irvine | Daniel Marcu
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib abs
Creating Value at the Boundary Between Humans and Machines
Daniel Marcu
Proceedings of the Second Joint EM+/CNGL Workshop: Bringing MT to the User: Research on Integrating MT in the Translation Industry

For a long time, machine translation and professional translation vendors have had a contentious relation. However, new tools, computing platforms, and business models are changing the fundamentals of this relationship. I will review the main trends in the area while emphasizing both past causes of failure and main drivers of success.

pdf bib
Hierarchical Search for Word Alignment
Jason Riesa | Daniel Marcu
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

bib
Trusted Translations Deliver Compelling Results for the Travel Industry
Daniel Marcu
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Commercial MT User Program

bib abs
Utilizing Automated Translation with Quality Scores to Increase Productivity
Daniel Marcu | Kathleen Egan | Chuck Simmons | Ning-Ning Mahlmann
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Government MT User Program

Automated translation can assist with a variety of translation needs in government, from speeding up access to information for intelligence work to helping human translators increase their productivity. However, government entities need to have a mechanism in place so that they know whether or not they can trust the output from automated translation solutions. In this presentation, Language Weaver will present a new capability "TrustScore": an automated scoring algorithm that communicates how good the automated translation is, using a meaningful metric. With this capability, each translation is automatically assigned a score from 1 to 5 in the TrustScore. A score of 1 would indicate that the translation is unintelligible; a score of 3 would indicate that meaning has been conveyed and that the translated content is actionable. A score approaching 4 or higher would indicate that meaning and nuance have been carried through. This automatic prediction of quality has been validated by testing done across significant numbers of data points in different companies and on different types of content. After outlining TrustScore, and how it works, Language Weaver will discuss how a scoring mechanism like TrustScore could be used in a translation productivity workflow in government to assist linguists with day to day translation work. This would enable them to further benefit from their investments in automated translation software. Language Weaver would also share how TrustScore is used in commercial deployments to cost effectively publish information in near real time.

pdf bib
Re-structuring, Re-labeling, and Re-aligning for Syntax-Based Machine Translation
Wei Wang | Jonathan May | Kevin Knight | Daniel Marcu
Computational Linguistics, Volume 36, Number 2, June 2010

We introduce a new generation of commercial translation software, based primarily on statistical learning and statistical language models.

2002

pdf bib
A Phrase-Based,Joint Probability Model for Statistical Machine Translation
Daniel Marcu | Daniel Wong
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

pdf bib
Processing Comparable Corpora With Bilingual Suffix Trees
Dragos Stefan Munteanu | Daniel Marcu
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

pdf bib
The Importance of Lexicalized Syntax Models for Natural Language Generation Tasks
Hal Daume III | Kevin Knight | Irene Langkilde-Geary | Daniel Marcu | Kenji Yamada
Proceedings of the International Natural Language Generation Conference

pdf bib
An Unsupervised Approach to Recognizing Discourse Relations
Daniel Marcu | Abdessamad Echihabi
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

pdf bib
A Noisy-Channel Model for Document Compression
Hal Daume III | Daniel Marcu
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

pdf bib abs
Using a large monolingual corpus to improve translation accuracy
Radu Soricut | Kevin Knight | Daniel Marcu
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers

The existence of a phrase in a large monolingual corpus is very useful information, and so is its frequency. We introduce an alternative approach to automatic translation of phrases/sentences that operationalizes this observation. We use a statistical machine translation system to produce alternative translations and a large monolingual corpus to (re)rank these translations. Our results show that this combination yields better translations, especially when translating out-of-domain phrases/sentences. Our approach can be also used to automatically construct parallel corpora from monolingual resources.

pdf bib abs
Translation by the numbers: Language Weaver
Bryce Benjamin | Kevin Knight | Daniel Marcu
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: System Descriptions

Pre-market prototype - to be available commercially in the second or third quarter of 2003.