Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1645953.1645977acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections

A translation model for matching reviews to objects

Published: 02 November 2009 Publication History


We develop a generic method for the review matching problem, which is to match unstructured text reviews to a list of objects, where each object has a set of attributes. To this end, we propose a translation model for generating reviews from a structured description of objects. We develop an EM-based method to estimate the model parameters and use this model to find, given a review, the object most likely to be the topic of the review. We conduct extensive experiments on two large-scale datasets: a collection of restaurant reviews from Yelp and a collection of movie reviews from IMDb. The experiments show that our translation model-based method is superior to traditional tf-idf based methods as well as a recent mixture model-based method for the review matching problem.


R. Ananthakrishna, S. Chaudhuri, and V. Ganti. Eliminating fuzzy duplicates in data warehouses. In Proc. 28th VLDB, pages 586--596, 2002.
Anonymized. Anonymized. In Anonymized, 2009, to appear.
L. Barbosa, R. Kumar, B. Pang, and A. Tomkins. For a few dollars less: Identifying review pages sans human labels. In Proc. NAACL, 2009.
A. Berger and J. Lafferty. Information retrieval as statistical translation. In Proc. 22nd SIGIR, pages 222--229, 1999.
I. Bhattacharya and L. Getoor. Collective entity resolution in relational data. ACM TKDD, 1(1), 2007.
P. Brown, J. Cocke, S. D. Pietra, V. D. Pietra, F. Jelinek, J. Lafferty, R. Mercer, and P. Roosin. A statistical approach to machine translation. Computational Linguistics, 16(2):79--85, 1990.
P. Brown, S. D. Pietra, V. D. Pietra, and R. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263--311, 1993.
C. Cardie. Empirical methods in information extraction. AI Magazine, 18(4):65--80, 1997.
V. T. Chakaravarthy, H. Gupta, P. Roy, and M. Mohania. Efficiently linking text documents with relevant structured information. In Proc. 32nd VLDB, pages 667--678, 2006.
S. Chen and J. Goodman. An empirical study of smoothing technique for language modeling. Technical Report TR-10-98, Harvard University, 1998.
I. P. Fellegi and A. B. Sunter. A theory for record linkage. JASIS, 64:1183--1210, 1969.
T. Hofmann. Probabilistic latent semantic indexing. In Proc. 22nd SIGIR, pages 50--57, 1999.
M. Hu and B. Liu. Mining opinion features in customer reviews. In Proc. AAAI, pages 755--760, 2004.
D. V. Kalashnikov, S. Mehrotra, and Z. Chen. Exploiting relationships for domain-independent data cleaning. In Proc. 5th SDM, 2005.
N. Kobayashi, K. Inui, Y. Matsumoto, K. Tateishi, and T. Fukushima. Collecting evaluative expressions for opinion extraction. In Proc. 1st IJCNLP, pages 596--605, 2004.
N. Kushmerick, D. S. Weld, and R. B. Doorenbos. Wrapper induction for information extraction. In Proc. 15th IJCAI, pages 729--737, 1997.
J. Lafferty and C. Zhai. Probabilistic relevance models based on document and query generation. In W. B. Croft and J. Lafferty, editors, Language Modeling and Information Retrieval. Academic Publishers, 2003.
A. McCallum and B. Wellner. Conditional models of identity uncertainty with application to noun coreference. In Proc. 17th NIPS, 2004.
H. B. Newcombe, J. M. Kennedy, S. J. Axford, and A. P. James. Automatic linkage of vital records. Science, 130:954--959, 1959.
V. Ng, S. Dasgupta, and S. M. N. Arifin. Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. In Proc. 21st COLING/44th ACL, pages 611--618, 2006.
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proc. 21st SIGIR, pages 275--281, 1998.
A.-M. Popescu and O. Etzioni. Extracting product features and opinions from reviews. In Proc. HLT/EMNLP, 2005.
S. Sarawagi. Information extraction. Foundations and Trends in Databases, 1(3):261--377, 2008.
F. Song and W. B. Croft. A general language model for information retrieval. In Proc. 22nd SIGIR, pages 279--280, 1999.
V. Stoyanov and C. Cardie. Topic identification for fine-grained opinion analysis. In Proc. COLING, 2008.
J. Yi, T. Nasukawa, R. Bunescu, and W. Niblack. Sentiment analyzer: Extrating sentiments about a given topic. In Proc. 3rd ICDM, pages 427--434, 2003.
C. Zhai. Statistical language models for information retrieval: A critical review. Foundations and Trends in Information Retrieval, 2(3):137--213, 2008.
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM TOIS, 22(2):179--214, 2004.

Cited By

View all
  • (2020)End-to-End Neural Matching for Semantic Location Prediction of TweetsACM Transactions on Information Systems10.1145/341514939:1(1-35)Online publication date: 5-Sep-2020
  • (2019)Towards Spatial Word EmbeddingsAdvances in Information Retrieval10.1007/978-3-030-15719-7_7(53-61)Online publication date: 7-Apr-2019
  • (2018)Review on Recent Advances in Information Mining From Big Consumer Opinion Data for Product DesignJournal of Computing and Information Science in Engineering10.1115/1.404108719:1Online publication date: 17-Sep-2018
  • Show More Cited By

Index Terms

  1. A translation model for matching reviews to objects



    Information & Contributors


    Published In

    cover image ACM Conferences
    CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management
    November 2009
    2162 pages
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]



    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 November 2009


    Request permissions for this article.

    Check for updates

    Author Tags

    1. language model
    2. review matching
    3. translation model


    • Research-article


    CIKM '09

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 23 Dec 2024

    Other Metrics


    Cited By

    View all
    • (2020)End-to-End Neural Matching for Semantic Location Prediction of TweetsACM Transactions on Information Systems10.1145/341514939:1(1-35)Online publication date: 5-Sep-2020
    • (2019)Towards Spatial Word EmbeddingsAdvances in Information Retrieval10.1007/978-3-030-15719-7_7(53-61)Online publication date: 7-Apr-2019
    • (2018)Review on Recent Advances in Information Mining From Big Consumer Opinion Data for Product DesignJournal of Computing and Information Science in Engineering10.1115/1.404108719:1Online publication date: 17-Sep-2018
    • (2018)Matching Descriptions to Spatial Entities Using a Siamese Hierarchical Attention NetworkIEEE Access10.1109/ACCESS.2018.28376666(28064-28072)Online publication date: 2018
    • (2016)Towards the Effective Linking of Social Media Contents to Products in E-Commerce CatalogsProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983747(1049-1058)Online publication date: 24-Oct-2016
    • (2015)Matching Reviews to Object Based on 2-Stage CRFWeb Technologies and Applications10.1007/978-3-319-25255-1_8(92-103)Online publication date: 13-Nov-2015
    • (2013)Matching Reviews to Database Objects Based on Labeled Latent Dirichlet Allocation ModelProceedings of the 2013 10th Web Information System and Application Conference10.1109/WISA.2013.18(48-51)Online publication date: 10-Nov-2013
    • (2012)Associating structured records to text documentsProceedings of the 21st International Conference on World Wide Web10.1145/2187980.2188072(451-452)Online publication date: 16-Apr-2012
    • (2012)Object matching in tweets with spatial modelsProceedings of the fifth ACM international conference on Web search and data mining10.1145/2124295.2124303(43-52)Online publication date: 8-Feb-2012
    • (2011)A simple word trigger method for social tag suggestionProceedings of the Conference on Empirical Methods in Natural Language Processing10.5555/2145432.2145601(1577-1588)Online publication date: 27-Jul-2011
    • Show More Cited By

    View Options

    Login options

    View options


    View or Download as a PDF file.



    View online with eReader.








    Share this Publication link

    Share on social media