Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/DBKDA.2010.34guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Adaptation of Apriori to MapReduce to Build a Warehouse of Relations between Named Entities across the Web

Published: 11 April 2010 Publication History

Abstract

The Semantic Web has made possible the use of the Internet to extract useful content, a task that could necessitate an infrastructure across the Web. With Hadoop, a free implementation of the MapReduce programming paradigm created by Google, we can treat these data reliably over hundreds of servers. This article describes how the Apriori algorithm was adapted to MapReduce in the search for relations between entities to deal with thousands of Web pages coming from RSS feeds daily. First, every feed is looked up five times per day and each entry is registered in a database with MapReduce. Second, the entries are read and their content sent to the Web service OpenCalais for the detection of named entities. For each Web page, the set of all itemsets found is generated and stored in the database. Third, all generated sets, from first to last, are counted and their support is registered. Finally, various analytical tasks are executed to present the relationships found. Our tests show that the third step, executed over 3,000,000 sets, was 4.5 times faster using five servers than using a single machine. This approach allows us to easily and automatically distribute treatments on as many machines as are available, and be able to process datasets that one server, even a very powerful one, would not be able to manage alone. We believe that this work is a step forward in processing semantic Web data efficiently and effectively.

Cited By

View all
  • (2018)How to exploit high performance computing in population-based metaheuristics for solving association rule mining problemDistributed and Parallel Databases10.1007/s10619-018-7218-436:2(369-397)Online publication date: 1-Jun-2018
  • (2015)Parallel Eclat for Opportunistic Mining of Frequent ItemsetsProceedings, Part I, of the 26th International Conference on Database and Expert Systems Applications - Volume 926110.1007/978-3-319-22849-5_27(401-415)Online publication date: 1-Sep-2015
  • (2012)PARMAProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2396776(85-94)Online publication date: 29-Oct-2012
  • Show More Cited By

Index Terms

  1. Adaptation of Apriori to MapReduce to Build a Warehouse of Relations between Named Entities across the Web
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    DBKDA '10: Proceedings of the 2010 Second International Conference on Advances in Databases, Knowledge, and Data Applications
    April 2010
    254 pages
    ISBN:9780769539812

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 11 April 2010

    Author Tags

    1. Apriori algorithm
    2. MapReduce paradigm
    3. association rules
    4. web mining

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)How to exploit high performance computing in population-based metaheuristics for solving association rule mining problemDistributed and Parallel Databases10.1007/s10619-018-7218-436:2(369-397)Online publication date: 1-Jun-2018
    • (2015)Parallel Eclat for Opportunistic Mining of Frequent ItemsetsProceedings, Part I, of the 26th International Conference on Database and Expert Systems Applications - Volume 926110.1007/978-3-319-22849-5_27(401-415)Online publication date: 1-Sep-2015
    • (2012)PARMAProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2396776(85-94)Online publication date: 29-Oct-2012
    • (2012)Semantic metadata in the news production processProceeding of the 16th International Academic MindTrek Conference10.1145/2393132.2393158(125-133)Online publication date: 3-Oct-2012
    • (2012)Integrating linked data into the content value chainProceedings of the 8th International Conference on Semantic Systems10.1145/2362499.2362513(94-102)Online publication date: 5-Sep-2012

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media