Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2320765.2320812acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Open business intelligence: on the importance of data quality awareness in user-friendly data mining

Published: 30 March 2012 Publication History
  • Get Citation Alerts
  • Abstract

    Citizens demand more and more data for making decisions in their daily life. Therefore, mechanisms that allow citizens to understand and analyze linked open data (LOD) in a user-friendly manner are highly required. To this aim, the concept of Open Business Intelligence (OpenBI) is introduced in this position paper. OpenBI facilitates non-expert users to (i) analyze and visualize LOD, thus generating actionable information by means of reporting, OLAP analysis, dashboards or data mining; and to (ii) share the new acquired information as LOD to be reused by anyone. One of the most challenging issues of OpenBI is related to data mining, since non-experts (as citizens) need guidance during preprocessing and application of mining algorithms due to the complexity of the mining process and the low quality of the data sources. This is even worst when dealing with LOD, not only because of the different kind of links among data, but also because of its high dimensionality. As a consequence, in this position paper we advocate that data mining for OpenBI requires data quality-aware mechanisms for guiding non-expert users in obtaining and sharing the most reliable knowledge from the available LOD.

    References

    [1]
    R. Ananthakrishna, S. Chaudhuri, and V. Ganti. Eliminating fuzzy duplicates in data warehouses. In VLDB, pages 586--597. Morgan Kaufmann, 2002.
    [2]
    L. Berti-Equille. Measuring and modelling data quality for quality-awareness in data mining. In F. Guillet and H. J. Hamilton, editors, Quality Measures in Data Mining, volume 43 of Studies in Computational Intelligence, pages 101--126. Springer, 2007.
    [3]
    C. Bizer, T. Heath, and T. Berners-Lee. Linked data - the story so far. Int. J. Semantic Web Inf. Syst., 5(3):1--22, 2009.
    [4]
    R. H. L. Chiang, T. M. Barron, and V. C. Storey. Extracting domain semantics for knowledge discovery in relational databases. In KDD Workshop, pages 299--310, 1994.
    [5]
    A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate record detection: A survey. IEEE Trans. Knowl. Data Eng., 19(1):1--16, 2007.
    [6]
    R. Espinosa, J. J. Zubcoff, and J.-N. Mazón. A set of experiments to consider data quality criteria in classification techniques for data mining. In B. Murgante, O. Gervasi, A. Iglesias, D. Taniar, and B. O. Apduhan, editors, ICCSA (2), volume 6783 of Lecture Notes in Computer Science, pages 680--694. Springer, 2011.
    [7]
    U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. Knowledge discovery and data mining: Towards a unifying framework. In KDD, pages 82--88, 1996.
    [8]
    I. K. Fodor. A survey of dimension reduction techniques. LLNL technical report, (June):1--24, 2002.
    [9]
    J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000.
    [10]
    M. Jarke and Y. Vassiliou. Data warehouse quality: A review of the dwq project. In D. M. Strong and B. K. Kahn, editors, IQ, pages 299--313. MIT, 1997.
    [11]
    H.-P. Kriegel, K. M. Borgwardt, P. Kröger, A. Pryakhin, M. Schubert, and A. Zimek. Future trends in data mining. Data Min. Knowl. Discov., 15(1):87--97, 2007.
    [12]
    Object Management Group. Common Warehouse Metamodel Specification 1.1. http://www.omg.org/cgi-bin/doc?formal/03-03-02.
    [13]
    E. Rahm and H. H. Do. Data cleaning: Problems and current approaches. IEEE Data Eng. Bull., 23(4):3--13, 2000.
    [14]
    D. M. Strong, Y. W. Lee, and R. Y. Wang. 10 potholes in the road to information quality. IEEE Computer, 30(8):38--46, 1997.
    [15]
    D. M. Strong, Y. W. Lee, and R. Y. Wang. Data quality in context. Commun. ACM, 40(5):103--110, 1997.
    [16]
    O. G. Troyanskaya, M. Cantor, G. Sherlock, P. O. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R. B. Altman. Missing value estimation methods for dna microarrays. Bioinformatics, 17(6):520--525, 2001.
    [17]
    X. Zhu, T. M. Khoshgoftaar, I. Davidson, and S. Zhang. Editorial: Special issue on mining low-quality data. Knowl. Inf. Syst., 11:131--136, February 2007.

    Cited By

    View all
    • (2024)Open Data Strategy in Competitive Intelligence: Analyzing the Scientific TrendsProceedings of the 17th International Conference on Industrial Engineering and Industrial Management (ICIEIM) – XXVII Congreso de Ingeniería de Organización (CIO2023)10.1007/978-3-031-57996-7_36(207-213)Online publication date: 26-Apr-2024
    • (2022)Multi-Object Tracking Using Gradient-Based Learning Model in Video SurveillanceInternational Journal of Software Innovation10.4018/IJSI.2891689:4(1-17)Online publication date: 29-Jun-2022
    • (2022)Big Data and Predictive Analytics for Business Intelligence: A Bibliographic Study (2000–2021)Forecasting10.3390/forecast40400424:4(767-786)Online publication date: 23-Sep-2022
    • Show More Cited By

    Index Terms

    1. Open business intelligence: on the importance of data quality awareness in user-friendly data mining

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      EDBT-ICDT '12: Proceedings of the 2012 Joint EDBT/ICDT Workshops
      March 2012
      265 pages
      ISBN:9781450311434
      DOI:10.1145/2320765
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 30 March 2012

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Research-article

      Conference

      ICDT '12

      Acceptance Rates

      Overall Acceptance Rate 7 of 10 submissions, 70%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)17
      • Downloads (Last 6 weeks)1

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Open Data Strategy in Competitive Intelligence: Analyzing the Scientific TrendsProceedings of the 17th International Conference on Industrial Engineering and Industrial Management (ICIEIM) – XXVII Congreso de Ingeniería de Organización (CIO2023)10.1007/978-3-031-57996-7_36(207-213)Online publication date: 26-Apr-2024
      • (2022)Multi-Object Tracking Using Gradient-Based Learning Model in Video SurveillanceInternational Journal of Software Innovation10.4018/IJSI.2891689:4(1-17)Online publication date: 29-Jun-2022
      • (2022)Big Data and Predictive Analytics for Business Intelligence: A Bibliographic Study (2000–2021)Forecasting10.3390/forecast40400424:4(767-786)Online publication date: 23-Sep-2022
      • (2022)Understanding open data business models from innovation and knowledge management perspectivesBusiness Process Management Journal10.1108/BPMJ-06-2021-037328:2(532-554)Online publication date: 14-Mar-2022
      • (2021)Big Data Quality for Data Mining in Business Intelligence ApplicationsIntegration Challenges for Analytics, Business Intelligence, and Data Mining10.4018/978-1-7998-5781-5.ch004(64-91)Online publication date: 2021
      • (2021)An Evidence-Based Approach on Public Health Decisions in Low-Middle Income Countries: Use Case of Senegal at the Verge of COVID-192021 International Conference on Digital Age & Technological Advances for Sustainable Development (ICDATA)10.1109/ICDATA52997.2021.00046(193-200)Online publication date: Jun-2021
      • (2020)Towards Semantic ETL for integration of textual scientific documents in a Big Data environment: a theoretical approach2020 6th IEEE Congress on Information Science and Technology (CiSt)10.1109/CiSt49399.2021.9357280(133-138)Online publication date: 5-Jun-2020
      • (2018)Business Intelligence and the WebInformation Systems Frontiers10.1007/s10796-013-9435-815:3(307-309)Online publication date: 24-Dec-2018
      • (2017)Data Analysis for Software Process Improvement: A Systematic Literature ReviewRecent Advances in Information Systems and Technologies10.1007/978-3-319-56535-4_5(48-59)Online publication date: 28-Mar-2017
      • (2015)Using Semantic Web Technologies for Exploratory OLAP: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2014.233082227:2(571-588)Online publication date: 1-Feb-2015
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media