Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3183713.3199515acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
keynote

Machine Learning for Data Management: Problems and Solutions

Published: 27 May 2018 Publication History

Abstract

Machine learning has made great strides in recent years, and its applications are spreading rapidly. Unfortunately, the standard machine learning formulation does not match well with data management problems. For example, most learning algorithms assume that the data is contained in a single table, and consists of i.i.d. (independent and identically distributed) samples. This leads to a proliferation of ad hoc solutions, slow development, and suboptimal results. Fortunately, a body of machine learning theory and practice is being developed that dispenses with such assumptions, and promises to make machine learning for data management much easier and more effective [1]. In particular, representations like Markov logic, which includes many types of deep networks as special cases, allow us to define very rich probability distributions over non-i.i.d., multi-relational data [2]. Despite their generality, learning the parameters of these models is still a convex optimization problem, allowing for efficient solution. Learning structure-in the case of Markov logic, a set of formulas in first-order logic-is intractable, as in more traditional representations, but can be done effectively using inductive logic programming techniques. Inference is performed using probabilistic generalizations of theorem proving, and takes linear time and space in tractable Markov logic, an object-oriented specialization of Markov logic [3]. These techniques have led to state-of-the-art, principled solutions to problems like entity resolution, schema matching, ontology alignment, and information extraction. Using tractable Markov logic, we have extracted from the Web a probabilistic knowledge base with millions of objects and billions of parameters, which can be queried exactly in subsecond times using an RDBMS backend [3]. With these foundations in place, we expect the pace of machine learning applications in data management to continue to accelerate in coming years.

References

[1]
I. Goodfellow, Y. Bengio &A. Courville, Deep Learning, MIT Press, 2016.
[2]
L. Getoor &B. Taskar (eds.), Introduction to Statistical Relational Learning, MIT Press, 2007.
[3]
P. Domingos &D. Lowd, Markov Logic: An Interface Layer for Artificial Intelligence, Morgan &Claypool, 2009.
[4]
M. Niepert &P. Domingos, "Learning and inference in tractable probabilistic knowledge bases," in Proc. 31st Conf. on Uncertainty in AI, 2015.

Cited By

View all
  • (2024)RocolSys: An Automatic Row-Column Data Storage System for HTAPWeb and Big Data10.1007/978-981-97-7244-5_27(368-372)Online publication date: 28-Aug-2024
  • (2023)Towards an Integrated Rough Set and Data Modelling Framework for Data Management and Knowledge ExtractionArtificial Intelligence and Smart Environment10.1007/978-3-031-26254-8_116(800-805)Online publication date: 8-Mar-2023
  • (2023)Precision Medicine and TelemedicineSpringer Handbook of Automation10.1007/978-3-030-96729-1_58(1249-1263)Online publication date: 17-Jun-2023
  • Show More Cited By
  1. Machine Learning for Data Management: Problems and Solutions

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
      May 2018
      1874 pages
      ISBN:9781450347037
      DOI:10.1145/3183713
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 May 2018

      Check for updates

      Author Tags

      1. graphical models
      2. non-i.i.d. data.
      3. probabilistic databases
      4. probabilistic theorem proving

      Qualifiers

      • Keynote

      Conference

      SIGMOD/PODS '18
      Sponsor:

      Acceptance Rates

      SIGMOD '18 Paper Acceptance Rate 90 of 461 submissions, 20%;
      Overall Acceptance Rate 785 of 4,003 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)16
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 03 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)RocolSys: An Automatic Row-Column Data Storage System for HTAPWeb and Big Data10.1007/978-981-97-7244-5_27(368-372)Online publication date: 28-Aug-2024
      • (2023)Towards an Integrated Rough Set and Data Modelling Framework for Data Management and Knowledge ExtractionArtificial Intelligence and Smart Environment10.1007/978-3-031-26254-8_116(800-805)Online publication date: 8-Mar-2023
      • (2023)Precision Medicine and TelemedicineSpringer Handbook of Automation10.1007/978-3-030-96729-1_58(1249-1263)Online publication date: 17-Jun-2023
      • (2022)An Efficient Algorithm for Mapping Deep Learning Applications on the NoC ArchitectureApplied Sciences10.3390/app1206316312:6(3163)Online publication date: 20-Mar-2022
      • (2021)Scalable and Usable Relational Learning With Automatic Language BiasProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457275(1440-1451)Online publication date: 9-Jun-2021
      • (2020)AprilProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3417422(3465-3468)Online publication date: 19-Oct-2020
      • (2020)Learning Over Dirty Data Without CleaningProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3389708(1301-1316)Online publication date: 11-Jun-2020
      • (2019)Data Management Model for Internet of EverythingMobile Web and Intelligent Information Systems10.1007/978-3-030-27192-3_26(331-341)Online publication date: 26-Aug-2019

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media