Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by Example

Published: 20 June 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Data Scientists deal with a wide variety of file data formats and data representations. Probably the most difficult to handle are custom data formats that liberally define their own particular flat or nested structure with multiple custom delimiters, multi-line records, or undocumented semantics of attribute sequences, co-appearances, and repetitions. As a prerequisite for exploratory ML model training, data scientists need to map these data representations into regular frames or matrices. Unfortunately, existing tools and frameworks provide only limited support for aiding this process, which causes redundant manual efforts and unnecessary data quality issues. In this paper, we initiate work on automatic matrix and frame reader generation by example. A user provides a sample of raw text data and its mapped matrix or frame representation. Our GIO framework then first identifies the mapping rules from raw to structured data, and subsequently generates source code of an efficient, multi-threaded reader for reading full raw datasets of this format. In order to facilitate manual improvements, both the mapping rules, and generated reader can be modified as needed. Our experiments show that GIO is able to correctly identify the mapping rules for basic text formats like CSV, LibSVM, MatrixMarket; custom text formats from publishing, automotive, and health care; as well as various nested formats such as JSON and XML. Additionally, the automatically generated readers yield competitive performance compared to hand-coded readers and tuned libraries like RapidJSON.

    Supplemental Material

    MP4 File
    Data Scientists deal with a wide variety of file data formats and data representations. Probably the most difficult to handle are custom data formats that liberally define their own particular flat or nested structure with multiple custom delimiters, multi-line records, or undocumented semantics of attribute sequences, co-appearances, and repetitions. As a prerequisite for exploratory ML model training, data scientists need to map these data representations into regular frames or matrices. Unfortunately, existing tools and frameworks provide only limited support for aiding this process, which causes redundant manual efforts and unnecessary data quality issues. In this paper, we initiate work on automatic matrix and frame reader generation by example. A user provides a sample of raw text data and its mapped matrix or frame representation. Our GIO framework then first identifies the mapping rules from raw to structured data, and subsequently generates source code of an efficient, multi-threaded reader for reading full raw datasets of this format. In order to facilitate manual improvements, both the mapping rules, and generated reader can be modified as needed. Our experiments show that GIO is able to correctly identify the mapping rules for basic text formats like CSV, LibSVM, MatrixMarket; custom text formats from publishing, automotive, and health care; as well as various nested formats such as JSON and XML. Additionally, the automatically generated readers yield competitive performance compared to hand-coded readers and tuned libraries like RapidJSON.
    PDF File
    Read me
    ZIP File
    Source Code

    References

    [1]
    2000. Auto-lead Data Format / ADF: An Industry Standard Data Format for the Export and Import of Automotive Customer Leads using XML. https://adfxml.info/adf_spec.pdf
    [2]
    2013. Matrix Market Exchange Formats. Technical Report. Math, Statistics, and Computational Science. https://math.nist.gov/MatrixMarket/formats.html
    [3]
    2022. Gson. https://github.com/google/gson/
    [4]
    2022. HAPI object-oriented HL7 2.x parser for Java. https://hapifhir.github.io/hapi-hl7v2/
    [5]
    2022. Jackson. https://github.com/FasterXML/jackson/
    [6]
    2022. RapidJSON. http://rapidjson.org/
    [7]
    2022. Schema Guru. https://github.com/snowplow/schema-guru
    [8]
    Ziawasch Abedjan, Lukasz Golab, and Felix Naumann. 2017. Data Profiling: A Tutorial. In SIGMOD. 1747--1751. https://doi.org/10.1145/3035918.3054772
    [9]
    Ioannis Alagiannis, Renata Borovica, Miguel Branco, Stratos Idreos, and Anastasia Ailamaki. 2012. NoDB: Efficient Query Execution on Raw Data Files. In SIGMOD. 241--252. https://doi.org/10.1145/2213836.2213864
    [10]
    Ioannis Alagiannis, Renata Borovica, Miguel Branco, Stratos Idreos, and Anastasia Ailamaki. 2012. NoDB in Action: Adaptive Query Processing on Raw Data. PVLDB 5, 12 (2012), 1942--1945. https://doi.org/10.14778/2367502.2367543
    [11]
    Bogdan Alexe, Balder TEN Cate, Phokion G Kolaitis, and Wang-Chiew Tan. 2011. Characterizing schema mappings via data examples. TODS 36, 4 (2011), 1--48. https://doi.org/10.1145/2043652.2043656
    [12]
    Bogdan Alexe, Balder ten Cate, Phokion G Kolaitis, and Wang-Chiew Tan. 2011. EIRENE: Interactive design and refinement of schema mappings via data examples. PVLDB 4, 12 (2011), 1414--1417. http://www.vldb.org/pvldb/vol4/p1414-alexe.pdf
    [13]
    Bogdan Alexe, Laura Chiticariu, Renée J Miller, and Wang-Chiew Tan. 2008. Muse: Mapping understanding and design by example. In ICDE. 10--19. https://doi.org/10.1109/ICDE.2008.4497409
    [14]
    Bogdan Alexe, Balder Ten Cate, Phokion G Kolaitis, and Wang-Chiew Tan. 2011. Designing and refining schema mappings via data examples. In SIGMOD. 133--144. https://doi.org/10.1145/1989323.1989338
    [15]
    Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei Zaharia. 2015. Spark SQL: Relational Data Processing in Spark. In SIGMOD. 1383--1394. https://doi.org/10.1145/2723372.2742797
    [16]
    Lyes Attouche, Mohamed Amine Baazizi, Dario Colazzo, Francesco Falleni, Giorgio Ghelli, Cristiano Landi, Carlo Sartiani, and Stefanie Scherzinger. 2021. A Tool for JSON Schema Witness Generation. In EDBT. 694--697. https://doi.org/10.5441/002/edbt.2021.86
    [17]
    Lyes Attouche, Mohamed Amine Baazizi, Dario Colazzo, Giorgio Ghelli, Carlo Sartiani, and Stefanie Scherzinger. 2022. Witness Generation for JSON Schema. PVLDB 15, 13 (2022), 4002--4014. https://www.vldb.org/pvldb/vol15/p4002-sartiani.pdf
    [18]
    David Aumueller, Hong Hai Do, Sabine Massmann, and Erhard Rahm. 2005. Schema and ontology matching with COMA. In SIGMOD. 906--908. https://doi.org/10.1145/1066157.1066283
    [19]
    Tahir Azim, Manos Karpathiotakis, and Anastasia Ailamaki. 2017. ReCache: Reactive Caching for Fast Analytics over Heterogeneous Data. PVLDB 11, 3 (2017), 324--337. https://doi.org/10.14778/3157794.3157801
    [20]
    Mohamed-Amine Baazizi, Dario Colazzo, Giorgio Ghelli, and Carlo Sartiani. 2017. Counting types for massive JSON datasets. In DBPL@VLDB Workshop. 1--12. https://doi.org/10.1145/3122831.3122837
    [21]
    Mohamed-Amine Baazizi, Dario Colazzo, Giorgio Ghelli, and Carlo Sartiani. 2019. Parametric schema inference for massive JSON datasets. VLDB J. 28, 4 (2019), 497--521. https://doi.org/10.1007/s00778-018-0532--7
    [22]
    Mohamed Amine Baazizi, Dario Colazzo, Giorgio Ghelli, and Carlo Sartiani. 2019. Schemas and Types for JSON Data: From Theory to Practice. In SIGMOD. 2060--2063. https://doi.org/10.1145/3299869.3314032
    [23]
    Mohamed-Amine Baazizi, Houssem Ben Lahmar, Dario Colazzo, Giorgio Ghelli, and Carlo Sartiani. 2017. Schema inference for massive JSON datasets. In EDBT. https://doi.org/10.5441/002/edbt.2017.21
    [24]
    Sebastian Baunsgaard, Matthias Boehm, Ankit Chaudhary, Behrouz Derakhshan, Stefan Geißelsöder, Philipp M. Grulich, Michael Hildebrand, Kevin Innerebner, Volker Markl, Claus Neubauer, Sarah Osterburg, Olga Ovcharenko, Sergey Redyuk, Tobias Rieger, Alireza Rezaei Mahdiraji, Sebastian Benjamin Wrede, and Steffen Zeuch. 2021. ExDRa: Exploratory Data Science on Federated Raw Data. In SIGMOD. 2450--2463. https://doi.org/10.1145/3448016.3457549
    [25]
    Lasse Bergroth, Harri Hakonen, and Timo Raita. 2000. A survey of longest common subsequence algorithms. In Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000. IEEE, 39--48. https://doi.org/10.1109/SPIRE.2000.878178
    [26]
    Philip A. Bernstein, Jayant Madhavan, and Erhard Rahm. 2011. Generic Schema Matching, Ten Years Later. PVLDB 4, 11 (2011), 695--701. http://www.vldb.org/pvldb/vol4/p695-bernstein_madhavan_rahm.pdf
    [27]
    George Beskales, Ihab F. Ilyas, Lukasz Golab, and Artur Galiullin. 2013. On the relative trust between inconsistent data and inaccurate constraints. In ICDE. 541--552. https://doi.org/10.1109/ICDE.2013.6544854
    [28]
    Kevin S. Beyer, Vuk Ercegovac, Rainer Gemulla, Andrey Balmin, Mohamed Y. Eltabakh, Carl-Christian Kanne, Fatma Özcan, and Eugene J. Shekita. 2011. Jaql: A Scripting Language for Large Scale Semistructured Data Analysis. PVLDB 4, 12 (2011), 1272--1283. http://www.vldb.org/pvldb/vol4/p1272-beyer.pdf
    [29]
    Robert Binna, Eva Zangerle, Martin Pichl, Günther Specht, and Viktor Leis. 2018. HOT: A Height Optimized Trie Index for Main-Memory Database Systems. In SIGMOD. 521--534. https://doi.org/10.1145/3183713.3196896
    [30]
    Spyros Blanas, Kesheng Wu, Surendra Byna, Bin Dong, and Arie Shoshani. 2014. Parallel data analysis directly on scientific file formats. In SIGMOD. 385--396. https://doi.org/10.1145/2588555.2612185
    [31]
    Matthias Boehm, Iulian Antonov, Sebastian Baunsgaard, Mark Dokter, Robert Ginthör, Kevin Innerebner, Florijan Klezin, Stefanie N. Lindstaedt, Arnab Phani, Benjamin Rath, Berthold Reinwald, Shafaq Siddiqui, and Sebastian Benjamin Wrede. 2020. SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle. In CIDR. http://cidrdb.org/cidr2020/papers/p22-boehm-cidr20.pdf
    [32]
    Matthias Boehm, Berthold Reinwald, Dylan Hutchison, Prithviraj Sen, Alexandre V. Evfimievski, and Niketan Pansare. 2018. On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML. PVLDB 11, 12 (2018), 1755--1768. https://doi.org/10.14778/3229863.3229865
    [33]
    Matthias Böhm, Benjamin Schlegel, Peter Benjamin Volk, Ulrike Fischer, Dirk Habich, and Wolfgang Lehner. 2011. Efficient In-Memory Indexing with Generalized Prefix Trees. In BTW. 227--246. https://dl.gi.de/20.500.12116/19581
    [34]
    Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 3 (2011), 27:1--27:27. https://doi.org/10.1145/1961189.1961199
    [35]
    Yu Cheng and Florin Rusu. 2014. Parallel in-situ data processing with speculative loading. In SIGMOD. 1287--1298. https://doi.org/10.1145/2588555.2593673
    [36]
    Jeffrey Cohen, Brian Dolan, Mark Dunlap, Joseph M. Hellerstein, and Caleb Welton. 2009. MAD Skills: New Analysis Practices for Big Data. PVLDB 2, 2 (2009), 1481--1492. https://doi.org/10.14778/1687553.1687576
    [37]
    Dong Deng, Raul Castro Fernandez, Ziawasch Abedjan, Sibo Wang, Michael Stonebraker, Ahmed K. Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, and Nan Tang. 2017. The Data Civilizer System. In CIDR. http://cidrdb.org/cidr2017/papers/p44-deng-cidr17.pdf
    [38]
    Hong Hai Do and Erhard Rahm. 2002. COMA - A System for Flexible Combination of Schema Matching Approaches. In VLDB. 610--621. https://doi.org/10.1016/B978--155860869--6/50060--3
    [39]
    Dominik Durner, Viktor Leis, and Thomas Neumann. 2021. JSON Tiles: Fast Analytics on Semi-Structured Data. In SIGMOD. 445--458. https://doi.org/10.1145/3448016.3452809
    [40]
    Ronald Fagin, Phokion G Kolaitis, Renée J Miller, and Lucian Popa. 2005. Data exchange: semantics and query answering. Theoretical Computer Science 336, 1 (2005), 89--124. https://doi.org/10.1016/j.tcs.2004.10.033
    [41]
    Chang Ge, Yinan Li, Eric Eilebrecht, Badrish Chandramouli, and Donald Kossmann. 2019. Speculative distributed CSV data parsing for big data analytics. In SIGMOD. 883--899. https://doi.org/10.1145/3299869.3319898
    [42]
    Chang Ge, Yinan Li, Eric Eilebrecht, Badrish Chandramouli, and Donald Kossmann. 2019. Speculative distributed CSV data parsing for big data analytics. In Proceedings of the 2019 International Conference on Management of Data. 883--899. https://doi.org/10.1145/3299869.3319898
    [43]
    Philipp M Grulich, Breß Sebastian, Steffen Zeuch, Jonas Traub, Janis von Bleichert, Zongxiong Chen, Tilmann Rabl, and Volker Markl. 2020. Grizzly: Efficient stream processing through adaptive query compilation. In SIGMOD. 2487--2503. https://doi.org/10.1145/3318464.3389739
    [44]
    Laura M. Haas, Mauricio A. Hernández, Howard Ho, Lucian Popa, and Mary Roth. 2005. Clio grows up: from research prototype to industrial tool. In SIGMOD. 805--810. https://doi.org/10.1145/1066157.1066252
    [45]
    Mauricio A. Hernández, Renée J. Miller, and Laura M. Haas. 2001. Clio: A Semi-Automatic Tool For Schema Mapping. In SIGMOD. 607. https://doi.org/10.1145/375663.375767
    [46]
    Madelon Hulsebos, Kevin Zeng Hu, Michiel A. Bakker, Emanuel Zgraggen, Arvind Satyanarayan, Tim Kraska, Çagatay Demiralp, and César A. Hidalgo. 2019. Sherlock: A Deep Learning Approach to Semantic Data Type Detection. In SIGKDD. 1500--1508. https://doi.org/10.1145/3292500.3330993
    [47]
    Stratos Idreos, Ioannis Alagiannis, Ryan Johnson, and Anastasia Ailamaki. 2011. Here are my Data Files. Here are my Queries. Where are my Results?. In CIDR. 57--68. http://cidrdb.org/cidr2011/Papers/CIDR11_Paper7.pdf
    [48]
    Milena Ivanova, Yagiz Kargin, Martin L. Kersten, Stefan Manegold, Ying Zhang, Mihai Datcu, and Daniela Espinoza- Molina. 2013. Data vaults: a database welcome to scientific file repositories. In SSDBM. 48:1--48:4. https://doi.org/10.1145/2484838.2484876
    [49]
    Lin Jiang, Junqiao Qiu, and Zhijia Zhao. 2020. Scalable Structural Index Construction for JSON Analytics. PVLDB 14, 4 (2020). https://doi.org/10.14778/3436905.3436926
    [50]
    Peter Kairouz, Brendan McMahan, and Virginia Smith. 2020. Federated Learning Tutorial. In NeurIPS. https://slideslive.com/38935813/federated-learning-tutorial
    [51]
    Manos Karpathiotakis, Ioannis Alagiannis, and Anastasia Ailamaki. 2016. Fast Queries Over Heterogeneous Data Through Engine Customization. PVLDB 9, 12 (2016), 972--983. https://doi.org/10.14778/2994509.2994516
    [52]
    Manos Karpathiotakis, Miguel Branco, Ioannis Alagiannis, and Anastasia Ailamaki. 2014. Adaptive Query Processing on RAW Data. PVLDB 7, 12 (2014), 1119--1130. https://doi.org/10.14778/2732977.2732986
    [53]
    Meike Klettke, Uta Störl, and Stefanie Scherzinger. 2015. Schema Extraction and Structural Outlier Detection for JSON-based NoSQL Data Stores. In BTW. 425--444. https://dl.gi.de/20.500.12116/2420
    [54]
    Phokion G Kolaitis. 2005. Schema mappings, data exchange, and metadata management. In PODS. 61--75. https://doi.org/10.1145/1065167.1065176
    [55]
    Marcel Kornacker et al . 2015. Impala: A Modern, Open-Source SQL Engine for Hadoop. In CIDR. http://cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf
    [56]
    Geoff Langdale and Daniel Lemire. 2019. Parsing gigabytes of JSON per second. VLDB J. 28, 6 (2019), 941--960. https://doi.org/10.1007/s00778-019-00578--5
    [57]
    Viktor Leis, Alfons Kemper, and Thomas Neumann. 2013. The adaptive radix tree: ARTful indexing for main-memory databases. In ICDE. 38--49. https://doi.org/10.1109/ICDE.2013.6544812
    [58]
    Yinan Li, Nikos R Katsipoulakis, Badrish Chandramouli, Jonathan Goldstein, and Donald Kossmann. 2017. Mison: a fast JSON parser for data analytics. PVLDB 10, 10 (2017), 1118--1129. https://doi.org/10.14778/3115404.3115416
    [59]
    Ericsson M. Garcia-Martin, G. Camarillo. 2008. Extensible Markup Language (XML) Format Extension for Representing Copy Control Attributes in Resource Lists. RFC 5364. RFC Editor. https://datatracker.ietf.org/doc/html/rfc5364
    [60]
    Jayant Madhavan, Philip A. Bernstein, and Erhard Rahm. 2001. Generic Schema Matching with Cupid. In VLDB. 49--58. http://www.vldb.org/conf/2001/P049.pdf
    [61]
    Renée J. Miller, Laura M. Haas, and Mauricio A. Hernández. 2000. Schema Mapping as Query Discovery. In VLDB. 77--88. http://www.vldb.org/conf/2000/P077.pdf
    [62]
    Donald R. Morrison. 1968. PATRICIA - Practical Algorithm To Retrieve Information Coded in Alphanumeric. J. ACM 15, 4 (1968), 514--534. https://doi.org/10.1145/321479.321481
    [63]
    Ingo Müller, Ghislain Fourny, Stefan Irimescu, Can Berker Cikis, and Gustavo Alonso. 2020. Rumble: Data Independence for Large Messy Data Sets. PVLDB 14, 4 (2020), 498--506. https://doi.org/10.14778/3436905.3436910
    [64]
    Svetlozar Nestorov, Jeffrey Ullman, Janet Wiener, and Sudarashan Chawathe. 1997. Representative objects: Concise representations of semistructured, hierarchical data. In ICDE. 79--90.
    [65]
    Shoumik Palkar, Firas Abuzaid, Peter Bailis, and Matei Zaharia. 2018. Filter Before You Parse: Faster Analytics on Raw Data with Sparser. PVLDB 11, 11 (2018). https://doi.org/10.14778/3236187.3236207
    [66]
    Christina Pavlopoulou, E Preston Carman Jr, Till Westmann, Michael J Carey, and Vassilis J Tsotras. 2018. A Parallel and Scalable Processor for JSON Data. In EDBT. 576--587. https://doi.org/10.5441/002/edbt.2018.68
    [67]
    Li Qian, Michael J Cafarella, and HV Jagadish. 2012. Sample-driven schema mapping. In SIGMOD. 73--84. https://doi.org/10.1145/2213836.2213846
    [68]
    Erhard Rahm and Philip A. Bernstein. 2001. A survey of approaches to automatic schema matching. VLDB J. 10, 4 (2001), 334--350. https://doi.org/10.1007/s007780100057
    [69]
    Y. Shafranovich. 2005. Common Format and MIME Type for Comma-Separated Values (CSV) Files. RFC 4180. RFC Editor. https://www.rfc-editor.org/rfc/rfc4180
    [70]
    Vraj Shah, Jonathan Lacanlale, Premanand Kumar, Kevin Yang, and Arun Kumar. 2021. Towards Benchmarking Feature Type Inference for AutoML Platforms. In SIGMOD. 1584--1596. https://doi.org/10.1145/3448016.3457274
    [71]
    Elias Stehle and Hans-Arno Jacobsen. 2020. ParPaRaw: Massively Parallel Parsing of Delimiter-Separated Raw Data. PVLDB 13, 5 (2020). https://doi.org/10.14778/3377369.3377372
    [72]
    Ed. T. Bray. 2017. The JavaScript Object Notation (JSON) Data Interchange Format. RFC 8259. RFC Editor. https://datatracker.ietf.org/doc/html/rfc8259
    [73]
    Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnetminer: extraction and mining of academic social networks. In SIGKDD. 990--998. https://doi.org/10.1145/1401890.1402008
    [74]
    Arno Unkrieg. 2014. Janino: A super-small, super-fast Java Compiler. https://janino-compiler.github.io/janino/2014-02--18_SWM-JAK.pdf
    [75]
    Qiu Yue Wang, Jeffrey Xu Yu, and Kam-Fai Wong. 2000. Approximate graph schema extraction for semi-structured data. In EDBT. 302--316. https://doi.org/10.1007/3--540--46439--5_21
    [76]
    Navid Yaghmazadeh, Xinyu Wang, and Isil Dillig. 2018. Automated migration of hierarchical data to relational tables using programming-by-example. PVLDB 11, 5 (2018), 580--593. https://doi.org/10.1145/3187009.3177735
    [77]
    Ling-Ling Yan, Renée J. Miller, Laura M. Haas, and Ronald Fagin. 2001. Data-Driven Understanding and Refinement of Schema Mappings. In SIGMOD. 485--496. https://doi.org/10.1145/375663.375729
    [78]
    Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In NSDI. 15--28. https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/zaharia
    [79]
    Matei Zaharia, Ali Ghodsi, Reynold Xin, and Michael Armbrust. 2021. Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics. In CIDR. http://cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf
    [80]
    Dan Zhang, Yoshihiko Suhara, Jinfeng Li, Madelon Hulsebos, Çagatay Demiralp, and Wang-Chiew Tan. 2020. Sato: Contextual Semantic Type Detection in Tables. PVLDB 13, 11 (2020), 1835--1848. http://www.vldb.org/pvldb/vol13/p1835-zhang.pdf

    Cited By

    View all
    • (2024)Effective Entry-Wise Flow for Molecule Generation2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00023(207-220)Online publication date: 13-May-2024

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Management of Data
    Proceedings of the ACM on Management of Data  Volume 1, Issue 2
    PACMMOD
    June 2023
    2310 pages
    EISSN:2836-6573
    DOI:10.1145/3605748
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 June 2023
    Published in PACMMOD Volume 1, Issue 2

    Permissions

    Request permissions for this article.

    Badges

    Author Tags

    1. custom data format
    2. data loading
    3. efficient readers
    4. raw data

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)112
    • Downloads (Last 6 weeks)5
    Reflects downloads up to

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Effective Entry-Wise Flow for Molecule Generation2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00023(207-220)Online publication date: 13-May-2024

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media