Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2032397.2032436guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Database-as-a-service for long-tail science

Published: 20 July 2011 Publication History

Abstract

Database technology remains underused in science, especially in the long tail -- the small labs and individual researchers that collectively produce the majority of scientific output. These researchers increasingly require iterative, ad hoc analysis over ad hoc databases but cannot individually invest in the computational and intellectual infrastructure required for state-of-the-art solutions.
We describe a new "delivery vector" for database technology called SQL-Share that emphasizes ad hoc integration, query, sharing, and visualization over pre-defined schemas. To empower non-experts to write complex queries, we synthesize example queries from the data itself and explore limited English hints to augment the process. We integrate collaborative visualization via a web-based service called VizDeck that uses automated visualization techniques with a card game metaphor to allow creation of interactive visual dashboards in seconds with zero programming.
We present data on the initial uptake and usage of the system and report preliminary results testingout new features with the datasets collected during the initial pilot deployment. We conclude that the SQLShare system and associated services have the potential to increase uptake of relational database technology in the long tail of science.

References

[1]
Abiteboul, S., Greenshpan, O., Milo, T., Polyzotis, N.: Matchup: Autocompletion for mashups. In: ICDE, pp. 1479-1482 (2009).
[2]
Akbarnejad, J., Chatzopoulou, G., Eirinaki, M., Koshy, S., Mittal, S., On, D., Polyzotis, N., Varman, J.S.V.: Sql querie recommendations. PVLDB 3(2) (2010).
[3]
Amazon Relational Database Service (RDS), http://www.amazon.com/rds/
[4]
Amazon SimpleDB, http://www.amazon.com/simpledb/
[5]
Anderson, C.: The long tail. Wired 12(10) (2004).
[6]
Bernstein, P.A., Melnik, S.: Model management 2.0: manipulating richer mappings. In: SIGMOD Conference, pp. 1-12 (2007).
[7]
Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O'Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research 31(1), 365-370 (2003).
[8]
Bouch, A., Kuchinsky, A., Bhatti, N.: Quality is in the eye of the beholder: meeting users' requirements for internet quality of service. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2000, pp. 297-304. ACM, New York (2000).
[9]
Brown, P.G.: Overview of scidb: large scale array storage, processing and analysis. In: Proceedings of the 2010 International Conference on Management of Data, SIGMOD 2010, pp. 963-968. ACM, New York (2010).
[10]
Cafarella, M.J., Halevy, A.Y., Khoussainova, N.: Data integration for the relational web. PVLDB 2(1) (2009).
[11]
Dörk, M., Carpendale, S., Collins, C., Williamson, C.: Visgets: Coordinated visualizations for web-based information exploration and discovery. IEEE Transactions on Visualization and Computer Graphics 14, 1205-1212 (2008).
[12]
Elmeleegy, H., Ivan, A., Akkiraju, R., Goodwin, R.: Mashup advisor: A recommendation tool for mashup development. In: ICWS 2008: Proceedings of the 2008 IEEE International Conference on Web Services, pp. 337-344. IEEE Computer Society, Washington, DC, USA (2008).
[13]
Franklin, M.J., Halevy, A.Y., Maier, D.: From databases to dataspaces: A new abstraction for information management. SIGMOD Record 34(4) (December 2005).
[14]
Google fusion tables, http://www.google.com/fusiontables
[15]
Gene ontology, http://www.geneontology.org/
[16]
Gotz, D., Wen, Z.: Behavior-driven visualization recommendation. In: Proceedings of the 13th International Conference on Intelligent User Interfaces, IUI 2009, pp. 315-324. ACM, New York (2009).
[17]
Graves, M., Bergeman, E.R., Lawrence, C.B.: Graph database systems for genomics. IEEE Eng. Medicine Biol. Special Issue on Managing Data for the Human Genome Project 11(6) (1995).
[18]
Gray, J., Liu, D.T., Nieto-Santisteban, M.A., Szalay, A.S., DeWitt, D.J., Heber, G.: Scientific data management in the coming decade. In: CoRR abs/cs/0502008 (2005).
[19]
Heber, G., Gray, J.: Supporting finite element analysis with a relational database backend; part 1: There is life beyond files. Technical report, Microsoft MSR-TR- 2005-49 (April 2005).
[20]
Howe, B.: Sqlshare: Database-as-a-service for long tail science, http://escience.washington.edu/sqlshare
[21]
Khoussainova, N., Kwon, Y., Balazinska, M., Suciu, D.: Snipsuggest: A context-aware sql autocomplete system. In: Proc. of the 37th VLDB Conf. (2011).
[22]
Large Hadron Collider (LHC), http://lhc.web.cern.ch
[23]
Lin, J., Wong, J., Nichols, J., Cypher, A., Lau, T.A.: End-user programming of mashups with vegemite. In: Proceedings of the 13th International Conference on Intelligent User Interfaces, IUI 2009, pp. 97-106. ACM, New York (2009).
[24]
Big science and long-tail science. Term attributed to Jim Downing, http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=938
[25]
Large Synoptic Survey Telescope, http://www.lsst.org/
[26]
Mackinlay, J.: Automating the design of graphical presentations of relational information. ACM Transactions on Graphics 5, 110-141 (1986).
[27]
Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with cupid. In: VLDB (2001).
[28]
Microsoft SQL Azure, http://www.microsoft.com/windowsazure/sqlazure/
[29]
Norman, D.: The design of everyday things. Doubleday, New York (1990).
[30]
Sloan Digital Sky Survey, http://cas.sdss.org
[31]
Yang, D.X., Procopiuc, C.M.: Summarizing relational databases. In: Proc. VLDB Endowment, vol. 2(1), pp. 634-645 (2009).

Cited By

View all
  • (2018)ERMrestProceedings of the 30th International Conference on Scientific and Statistical Database Management10.1145/3221269.3222333(1-12)Online publication date: 9-Jul-2018
  • (2018)Contextual Intelligence for Unified Data GovernanceProceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management10.1145/3211954.3211955(1-9)Online publication date: 10-Jun-2018
  • (2016)SQLShareProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2882957(281-293)Online publication date: 26-Jun-2016
  • Show More Cited By

Index Terms

  1. Database-as-a-service for long-tail science
        Index terms have been assigned to the content through auto-classification.

        Comments

        Information & Contributors

        Information

        Published In

        cover image Guide Proceedings
        SSDBM'11: Proceedings of the 23rd international conference on Scientific and statistical database management
        July 2011
        601 pages
        ISBN:9783642223501

        Sponsors

        • Paradigm4 Inc.: Paradigm4 Inc.
        • Microsoft Research: Microsoft Research
        • Gordon and Betty Moore Foundation: Gordon and Betty Moore Foundation
        • eScience Institute: eScience Institute

        Publisher

        Springer-Verlag

        Berlin, Heidelberg

        Publication History

        Published: 20 July 2011

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 30 Aug 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2018)ERMrestProceedings of the 30th International Conference on Scientific and Statistical Database Management10.1145/3221269.3222333(1-12)Online publication date: 9-Jul-2018
        • (2018)Contextual Intelligence for Unified Data GovernanceProceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management10.1145/3211954.3211955(1-9)Online publication date: 10-Jun-2018
        • (2016)SQLShareProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2882957(281-293)Online publication date: 26-Jun-2016
        • (2016)Species distribution modeling in the cloudConcurrency and Computation: Practice & Experience10.1002/cpe.303028:4(1056-1079)Online publication date: 25-Mar-2016
        • (2015)Query from examplesProceedings of the VLDB Endowment10.14778/2831360.28313698:13(2158-2169)Online publication date: 1-Sep-2015
        • (2015)GENProceedings of the 27th International Conference on Scientific and Statistical Database Management10.1145/2791347.2791363(1-5)Online publication date: 29-Jun-2015
        • (2015)Towards automated prediction of relationships among scientific datasetsProceedings of the 27th International Conference on Scientific and Statistical Database Management10.1145/2791347.2791358(1-5)Online publication date: 29-Jun-2015
        • (2015)A form-based query interface for complex queriesJournal of Visual Languages and Computing10.1016/j.jvlc.2015.03.00129:C(15-53)Online publication date: 1-Aug-2015
        • (2014)The database group at the University of WashingtonACM SIGMOD Record10.1145/2627692.262770143:1(39-44)Online publication date: 13-May-2014
        • (2014)Helping scientists reconnect their datasetsProceedings of the 26th International Conference on Scientific and Statistical Database Management10.1145/2618243.2618263(1-12)Online publication date: 30-Jun-2014
        • Show More Cited By

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media