Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Indexing the Pickup and Drop-Off Locations of NYC Taxi Trips in PostgreSQL – Lessons from the Road

  • Conference paper
  • First Online:
Advances in Spatial and Temporal Databases (SSTD 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10411))

Included in the following conference series:

Abstract

In this paper, we present our experience in indexing the drop-off and pick-up locations of taxi trips in New York City. The paper presents a comprehensive experimental analysis of classic and state-of-the-art spatial database indexing schemes. The paper evaluates a popular spatial tree indexing scheme (i.e., GIST-Spatial), a Block Range Index (BRIN-Spatial) provided by PostgreSQL as well as a new indexing scheme, namely Hippo-Spatial. In the experiments, the paper considers five evaluation metrics to compare and contrast the performance of the three indexing schemes: storage overhead, index initialization time, query response time, maintenance overhead, and throughput. Furthermore, the benchmark takes into account parameters that affect the index performance, which include but is not limited to: data size, spatial query selectivity, and spatial area density, The paper finally analyzes the experimental evaluation results and highlights the key insights and lessons learned. The results emphasize the fact that there is no one size that fits all when it comes to indexing massive-scale spatial data. The results also prove that modern database systems can maintain a lightweight index (in terms of storage and maintenance overhead) that is also fast enough for spatial data analytics applications. The source code for the experiments presented in the paper is available here: https://github.com/DataSystemsLab/hippo-postgresql.

This work is supported by the National Science Foundation Grant 1654861.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Source code: https://github.com/DataSystemsLab/hippo-postgresql.

References

  1. New york city taxi and limousine commission. http://www.nyc.gov/html/tlc/html/about/trip_record_data.html

  2. Aoki, P.M.: Generalizing “search” in generalized search trees. In: Proceedings of the 14th International Conference on Data Engineering, pp. 380–389. IEEE (1998)

    Google Scholar 

  3. Block range index. https://www.postgresql.org/docs/9.6/static/brin.html

  4. Postgis - spatial and geographic objects for postgresql. http://postgis.net

  5. Postgresql: a powerful, open source object-relational database system. https://www.postgresql.org/

  6. Bontempo, C., Zagelow, G.: The IBM data warehouse architecture. Commun. ACM 41(9), 38–48 (1998)

    Article  Google Scholar 

  7. Earth science data. https://earthdata.nasa.gov

  8. Comer, D.: Ubiquitous b-tree. ACM Comput. Surv. CSUR 11(2), 121–137 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  9. Corral, A., Vassilakopoulos, M., Manolopoulos, Y.: Algorithms for joining R-trees and linear region quadtrees. In: Güting, R.H., Papadias, D., Lochovsky, F. (eds.) SSD 1999. LNCS, vol. 1651, pp. 251–269. Springer, Heidelberg (1999). doi:10.1007/3-540-48482-5_16

    Chapter  Google Scholar 

  10. Finkel, R.A., Bentley, J.L.: Quad trees: a data structure for retrieval of composite keys. Acta Inf. 4(1), 1–9 (1974)

    Article  MATH  Google Scholar 

  11. Fusco, F., Stoecklin, M.P., Vlachos, M.: Net-fli: on-the-fly compression, archiving and indexing of streaming network traffic. VLDB J. 3(1–2), 1382–1393 (2010)

    Google Scholar 

  12. Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing relations and indexes. In: Proceedings of the International Conference on Data Engineering, ICDE, pp. 370–379. IEEE (1998)

    Google Scholar 

  13. Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the ACM International Conference on Management of Data, SIGMOD, pp. 47–57. ACM (1984)

    Google Scholar 

  14. Hellerstein, J.M.: Generalized search tree. In: Liu, L., Tamer Özsu, M. (eds.) Encyclopedia of Database Systems, pp. 1222–1224. Springer, US (2009)

    Google Scholar 

  15. Hellerstein, J.M., Naughton, J.F., Pfeffer, A.: Generalized search trees for database systems, September 1995

    Google Scholar 

  16. Kamel, I., Faloutsos, C.: Hilbert R-tree: an improved R-tree using fractals. In: Proceedings of the International Conference on Very Large Data Bases, VLDB, September 1994

    Google Scholar 

  17. Kamel, I., Khalil, M., Kouramajian, V.: Bulk insertion in dynamic R-trees. In: Proceedings of the International Symposium on Spatial Data Handling, SDH, pp. 31–42 (1996)

    Google Scholar 

  18. Kornacker, M., Mohan, C., Hellerstein, J.M.: Concurrency and recovery in generalized search trees. ACM SIGMOD Rec. 26, 62–72 (1997). ACM

    Article  Google Scholar 

  19. Lee, M.-L., Hsu, W., Jensen, C.S., Cui, B., Teo, K.L.: Supporting frequent updates in R-trees: a bottom-up approach. In: Proceedings of the International Conference on Very Large Data Bases, VLDB, pp. 608–619, September 2003

    Google Scholar 

  20. Samet, H., Webber, R.E.: Storing a collection of polygons using quadtrees. ACM Trans. Graph. TOG 4(3), 182–222 (1985)

    Article  Google Scholar 

  21. Sidirourgos, L., Kersten, M.L.: Column imprints: a secondary index structure. In: Proceedings of the ACM International Conference on Management of Data, SIGMOD, pp. 893–904. ACM (2013)

    Google Scholar 

  22. Ślezak, D., Eastwood, V.: Data warehouse technology by infobright. In: Proceedings of the ACM International Conference on Management of Data, SIGMOD, pp. 841–846. ACM (2009)

    Google Scholar 

  23. Stonebraker, M., Rowe, L.A.: The design of postgres. In: Proceedings of the ACM International Conference on Management of Data, SIGMOD, pp. 340–355. ACM (1986)

    Google Scholar 

  24. Tayeb, J., Ulusoy, Ö., Wolfson, O.: A quadtree-based dynamic attribute indexing method. Comput. J. 41(3), 185–200 (1998)

    Article  MATH  Google Scholar 

  25. Weiss, R.: A technical overview of the oracle exadata database machine and exadata storage server. Oracle White Paper. Oracle Corporation, Redwood Shores (2012)

    Google Scholar 

  26. Xu, X., Han, J., Lu, W.: RT-tree: an improved R-tree indexing structure for temporal spatial databases. In: Proceeding of the International Symposium on Spatial Data Handling, SSDH, pp. 1040–1049, July 1990

    Google Scholar 

  27. Yu, J., Moraffah, R., Sarwat, M.: Hippo in action: scalable indexing of a billion New York city taxi trips and beyond. In: Proceedings of the International Conference on Data Engineering, ICDE. IEEE (2017)

    Google Scholar 

  28. Jia, Y., Sarwat, M.: Two birds, one stone: a fast, yet lightweight, indexing scheme for modern database systems. Proc. VLDB Endowment 10(4), 385–396 (2016)

    Article  Google Scholar 

  29. Zukowski, M., Heman, S., Nes, N., Boncz, P.: Super-scalar RAM-CPU cache compression. In: Proceedings of the International Conference on Data Engineering, ICDE, pp. 59–59. IEEE (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jia Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Yu, J., Sarwat, M. (2017). Indexing the Pickup and Drop-Off Locations of NYC Taxi Trips in PostgreSQL – Lessons from the Road. In: Gertz, M., et al. Advances in Spatial and Temporal Databases. SSTD 2017. Lecture Notes in Computer Science(), vol 10411. Springer, Cham. https://doi.org/10.1007/978-3-319-64367-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64367-0_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64366-3

  • Online ISBN: 978-3-319-64367-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics