Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3604237.3626901acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicaifConference Proceedingsconference-collections
research-article

Multi-Modal Financial Time-Series Retrieval Through Latent Space Projections

Published: 25 November 2023 Publication History

Abstract

Financial firms commonly process and store billions of time-series data, generated continuously and at a high frequency. To support efficient data storage and retrieval, specialized time-series databases and systems have emerged. These databases support indexing and querying of time-series by a constrained Structured Query Language(SQL)-like format to enable queries like "Stocks with monthly price returns greater than 5%", and expressed in rigid formats. However, such queries do not capture the intrinsic complexity of high dimensional time-series data, which can often be better described by images or language (e.g., "A stock in low volatility regime"). Moreover, the required storage, computational time, and retrieval complexity to search in the time-series space are often non-trivial. In this paper, we propose and demonstrate a framework to store multi-modal data for financial time-series in a lower-dimensional latent space using deep encoders, such that the latent space projections capture not only the time series trends but also other desirable information or properties of the financial time-series data (such as price volatility). Moreover, our approach allows user-friendly query interfaces, enabling natural language text or sketches of time-series, for which we have developed intuitive interfaces. We demonstrate the advantages of our method in terms of computational efficiency and accuracy on real historical data as well as synthetic data, and highlight the utility of latent-space projections in the storage and retrieval of financial time-series data with intuitive query modalities.

References

[1]
Rakesh Agrawal, Giuseppe Psaila, Edward L. Wimmers, and Mohamed Zaït. 1995. Querying Shapes of Histories. In Proceedings of the 21th International Conference on Very Large Data Bases(VLDB ’95). 502–514.
[2]
Mohammad M Al-Khaldi, Joel T Johnson, Andrew J O’Brien, Anna Balenzano, and Francesco Mattia. 2019. Time-series retrieval of soil moisture using CYGNSS. IEEE Transactions on Geoscience and Remote Sensing 57, 7 (2019), 4322–4331.
[3]
Ira Assent, Ralph Krieger, Farzad Afschari, and Thomas Seidl. 2008. The TS-tree: efficient time series search and retrieval. In Proceedings of the 11th international conference on Extending database technology: Advances in database technology.
[4]
Fazl Barez, Paul Bilokon, and Ruijie Xiong. 2023. Benchmarking Specialized Databases for High-frequency Data. arXiv preprint arXiv:2301.12561 (2023).
[5]
Ledion Bitincka, Archana Ganapathi, Stephen Sorkin, and Steve Zhang. 2010. Optimizing data analysis with a semi-structured time series database. In Workshop on Managing Systems via Log Analysis and Machine Learning Techniques.
[6]
Jean-Philippe Bouchaud, Julius Bonart, Jonathan Donier, and Martin Gould. 2018. Trades, quotes and prices: financial markets under the microscope. Cambridge University Press.
[7]
Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, Portugal, 632–642. https://doi.org/10.18653/v1/D15-1075
[8]
Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah. 1993. Signature Verification Using a "Siamese" Time Delay Neural Network. In NeurIPS(NIPS’93). 737–744.
[9]
David Byrd. 2019. Explaining Agent-Based Financial Market Simulation. arxiv:1909.11650 [cs.MA]
[10]
David Byrd, Maria Hybinette, and Tucker Hybinette Balch. 2020. ABIDES: Towards high-fidelity multi-agent market simulation. In Proceedings of the 2020 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. 11–22.
[11]
Andrea Coletta, Sriram Gopalakrishan, Daniel Borrajo, and Svitlana Vyetrenko. 2023. On the Constrained Time-Series Generation Problem. arXiv preprint arXiv:2307.01717 (2023).
[12]
Andrea Coletta, Aymeric Moulin, Svitlana Vyetrenko, and Tucker Balch. 2022. Learning to simulate realistic limit order book markets from data as a World Agent. In Proceedings of the Third ACM International Conference on AI in Finance. 428–436.
[13]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv:1810.04805 [cs.CL]
[14]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning.
[15]
Yuanzhe Hao, Xiongpai Qin, Yueguo Chen, Yaru Li, Xiaoguang Sun, Yu Tao, Xiao Zhang, and Xiaoyong Du. 2021. Ts-benchmark: A benchmark for time series databases. In 2021 IEEE ICDE. IEEE, 588–599.
[16]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arxiv:1512.03385 [cs.CV]
[17]
Shima Imani, Sara Alaee, and Eamonn J. Keogh. 2019. Putting the Human in the Time Series Analytics Loop. Companion Proceedings of The 2019 World Wide Web Conference (2019). https://api.semanticscholar.org/CorpusID:153314304
[18]
Timescale Inc.2022. Time-series data simplified | Timescale. https://www.timescale.com/
[19]
InfluxData. 2022. influxdb: open source time series database. https://www.influxdata.com/products/influxdb-overview/
[20]
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data 7, 3 (2019), 535–547.
[21]
Eamonn Keogh, Stefano Lonardi, and Bill’Yuan-chi’ Chiu. 2002. Finding surprising patterns in a time series database in linear time and space. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. 550–556.
[22]
KX. 2022. Developing with kdb+ and the q language - Kdb+ and q documentation. https://code.kx.com/q/
[23]
Jia Liu, Yong Xue, Kaijun Ren, Junqiang Song, Christopher Windmill, and Patrick Merritt. 2019. High-performance time-series quantitative retrieval from satellite images on a GPU cluster. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 12, 8 (2019), 2810–2821.
[24]
Miro Mannino and Azza Abouzied. 2018. Expressive time series querying with hand-drawn scale-free sketches. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–13.
[25]
Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018).
[26]
Tomas Mikolov, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In International Conference on Learning Representations.
[27]
Bernt Oksendal. 1998. Stochastic Differential Equations, An Introduction with Applications. Springer.
[28]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, Vol. 139. PMLR, 8748–8763.
[29]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arxiv:1908.10084 [cs.CL]
[30]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. arxiv:1409.0575 [cs.CV]
[31]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. 2015 IEEE CVPR (2015), 815–823.
[32]
Bonil Shah, PM Jat, and Kalyan Sashidhar. 2022. Performance Study of Time Series Databases. arXiv preprint arXiv:2208.13982 (2022).
[33]
Dongjin Song, Ning Xia, Wei Cheng, Haifeng Chen, and Dacheng Tao. 2018. Deep r-th root of rank supervised joint binary embedding for multivariate time series retrieval. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2229–2238.
[34]
Ivona Tautkute, Tomasz Trzciński, Aleksander P Skorupa, Łukasz Brocki, and Krzysztof Marasek. 2019. Deepstyle: Multimodal search engine for fashion and interior design. IEEE Access 7 (2019), 84613–84628.
[35]
Svitlana Vyetrenko, David Byrd, Nick Petosa, Mahmoud Mahfouz, Danial Dervovic, Manuela Veloso, and Tucker Balch. 2020. Get real: Realism metrics for robust limit order book market simulations. In Proceedings of the First ACM International Conference on AI in Finance. 1–8.
[36]
Elaine Wah, Mason Wright, and Michael P. Wellman. 2017. Welfare Effects of Market Making in Continuous Double Auctions. (2017).
[37]
Wei Wang, Xiaoyan Yang, Beng Chin Ooi, Dongxiang Zhang, and Yueting Zhuang. 2016. Effective deep learning-based multi-modal retrieval. The VLDB Journal 25 (2016), 79–101.
[38]
Martin Wattenberg. 2001. Sketching a graph to query a time-series database. In CHI’01 Extended Abstracts on Human factors in Computing Systems. 381–382.
[39]
Adina Williams, Nikita Nangia, and Samuel Bowman. 2018. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 1112–1122. https://doi.org/10.18653/v1/N18-1101
[40]
Philip Winston. 2022. Time-Series Databases and Amazon Timestream. IEEE Software 39, 03 (2022), 126–128.
[41]
Yang Yang, Qiang Cao, and Hong Jiang. 2019. EdgeDB: An efficient time-series database for edge computing. IEEE Access 7 (2019), 142295–142307.
[42]
Jinsung Yoon, Daniel Jarrett, and Mihaela Van der Schaar. 2019. Time-series generative adversarial networks. Advances in neural information processing systems 32 (2019).
[43]
Dixian Zhu, Dongjin Song, Yuncong Chen, Cristian Lumezanu, Wei Cheng, Bo Zong, Jingchao Ni, Takehiko Mizoguchi, Tianbao Yang, and Haifeng Chen. 2020. Deep unsupervised binary coding networks for multivariate time series retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 1403–1411.

Cited By

View all
  • (2025)LLM-infused bi-level semantic enhancement for corporate credit risk predictionInformation Processing & Management10.1016/j.ipm.2025.10409162:4(104091)Online publication date: Jul-2025
  • (2024)Avaliando a Performance de SGBDs na Inserção e Consulta de Dados de Séries TemporaisAnais da XIX Escola Regional de Banco de Dados (ERBD 2024)10.5753/erbd.2024.238695(170-173)Online publication date: 18-May-2024
  • (2024)Anomaly Detection Scheme Using Global and Local Features in Time Series Data2024 15th International Conference on Information and Communication Technology Convergence (ICTC)10.1109/ICTC62082.2024.10827487(1997-1998)Online publication date: 16-Oct-2024

Index Terms

  1. Multi-Modal Financial Time-Series Retrieval Through Latent Space Projections
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        ICAIF '23: Proceedings of the Fourth ACM International Conference on AI in Finance
        November 2023
        697 pages
        ISBN:9798400702402
        DOI:10.1145/3604237
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 25 November 2023

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Time-series
        2. datasets
        3. neural networks
        4. text tagging

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        ICAIF '23

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)147
        • Downloads (Last 6 weeks)15
        Reflects downloads up to 18 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2025)LLM-infused bi-level semantic enhancement for corporate credit risk predictionInformation Processing & Management10.1016/j.ipm.2025.10409162:4(104091)Online publication date: Jul-2025
        • (2024)Avaliando a Performance de SGBDs na Inserção e Consulta de Dados de Séries TemporaisAnais da XIX Escola Regional de Banco de Dados (ERBD 2024)10.5753/erbd.2024.238695(170-173)Online publication date: 18-May-2024
        • (2024)Anomaly Detection Scheme Using Global and Local Features in Time Series Data2024 15th International Conference on Information and Communication Technology Convergence (ICTC)10.1109/ICTC62082.2024.10827487(1997-1998)Online publication date: 16-Oct-2024

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media