research-article

Multi-Modal Financial Time-Series Retrieval Through Latent Space Projections

Authors:

Andrea Coletta,

Elizabeth Fons,

Sriram Gopalakrishnan,

Svitlana Vyetrenko,

Manuela VelosoAuthors Info & Claims

ICAIF '23: Proceedings of the Fourth ACM International Conference on AI in Finance

Pages 498 - 506

https://doi.org/10.1145/3604237.3626901

Published: 25 November 2023 Publication History

Abstract

Financial firms commonly process and store billions of time-series data, generated continuously and at a high frequency. To support efficient data storage and retrieval, specialized time-series databases and systems have emerged. These databases support indexing and querying of time-series by a constrained Structured Query Language(SQL)-like format to enable queries like "Stocks with monthly price returns greater than 5%", and expressed in rigid formats. However, such queries do not capture the intrinsic complexity of high dimensional time-series data, which can often be better described by images or language (e.g., "A stock in low volatility regime"). Moreover, the required storage, computational time, and retrieval complexity to search in the time-series space are often non-trivial. In this paper, we propose and demonstrate a framework to store multi-modal data for financial time-series in a lower-dimensional latent space using deep encoders, such that the latent space projections capture not only the time series trends but also other desirable information or properties of the financial time-series data (such as price volatility). Moreover, our approach allows user-friendly query interfaces, enabling natural language text or sketches of time-series, for which we have developed intuitive interfaces. We demonstrate the advantages of our method in terms of computational efficiency and accuracy on real historical data as well as synthetic data, and highlight the utility of latent-space projections in the storage and retrieval of financial time-series data with intuitive query modalities.

References

[1]

Rakesh Agrawal, Giuseppe Psaila, Edward L. Wimmers, and Mohamed Zaït. 1995. Querying Shapes of Histories. In Proceedings of the 21th International Conference on Very Large Data Bases(VLDB ’95). 502–514.

Digital Library

[2]

Mohammad M Al-Khaldi, Joel T Johnson, Andrew J O’Brien, Anna Balenzano, and Francesco Mattia. 2019. Time-series retrieval of soil moisture using CYGNSS. IEEE Transactions on Geoscience and Remote Sensing 57, 7 (2019), 4322–4331.

[3]

Ira Assent, Ralph Krieger, Farzad Afschari, and Thomas Seidl. 2008. The TS-tree: efficient time series search and retrieval. In Proceedings of the 11th international conference on Extending database technology: Advances in database technology.

Digital Library

[4]

Fazl Barez, Paul Bilokon, and Ruijie Xiong. 2023. Benchmarking Specialized Databases for High-frequency Data. arXiv preprint arXiv:2301.12561 (2023).

[5]

Ledion Bitincka, Archana Ganapathi, Stephen Sorkin, and Steve Zhang. 2010. Optimizing data analysis with a semi-structured time series database. In Workshop on Managing Systems via Log Analysis and Machine Learning Techniques.

[6]

Jean-Philippe Bouchaud, Julius Bonart, Jonathan Donier, and Martin Gould. 2018. Trades, quotes and prices: financial markets under the microscope. Cambridge University Press.

[7]

Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, Portugal, 632–642. https://doi.org/10.18653/v1/D15-1075

[8]

Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah. 1993. Signature Verification Using a "Siamese" Time Delay Neural Network. In NeurIPS(NIPS’93). 737–744.

[9]

David Byrd. 2019. Explaining Agent-Based Financial Market Simulation. arxiv:1909.11650 [cs.MA]

[10]

David Byrd, Maria Hybinette, and Tucker Hybinette Balch. 2020. ABIDES: Towards high-fidelity multi-agent market simulation. In Proceedings of the 2020 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. 11–22.

Digital Library

[11]

Andrea Coletta, Sriram Gopalakrishan, Daniel Borrajo, and Svitlana Vyetrenko. 2023. On the Constrained Time-Series Generation Problem. arXiv preprint arXiv:2307.01717 (2023).

[12]

Andrea Coletta, Aymeric Moulin, Svitlana Vyetrenko, and Tucker Balch. 2022. Learning to simulate realistic limit order book markets from data as a World Agent. In Proceedings of the Third ACM International Conference on AI in Finance. 428–436.

Digital Library

[13]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv:1810.04805 [cs.CL]

[14]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning.

[15]

Yuanzhe Hao, Xiongpai Qin, Yueguo Chen, Yaru Li, Xiaoguang Sun, Yu Tao, Xiao Zhang, and Xiaoyong Du. 2021. Ts-benchmark: A benchmark for time series databases. In 2021 IEEE ICDE. IEEE, 588–599.

[16]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arxiv:1512.03385 [cs.CV]

[17]

Shima Imani, Sara Alaee, and Eamonn J. Keogh. 2019. Putting the Human in the Time Series Analytics Loop. Companion Proceedings of The 2019 World Wide Web Conference (2019). https://api.semanticscholar.org/CorpusID:153314304

[18]

Timescale Inc.2022. Time-series data simplified | Timescale. https://www.timescale.com/

[19]

InfluxData. 2022. influxdb: open source time series database. https://www.influxdata.com/products/influxdb-overview/

[20]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data 7, 3 (2019), 535–547.

[21]

Eamonn Keogh, Stefano Lonardi, and Bill’Yuan-chi’ Chiu. 2002. Finding surprising patterns in a time series database in linear time and space. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. 550–556.

Digital Library

[22]

KX. 2022. Developing with kdb+ and the q language - Kdb+ and q documentation. https://code.kx.com/q/

[23]

Jia Liu, Yong Xue, Kaijun Ren, Junqiang Song, Christopher Windmill, and Patrick Merritt. 2019. High-performance time-series quantitative retrieval from satellite images on a GPU cluster. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 12, 8 (2019), 2810–2821.

[24]

Miro Mannino and Azza Abouzied. 2018. Expressive time series querying with hand-drawn scale-free sketches. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–13.

Digital Library

[25]

Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018).

[26]

Tomas Mikolov, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In International Conference on Learning Representations.

[27]

Bernt Oksendal. 1998. Stochastic Differential Equations, An Introduction with Applications. Springer.

[28]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, Vol. 139. PMLR, 8748–8763.

[29]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arxiv:1908.10084 [cs.CL]

[30]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. arxiv:1409.0575 [cs.CV]

[31]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. 2015 IEEE CVPR (2015), 815–823.

[32]

Bonil Shah, PM Jat, and Kalyan Sashidhar. 2022. Performance Study of Time Series Databases. arXiv preprint arXiv:2208.13982 (2022).

[33]

Dongjin Song, Ning Xia, Wei Cheng, Haifeng Chen, and Dacheng Tao. 2018. Deep r-th root of rank supervised joint binary embedding for multivariate time series retrieval. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2229–2238.

Digital Library

[34]

Ivona Tautkute, Tomasz Trzciński, Aleksander P Skorupa, Łukasz Brocki, and Krzysztof Marasek. 2019. Deepstyle: Multimodal search engine for fashion and interior design. IEEE Access 7 (2019), 84613–84628.

[35]

Svitlana Vyetrenko, David Byrd, Nick Petosa, Mahmoud Mahfouz, Danial Dervovic, Manuela Veloso, and Tucker Balch. 2020. Get real: Realism metrics for robust limit order book market simulations. In Proceedings of the First ACM International Conference on AI in Finance. 1–8.

Digital Library

[36]

Elaine Wah, Mason Wright, and Michael P. Wellman. 2017. Welfare Effects of Market Making in Continuous Double Auctions. (2017).

[37]

Wei Wang, Xiaoyan Yang, Beng Chin Ooi, Dongxiang Zhang, and Yueting Zhuang. 2016. Effective deep learning-based multi-modal retrieval. The VLDB Journal 25 (2016), 79–101.

Digital Library

[38]

Martin Wattenberg. 2001. Sketching a graph to query a time-series database. In CHI’01 Extended Abstracts on Human factors in Computing Systems. 381–382.

[39]

Adina Williams, Nikita Nangia, and Samuel Bowman. 2018. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 1112–1122. https://doi.org/10.18653/v1/N18-1101

[40]

Philip Winston. 2022. Time-Series Databases and Amazon Timestream. IEEE Software 39, 03 (2022), 126–128.

[41]

Yang Yang, Qiang Cao, and Hong Jiang. 2019. EdgeDB: An efficient time-series database for edge computing. IEEE Access 7 (2019), 142295–142307.

[42]

Jinsung Yoon, Daniel Jarrett, and Mihaela Van der Schaar. 2019. Time-series generative adversarial networks. Advances in neural information processing systems 32 (2019).

[43]

Dixian Zhu, Dongjin Song, Yuncong Chen, Cristian Lumezanu, Wei Cheng, Bo Zong, Jingchao Ni, Takehiko Mizoguchi, Tianbao Yang, and Haifeng Chen. 2020. Deep unsupervised binary coding networks for multivariate time series retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 1403–1411.

Cited By

Lu SSu YZhang XChai JYu L(2025)LLM-infused bi-level semantic enhancement for corporate credit risk predictionInformation Processing & Management10.1016/j.ipm.2025.10409162:4(104091)Online publication date: Jul-2025
https://doi.org/10.1016/j.ipm.2025.104091
Lima MLichtnow D(2024)Avaliando a Performance de SGBDs na Inserção e Consulta de Dados de Séries TemporaisAnais da XIX Escola Regional de Banco de Dados (ERBD 2024)10.5753/erbd.2024.238695(170-173)Online publication date: 18-May-2024
https://doi.org/10.5753/erbd.2024.238695
Kim MKim SLee SKwon Y(2024)Anomaly Detection Scheme Using Global and Local Features in Time Series Data2024 15th International Conference on Information and Communication Technology Convergence (ICTC)10.1109/ICTC62082.2024.10827487(1997-1998)Online publication date: 16-Oct-2024
https://doi.org/10.1109/ICTC62082.2024.10827487

Index Terms

Multi-Modal Financial Time-Series Retrieval Through Latent Space Projections
1. Information systems
  1. Information retrieval
  2. Information systems applications
2. Theory of computation
  1. Theory and algorithms for application domains

Index terms have been assigned to the content through auto-classification.

Recommendations

Entropy-based Time-series Financial Distress Model Based on Attribute Selection and MetaCost Methods for Imbalance Class
AI2A '23: Proceedings of the 2023 3rd International Conference on Artificial Intelligence, Automation and Algorithms

Financial distress prediction is an important and challenging issue in the financial field. Now, many methods have been proposed to forecast company bankruptcy and financial crisis, and many studies show that artificial intelligence is better than ...
Generation of Realistic Synthetic Financial Time-series
Financial markets have always been a point of interest for automated systems. Due to their complex nature, financial algorithms and fintech frameworks require vast amounts of data to accurately respond to market fluctuations. This data availability is ...
Intelligent candlestick forecast system for financial time-series analysis using metaheuristics-optimized multi-output machine learning
Abstract
The effective prediction of stock market prices and trends is a critical topic in financial research for investors and stakeholders who wish to increase their return on investment. Motivated by highly unstable stock market targets and ...
Highlights
- FBI-MLSSVR outperforms previous methods and competitive hybrid models.
- FBI-...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICAIF '23: Proceedings of the Fourth ACM International Conference on AI in Finance

November 2023

697 pages

ISBN:9798400702402

DOI:10.1145/3604237

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 November 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICAIF '23

ICAIF '23: 4th ACM International Conference on AI in Finance

November 27 - 29, 2023

NY, Brooklyn, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
258
Total Downloads

Downloads (Last 12 months)147
Downloads (Last 6 weeks)15

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lu SSu YZhang XChai JYu L(2025)LLM-infused bi-level semantic enhancement for corporate credit risk predictionInformation Processing & Management10.1016/j.ipm.2025.10409162:4(104091)Online publication date: Jul-2025
https://doi.org/10.1016/j.ipm.2025.104091
Lima MLichtnow D(2024)Avaliando a Performance de SGBDs na Inserção e Consulta de Dados de Séries TemporaisAnais da XIX Escola Regional de Banco de Dados (ERBD 2024)10.5753/erbd.2024.238695(170-173)Online publication date: 18-May-2024
https://doi.org/10.5753/erbd.2024.238695
Kim MKim SLee SKwon Y(2024)Anomaly Detection Scheme Using Global and Local Features in Time Series Data2024 15th International Conference on Information and Communication Technology Convergence (ICTC)10.1109/ICTC62082.2024.10827487(1997-1998)Online publication date: 16-Oct-2024
https://doi.org/10.1109/ICTC62082.2024.10827487

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten