Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3604237.3626899acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicaifConference Proceedingsconference-collections
research-article
Open access

A Fast Non-Linear Coupled Tensor Completion Algorithm for Financial Data Integration and Imputation

Published: 25 November 2023 Publication History

Abstract

Missing data imputation is crucial in finance to ensure accurate financial analysis, risk management, investment strategies, and other financial applications. Recently, tensor factorization and completion have gained momentum in many finance data imputation applications, primarily due to recent breakthroughs in applying deep neural networks for nonlinear tensor analysis. However, one limitation of these approaches is that they are prone to overfitting sparse tensors that contain only a small number of observations. This paper focuses on learning highly reliable embedding for the tensor imputation problem and applies orthogonal regularizations for tensor factorization, reconstruction, and completion. The proposed neural network architecture for sparse tensors, called “RegTensor”, includes multiple components: an embedding learning module for each tensor order, MLP (multilayer perception) to model nonlinear interactions among embeddings, and a regularization module to minimize overfitting problems due to the large tensor rank. Our algorithm is efficient in factorizing both single and multiple tensors (coupled tensor factorization) without incurring high training and optimization costs. We have applied this algorithm in a variety of practical scenarios, including the imputation of bond characteristics and financial analyst EPS forecast data. Experimental results demonstrate its superiority with significant performance improvements: 40%-74% better than linear tensor completion models and 2%-52% better than the state-of-the-art nonlinear models.

References

[1]
Evrim Acar, Daniel M Dunlavy, Tamara G Kolda, and Morten Mørup. 2011. Scalable tensor factorizations for incomplete data. Chemometrics and Intelligent Laboratory Systems 106, 1 (2011), 41–56.
[2]
Evrim Acar, Gözde Gürdeniz, Morten A Rasmussen, Daniela Rago, Lars O Dragsted, and Rasmus Bro. 2012. Coupled matrix factorization with sparse factors to identify potential biomarkers in metabolomics. In 2012 IEEE 12th International Conference on Data Mining Workshops. IEEE, 1–8.
[3]
Evrim Acar, Tamara G Kolda, and Daniel M Dunlavy. 2011. All-at-once optimization for coupled matrix and tensor factorizations. arXiv preprint arXiv:1105.3422 (2011).
[4]
Abdelmonem A Afifi and Robert M Elashoff. 1966. Missing observations in multivariate statistics I. Review of the literature. J. Amer. Statist. Assoc. 61, 315 (1966), 595–604.
[5]
Sanaz Bahargam and Evangelos E Papalexakis. 2018. Constrained coupled matrix-tensor factorization and its application in pattern and topic detection. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 91–94.
[6]
Turan G Bali, Amit Goyal, Dashan Huang, Fuwei Jiang, and Quan Wen. 2021. Different strokes: Return predictability across stocks and bonds with machine learning and big data. Swiss Finance Institute, Research Paper Series20-110 (2021).
[7]
Heiner Beckmeyer and Timo Wiedemann. 2023. Recovering missing firm characteristics with attention-based machine learning. Available at SSRN 4003455 (2023).
[8]
Svetlana Bryzgalova, Sven Lerner, Martin Lettau, and Markus Pelger. 2022. Missing financial data. Available at SSRN 4106794 (2022).
[9]
Ercument Cahan, Jushan Bai, and Serena Ng. 2023. Factor-based imputation of missing values and covariances in panel data of large dimensions. Journal of Econometrics 233, 1 (2023), 113–131.
[10]
J Douglas Carroll and Jih-Jie Chang. 1970. Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young” decomposition. Psychometrika 35, 3 (1970), 283–319.
[11]
Andrew Y Chen and Jack McCoy. 2022. Missing values and the dimensionality of expected returns. arXiv preprint arXiv:2207.13071 (2022).
[12]
Andrew Y Chen and Tom Zimmermann. 2021. Open source cross-sectional asset pricing. Critical Finance Review, Forthcoming (2021).
[13]
Dongjin Choi, Jun-Gi Jang, and Uksong Kang. 2017. Fast, accurate, and scalable method for sparse coupled matrix-tensor factorization. arXiv preprint arXiv:1708.08640 (2017).
[14]
Xiaomin Fang, Rong Pan, Guoxiang Cao, Xiuqiang He, and Wenyuan Dai. 2015. Personalized tag recommendation through nonlinear tensor factorization using gaussian kernel. In Twenty-Ninth AAAI Conference on Artificial Intelligence.
[15]
Joachim Freyberger, Björn Höppner, Andreas Neuhierl, and Michael Weber. 2022. Missing data in asset pricing panels. Technical Report. National Bureau of Economic Research.
[16]
Silvia Gandy, Benjamin Recht, and Isao Yamada. 2011. Tensor completion and low-n-rank tensor recovery via convex optimization. Inverse Problems 27, 2 (2011), 025010.
[17]
Shihao Gu, Bryan Kelly, and Dacheng Xiu. 2020. Empirical asset pricing via machine learning. The Review of Financial Studies 33, 5 (2020), 2223–2273.
[18]
Richard A Harshman 1970. Foundations of the PARAFAC procedure: Models and conditions for an" explanatory" multimodal factor analysis. (1970).
[19]
Lifang He, Chun-Ta Lu, Guixiang Ma, Shen Wang, Linlin Shen, Philip S Yu, and Ann B Ragin. 2017. Kernelized support tensor machines. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 1442–1451.
[20]
Zhenyu Hou, Xiao Liu, Yukuo Cen, Yuxiao Dong, Hongxia Yang, Chunjie Wang, and Jie Tang. 2022. Graphmae: Self-supervised masked graph autoencoders. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 594–604.
[21]
Bryan T. Kelly, Diogo Palhares, and Seth Pruitt. 2023. Modeling corporate bond returns. Journal of Finance forthcoming (2023).
[22]
Suleiman A Khan, Eemeli Leppäaho, and Samuel Kaski. 2016. Bayesian multi-tensor factorization. Machine Learning 105, 2 (2016), 233–253.
[23]
Yejin Kim, Robert El-Kareh, Jimeng Sun, Hwanjo Yu, and Xiaoqian Jiang. 2017. Discriminative and distinct phenotyping by constrained tensor factorization. Scientific reports 7, 1 (2017), 1–12.
[24]
Bin Liu, Lirong He, Yingming Li, Shandian Zhe, and Zenglin Xu. 2018. Neuralcp: Bayesian multiway data analysis with neural tensor decomposition. Cognitive Computation 10, 6 (2018), 1051–1061.
[25]
Hanpeng Liu, Yaguang Li, Michael Tsang, and Yan Liu. 2019. CoSTCo: A Neural Tensor Completion Model for Sparse Tensors. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Anchorage, AK, USA) (KDD ’19). Association for Computing Machinery, New York, NY, USA, 324–334. https://doi.org/10.1145/3292500.3330881
[26]
Ji Liu, Przemyslaw Musialski, Peter Wonka, and Jieping Ye. 2012. Tensor completion for estimating missing values in visual data. IEEE transactions on pattern analysis and machine intelligence 35, 1 (2012), 208–220.
[27]
Atsuhiro Narita, Kohei Hayashi, Ryota Tomioka, and Hisashi Kashima. 2012. Tensor factorization using auxiliary information. Data Mining and Knowledge Discovery 25, 2 (2012), 298–324.
[28]
Bernardino Romera-Paredes and Massimiliano Pontil. 2013. A new convex relaxation for tensor completion. In Advances in Neural Information Processing Systems. 2967–2975.
[29]
David E Rumelhart, James L McClelland, and CORPORATE PDP Research Group. 1986. Parallel distributed processing: Explorations in the microstructure of cognition, Vol. 1: Foundations. MIT press.
[30]
Amnon Shashua and Tamir Hazan. 2005. Non-negative tensor factorization with applications to statistics and computer vision. In Proceedings of the 22nd international conference on Machine learning. 792–799.
[31]
Qingquan Song, Xiao Huang, Hancheng Ge, James Caverlee, and Xia Hu. 2017. Multi-aspect streaming tensor completion. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 435–443.
[32]
Ledyard R Tucker. 1966. Some mathematical notes on three-mode factor analysis. Psychometrika 31, 3 (1966), 279–311.
[33]
Ajim Uddin, Xinyuan Tao, Chia-Ching Chou, and Dantong Yu. 2020. Nonlinear Tensor Completion Using Domain Knowledge: An Application in Analysts’ Earnings Forecast. In 2020 International Conference on Data Mining Workshops (ICDMW). IEEE, 377–384.
[34]
Ajim Uddin, Xinyuan Tao, Chia-Ching Chou, and Dantong Yu. 2022. Are missing values important for earnings forecasts? A machine learning perspective. Quantitative finance 22, 6 (2022), 1113–1132.
[35]
Ajim Uddin, Xinyuan Tao, Chia-Ching Chou, and Dantong Yu. 2022. Machine Learning for Earnings Prediction: A Nonlinear Tensor Approach for Data Integration and Completion. In Proceedings of the Third ACM International Conference on AI in Finance. 282–290.
[36]
Ajim Uddin, Xinyuan Tao, and Dantong Yu. 2021. Attention Based Dynamic Graph Learning Framework for Asset Pricing. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 1844–1853.
[37]
Qing Wu, Jie Wang, Jin Fan, Gang Xu, Jia Wu, Blake Johnson, Xingfei Li, Quan Do, and Ruiquan Ge. 2019. Improved coupled tensor factorization with its applications in health data analysis. Complexity 2019 (2019).
[38]
Xian Wu, Baoxu Shi, Yuxiao Dong, Chao Huang, and Nitesh V. Chawla. 2019. Neural Tensor Factorization for Temporal Interaction Learning. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (Melbourne VIC, Australia) (WSDM ’19). 537–545.
[39]
Ruoxuan Xiong and Markus Pelger. 2023. Large dimensional latent factor modeling with missing observations and applications to causal inference. Journal of Econometrics 233, 1 (2023), 271–301.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICAIF '23: Proceedings of the Fourth ACM International Conference on AI in Finance
November 2023
697 pages
ISBN:9798400702402
DOI:10.1145/3604237
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 November 2023

Check for updates

Author Tags

  1. Coupled Tensor Decomposition
  2. FinTech.
  3. Non-linear Tensor Factorization
  4. Sparse Tensor Completion

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICAIF '23

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 577
    Total Downloads
  • Downloads (Last 12 months)577
  • Downloads (Last 6 weeks)94
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media