Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3459637.3481989acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Form 10-Q Itemization

Published: 30 October 2021 Publication History

Abstract

The quarterly financial statement, or Form 10-Q, is one of the most frequently required filings for US public companies to disclose financial and other important business information. Due to the massive volume of 10-Q filings and the enormous variations in the reporting format, it has been a long-standing challenge to retrieve item-specific information from 10-Q filings that lack machine-readable hierarchy. This paper presents a solution for itemizing 10-Q files by complementing a rule-based algorithm with a Convolutional Neural Network (CNN) image classifier. This solution demonstrates a pipeline that can be generalized to a rapid data retrieval solution among a large volume of textual data using only typographic items. The extracted textual data can be used as unlabeled content-specific data to train transformer models (e.g., BERT) or fit into various field-focus natural language processing (NLP) applications.

Supplementary Material

MP4 File (CIKM_Form_10Q_10min.mp4)
Presentation video of the Form 10-Q Itemization paper. It includes a brief review of problem statement, demo website interface, pipeline design and performance.

References

[1]
Yang Bao and Anindya Datta. 2014. Simultaneously discovering and quantifying risk types from textual risk disclosures. Management Science, Vol. 60, 6 (2014), 1371--1391. ISBN: 0025--1909 Publisher: INFORMS.
[2]
Zhifeng Bao, Tok Wang Ling, Bo Chen, and Jiaheng Lu. 2009. Effective XML Keyword Search with Relevance Oriented Ranking. In 2009 IEEE 25th International Conference on Data Engineering. 517--528. https://doi.org/10.1109/ICDE.2009.16
[3]
Zhifeng Bao, Jiaheng Lu, Tok Wang Ling, and Bo Chen. 2010. Towards an Effective XML Keyword Search. IEEE Transactions on Knowledge and Data Engineering, Vol. 22, 8 (2010), 1077--1092. https://doi.org/10.1109/TKDE.2010.63
[4]
Andrew Begel, John Tang, Sean Andrist, Michael Barnett, Tony Carbary, Piali Choudhury, Edward Cutrell, Alberto Fung, Sasa Junuzovic, Daniel McDuff, et al. 2020. Lessons Learned in Designing AI for Autistic Adults. In The 22nd International ACM SIGACCESS Conference on Computers and Accessibility. 1--6.
[5]
Di Bo, Hoon Hwangbo, Vinit Sharma, Corey Arndt, and Stephanie C TerMaath. 2021. A Subspace-based Approach for Dimensionality Reduction and Important Variable Selection. arXiv preprint arXiv:2106.01584 (2021).
[6]
Jennifer N Carpenter, Fangzhou Lu, and Robert F Whitelaw. 2021 a. The Price and Quantity of Interest Rate Risk. Working Paper 28444. National Bureau of Economic Research. https://doi.org/10.3386/w28444
[7]
Jennifer N. Carpenter, Fangzhou Lu, and Robert F. Whitelaw. 2021 b. The real value of China's stock market. Journal of Financial Economics, Vol. 139, 3 (2021), 679--696.
[8]
Leshang Chen and Susan B. Davidson. 2020. Automating Software Citation using GitCite. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). 1754--1757. https://doi.org/10.1109/ICDE48307.2020.00162
[9]
Lingjiao Chen, Hongyi Wang, Leshang Chen, Paraschos Koutris, and Arun Kumar. 2019. Demonstration of Nimbus: Model-based Pricing for Machine Learning in a Data Marketplace. 1885--1888. https://doi.org/10.1145/3299869.3320231
[10]
Lauren Cohen, Christopher Malloy, and Quoc Nguyen. 2020. Lazy prices. The Journal of Finance, Vol. 75, 3 (2020), 1371--1415. ISBN: 0022-1082 Publisher: Wiley Online Library.
[11]
Rui Dai, Hao Liang, and Lilian Ng. 2020. Socially responsible corporate customers. Journal of Financial Economics (2020). https://doi.org/10.1016/j.jfineco.2020.01.003
[12]
Yu Ding, Lei Shi, and Zhigang Deng. 2017. Perceptual enhancement of emotional mocap head motion: An experimental study. In Proc. International Conference on Affective Computing and Intelligent Interaction. SA, TX.
[13]
Yu Ding, Lei Shi, and Zhigang Deng. 2020. Low-Level Characterization of Expressive Head Motion Through Frequency Domain Analysis. IEEE Transactions on Affective Computing, Vol. 11, 3 (2020), 405--418. https://doi.org/10.1109/TAFFC.2018.2805892
[14]
Tianming Du, Xuqing Liu, Honggang Zhang, and Bo Xu. 2018. Real-time lesion detection of cardiac coronary artery using deep neural networks. In 2018 International Conference on Network Infrastructure and Digital Content (IC-NIDC). IEEE, 150--154.
[15]
Tianming Du, Xiaotong Shi, Ruijia Wu, Honggang Zhang, Xiaolin Song, Jin Feng, and Fuzhuo Sun. 2019. Morphology Reconstruction of Obstructed Coronary Artery in Angiographic Images. In 2019 IEEE Visual Communications and Image Processing (VCIP). IEEE, 1--4.
[16]
Tianming Du, Xiaoru Wang, Junping Du, and Yuanyou Wang. 2015. An Algorithm for Image Classification Based on Semantic Transfer Learning. In Advanced Multimedia and Ubiquitous Engineering. Springer, 249--256.
[17]
Tianming Du, Lihua Xie, Honggang Zhang, Xuqing Liu, Xiaofei Wang, Donghao Chen, Yang Xu, Zhongwei Sun, Wenhui Zhou, Lei Song, et al. 2021 a. Training and validation of a deep learning architecture for the automatic analysis of coronary angiography. EuroIntervention: journal of EuroPCR in collaboration with the Working Group on Interventional Cardiology of the European Society of Cardiology, Vol. 17, 1 (2021), 32--40.
[18]
Tianming Du, Honggang Zhang, Yuemeng Li, Stephen Pickup, Mark Rosen, Rong Zhou, Hee Kwon Song, and Yong Fan. 2021 b. Adaptive convolutional neural networks for accelerating magnetic resonance imaging via k-space data interpolation. Medical Image Analysis, Vol. 72 (2021), 102098.
[19]
Tianming Du, Yanci Zhang, Xiaotong Shi, and Shuang Chen. 2020. Multiple Slice k-space Deep Learning for Magnetic Resonance Imaging Reconstruction. In 2020 42nd Annual International Conference of the IEEE Engineering in Medicine Biology Society (EMBC). 1564--1567. https://doi.org/10.1109/EMBC44109.2020.9175642
[20]
Travis Dyer, Mark Lang, and Lorien Stice-Lawrence. 2017. The evolution of 10-K textual disclosure: Evidence from Latent Dirichlet Allocation. Journal of Accounting and Economics, Vol. 64, 2--3 (2017), 221--245. ISBN: 0165--4101, Elsevier.
[21]
Chuck Fang. 2020. Insurance against Long-Run Volatility Risk: Demand, Supply, and Pricing. Capital Markets: Asset Pricing & Valuation eJournal (2020).
[22]
Thierry Foucault and Laurent Frésard. 2019. Corporate strategy, conformism, and the stock market. The Review of Financial Studies, Vol. 32, 3 (2019), 905--950. ISBN: 0893-9454 Publisher: Oxford University Press.
[23]
Fei Gao, Teresa Wu, Xianghua Chu, Hyunsoo Yoon, Yanzhe Xu, and Bhavika Patel. 2020 a. Deep Residual Inception Encoder--Decoder Network for Medical Imaging Synthesis. IEEE Journal of Biomedical and Health Informatics, Vol. 24, 1 (2020), 39--49. https://doi.org/10.1109/JBHI.2019.2912659
[24]
Fei Gao, H. Yoon, Yanzhe Xu, D. Goradia, Ji Luo, Teresa Wu, and Y. Su. 2020 b. AD-NET: Age-adjust neural network for improved MCI to AD conversion prediction. NeuroImage: Clinical, Vol. 27 (2020).
[25]
Hongye Guo. 2019. Underreaction, Overreaction, and Dynamic Autocorrelation of Stock Returns. Wealth Management eJournal (2019).
[26]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. 770--778. https://doi.org/10.1109/CVPR.2016.90
[27]
L. Hu, H. Yoon, J. Eschbacher, L. Baxter, A. Dueck, A. Nespodzany, K. Smith, P. Nakaji, Y. Xu, L. Wang, J. Karis, A. Hawkins-Daarud, K. Singleton, P. Jackson, B. Anderies, B. Bendok, R. Zimmerman, C. Quarles, A. B. Porter-Umphrey, M. Mrugala, A. Sharma, J. Hoxworth, M. Sattur, N. Sanai, P. E. Koulemberis, C. Krishna, J. R. Mitchell, T. Wu, N. Tran, K. Swanson, and J. Li. 2019. Accurate Patient-Specific Machine Learning Models of Glioblastoma Invasion Using Transfer Learning. American Journal of Neuroradiology, Vol. 40 (2019), 418 -- 425.
[28]
Xiao Huang, Zhenlong Li, Junyu Lu, Sicheng Wang, Hanxue Wei, and Baixu Chen. 2020. Time-series clustering for home dwell time during COVID-19: what can we learn from it? ISPRS International Journal of Geo-Information, Vol. 9, 11 (2020), 675.
[29]
Xiao Huang, Junyu Lu, Song Gao, Sicheng Wang, Zhewei Liu, and Hanxue Wei. 2021. Staying at Home Is a Privilege: Evidence from Fine-Grained Mobile Phone Location Data in the United States during the COVID-19 Pandemic. Annals of the American Association of Geographers (2021), 1--20.
[30]
Apple Inc. 2021. aapl-20201226. https://www.sec.gov/Archives/edgar/data/320193/000032019321000010/aapl-20201226.htm Retrieved March 6, 2021 from
[31]
Robert A Jarrow, Rinald Murataj, Martin T Wells, and Liao Zhu. 2021. The Low-volatility Anomaly and the Adaptive Multi-Factor Model. arXiv preprint arXiv:2003.08302 (2021).
[32]
Cheng Jie. 2018. Decision Making Under Uncertainty: New Models and Applications. Ph.,D. Dissertation.
[33]
Cheng Jie, Prashanth L.A., Michael Fu, Steve Marcus, and Csaba Szepesvári. 2018. Stochastic Optimization in a Cumulative Prospect Theory Framework. IEEE Trans. Automat. Control, Vol. 63, 9 (2018), 2867--2882. https://doi.org/10.1109/TAC.2018.2822658
[34]
Cheng Jie, Da Xu, Zigeng Wang, Lu Wang, and Wei-Yuan Shen. 2021. Bidding via Clustering Ads Intentions: an Efficient Search Engine Marketing System for E-commerce. ArXiv, Vol. abs/2106.12700 (2021).
[35]
Timothy Johnson, Ping Liu, and Yingjie Yu. 2018. The Private and Social Value of Capital Structure Commitment. SSRN Electronic Journal (01 2018). https://doi.org/10.2139/ssrn.3300371
[36]
Christopher Karvetski, Carolyn Meinel, Daniel Maxwell, Yunzi Lu, Barb Mellers, and Philip Tetlock. 2021. Forecasting the Accuracy of Forecasters from Properties of Forecasting Rationales. SSRN Electronic Journal (01 2021). https://doi.org/10.2139/ssrn.3779404
[37]
Prashanth L.A., Cheng Jie, Michael Fu, Steve Marcus, and Csaba Szepesvari. 2016. Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control. In Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 48), Maria Florina Balcan and Kilian Q. Weinberger (Eds.). PMLR, New York, New York, USA, 1406--1415.
[38]
Ha Le, Christos Smailis, Lei Shi, and Ioannis Kakadiaris. 2020. EDGE20: A Cross Spectral Evaluation Dataset for Multiple Surveillance Problems. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
[39]
Ross Levine, Chen Lin, Qilin Peng, and Wensi Xie. 2020. Communication within Banking Organizations and Small Business Lending. The Review of Financial Studies, Vol. 33, 12 (03 2020), 5750--5783. https://doi.org/10.1093/rfs/hhaa036
[40]
Feng Li. 2008. Annual report readability, current earnings, and earnings persistence. Journal of Accounting and economics, Vol. 45, 2--3 (2008), 221--247. Publisher: Elsevier.
[41]
Huan Li, Stacy Patterson, Yuhao Yi, and Zhongzhi Zhang. 2020. Maximizing the Number of Spanning Trees in a Connected Graph. IEEE Transactions on Information Theory, Vol. 66, 2 (2020), 1248--1260. https://doi.org/10.1109/TIT.2019.2940263
[42]
Huan Li, Richard Peng, Liren Shan, Yuhao Yi, and Zhongzhi Zhang. 2019. Current Flow Group Closeness Centrality for Complex Networks?. In The World Wide Web Conference (San Francisco, CA, USA) (WWW '19). Association for Computing Machinery, New York, NY, USA, 961--971. https://doi.org/10.1145/3308558.3313490
[43]
Yifei Li, Kuangyan Song, Yiming Sun, and Liao Zhu. 2021. FrequentNet: A Novel Interpretable Deep Learning Model for Image Classification. Available at SSRN: https://ssrn.com/abstract=3895462 (2021).
[44]
Kun Lin, Cheng Jie, and Steven I. Marcus. 2018. Probabilistically distorted risk-sensitive infinite-horizon dynamic programming. Automatica, Vol. 97 (2018), 1--6. https://doi.org/10.1016/j.automatica.2018.07.028
[45]
Xuqing Liu, Tianming Du, Honggang Zhang, and Chunlei Sun. 2019. Detection and Classification of Chronic Total Occlusion lesions using Deep Learning. In 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 828--831.
[46]
Tim Loughran and Bill McDonald. 2011. When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. The Journal of Finance, Vol. 66, 1 (2011), 35--65. ISBN: 0022--1082 Publisher: Wiley Online Library.
[47]
Zhiqiang Ma, Steven Pomerville, Mingyang Di, and Armineh Nourbakhsh. 2020. SPot: A Tool for Identifying Operating Segments in Financial Tables. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, China) (SIGIR '20). Association for Computing Machinery, New York, NY, USA, 2157--2160. https://doi.org/10.1145/3397271.3401406
[48]
A. PrashanthL., Cheng Jie, M. Fu, S. Marcus, and Csaba Szepesvari. 2016. Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control. In ICML.
[49]
Md Alimoor Reza, Hui Zheng, Georgios Georgakis, and Jana Kovs ecká. 2017. Label propagation in RGB-D video. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 4917--4922.
[50]
Lei Shi, Charles Livermore, and Ioannis A Kakadiaris. 2020 b. DVRNet: Decoupled Visible Region Network for Pedestrian Detection. In Proc. International Joint Conference on Biometrics. Houston, TX.
[51]
Lei Shi, Xiang Xu, and I. A. Kakadiaris. 2019 a. SANet: Smoothed Attention Network for Single Stage Face Detector. In Proc. International Conference on Biometrics. Crete, Greece.
[52]
Lei Shi, Xiang Xu, and I. A. Kakadiaris. 2019 b. SEFD: A Simple and Effective Single Stage Face Detector. In Proc. International Conference on Biometrics. Crete, Greece.
[53]
Lei Shi, Xiang Xu, and Ioannis A. Kakadiaris. 2020 c. Detecting Multi-Scale Faces Using Attention-Based Feature Fusion and Smoothed Context Enhancement. IEEE Transactions on Biometrics, Behavior, and Identity Science, Vol. 2, 3 (2020), 235--244. https://doi.org/10.1109/TBIOM.2020.2993242
[54]
Xiaotong Shi, Tianming Du, Shuang Chen, Honggang Zhang, Changdong Guan, and Bo Xu. 2020 a. UENet: A Novel Generative Adversarial Network for Angiography Image Segmentation. In 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 1612--1615.
[55]
Hao Tang, Xuming Chen, Yang Liu, Zhipeng Lu, Junhua You, Mingzhou Yang, Shengyu Yao, Guoqi Zhao, Yi Xu, Tingfeng Chen, et al. 2019 a. Clinically applicable deep learning framework for organs at risk delineation in CT images. Nature Machine Intelligence, Vol. 1, 10 (2019), 480--491.
[56]
Hao Tang, Daniel R Kim, and Xiaohui Xie. 2018. Automated pulmonary nodule detection using 3D deep convolutional neural networks. In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). IEEE, 523--526.
[57]
Hao Tang, Xingwei Liu, Kun Han, Xiaohui Xie, Xuming Chen, Huang Qian, Yong Liu, Shanlin Sun, and Narisu Bai. 2021 a. Spatial Context-Aware Self-Attention Model For Multi-Organ Segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 939--949.
[58]
Hao Tang, Xingwei Liu, Shanlin Sun, Xiangyi Yan, and Xiaohui Xie. 2021 b. Recurrent Mask Refinement for Few-Shot Medical Image Segmentation. arXiv preprint arXiv:2108.00622 (2021).
[59]
Hao Tang, Chupeng Zhang, and Xiaohui Xie. 2019 b. Nodulenet: Decoupled false positive reduction for pulmonary nodule detection and segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 266--274.
[60]
Yanzhe Xu, F. Gao, Teresa Wu, K. Bennett, J. Charlton, and Suryadipto Sarkar. 2019. U-Net with optimal thresholding for small blob detection in medical images. 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE) (2019), 1761--1767.
[61]
Yanzhe Xu, Teresa Wu, J. Charlton, F. Gao, and K. Bennett. 2020 a. Small Blob Detector Using Bi-Threshold Constrained Adaptive Scales. IEEE transactions on bio-medical engineering, Vol. PP (2020).
[62]
Yanzhe Xu, Teresa Wu, F. Gao, J. Charlton, and K. Bennett. 2020 b. Improved small blob detection in 3D images using jointly constrained deep learning and Hessian analysis. Scientific Reports, Vol. 10 (2020).
[63]
Xinyan Zhao, Mengqi Zhan, and Cheng Jie. 2018. Examining multiplicity and dynamics of publics' crisis narratives with large-scale Twitter data. Public Relations Review, Vol. 44, 4 (2018), 619--632. https://doi.org/10.1016/j.pubrev.2018.07.004
[64]
Liao Zhu. 2020. The Adaptive Multi-Factor Model and the Financial Market. eCommons.
[65]
Liao Zhu, Sumanta Basu, Robert A. Jarrow, and Martin T. Wells. 2020. High-Dimensional Estimation, Basis Assets, and the Adaptive Multi-Factor Model. The Quarterly Journal of Finance, Vol. 10, 04 (2020), 2050017.
[66]
Liao Zhu, Robert A. Jarrow, and Martin T. Wells. 2021 a. Time-Invariance Coefficients Tests with the Adaptive Multi-Factor Model. arXiv preprint arXiv:2011.04171 (2021).
[67]
Liao Zhu, Ningning Sun, and Martin T. Wells. 2021 b. Clustering Structure of Microstructure Measures. arXiv preprint arXiv:2107.02283 (2021).
[68]
Liao Zhu, Haoxuan Wu, and Martin T. Wells. 2021 c. A News-based Machine Learning Model for Adaptive Asset Pricing. arXiv preprint arXiv:2106.07103 (2021).

Cited By

View all
  • (2023)Predicting the Silent Majority on Graphs: Knowledge Transferable Graph Neural NetworkProceedings of the ACM Web Conference 202310.1145/3543507.3583287(274-285)Online publication date: 30-Apr-2023
  • (2023)Transformers Pay Attention to Convolutions Leveraging Emerging Properties of ViTs by Dual Attention-Image Network2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW60793.2023.00244(2296-2307)Online publication date: 2-Oct-2023
  • (2023)News-Based Sparse Machine Learning Models for Adaptive Asset PricingData Science in Science10.1080/26941899.2023.21878952:1Online publication date: 3-Apr-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
October 2021
4966 pages
ISBN:9781450384469
DOI:10.1145/3459637
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. earnings reports
  2. financial reports
  3. products and services
  4. regulation
  5. sec filing
  6. text tagging

Qualifiers

  • Short-paper

Conference

CIKM '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)37
  • Downloads (Last 6 weeks)7
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Predicting the Silent Majority on Graphs: Knowledge Transferable Graph Neural NetworkProceedings of the ACM Web Conference 202310.1145/3543507.3583287(274-285)Online publication date: 30-Apr-2023
  • (2023)Transformers Pay Attention to Convolutions Leveraging Emerging Properties of ViTs by Dual Attention-Image Network2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW60793.2023.00244(2296-2307)Online publication date: 2-Oct-2023
  • (2023)News-Based Sparse Machine Learning Models for Adaptive Asset PricingData Science in Science10.1080/26941899.2023.21878952:1Online publication date: 3-Apr-2023
  • (2022)Company-as-Tribe: Company Financial Risk Assessment on Tribe-Style Graph with Hierarchical Graph Neural NetworksProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539129(2712-2720)Online publication date: 14-Aug-2022
  • (2022)AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV51458.2022.00333(3270-3280)Online publication date: Jan-2022
  • (2022)Speculative Container Scheduling for Deep Learning Applications in a Kubernetes ClusterIEEE Systems Journal10.1109/JSYST.2021.312997416:3(3770-3781)Online publication date: Sep-2022
  • (2022)DS-UNeXt: depthwise separable convolution network with large convolutional kernel for medical image segmentationSignal, Image and Video Processing10.1007/s11760-022-02388-917:5(1775-1783)Online publication date: 16-Nov-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media