Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1620754.1620794dlproceedingsArticle/Chapter ViewAbstractPublication PagesnaaclConference Proceedingsconference-collections
research-article
Free access

Predicting risk from financial reports with regression

Published: 31 May 2009 Publication History

Abstract

We address a text regression problem: given a piece of text, predict a real-world continuous quantity associated with the text's meaning. In this work, the text is an SEC-mandated financial report published annually by a publicly-traded company, and the quantity to be predicted is volatility of stock returns, an empirical measure of financial risk. We apply well-known regression techniques to a large corpus of freely available financial reports, constructing regression models of volatility for the period following a report. Our models rival past volatility (a strong baseline) in predicting the target variable, and a single model that uses both can significantly outperform past volatility. Interestingly, our approach is more accurate for reports after the passage of the Sarbanes-Oxley Act of 2002, giving some evidence for the success of that legislation in making financial reports more informative.

References

[1]
J. S. Albrecht and R. Hwa. 2007. Regression for sentence-level MT evaluation with pseudo references. In Proc. of ACL.
[2]
W. Antweiler and M. Z. Frank. 2004. Is all that talk just noise? the information content of internet stock message boards. Journal of Finance, 59:1259--1294.
[3]
F. Biadsy, J. Hirschberg, and E. Filatova. 2008. An unsupervised approach to biography production using Wikipedia. In Proc. of ACL.
[4]
D. M. Blei and J. D. McAuliffe. 2007. Supervised topic models. In Advances in NIPS 21.
[5]
T. Bollerslev. 1986. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31:307--327.
[6]
R. Bosch and J. Smith. 1998. Separating hyperplanes and the authorship of the disputed Federalist papers. American Mathematical Monthly, 105(7):601--608.
[7]
W. B. Cavnar and J. M. Trenkle. 1994. n-gram-based text categorization. In Proc. of SDAIR.
[8]
S. Das and M. Chen. 2001. Yahoo for Amazon: Extracting market sentiment from stock mesage boards. In Proc. of Asia Pacific Finance Association Annual Conference.
[9]
H. Drucker, C. J. C. Burges, L. Kaufman, A. Smola, and V. Vapnik. 1997. Support vector regression machines. In Advances in NIPS 9.
[10]
B. Dumas, A. Kurshev, and R. Uppal. 2007. Equilibrium portfolio strategies in the presence of sentiment risk and excess volatility. Swiss Finance Institute Research Paper No. 07--37.
[11]
J. Engelberg. 2007. Costly information processing: Evidence from earnings announcements. Working paper, Northwestern University.
[12]
R. F. Engle. 1982. Autoregressive conditional heteroscedasticity with estimates of variance of united kingdom inflation. Econometrica, 50:987--1008.
[13]
E. F. Fama. 1970. Efficient capital markets: A review of theory and empirical work. Journal of Finance, 25(2):383--417.
[14]
C. Gaa. 2007. Media coverage, investor inattention, and the market's reaction to news. Working paper, University of British Columbia.
[15]
T. Joachims. 1998. Text categorization with support vector machines: Learning with many relevant features. In Proc. of ECML.
[16]
T. Joachims. 1999. Making large-scale SVM learning practical. In Advances in Kernel Methods - Support Vector Learning. MIT Press.
[17]
J. Karlgren and D. Cutting. 1994. Recognizing text genres with simple metrics using discriminant analysis. In Proc. of COLING.
[18]
M. Koppel and I. Shtrimberg. 2004. Good news or bad news? let the market decide. In AAAI Spring Symposium on Exploring Attitude and Affect in Text: Theories and Applications.
[19]
V. Lavrenko, M. Schmill, D. Lawrie, P. Ogilvie, D. Jensen, and J. Allan. 2000a. Language models for financial news recommendation. In Proc. of CIKM.
[20]
V. Lavrenko, M. Schmill, D. Lawrie, P. Ogilvie, D. Jensen, and J. Allan. 2000b. Mining of concurrent text and time series. In Proc. of KDD.
[21]
K. Lerman, A. Gilder, M. Dredze, and F. Pereira. 2008. Reading the markets: Forecasting public opinion of political candidates by news analysis. In COLING.
[22]
F. Li. 2005. Do stock market investors understand the risk sentiment of corporate annual reports? Working Paper, University of Michigan.
[23]
R. Merton. 1974. On the pricing of corporate debt: The risk structure of interest rates. Journal of Finance, 29:449--470.
[24]
B. Pang, L. Lee, and S. Vaithyanathan. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proc. of EMNLP.
[25]
M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. 1998. A Bayesian approach to filtering junk email. In Proc. of AAAI Workshop on Learning for Text Categorization.
[26]
B. Schölkopf and A. J. Smola. 2002. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press.
[27]
P. C. Tetlock, M. Saar-Tsechansky, and S. Macskassy. 2008. More than words: Quantifying language to measure firms' fundamentals. Journal of Finance, 63(3):1437--1467.
[28]
P. C. Tetlock. 2007. Giving content to investor sentiment: The role of media in the stock market. Journal of Finance, 62(3):1139--1168.
[29]
K. Weiss-Hanley and G. Hoberg. 2008. Strategic disclosure and the pricing of initial public offerings. Working paper.
[30]
J. Wiebe and E. Riloff. 2005. Creating subjective and objective sentence classifiers from unannotated texts. In CICLing.
[31]
Y. Yang and C. G. Chute. 1992. A linear least squares fit mapping method for information retrieval from natural language texts. In Proc. of COLING.
[32]
Y. Yang and C. G. Chute. 1993. An application of least squares fit mapping to text information retrieval. In Proc. of SIGIR.

Cited By

View all
  • (2024)ECC Analyzer: Extracting Trading Signal from Earnings Conference Calls using Large Language Model for Stock Volatility PredictionProceedings of the 5th ACM International Conference on AI in Finance10.1145/3677052.3698689(257-265)Online publication date: 14-Nov-2024
  • (2022)MONOPOLYProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548380(2276-2285)Online publication date: 10-Oct-2022
  • (2021)Goal-Directed Extractive Summarization of Financial ReportsProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482113(2817-2821)Online publication date: 26-Oct-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
NAACL '09: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
May 2009
716 pages
ISBN:9781932432411

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 31 May 2009

Qualifiers

  • Research-article

Acceptance Rates

Overall Acceptance Rate 21 of 29 submissions, 72%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)132
  • Downloads (Last 6 weeks)16
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)ECC Analyzer: Extracting Trading Signal from Earnings Conference Calls using Large Language Model for Stock Volatility PredictionProceedings of the 5th ACM International Conference on AI in Finance10.1145/3677052.3698689(257-265)Online publication date: 14-Nov-2024
  • (2022)MONOPOLYProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548380(2276-2285)Online publication date: 10-Oct-2022
  • (2021)Goal-Directed Extractive Summarization of Financial ReportsProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482113(2817-2821)Online publication date: 26-Oct-2021
  • (2020)Predicting In-Game Actions from Interviews of NBA PlayersComputational Linguistics10.1162/coli_a_0038346:3(667-712)Online publication date: 1-Nov-2020
  • (2020)Knowledge Graph-based Event Embedding Framework for Financial Quantitative InvestmentsProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401427(2221-2230)Online publication date: 25-Jul-2020
  • (2020)Multimodal Multi-Task Financial Risk ForecastingProceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3413752(456-465)Online publication date: 12-Oct-2020
  • (2019)Differentially private iterative gradient hard thresholding for sparse learningProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367471.3367561(3740-3747)Online publication date: 10-Aug-2019
  • (2018)Hybrid Deep Sequential Modeling for Social Text-Driven Stock PredictionProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3269290(1627-1630)Online publication date: 17-Oct-2018
  • (2018)Output Fisher embedding regressionMachine Language10.1007/s10994-018-5698-0107:8-10(1229-1256)Online publication date: 1-Sep-2018
  • (2018)Relative-error approximate versions of Douglas---Rachford splitting and special cases of the ADMMMathematical Programming: Series A and B10.1007/s10107-017-1160-5170:2(417-444)Online publication date: 1-Aug-2018
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media