Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
/ sn2 Public

Dataset for stock prediction from tweets and historical stock prices.

Notifications You must be signed in to change notification settings

koa-fin/sn2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

sep-dataset

This repository releases the dataset for "Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models" [Paper].

Note: For text data, only the raw dataset is used in our work. The preprocessed dataset was used to conduct ablation studies with existing models.

Dataset Overview

Price and tweet data from 2020 to 2022 of 55 stocks, coming from the top 5 stocks in 11 industries.

The full list of stocks and their companies can be found in stocktable.pdf.

Data Components

This dataset comprises two main components,

Data Format

We collect data in the same format as the Stocknet Dataset.

As the number of tweets have increased exponentially, we also employed a clustering pipeline to obtain the most representative tweets for each day.

Raw Tweet Data

Format: JSON
Keys: see Introduction to Tweet JSON

Preprocessed Tweet Data

Format: JSON
Keys: 'text', 'created_at', 'user_id_str'

Raw Price Data

Format: CSV
Entries: date, open price, high price, low price, close price, adjusted close price, volume

Preprocessed Price Data

Format: TXT
Entries: date, close price, open price, high price, low price, close price change, volume
Note: open, high, low, close prices are normalized with the last close price, $p_t = {\tilde{p}_t / \tilde{p}^c_{t-1}}-1$.

Citation

If you use this dataset, please cite our paper.

@inproceedings{koa2024learning,
  title={Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models},
  author={Koa, Kelvin J.L. and Ma, Yunshan and Ng, Ritchie and Chua, Tat-Seng},
  booktitle={Proceedings of the ACM on Web Conference 2024},
  pages={4304–4315},
  year={2024}
}

About

Dataset for stock prediction from tweets and historical stock prices.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published