Bitcoin Price-Sentiment Analysis: Data Mining Project Report
Bitcoin Price-Sentiment Analysis: Data Mining Project Report
Bitcoin Price-Sentiment Analysis: Data Mining Project Report
June, 2014
Sarajevo
Contents
1.
Project Definition.................................................................................................. 3
2.
3.
1. Project Definition
Bitcoin is peer to peer version of electronic cash which allows users online
payments to be sent directly from one party to another without going
through financial institution. During last quarter of 2013 bitcoin started to
grow very rapidly and reached record price on November 29 of $1,242 per
coin. For comparison, during the same day spot gold prices hit a price of
$1,240 per ounce. Currently there are more than 12 million bitcoins in
circulation and the rate of new bitcoins will be halved every four years until
there is a maximum of 21 million coins. After record price of bitcoin in
November, price plunged to around $600 and then started to stagnate
around that price point with sight ups and downs. Today, price of bitcoin is
$617 and scored slight growth in May 2014. Because of stated facts where
price of virtual currency passes price of gold in one point of day, we will try
to analyze is there a correlation between twitter post called tweets and price
of bitcoin. If there is a correlation, that can be a good standing point for
predicting future plunges or jumps in terms of bitcoin price.
which must discard to get data we need. After that we must adjust that raw
data for inserting it to the tables which can later on be used for analysis.
discarded all data records with word count lower than 3 and records whose
tweet contain non ASCII characters because those are ones which we cannot
analyze with confidence. After these filtering methods, we have got around
1.1 million records which was 252MB. With data we acquired after filtering
we begin with sentiment analysis. Sentiment analysis is done with list of
words with valance, arousal and dominance. After successful sentiment
analysis we should get three dimensional map of tweet moods but sentiment
analysis will remove records which cannot be analyzed. After sentiment
analysis we were left with 80MB of data or 335 000 individual records and
each of them have new derived attributes related to sentiment analysis and
those are: mood, mean valence, mean arousal, mean dominance and
5
intensity of the mood. Attribute mood has 20 different values and each of
those can have different arousal, valence, dominance and intensity. After
preparing our data acquired from twitter, we must take historic bitcoin price
data with time information from one of the largest bitcoin exchange
websites. Next thing is to match each tweet with corresponding bitcoin price
by using relevant timestamp. Next thing we need to do is to adjust our data
set for WEKA. Because WEKA requires csv format we need to convert our
data set to that format.