The document discusses methods for inferring links between financial assets based on time series data, such as using correlation analysis of asset returns after controlling for common factors like market movements through techniques like principal component analysis. It provides examples of calculating returns from price data and removing the influence of the largest principal component from the returns before analyzing correlations between the assets. Various statistical and network analysis methods are presented for inferring relationships between financial entities from time series data on their performance.
1 of 36
More Related Content
Financial Networks V - Inferring Links
1. Center for Financial Studies at the Goethe University
PhD Mini-course
Frankfurt, 25 January 2013
Financial Networks
V. Inferring Links
Dr. Kimmo Soramäki
Founder and CEO
FNA, www.fna.fi
2. I. Mapping II. Mapping
Systemic Risk Financial Markets
2
3. Observing vs inferring
• Observing links
– Exposures, payment flow, trade, co-ownership, joint
board membership, etc.
– Cause of link is known
• Inferring links
– Observing the effects and inferring a relationship
– Cause of link is unknown
– Time series on asset prices, trade volumes, balance
sheet items
3
4. Agenda
V. Inferring Links
• Prices and Returns
• Controlling for common factors
• Correlation and dependence
• Significant correlations
• Multiple Comparisons
VI. Correlation Networks
• Distance and Hierarchical Clustering
• Minimum Spanning Tree & PMFG
• Filtering
• Visual Layouts
4
6. Prices vs Returns
• Benefit of using returns vs prices for correlations:
• Normalization. All variables are denoted in same units
• Required by some matrix algebra
• Returns (on asset prices) have low serial correlation
• Returns are what matter once investement is made
• We'll use returns
6
7. Log vs Arithmetics Returns
• Most often logarithmic returns are used in Finance. Why?
• Benefit of using log returns vs arithmetic returns
– Assumptions in statistical estimates (log-normal returns)
– Easier calculations (integration, compounding) lead to smaller
algorithmic complexity
• Normaliy assumptions make log returns better
• We'll use log returns
7
8. Data Issues
• Which prices
– daily (low, high, close)
– time-zone issues
– high-frequency
• Missing prices
– Errors in source data
– Partial holidays
– Weekends/holidays
– Exit and entry of assets
• Handling options
– Look up from other sources
– Edit values, e.g. replace with same as previous (prices) or 0 (returns)
– Exclude days/series from analysis
– Decide on weekend returns depending on data
8
9. Ticker Name Sector Weight
DAX stocks
ADS Adidas clothing and footwear 2.0
ALV Allianz insurance 6.7
BAS BASF speciality chemicals 9.6
BAYN Bayer speciality chemicals 7.6
BEI Beiersdorf personal products 0.9
BMW BMW automobile manufacturers 3.3
CBK Commerzbank credit banks 1.0
CON Continental car parts manufacturers 0.8
DAI Daimler automobile manufacturers 5.8
DB1 Deutsche Börse securities brokers 1.5
DBK Deutsche Bank credit banks 5.1
DPW Deutsche Post logistics 1.9
DTE Deutsche Telekom fixed-line telecommunication 5.4
EOAN E.ON multi-utilities 6.2
FME Fresenius Medical Care health care 2.2
FRE Fresenius health care 1.6
HEI HeidelbergCement building materials 0.9
HEN3 Henkel personal products 1.5
IFX Infineon Technologies semiconductors 1.3
LHA Deutsche Lufthansa airlines 0.8
LIN Linde industrial gases 3.8
LXS Lanxess specialty chemicals 0.7
MRK Merck pharmaceuticals 1.0
MUV2 Munich Re re-insurance 2.9
RWE RWE multi-utilities 2.2
SAP SAP software 7.7
SDF K+S commodity chemicals 1.2
SIE Siemens diversified industrials 10.0
TKA ThyssenKrupp diversified industrials 1.3
VOW3 Volkswagen Group automobile manufacturers 3.4
9
11. calculatereturns -command
• File (-file) : File with price data. The file must contain a table where each
row has the observation date as its first element, and asset prices as
subsequent elements separated by a field delimiter. The first line read must
contain names for each asset column (except the first column which
contains the date). Mandatory.
• Save As (-saveas) : File where result should be saved. Mandatory.
• Date format (-dateformat) : Format of date in input file. Default value yyyy-
MM-dd. Optional.
• Method of calculation (-method) : Arithmetic (arithmetic) or logarithmic
(log) returns. Mandatory. By default 'log'.
• Number of observations (-obs) : Larger than or equal to 2. Optional. By
default '2'.
• Length of interval (-interval) : Larger than or equal to 1. Optional. By
default '1'.
• Date order (-dateorder) : Date order in the file. Optional. Allowed values:
[asc, desc]. By default 'asc'.
11
14. Common Factors
• A common factor may be affecting all the returns
– E.g bullish/bearish market
• If we know the market return, we can deduct it from the returns to
look at excess returns to market
• If we don't know, we can try to control for this unknown common
factor via statistical methods
• Principal Component Analysis allows us to identify and remove the
common factor
14
15. Eigenvectors - example
A plot of the normalised
data (mean subtracted)
-> First component
explains maximum
variance
The data has two variables
(x and y), the covariance
matrix is 2x2 and thus
there are two Eigenvectors
The eigenvectors of
the covariance matrix are
overlayed as dotted lines
15
16. Eigenvectors of asset correlations
• The eigenvectors of a correlation matrix are orthogonal to each other. The
data can be projected in terms of its eigenvectors
• Each eigenvector represents a Priciple Component
– if values are similar, the component affects all returns similarly
• Eigenvalues scale with the share of variance explaned by each component
– Divide each by the sum to get share
• Laloux et al (1999) find on S&P500 (and other markets)
– The largest eigenvalue is well separated from the bulk and corresponds to the whole market
as the corresponding eigenvector has roughly equal components
– The next largest eigenvectors carry information about the real correlations and can be used
in identifying clusters of strongly interacting assets
– The smaller eigenvectors are noise (95%)
L. Laloux, P. Cizeau, J.-P. Bouchaud, M. Potters, Noise Dressing of Financial Correlation Matrices. Phys. Rev. Lett. 83 (1999)
1467 ; T. Heimo, J. Saramaki, J-P. Onnela, K. Kaski (2007). Spectral and network methods in the analysis of correlation
matrices of stock returns, Physica A 383, pp. 147–151 16
17. pca -command
• File (-file) : File with input data.
• Components (-components) : Components to keep, e.g. 2- or 1 or 2-10
Optional.
• Observations (-obs) : Number of observations in each period.
• Interval (-interval) : Number of observations between each period.
• Saveas (-saveas) : Name of file to store amended data.
Examples
# calculate returns corresponding to all but principal component
pca -file dax-returns-2011.csv -obs 30 -interval 30 -components 2- -saveas dax-montly1.csv
# calculate returns corresponding to second to 10th largest component
pca -file dax-returns-2011.csv -obs 30 -interval 30 -components 2-10 -saveas dax-montly2.csv
17
20. buildbycorrelation -command
• File (-file) : File from which to read data. To skip lines or use non-default field
delimiter use the 'delimiter' and 'skiplines' arguments. Mandatory.
• Save as property (-saveas) : Property name of saved result. Optional. By default
'correlation'.
• Number of observations (-obs) : Larger than or equal to 2. Optional.
• Length of interval (-interval) : Larger than or equal to 1. Optional.
• -commitinterval : Larger than or equal to 1. Optional. By default '10000'.
• Save standard deviation (-savestdev) : Option to save standard deviation as vertex
property. Optional. By default 'false'.
• Save average returns (-savereturns) : Option to save average returns as vertex
property. Optional. By default 'true'.
• Preserve (-preserve) : Do not overwrite existing networks in database. Optional. By
default 'true'.
20
21. • Missing values (-missing) : Handling of missing or non-numeric values in data.
Alternatives:
– Zero : missing values are considered as 0.
– NaN : default value corresponds to 'NaN and missing values' setting in 'Default format
settings'.
– Alert : missing values cause the calculation to fail.
Optional. Allowed values: [Zero, NaN, Alert]. By default 'Zero'.
• Start date (-start) : Starting date of calculation. Optional.
• End date (-end) : Ending date of calculation. Optional.
• Date format (-dateformat) : Format of date in input file. Default value corresponds to
'Date format' setting in 'Default format settings'. Optional.
• Date order (-dateorder) : Order of dates in the input file. If dates are ascending order,
the newest network has fewer observations than other networks. If dates are
descending order, the oldest network has fewer observations. Optional. Allowed
values: [asc, desc]. By default 'desc'.
21
23. Examples
# Calculate 100 day pairwise correlations with 1 day sliding window, consider
any missing/non-numeric return as 0
buildbycorrelationd -file daxreturns-2011.csv -missing Zero -days 100 -interval
1 -preserve false
# Calculate 100 day pairwise correlations with no sliding window, consider any
missing/non-numeric return as 0
buildbycorrelationd -file daxreturns-2011.csv -missing Zero -days 100 -interval
100 -preserve false
# Calculate pairwise correlations for whole data, alert about missing values
buildbycorrelation -file daxreturns-2011.csv -missing Alert -preserve false
23
24. Heatmap -command
• Creates time series of heatmaps
• Like viz -command
• Allows mapping of data to cell size, shape, color, label, hover
• Allows definition of color ranges and domains
# create heatmap from correlations
heatmap -sortv vertex_id -p correlation -symmetric true -
cellsizedefault 13 -transition 0 -cellhover correlation -colordomain (-1)-
1 -palette darkblue-lightgray-darkred -saveas daxheat-def-Y
24
26. About Color Perception
A and B are the same
shade of gray
http://upload.wikimedia.org/wikipedia/comm
ons/9/93/Optical_illusion_greysquares.gif
27. Anscombes Quartet
Constructed in 1973 by Francis Anscombe to demonstrate both the
importance of graphing data before analyzing it and the effect
of outliers on statistical properties
Mean of x 9
Variance of x 11
Mean of y ~7.50
Variance of y ~4.1
Correlation ~0.816
Linear regression:
y = 3.00 + 0.500x
Chatterjee, Sangit; Firat, Aykut (2007). "Generating Data with
Identical Statistics but Dissimilar Graphics: A Follow up to the
Anscombe Dataset". American Statistician 61 (3): 248–254.
27
28. Significance of Correlations
• We are estimating 'true' correlations by means of
samples
• Our estimates have errors
• Question: What is the chance that random data
exhibits an observed correlation?
28
29. Example - distribution of correlation in 30
trials with random numbers
20 pairs 50 pairs
100 pair 200 pairs
29
30. Test for significance
• Errors:
– Type I: False positive - we identify a correlation where there is
none
– Type II: False negative - we identify no correlation when there is
one
• We can control for Type I
– Fisher transformation -> Normal distr. -> Confidence Interval
• Possible to estimate Type II error rate - only more data
will bring this down
30
32. No signifance test, Principal Only significant correlations ( =0.05),
Component Removed Principal Component Removed
32
33. Problem of Multiple Comparisons
• Occurs when considering multiple simultanous estimates
– e.g. with correlation matrix of n=30 we have 30*30/2-30 = 420
estimates
– > too many false positives by random chance alone
– the system may not be accurate enough
• Familywise error rate (FWER) - Probability of making one or more
false positives among all the hypotheses when performing multiple
hypotheses tests.
– Solved with e.g. Bonferroni correction
• False discovery rate (FDR) - Proportion of significant correlations
that are false positives
– Benjamini-Hochberg -procedure
• Question: which is more important? Missing correlations that exist,
or disregarding ones that do? 33
34. 33-35
bonferroni -parameter
Enable Bonferroni correction (-bonferroni) : Option to use Bonferroni
correction in significance testing. Optional. By default 'false'.
Example
buildbycorrelation -file daxreturns.csv -missing Alert -obs
30 -interval 1 -alpha 0.05 -bonferroni true -preserve false
34
35. Only significant correlations Bonferroni correction, Only
( =0.05), Principal Component significant correlations ( =0.05),
Removed Principal Component Removed
35
36. Blog, Library and Demos at www.fna.fi
Dr. Kimmo Soramäki
kimmo@soramaki.net
Twitter: soramaki