Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
44 views

0501 Indexing and Selecting Data

This document discusses indexing and selecting data from pandas DataFrames, Series, and Panels. It loads stock price data for SPY and TLT into DataFrames, then combines them into a Panel. It demonstrates selecting data from the Series, DataFrames, and Panel using .loc indexing by label. Selection is shown by single labels, label slices, and skipping rows. Indexes are examined for the Series, DataFrames and Panel.

Uploaded by

Kumara S
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

0501 Indexing and Selecting Data

This document discusses indexing and selecting data from pandas DataFrames, Series, and Panels. It loads stock price data for SPY and TLT into DataFrames, then combines them into a Panel. It demonstrates selecting data from the Series, DataFrames, and Panel using .loc indexing by label. Selection is shown by single labels, label slices, and skipping rows. Indexes are examined for the Series, DataFrames and Panel.

Uploaded by

Kumara S
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

2/8/2015 0501 

Indexing and Selecting Data

INDEXING AND SELECTING DATA

LIBRARIES
In [154]:

%pylab
import pandas as pd
from pandas.io.data import DataReader

Using matplotlib backend: Qt4Agg
Populating the interactive namespace from numpy and matplotlib

Data Load

DataFrame

In [155]:

df1 = DataReader("SPY", "yahoo", "20030101", "20150612")
df1.head()

Out[155]:

Open High Low Close Volume Adj Close

Date

2003­01­02 88.849998 91.300003 88.540001 91.070000 44516300 71.506562

2003­01­03 90.910004 91.379997 90.500000 91.349998 32222600 71.726413

2003­01­06 91.239998 93.489998 91.169998 92.959999 40984500 72.990557

2003­01­07 92.900002 93.370003 92.199997 92.730003 38640400 72.809968

2003­01­08 92.199997 92.400002 91.050003 91.389999 38702200 71.757821

http://localhost:8889/notebooks/0501%20Indexing%20and%20Selecting%20Data.ipynb 1/16
2/8/2015 0501 Indexing and Selecting Data

In [156]:

df1.describe()

Out[156]:

Open High Low Close Volume Adj Close

count 3133.000000 3133.000000 3133.000000 3133.000000 3.133000e+03 3133.000000

mean 133.114730 133.896227 132.266524 133.123869 1.424012e+08 119.240794

std 31.127787 31.080510 31.169782 31.131838 1.074667e+08 35.583383

min 67.949997 70.000000 67.099998 68.110001 8.055800e+06 59.877756

25% 112.580002 113.199997 112.010002 112.589996 6.207150e+07 95.767011

50% 127.839996 128.470001 126.959999 127.820000 1.148069e+08 110.950210

75% 145.830002 146.610001 145.029999 145.820007 1.883378e+08 129.994176

max 213.240005 213.779999 212.910004 213.500000 8.710263e+08 213.500000

In [157]:

df1.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3133 entries, 2003‐01‐02 00:00:00 to 2015‐06‐12 00:00:00
Data columns (total 6 columns):
Open         3133 non‐null float64
High         3133 non‐null float64
Low          3133 non‐null float64
Close        3133 non‐null float64
Volume       3133 non‐null int64
Adj Close    3133 non‐null float64
dtypes: float64(5), int64(1)
memory usage: 171.3 KB

Series

As a Series we are going to use one of the columns. E.g. the Adjustment Close

http://localhost:8889/notebooks/0501%20Indexing%20and%20Selecting%20Data.ipynb 2/16
2/8/2015 0501 Indexing and Selecting Data

In [158]:

s = df1['Adj Close']
s.head()

Out[158]:
Date
2003‐01‐02    71.506562
2003‐01‐03    71.726413
2003‐01‐06    72.990557
2003‐01‐07    72.809968
2003‐01‐08    71.757821
Name: Adj Close, dtype: float64

Panel

We are going to create a new dataframe and add to the first one ('SPY') for creating a Panel.

In [159]:

df2 = DataReader("TLT", "yahoo", "20030101", "20150612")
df2.head()

Out[159]:

Open High Low Close Volume Adj Close

Date

2003­01­02 87.699997 87.900002 86.209999 86.279999 192100 52.739388

2003­01­03 86.150002 86.540001 85.830002 86.480003 311200 52.861643

2003­01­06 86.169998 86.290001 85.809998 86.250000 35600 52.721051

2003­01­07 86.190002 86.750000 86.169998 86.559998 69800 52.910540

2003­01­08 86.879997 87.139999 86.779999 86.989998 160100 53.173381

In [160]:

p = pd.Panel({'df1': df1, 'df2': df2})
p.describe

Out[160]:

<bound method Panel.describe of <class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 3133 (major_axis) x 6 (minor_axis)
Items axis: df1 to df2
Major_axis axis: 2003‐01‐02 00:00:00 to 2015‐06‐12 00:00:00
Minor_axis axis: Open to Adj Close>

Index
http://localhost:8889/notebooks/0501%20Indexing%20and%20Selecting%20Data.ipynb 3/16
2/8/2015 0501 Indexing and Selecting Data

Series have only one index.
In this case it is the dates in which prices have been recorded.

In [161]:

s.index

Out[161]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2003‐01‐02, ..., 2015‐06‐12]
Length: 3133, Freq: None, Timezone: None

DataFrames have two indexes (the rows and the columns).
Rows: Dates 
Columns: Open, High, Low, ...

In [162]:

df1.index

Out[162]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2003‐01‐02, ..., 2015‐06‐12]
Length: 3133, Freq: None, Timezone: None

In [163]:

df1.columns

Out[163]:

Index([u'Open', u'High', u'Low', u'Close', u'Volume', u'Adj Close'], dtype='object')

Panels have three indexes or axis (items index, rows and columns).
Items: df1 and df2
Rows or major index: Dates
Columns or minor index: Open, High, Low, ...

In [164]:

p.axes

Out[164]:
[Index([u'df1', u'df2'], dtype='object'),
 <class 'pandas.tseries.index.DatetimeIndex'>
 [2003‐01‐02, ..., 2015‐06‐12]
 Length: 3133, Freq: None, Timezone: None,
 Index([u'Open', u'High', u'Low', u'Close', u'Volume', u'Adj Close'], dtype='objec
t')]

Selection by Index
http://localhost:8889/notebooks/0501%20Indexing%20and%20Selecting%20Data.ipynb 4/16
2/8/2015 0501 Indexing and Selecting Data

Selection by Index

.loc

It selects based in label values: 
Series: s.loc[indexer]
DataFrame: df.loc[row_indexer, column_indexer]
Panel: p.loc[item_indexer, major_indexer, minor_indexer]

Series

In [165]:

s.head()

Out[165]:

Date
2003‐01‐02    71.506562
2003‐01‐03    71.726413
2003‐01‐06    72.990557
2003‐01‐07    72.809968
2003‐01‐08    71.757821
Name: Adj Close, dtype: float64

In [166]:

s.loc["2003‐01‐07"]

Out[166]:

72.809968000000012

http://localhost:8889/notebooks/0501%20Indexing%20and%20Selecting%20Data.ipynb 5/16
2/8/2015 0501 Indexing and Selecting Data

In [167]:

s.loc[0]    # Error when index is not found, and not possible to select by position

‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
KeyError                                  Traceback (most recent call last)
<ipython‐input‐167‐4753738e9717> in <module>()
‐‐‐‐> 1 s.loc[0]    # Error when index is not found, and not possible to select by p
osition

C:\Anaconda\lib\site‐packages\pandas\core\indexing.pyc in __getitem__(self, key)
   1200             return self._getitem_tuple(key)
   1201         else:
‐> 1202             return self._getitem_axis(key, axis=0)
   1203 
   1204     def _getitem_axis(self, key, axis=0):

C:\Anaconda\lib\site‐packages\pandas\core\indexing.pyc in _getitem_axis(self, key, a
xis)
   1343 
   1344         # fall thru to straight lookup
‐> 1345         self._has_valid_type(key, axis)
   1346         return self._get_label(key, axis=axis)
   1347 

C:\Anaconda\lib\site‐packages\pandas\core\indexing.pyc in _has_valid_type(self, key, 
axis)
   1305                 raise
   1306             except:
‐> 1307                 error()
   1308 
   1309         return True

C:\Anaconda\lib\site‐packages\pandas\core\indexing.pyc in error()
   1292                         "cannot use label indexing with a null key")
   1293                 raise KeyError("the label [%s] is not in the [%s]" %
‐> 1294                                (key, self.obj._get_axis_name(axis)))
   1295 
   1296             try:

KeyError: 'the label [0] is not in the [index]'

In [168]:
s.loc["2003‐01‐07":"2003‐01‐09"]

Out[168]:

Date
2003‐01‐07    72.809968
2003‐01‐08    71.757821
2003‐01‐09    72.872778
Name: Adj Close, dtype: float64

http://localhost:8889/notebooks/0501%20Indexing%20and%20Selecting%20Data.ipynb 6/16
2/8/2015 0501 Indexing and Selecting Data

In [169]:

s.loc[::2]

Out[169]:

Date
2003‐01‐02    71.506562
2003‐01‐06    72.990557
2003‐01‐08    71.757821
2003‐01‐10    73.069074
2003‐01‐14    73.281076
2003‐01‐16    72.252483
2003‐01‐21    70.077531
2003‐01‐23    69.653531
2003‐01‐27    66.897539
2003‐01‐29    67.902578
2003‐01‐31    67.572797
2003‐02‐04    67.038873
2003‐02‐06    66.308653
2003‐02‐10    65.963176
2003‐02‐12    64.463475
...
2015‐05‐04    211.320007
2015‐05‐06    208.039993
2015‐05‐08    211.619995
2015‐05‐12    209.979996
2015‐05‐14    212.210007
2015‐05‐18    213.100006
2015‐05‐20    212.880005
2015‐05‐22    212.990005
2015‐05‐27    212.699997
2015‐05‐29    211.139999
2015‐06‐02    211.360001
2015‐06‐04    210.130005
2015‐06‐08    208.419998
2015‐06‐10    210.960007
2015‐06‐12    209.929993
Name: Adj Close, Length: 1567

DataFrames

http://localhost:8889/notebooks/0501%20Indexing%20and%20Selecting%20Data.ipynb 7/16
2/8/2015 0501 Indexing and Selecting Data

In [170]:

df1.head()

Out[170]:

Open High Low Close Volume Adj Close

Date

2003­01­02 88.849998 91.300003 88.540001 91.070000 44516300 71.506562

2003­01­03 90.910004 91.379997 90.500000 91.349998 32222600 71.726413

2003­01­06 91.239998 93.489998 91.169998 92.959999 40984500 72.990557

2003­01­07 92.900002 93.370003 92.199997 92.730003 38640400 72.809968

2003­01­08 92.199997 92.400002 91.050003 91.389999 38702200 71.757821

In [171]:

df1.loc["2003‐01‐07"]

Out[171]:
Open               92.900002
High               93.370003
Low                92.199997
Close              92.730003
Volume       38640400.000000
Adj Close          72.809968
Name: 2003‐01‐07 00:00:00, dtype: float64

In [172]:

df1.loc["2003‐01‐07", "Adj Close"]

Out[172]:
72.809968000000012

http://localhost:8889/notebooks/0501%20Indexing%20and%20Selecting%20Data.ipynb 8/16
2/8/2015 0501 Indexing and Selecting Data

In [173]:

df1.loc[:,"Adj Close"]

Out[173]:

Date
2003‐01‐02    71.506562
2003‐01‐03    71.726413
2003‐01‐06    72.990557
2003‐01‐07    72.809968
2003‐01‐08    71.757821
2003‐01‐09    72.872778
2003‐01‐10    73.069074
2003‐01‐13    73.045519
2003‐01‐14    73.281076
2003‐01‐15    72.550856
2003‐01‐16    72.252483
2003‐01‐17    71.184641
2003‐01‐21    70.077531
2003‐01‐22    69.229532
2003‐01‐23    69.653531
...
2015‐05‐22    212.990005
2015‐05‐26    210.699997
2015‐05‐27    212.699997
2015‐05‐28    212.460007
2015‐05‐29    211.139999
2015‐06‐01    211.570007
2015‐06‐02    211.360001
2015‐06‐03    211.919998
2015‐06‐04    210.130005
2015‐06‐05    209.770004
2015‐06‐08    208.419998
2015‐06‐09    208.449997
2015‐06‐10    210.960007
2015‐06‐11    211.649994
2015‐06‐12    209.929993
Name: Adj Close, Length: 3133

In [174]:

df1.loc[:, ["Open", "Close"]]

Out[174]:

Open Close

Date

2003­01­02 88.849998 91.070000

2003­01­03 90.910004 91.349998

2003­01­06 91.239998 92.959999

2003­01­07 92.900002 92.730003

2003­01­08 92.199997 91.389999


http://localhost:8889/notebooks/0501%20Indexing%20and%20Selecting%20Data.ipynb 9/16
2/8/2015 0501 Indexing and Selecting Data
2003­01­08 92.199997 91.389999

2003­01­09 91.820000 92.809998

2003­01­10 91.949997 93.059998

2003­01­13 93.540001 93.029999

2003­01­14 92.690002 93.330002

2003­01­15 93.540001 92.400002

2003­01­16 92.500000 92.019997

2003­01­17 90.989998 90.660004

2003­01­21 90.870003 89.250000

2003­01­22 88.769997 88.169998

2003­01­23 88.750000 88.709999

2003­01­24 88.589996 86.379997

2003­01­27 85.730003 85.199997

2003­01­28 85.629997 85.830002

2003­01­29 85.419998 86.480003

2003­01­30 86.790001 84.430000

2003­01­31 84.150002 86.059998

2003­02­03 86.139999 86.230003

2003­02­04 85.309998 85.379997

2003­02­05 85.750000 84.849998

2003­02­06 84.370003 84.449997

2003­02­07 84.910004 83.419998

2003­02­10 83.459999 84.010002

2003­02­11 84.370003 83.430000

2003­02­12 83.160004 82.099998

2003­02­13 82.150002 82.349998

... ... ...

2015­05­01 209.399994 210.720001

2015­05­04 211.229996 211.320007

Panels

http://localhost:8889/notebooks/0501%20Indexing%20and%20Selecting%20Data.ipynb 10/16
2/8/2015 0501 Indexing and Selecting Data

In [175]:

p.describe

Out[175]:

<bound method Panel.describe of <class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 3133 (major_axis) x 6 (minor_axis)
Items axis: df1 to df2
Major_axis axis: 2003‐01‐02 00:00:00 to 2015‐06‐12 00:00:00
Minor_axis axis: Open to Adj Close>

In [176]:

p.loc["df1", "2003‐01‐07", "Adj Close"]

Out[176]:

72.809968000000012

.iloc

It selects based in position in the index: 
Series: s.iloc[indexer]
DataFrame: df.iloc[row_indexer, column_indexer]
Panel: p.iloc[item_indexer, major_indexer, minor_indexer]

Series

In [177]:

s.head()

Out[177]:
Date
2003‐01‐02    71.506562
2003‐01‐03    71.726413
2003‐01‐06    72.990557
2003‐01‐07    72.809968
2003‐01‐08    71.757821
Name: Adj Close, dtype: float64

In [179]:

s.iloc[4]

Out[179]:
71.757820999999993

http://localhost:8889/notebooks/0501%20Indexing%20and%20Selecting%20Data.ipynb 11/16
2/8/2015 0501 Indexing and Selecting Data

In [180]:

s.iloc["2003‐01‐07"]

‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
TypeError                                 Traceback (most recent call last)
<ipython‐input‐180‐0a927e23536a> in <module>()
‐‐‐‐> 1 s.iloc["2003‐01‐07"]

C:\Anaconda\lib\site‐packages\pandas\core\indexing.pyc in __getitem__(self, key)
   1200             return self._getitem_tuple(key)
   1201         else:
‐> 1202             return self._getitem_axis(key, axis=0)
   1203 
   1204     def _getitem_axis(self, key, axis=0):

C:\Anaconda\lib\site‐packages\pandas\core\indexing.pyc in _getitem_axis(self, key, a
xis)
   1461 
   1462             else:
‐> 1463                 key = self._convert_scalar_indexer(key, axis)
   1464 
   1465                 if not com.is_integer(key):

C:\Anaconda\lib\site‐packages\pandas\core\indexing.pyc in _convert_scalar_indexer(se
lf, key, axis)
    167         ax = self.obj._get_axis(min(axis, self.ndim ‐ 1))
    168         # a scalar
‐‐> 169         return ax._convert_scalar_indexer(key, typ=self.name)
    170 
    171     def _convert_slice_indexer(self, key, axis):

C:\Anaconda\lib\site‐packages\pandas\core\index.pyc in _convert_scalar_indexer(self, 
key, typ)
    641                     type(self).__name__),FutureWarning)
    642                 return key
‐‐> 643             return self._convert_indexer_error(key, 'label')
    644 
    645         if is_float(key):

C:\Anaconda\lib\site‐packages\pandas\core\index.pyc in _convert_indexer_error(self, 
key, msg)
    781             msg = 'label'
    782         raise TypeError("the {0} [{1}] is not a proper indexer for this inde
x "
‐‐> 783                         "type ({2})".format(msg, key, self.__class__.__nam
e__))
    784 
    785     def get_duplicates(self):

TypeError: the label [2003‐01‐07] is not a proper indexer for this index type (Datet
imeIndex)

http://localhost:8889/notebooks/0501%20Indexing%20and%20Selecting%20Data.ipynb 12/16
2/8/2015 0501 Indexing and Selecting Data

In [181]:

s.iloc[:10:‐2]

Out[181]:
Date
2015‐06‐12    209.929993
2015‐06‐10    210.960007
2015‐06‐08    208.419998
2015‐06‐04    210.130005
2015‐06‐02    211.360001
2015‐05‐29    211.139999
2015‐05‐27    212.699997
2015‐05‐22    212.990005
2015‐05‐20    212.880005
2015‐05‐18    213.100006
2015‐05‐14    212.210007
2015‐05‐12    209.979996
2015‐05‐08    211.619995
2015‐05‐06    208.039993
2015‐05‐04    211.320007
...
2003‐03‐03    66.025987
2003‐02‐27    66.222282
2003‐02‐25    66.324360
2003‐02‐21    66.881838
2003‐02‐19    66.881838
2003‐02‐14    66.073102
2003‐02‐12    64.463475
2003‐02‐10    65.963176
2003‐02‐06    66.308653
2003‐02‐04    67.038873
2003‐01‐31    67.572797
2003‐01‐29    67.902578
2003‐01‐27    66.897539
2003‐01‐23    69.653531
2003‐01‐21    70.077531
Name: Adj Close, Length: 1561

DataFrames

http://localhost:8889/notebooks/0501%20Indexing%20and%20Selecting%20Data.ipynb 13/16
2/8/2015 0501 Indexing and Selecting Data

In [182]:

df1.head()

Out[182]:

Open High Low Close Volume Adj Close

Date

2003­01­02 88.849998 91.300003 88.540001 91.070000 44516300 71.506562

2003­01­03 90.910004 91.379997 90.500000 91.349998 32222600 71.726413

2003­01­06 91.239998 93.489998 91.169998 92.959999 40984500 72.990557

2003­01­07 92.900002 93.370003 92.199997 92.730003 38640400 72.809968

2003­01­08 92.199997 92.400002 91.050003 91.389999 38702200 71.757821

In [183]:

df1.iloc[3]

Out[183]:

Open               92.900002
High               93.370003
Low                92.199997
Close              92.730003
Volume       38640400.000000
Adj Close          72.809968
Name: 2003‐01‐07 00:00:00, dtype: float64

In [184]:
df1.iloc[3,5]

Out[184]:

72.809968000000012

Panels

In [185]:

p.describe

Out[185]:
<bound method Panel.describe of <class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 3133 (major_axis) x 6 (minor_axis)
Items axis: df1 to df2
Major_axis axis: 2003‐01‐02 00:00:00 to 2015‐06‐12 00:00:00
Minor_axis axis: Open to Adj Close>

http://localhost:8889/notebooks/0501%20Indexing%20and%20Selecting%20Data.ipynb 14/16
2/8/2015 0501 Indexing and Selecting Data

In [186]:

p.iloc[0,3,5]

Out[186]:
72.809968000000012

.ix

When axis is not integer it supports both of selections based in label or based in position in
the index: 

Series: s.ix[indexer]
DataFrame: df.ix[row_indexer, column_indexer]
Panel: p.ix[item_indexer, major_indexer, minor_indexer]

When axis is integer, it will select by label (as .loc)

In [187]:

s.head()

Out[187]:
Date
2003‐01‐02    71.506562
2003‐01‐03    71.726413
2003‐01‐06    72.990557
2003‐01‐07    72.809968
2003‐01‐08    71.757821
Name: Adj Close, dtype: float64

In [188]:

s.ix["2003‐01‐07"]

Out[188]:
72.809968000000012

In [189]:

s.ix[3]

Out[189]:
72.809968000000012

Attributes

http://localhost:8889/notebooks/0501%20Indexing%20and%20Selecting%20Data.ipynb 15/16
2/8/2015 0501 Indexing and Selecting Data

In [193]:

df1.Close['2003‐01‐03']

Out[193]:

91.349997999999999

In [ ]:

http://localhost:8889/notebooks/0501%20Indexing%20and%20Selecting%20Data.ipynb 16/16

You might also like