Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
5 views

pandas correlation,visualization 5

The document outlines the use of the Pandas library in Python for data analysis, specifically focusing on calculating correlations between dataset columns using the corr() method. It provides examples of reading a CSV file, displaying the first few rows, and generating correlation matrices. Additionally, it demonstrates various plotting techniques to visualize data, including histograms, box plots, and line plots.

Uploaded by

haridivya6650
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

pandas correlation,visualization 5

The document outlines the use of the Pandas library in Python for data analysis, specifically focusing on calculating correlations between dataset columns using the corr() method. It provides examples of reading a CSV file, displaying the first few rows, and generating correlation matrices. Additionally, it demonstrates various plotting techniques to visualize data, including histograms, box plots, and line plots.

Uploaded by

haridivya6650
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

In [1]: import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
import sklearn
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

Finding Relationships
A great aspect of the Pandas module is the corr() method.

The corr() method calculates the relationship between each column in your data set.

In [2]: df=pd.read_csv('data (1) .csv')


df.head()

Out[2]: Duration Pulse Maxpulse Calories

0 60 110 130 409.1

1 60 117 145 479.0

2 60 103 135 340.0

3 45 109 175 282.4

4 45 117 148 406.0

In [3]: df.corr()

Out[3]: Duration Pulse Maxpulse Calories

Duration 1.000000 -0.155408 0.009403 0.922717

Pulse -0.155408 1.000000 0.786535 0.025121

Maxpulse 0.009403 0.786535 1.000000 0.203813

Calories 0.922717 0.025121 0.203813 1.000000

In [4]: df.corr()[["Calories"]].sort_values(by="Calories", ascending=False)

Out[4]: Calories

Calories 1.000000

Duration 0.922717

Maxpulse 0.203813

Pulse 0.025121

plotting with pandas


In [5]: df.columns

Index(['Duration', 'Pulse', 'Maxpulse', 'Calories'], dtype='object')


Out[5]:
In [ ]:

In [6]: df.plot()
plt.show()

Histogram

In [7]: df.Maxpulse.plot.hist(figsize=(10,10),cmap='brg');
In [8]: #boxplot

In [9]: df.Maxpulse.plot.box(figsize=(4,3), cmap='gist_earth');


In [10]: #subplot

In [11]: df.plot.box(figsize=(22,4),subplots=True);

In [12]: #histogram

In [13]: df.plot.hist(figsize=(6,18),subplots=True);
In [14]: #line

In [15]: df.plot.line(figsize=(6,18),subplots=True);
In [16]: df.Maxpulse.plot.line(figsize=(4,3), cmap='gist_earth');

In [17]: df.columns
Index(['Duration', 'Pulse', 'Maxpulse', 'Calories'], dtype='object')
Out[17]:

In [18]: #bar

In [19]: df.Maxpulse.value_counts().plot.bar(figsize=(6,2), cmap='twilight_shifted');

In [20]: #area chart

In [21]: df.plot.area('Pulse','Calories', figsize=(6,3), cmap='Pastel2');


In [22]: #barh

In [23]: df.sort_values(by='Pulse')[:5].plot.barh('Calories','Pulse',figsize=(6,4));

In [24]: #scatter

In [25]: #pie

In [26]: #kde

In [27]: #hexbin

In [ ]:

You might also like