Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
32 views

Learn Data Analysis With Pandas - Introduction

Pandas is a Python library used for data analysis and manipulation. It allows data to be stored and manipulated in data structures called DataFrames. DataFrames can be created from dictionaries, lists, CSV files or SQL databases. Specific rows can be selected using logical operators. The apply() function transforms columns by applying a function to each value. New columns can be added to existing DataFrames.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Learn Data Analysis With Pandas - Introduction

Pandas is a Python library used for data analysis and manipulation. It allows data to be stored and manipulated in data structures called DataFrames. DataFrames can be created from dictionaries, lists, CSV files or SQL databases. Specific rows can be selected using logical operators. The apply() function transforms columns by applying a function to each value. New columns can be added to existing DataFrames.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Cheatsheets / Learn Data Analysis with Pandas

Introduction to Pandas
Pandas DataFrame creation
The fundamental Pandas object is called a DataFrame.
It is a 2-dimensional size-mutable, potentially # Ways of creating a Pandas DataFrame
heterogeneous, tabular data structure. # Passing in a dictionary:
A DataFrame can be created multiple ways. It can be data = {'name':['Anthony', 'Maria'], 'age':
created by passing in a dictionary or a list of lists to the
[30, 28]}
pd.DataFrame() method, or by reading data from a CSV
df = pd.DataFrame(data)
le.

# Passing in a list of lists:


data = [['Tom', 20], ['Jack', 30], ['Meera',
25]]
df = pd.DataFrame(data, columns = ['Name',
'Age'])

# Reading data from a csv file:


df = pd.read_csv('students.csv')

Pandas
Pandas is an open source library that is used to analyze
data in Python. It takes in data, like a CSV or SQL import pandas as pd
database, and creates an object with rows and columns
called a data frame. Pandas is typically imported with
the alias pd .

Selecting Pandas DataFrame rows using logical operators


In pandas, speci c rows can be selected if they satisfy
certain conditions using Python’s logical operators. The # Selecting rows where age is over 20
result is a DataFrame that is a subset of the original df[df.age > 20]
DataFrame.
Multiple logical conditions can be combined with OR # Selecting rows where name is not John
(using | ) and AND (using & ), and each condition must
df[df.name != "John"]
be enclosed in parentheses.

# Selecting rows where age is less than 10


# OR greater than 70
df[(df.age < 10) | (df.age > 70)]
Pandas apply() function
The Pandas apply() function can be used to apply a
function on every value in a column or row of a # This function doubles the input value
DataFrame, and transform that column or row to the def double(x):
resulting values.   return 2*x
By default, it will apply a function to all values of a
column. To perform it on a row instead, you can specify
# Apply this function to double every value
the argument axis=1 in the apply() function call.
in a specified column
df.column1 = df.column1.apply(double)

# Lambda functions can also be supplied to


`apply()`
df.column2 = df.column2.apply(lambda x : 3*x)

# Applying to a row requires it to be called


on the entire DataFrame
df['newColumn'] = df.apply(lambda row:
  row['column1'] * 1.5 + row['column2'],
  axis=1
)

Pandas DataFrames adding columns


Pandas DataFrames allow for the addition of columns
after the DataFrame has already been created, by using # Specifying each value in the new column:
the format df['newColumn'] and setting it equal to the df['newColumn'] = [1, 2, 3, 4]
new column’s value.

# Setting each row in the new column to the


same value:
df['newColumn'] = 1

# Creating a new column by doing a 


# calculation on an existing column:
df['newColumn'] = df['oldColumn'] * 5

You might also like