Python Libraries
Python Libraries
Create a Module
To create a module just save the code you want in a file with the file
extension .py:
Example
Save this code in a file named mymodule.py
def greeting(name):
print("Hello, " + name)
Use a Module
Now we can use the module we just created, by using the import statement:
Example
Import the module named mymodule, and call the greeting function:
import mymodule
mymodule.greeting("Jonathan")
Variables in Module
The module can contain functions, as already described, but also variables of
all types (arrays, dictionaries, objects etc):
Example
Save this code in the file mymodule.py
person1 = {
"name": "John",
"age": 36,
"country": "Norway"
}
Example
Import the module named mymodule, and access the person1 dictionary:
import mymodule
a = mymodule.person1["age"]
print(a)
Re-naming a Module
You can create an alias when you import a module, by using the as keyword:
Example
Create an alias for mymodule called mx:
import mymodule as mx
a = mx.person1["age"]
print(a)
Introduction to Matplotlib
Import matplotlib
Output :
Output:
Output:
Pandas
Pandas is an open-source library that is built on top of NumPy library. It is a Python
package that offers various data structures and operations for manipulating numerical
data and time series. It is mainly popular for importing and analysing data much
easier. Pandas is fast and it has high-performance & productivity for users.
A Dataframe is a two-dimensional data structure, i.e., data is aligned in a tabular
fashion in rows and columns. In dataframe datasets are arranged in rows and columns,
we can store any number of datasets in a dataframe. We can perform many operations
on these datasets like arithmetic operation, columns/rows selection, columns/rows
addition etc.
Creating an empty dataframe :
A basic DataFrame, which can be created is an Empty Dataframe. An Empty
Dataframe is created just by calling a dataframe constructor.
# import pandas as pd
import pandas as pd
print(df)
Output :
Empty DataFrame
Columns: []
Index: []
# list of strings
lst = ['Hello', 'This', 'is',
'python', 'Class', 'BMS']
Output:
0
0 Hello
1 This
2 is
3 python
4 Class
5 BMS
Example
Create a simple Pandas DataFrame:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
print(df)
Result
calories duration
0 420 50
1 380 40
2 390 45
Locate Row
As you can see from the result above, the DataFrame is like a table with
rows and columns.
Pandas use the loc attribute to return one or more specified row(s)
Example
Return row 0:
Named Indexes
With the index argument, you can name your own indexes.
Example
Add a list of names to give each row a name:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
print(df)
Result
calories duration
day1 420 50
day2 380 40
day3 390 45
Example
Return "day2":
#refer to the named index:
print(df.loc["day2"])
Result
calories 380
duration 40
Name: day2, dtype: int64
CSV files contains plain text and is a well know format that can be read by
everyone including Pandas.
Example
Load a comma separated file (CSV file) into a DataFrame:
import pandas as pd
df = pd.read_csv('data.csv')
OR (in case you have a link where csv file is stored)
df =
pd.read_csv("https://media.geeksforgeeks.org/wp-content/uploads/nba.csv")
print(df.to_string())
If you have a large DataFrame with many rows, Pandas will only return the
first 5 rows, and the last 5 rows
the length of your list should match the length of the index column otherwise it
will show an error.
By using DataFrame.insert()
It gives the freedom to add a column at any position we like and not just at the end. It
also provides different options for inserting the column values.
Output:
By using a dictionary
We can use a Python dictionary to add a new column in pandas DataFrame. Use an
existing column as the key values and their respective values will be the values for a
new column.
# Import pandas package
import pandas as pd
Pandas head() method is used to return top n (5 by default) rows of a data frame or
series. The head() method returns the headers and a specified number of rows,
starting from the top.
Syntax: Dataframe.head(n=5)
Parameters:
n: integer value, number of rows to be returned
import pandas as pd
data_top = data.head()
# display
data_top
There is also a tail() method for viewing the last rows of the DataFrame.
The tail() method returns the headers and a specified number of rows,
starting from the bottom.
print(df.tail())
Rows or columns can be removed using an index label or column name using this
method.
Syntax:
DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None,
inplace=False, errors=’raise’)
Parameters:
labels: String or list of strings referring row or column name.
axis: int or string value, 0 ‘index’ for Rows and 1 ‘columns’ for Columns.
index or columns: Single label or list. index or columns are an alternative to axis
and cannot be used together. level: Used to specify level in case data frame is having
multiple level index.
inplace: Makes changes in original Data Frame if True.
errors: Ignores error if any value from the list doesn’t exists and drops rest of the
values when errors = ‘ignore’ Return type: Dataframe with dropped values
In this code, A list of index labels is passed and the rows corresponding to those labels
are dropped using .drop() method.
# importing pandas module
import pandas as pd
# display
print(data)
# display
print(data.head())
res1 = pd.concat(frames)
res1
res = df.append(df1)
res
Merging DataFrame
Pandas have options for high-performance in-memory merging and joining. When
we need to combine very large DataFrames, joins serve as a powerful way to
perform these operations swiftly. Joins can only be done on two DataFrames at a
time, denoted as left and right tables. The key is the common column that the two
DataFrames will be joined on. It’s a good practice to use keys which have unique
values throughout the column to avoid unintended duplication of row values. Pandas
provide a single function, merge(), as the entry point for all standard database join
operations between DataFrame objects.
There are four basic ways to handle the join (inner, left, right, and outer), depending
on which rows must retain their data.
res
Output:
left LEFT OUTER JOIN Use keys from left frame only
right RIGHT OUTER JOIN Use keys from right frame only
outer FULL OUTER JOIN Use union of keys from both frames
import pandas as pd
df = pd.DataFrame(data1)
df1 = pd.DataFrame(data2)
Run on IDE
Now we set how = 'left' in order to use keys from left frame only.
# using keys from left frame
res
Output :
Now we set how = 'right' in order to use keys from right frame only.
# using keys from right frame
Output :
Now we set how = 'outer' in order to get union of keys from dataframes.
# getting union of keys
res2
Output :
Now we set how = 'inner' in order to get intersection of keys from dataframes.
# getting intersection of keys
Output :
Joining DataFrame
In order to join dataframe, we use .join() function this function is used for combining
the columns of two potentially differently-indexed DataFrames into a single result
DataFrame.
# importing pandas module
import pandas as pd
Run on IDE
res = df.join(df1)
res
Output :
res1
Output :
import pandas as pd
df = pd.DataFrame(data1)
Run on IDE
res2
Output :
import pandas as pd
names=['key', 'Y'])
Run on IDE
# multi indexed
result
Output :