Python Abstract
Python Abstract
Abstract
In this post, we will provide an overview of the common functionalities of NumPy and
Pandas. We will realize the similarity of these libraries with existing toolboxes in R and
MATLAB. This similarity and added flexibility have resulted in wide acceptance of
python in the scientific community lately. Topic covered in the blog are:
1. Overview of NumPy
2. Overview of Pandas
3. Using Matplotlib
This post is an excerpt from a live hands-on training conducted by CloudxLab on 25th
Nov 2017. It was attended by more than 100 learners around the globe. The
participants were from countries namely; United States, Canada, Australia, Indonesia,
India, Thailand, Philippines, Malaysia, Macao, Japan, Hong Kong, Singapore, United
Kingdom, Saudi Arabia, Nepal, & New Zealand.
NumPy
NumPy’s main object is the homogeneous multidimensional array. It is a table with same
type elements, i.e, integers or string or characters (homogeneous), usually integers. In
NumPy, dimensions are called axes. The number of axes is called the rank.
There are several ways to create an array in NumPy like np.array, np.zeros, no.ones, etc.
Each of them provides some flexibility.
Command to create
Example
an array
3 <type 'numpy.ndarray'>
6 >>> type(b)
7 <type 'numpy.ndarray'>
3 [ 1, 1, 1, 1],
np.ones
4 [ 1, 1, 1, 1]])
>>> np.random.rand(2,3)
1 array([[
2 0.55365951, 0.60150511, 0.36113117],
np.random.rand(2,3) 3 [ 0.5388662
, 0.06929014, 0.07908068]])
>>> np.empty((2,3))
1 array([[
2 0.21288689, 0.20662218, 0.78018623],
np.empty((2,3)) 3 [
0.35294004, 0.07347101, 0.54552084]])
examples:
Pandas
Similar to NumPy, Pandas is one of the most widely used python libraries in data science. It
provides high-performance, easy to use structures and data analysis tools. Unlike NumPy
library which provides objects for multi-dimensional arrays, Pandas provides in-memory 2d
table object called Dataframe. It is like a spreadsheet with column names and row labels.
Hence, with 2d tables, pandas is capable of providing many additional functionalities like
creating pivot tables, computing columns based on other columns and plotting graphs.
Pandas can be imported into Python using:
Pandas Series object is created using pd.Series function. Each row is provided with an index
and by defaults is assigned numerical values starting from 0. Like NumPy, Pandas also
provide the basic mathematical functionalities like addition, subtraction and conditional
operations and broadcasting.
Pandas dataframe object represents a spreadsheet with cell values, column names, and row
index labels. Dataframe can be visualized as dictionaries of Series. Dataframe rows and
columns are simple and intuitive to access. Pandas also provide SQL-like functionality to
filter, sort rows based on conditions. For example,
>>> people_dict = { "weight": pd.Series([68, 83, 112],index=["alice", "bob", "charles"]), "birthyear":
1
pd.Series([1984, 1985, 1992], index=["bob", "alice", "charles"], name="year"),
2
"children": pd.Series([0, 3], index=["charles", "bob"]),
3
"hobby": pd.Series(["Biking", "Dancing"], index=["alice", "bob"]),}
2 >>> people
New columns and rows can be easily added to the dataframe. In addition to the basic
functionalities, pandas dataframe can be sorted by a particular column.
Dataframes can also be easily exported and imported from CSV, Excel, JSON, HTML and
SQL database. Some other essential methods that are present in dataframes are:
examples:
Submitted by ,.
Shambhavi b
Usha M
Lakshmi s nandi