Notes On Pandasmanpreet
Notes On Pandasmanpreet
Prior to Pandas, Python was majorly used for data munging and
preparation. It had very little contribution towards data analysis. Pandas
solved this problem. Using Pandas, we can accomplish five typical steps
in the processing and analysis of data, regardless of the origin of data —
load, prepare, manipulate, model, and analyze.
Tools for loading data into in-memory data objects from different file formats.
Windows
Anaconda (from https://www.continuum.io) is a free Python distribution for
SciPy stack. It is also available for Linux and Mac.
Pandas deals with the following three data structures −
Series
DataFrame
Panel
These data structures are built on top of Numpy array, which means they
are fast.
Series
Series is a one-dimensional array like structure with homogeneous data.
For example, the following series is a collection of integers 10, 23, 56, …
10 23 56 17 52 61 73 90 26 72
Key Points
Homogeneous data
Size Immutable
DataFrame
DataFrame is a two-dimensional array with heterogeneous data. For
example,
Column Type
Name String
Age Integer
Gender String
Rating Float
Key Points
Heterogeneous data
Size Mutable
Data Mutable
Panel
Panel is a three-dimensional data structure with heterogeneous data. It
is hard to represent the panel in graphical representation. But a panel
can be illustrated as a container of DataFrame.
Key Points
Heterogeneous data
Size Mutable
Data Mutable