Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
34 views

Notes On Pandasmanpreet

Pandas is an open-source Python library that provides high-performance data manipulation and analysis tools using powerful data structures. It was created in 2008 by Wes McKinney to enable high performance and flexible analysis of data. Pandas allows users to load, prepare, manipulate, model and analyze data regardless of source using its core data structures: Series, DataFrames, and Panels. Pandas is widely used in domains like finance, economics, statistics and analytics.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Notes On Pandasmanpreet

Pandas is an open-source Python library that provides high-performance data manipulation and analysis tools using powerful data structures. It was created in 2008 by Wes McKinney to enable high performance and flexible analysis of data. Pandas allows users to load, prepare, manipulate, model and analyze data regardless of source using its core data structures: Series, DataFrames, and Panels. Pandas is widely used in domains like finance, economics, statistics and analytics.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Introduction to Python Pandas

Pandas is an open-source Python Library providing high-performance


data manipulation and analysis tool using its powerful data structures.
The name Pandas is derived from the word Panel Data – an Econometrics
from Multidimensional data.

In 2008, developer Wes McKinney started developing pandas when in


need of high performance, flexible tool for analysis of data.

Prior to Pandas, Python was majorly used for data munging and
preparation. It had very little contribution towards data analysis. Pandas
solved this problem. Using Pandas, we can accomplish five typical steps
in the processing and analysis of data, regardless of the origin of data —
load, prepare, manipulate, model, and analyze.

Python with Pandas is used in a wide range of fields including academic


and commercial domains including finance, economics, Statistics,
analytics, etc.

Key Features of Pandas


 Fast and efficient DataFrame object with default and customized indexing.

 Tools for loading data into in-memory data objects from different file formats.

 Data alignment and integrated handling of missing data.

 Reshaping and pivoting of date sets.

 Label-based slicing, indexing and subsetting of large data sets.

 Columns from a data structure can be deleted or inserted.

 Group by data for aggregation and transformations.

 High performance merging and joining of data.

 Time Series functionality.

if you install Anaconda Python package, Pandas will be installed by


default with the following −

Windows
 Anaconda (from https://www.continuum.io) is a free Python distribution for
SciPy stack. It is also available for Linux and Mac.
Pandas deals with the following three data structures −

 Series

 DataFrame

 Panel

These data structures are built on top of Numpy array, which means they
are fast.

Data Dimensions Description


Structure

Series 1 1D labeled homogeneous array, sizeimmutable.

Data Frames 2 General 2D labeled, size-mutable tabular structure


with potentially heterogeneously typed columns.

Panel 3 General 3D labeled, size-mutable array.

Series
Series is a one-dimensional array like structure with homogeneous data.
For example, the following series is a collection of integers 10, 23, 56, …

10 23 56 17 52 61 73 90 26 72

Key Points

 Homogeneous data

 Size Immutable

 Values of Data Mutable

DataFrame
DataFrame is a two-dimensional array with heterogeneous data. For
example,

Name Age Gender Rating


Steve 32 Male 3.45

Lia 28 Female 4.6

Vin 45 Male 3.9

Katie 38 Female 2.78

The table represents the data of a sales team of an organization with


their overall performance rating. The data is represented in rows and
columns. Each column represents an attribute and each row represents a
person.

Data Type of Columns


The data types of the four columns are as follows −

Column Type

Name String

Age Integer

Gender String

Rating Float

Key Points

 Heterogeneous data

 Size Mutable

 Data Mutable

Panel
Panel is a three-dimensional data structure with heterogeneous data. It
is hard to represent the panel in graphical representation. But a panel
can be illustrated as a container of DataFrame.

Key Points

 Heterogeneous data

 Size Mutable

 Data Mutable

You might also like