Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
22 views

Pandas

Pandas is a powerful Python library for data analysis and manipulation. It provides data structures like Series and DataFrame and tools for data wrangling, manipulation, merging, joining and visualization. Pandas allows importing and exporting data from various sources and formats and is useful for working with time series data.

Uploaded by

sumitpython.41
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Pandas

Pandas is a powerful Python library for data analysis and manipulation. It provides data structures like Series and DataFrame and tools for data wrangling, manipulation, merging, joining and visualization. Pandas allows importing and exporting data from various sources and formats and is useful for working with time series data.

Uploaded by

sumitpython.41
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Introduction to Pandas

Pandas is a powerful and flexible open-source data analysis and manipulation library for Python. It
provides data structures and functions needed to manipulate structured data seamlessly. Developed
by Wes McKinney in 2008, Pandas is built on top of NumPy and is part of the broader ecosystem of
Python data analysis tools.

Key Features of Pandas

1. Data Structures: Pandas introduces two primary data structures: Series and DataFrame.

 Series: A one-dimensional labeled array capable of holding any data type. Labels,
known as the index, provide a way to access elements.

 DataFrame: A two-dimensional labeled data structure with columns of potentially


different types. It's similar to a spreadsheet or SQL table.

2. Data Alignment and Handling Missing Data: Pandas handles missing data elegantly using
methods like dropna(), fillna(), and more, ensuring robust data cleaning processes.

3. Data Wrangling: Pandas provides powerful tools for merging, reshaping, selecting, and
filtering data. Operations like joining tables (using merge()), concatenation, and group
operations (groupby()) are straightforward and efficient.

4. Input/Output Tools: Pandas supports reading from and writing to various file formats like
CSV, Excel, SQL databases, and more, using methods such as read_csv(), read_excel(),
to_csv(), and to_sql().

5. Time Series Functionality: With robust time series capabilities, Pandas can handle frequency
conversion, moving window statistics, and date range generation.

Basic Operations in Pandas

Importing Pandas

python

Copy code

import pandas as pd

Creating Data Structures

1. Series

python

Copy code

data = [1, 3, 5, 7, 9] series = pd.Series(data) print(series)

2. DataFrame

python

Copy code
data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles',
'Chicago'] } df = pd.DataFrame(data) print(df)

Reading Data

python

Copy code

df = pd.read_csv('data.csv')

Writing Data

python

Copy code

df.to_csv('output.csv', index=False)

Data Selection and Filtering

1. Selecting Columns

python

Copy code

ages = df['Age'] print(ages)

2. Selecting Rows by Label

python

Copy code

row = df.loc[0] print(row)

3. Selecting Rows by Position

python

Copy code

row = df.iloc[0] print(row)

4. Filtering Rows

python

Copy code

filtered_df = df[df['Age'] > 30] print(filtered_df)

Data Manipulation

1. Adding a Column

python

Copy code
df['Salary'] = [50000, 60000, 70000] print(df)

2. Deleting a Column

python

Copy code

df.drop('Salary', axis=1, inplace=True) print(df)

3. Renaming Columns

python

Copy code

df.rename(columns={'Name': 'Full Name'}, inplace=True) print(df)

4. Handling Missing Data

python

Copy code

df.fillna(0, inplace=True) print(df)

Grouping and Aggregation

Pandas provides the groupby() function for data aggregation.

python

Copy code

grouped = df.groupby('City').mean() print(grouped)

Merging and Joining

1. Concatenation

python

Copy code

df1 = pd.DataFrame({'A': ['A0', 'A1'], 'B': ['B0', 'B1']}) df2 = pd.DataFrame({'A': ['A2', 'A3'], 'B': ['B2',
'B3']}) result = pd.concat([df1, df2]) print(result)

2. Merging

python

Copy code

left = pd.DataFrame({'key': ['K0', 'K1'], 'A': ['A0', 'A1']}) right = pd.DataFrame({'key': ['K0', 'K1'], 'B':
['B0', 'B1']}) merged = pd.merge(left, right, on='key') print(merged)

Advanced Features

1. Time Series

python
Copy code

rng = pd.date_range('2021-01-01', periods=10, freq='D') ts = pd.Series(range(10), index=rng) print(ts)

2. Pivot Tables

python

Copy code

pivot = df.pivot_table(values='Age', index='City', columns='Name', aggfunc='mean') print(pivot)

Visualization with Pandas

Pandas integrates with Matplotlib for easy plotting.

python

Copy code

import matplotlib.pyplot as plt df.plot(x='Name', y='Age', kind='bar') plt.show()

Conclusion

Pandas is an essential tool for data scientists and analysts, offering powerful data manipulation
capabilities and seamless integration with other Python libraries. Whether dealing with small
datasets or large-scale data, Pandas provides the flexibility and efficiency required to perform
complex data analysis tasks. As you become more familiar with its functionalities, you'll find Pandas
indispensable for your data-driven projects.

You might also like