Pandas
Pandas
Pandas is a powerful and flexible open-source data analysis and manipulation library for Python. It
provides data structures and functions needed to manipulate structured data seamlessly. Developed
by Wes McKinney in 2008, Pandas is built on top of NumPy and is part of the broader ecosystem of
Python data analysis tools.
1. Data Structures: Pandas introduces two primary data structures: Series and DataFrame.
Series: A one-dimensional labeled array capable of holding any data type. Labels,
known as the index, provide a way to access elements.
2. Data Alignment and Handling Missing Data: Pandas handles missing data elegantly using
methods like dropna(), fillna(), and more, ensuring robust data cleaning processes.
3. Data Wrangling: Pandas provides powerful tools for merging, reshaping, selecting, and
filtering data. Operations like joining tables (using merge()), concatenation, and group
operations (groupby()) are straightforward and efficient.
4. Input/Output Tools: Pandas supports reading from and writing to various file formats like
CSV, Excel, SQL databases, and more, using methods such as read_csv(), read_excel(),
to_csv(), and to_sql().
5. Time Series Functionality: With robust time series capabilities, Pandas can handle frequency
conversion, moving window statistics, and date range generation.
Importing Pandas
python
Copy code
import pandas as pd
1. Series
python
Copy code
2. DataFrame
python
Copy code
data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles',
'Chicago'] } df = pd.DataFrame(data) print(df)
Reading Data
python
Copy code
df = pd.read_csv('data.csv')
Writing Data
python
Copy code
df.to_csv('output.csv', index=False)
1. Selecting Columns
python
Copy code
python
Copy code
python
Copy code
4. Filtering Rows
python
Copy code
Data Manipulation
1. Adding a Column
python
Copy code
df['Salary'] = [50000, 60000, 70000] print(df)
2. Deleting a Column
python
Copy code
3. Renaming Columns
python
Copy code
python
Copy code
python
Copy code
1. Concatenation
python
Copy code
df1 = pd.DataFrame({'A': ['A0', 'A1'], 'B': ['B0', 'B1']}) df2 = pd.DataFrame({'A': ['A2', 'A3'], 'B': ['B2',
'B3']}) result = pd.concat([df1, df2]) print(result)
2. Merging
python
Copy code
left = pd.DataFrame({'key': ['K0', 'K1'], 'A': ['A0', 'A1']}) right = pd.DataFrame({'key': ['K0', 'K1'], 'B':
['B0', 'B1']}) merged = pd.merge(left, right, on='key') print(merged)
Advanced Features
1. Time Series
python
Copy code
2. Pivot Tables
python
Copy code
python
Copy code
Conclusion
Pandas is an essential tool for data scientists and analysts, offering powerful data manipulation
capabilities and seamless integration with other Python libraries. Whether dealing with small
datasets or large-scale data, Pandas provides the flexibility and efficiency required to perform
complex data analysis tasks. As you become more familiar with its functionalities, you'll find Pandas
indispensable for your data-driven projects.