Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (1 vote)
191 views

Python Guide 2025

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
191 views

Python Guide 2025

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 210

T
ALLYOUNEED
TOKNOW SERI
oBe
co
meaS
u

PYTHON
cc
es
s
f
ES

u
lDa
taPr
of
es
s
io
nal
ABOUT BRAINALYST

Brainalyst is a pioneering data-driven company dedicated to transforming data into actionable insights and
innovative solutions. Founded on the principles of leveraging cutting-edge technology and advanced analytics,
Brainalyst has become a beacon of excellence in the realms of data science, artificial intelligence, and machine
learning.

OUR MISSION

At Brainalyst, our mission is to empower businesses and individuals by providing comprehensive data solutions
that drive informed decision-making and foster innovation. We strive to bridge the gap between complex data and
meaningful insights, enabling our clients to navigate the digital landscape with confidence and clarity.

WHAT WE OFFER

1. Data Analytics and Consulting


Brainalyst offers a suite of data analytics services designed to help organizations harness the power of their
data. Our consulting services include:

• Data Strategy Development: Crafting customized data strategies aligned with your business
objectives.

• Advanced Analytics Solutions: Implementing predictive analytics, data mining, and statistical
analysis to uncover valuable insights.

• Business Intelligence: Developing intuitive dashboards and reports to visualize key metrics and
performance indicators.

2. Artificial Intelligence and Machine Learning


We specialize in deploying AI and ML solutions that enhance operational efficiency and drive innovation.
Our offerings include:

• Machine Learning Models: Building and deploying ML models for classification, regression,
clustering, and more.

• Natural Language Processing: Implementing NLP techniques for text analysis, sentiment analysis,
and conversational AI.

• Computer Vision: Developing computer vision applications for image recognition, object detection,
and video analysis.

3. Training and Development


Brainalyst is committed to fostering a culture of continuous learning and professional growth. We provide:

• Workshops and Seminars: Hands-on training sessions on the latest trends and technologies in
data science and AI.

• Online Courses: Comprehensive courses covering fundamental to advanced topics in data


analytics, machine learning, and AI.

• Customized Training Programs: Tailored training solutions to meet the specific needs of
organizations and individuals.

2021-2024
4. Generative AI Solutions

As a leader in the field of Generative AI, Brainalyst offers innovative solutions that create new content and
enhance creativity. Our services include:

• Content Generation: Developing AI models for generating text, images, and audio.

• Creative AI Tools: Building applications that support creative processes in writing, design, and
media production.

• Generative Design: Implementing AI-driven design tools for product development and
optimization.

OUR JOURNEY

Brainalyst’s journey began with a vision to revolutionize how data is utilized and understood. Founded by
Nitin Sharma, a visionary in the field of data science, Brainalyst has grown from a small startup into a renowned
company recognized for its expertise and innovation.

KEY MILESTONES:

• Inception: Brainalyst was founded with a mission to democratize access to advanced data analytics and AI
technologies.

• Expansion: Our team expanded to include experts in various domains of data science, leading to the
development of a diverse portfolio of services.

• Innovation: Brainalyst pioneered the integration of Generative AI into practical applications, setting new
standards in the industry.

• Recognition: We have been acknowledged for our contributions to the field, earning accolades and
partnerships with leading organizations.

Throughout our journey, we have remained committed to excellence, integrity, and customer satisfaction.
Our growth is a testament to the trust and support of our clients and the relentless dedication of our team.

WHY CHOOSE BRAINALYST?

Choosing Brainalyst means partnering with a company that is at the forefront of data-driven innovation. Our
strengths lie in:

• Expertise: A team of seasoned professionals with deep knowledge and experience in data science and AI.

• Innovation: A commitment to exploring and implementing the latest advancements in technology.

• Customer Focus: A dedication to understanding and meeting the unique needs of each client.

• Results: Proven success in delivering impactful solutions that drive measurable outcomes.

JOIN US ON THIS JOURNEY TO HARNESS THE POWER OF DATA AND AI. WITH BRAINALYST, THE FUTURE IS
DATA-DRIVEN AND LIMITLESS.

2021-2024
TABLE OF CONTENTS
1. Preface 7. Sets in Python
• Creating Sets
2. Introduction to Python
• Set Operations
• What is Python
• Intersection
• Python Basics
• Union
• Purpose of Python
• Symmetric Difference
• What Can Python Do
• Difference
• Advantages of Python
• Disadvantages of Python • Adding and Removing Elements
• Why Python 8. User-Defined Functions (UDFs)
• Free Open Source vs. Licensed Software • Defining UDFs
3. Getting Started with Python • Parameters and Return Values
• Installing Anaconda • *args and **kwargs
• Difference between Anaconda and Miniconda • Lambda Functions
• Downloading and Installing Anaconda • Using map() with Lambda Functions
• Using Jupyter Notebook 9. String Operations
• Important Shortcuts in Jupyter Notebook • Creating and Manipulating Strings
4. Fundamentals of Python • String Methods
• Variables • Escape Characters
• Data Types 10. NumPy Basics
• Operators • Introduction to NumPy
• Syntax Rules • Creating NumPy Arrays
• Control Flow Statements • Array Manipulation
• Conditional Statements (if, elif, else) • NumPy Array Methods
• Loop Statements (for, while) • Broadcasting and Vectorization
• Function Definitions
11. Data Visualization with Python
5. Lists and Tuples in Python • Introduction to Data Visualization
• Creating Lists and Tuples • Plotting with Matplotlib
• Accessing Values in Lists and Tuples • Plotting with Seaborn
• Slicing and Indexing • Creating Various Charts and Graphs
• List and Tuple Methods
12. Advanced Python Topics
6. Dictionaries in Python • Object-Oriented Programming (OOP)
• Creating Dictionaries • Working with Files
• Accessing Elements in Dictionaries • Exception Handling
• Adding and Removing Elements • Regular Expressions
• Dictionary Methods • Working with Dates and Times

2021-2024
Preface
Welcome to “Python: Basic to Advanced,” a comprehensive guide designed to help
you master Python, one of the most powerful and versatile programming languages
available today. Whether you are a beginner starting your programming journey or an
experienced developer looking to deepen your understanding of Python, this handbook
will serve as an invaluable resource.

Python’s simplicity and readability have made it a popular choice for a wide range
of applications, from web development and data science to artificial intelligence and
automation. This handbook covers everything from the fundamental concepts of Python
programming to advanced topics, ensuring you have the knowledge and skills to tackle
any challenge.

We begin with an introduction to Python, exploring its history, advantages, and core
concepts. You will then learn how to get started with Python, including setting up your
development environment and using Jupyter Notebook for interactive programming.
As you progress through the chapters, you will delve into data structures, control flow
statements, functions, and more. The handbook also covers essential libraries such as
NumPy and Matplotlib, which are crucial for data analysis and visualization.

Throughout this handbook, we emphasize hands-on practice with detailed examples


and exercises. By the end of this journey, you will be well-equipped to develop robust
and efficient Python applications, analyze data, and create stunning visualizations.

I am Nitin Sharma, CEO and Founder of Brainalyst – A Data-Driven Company. This


handbook is the result of extensive collaboration and dedication. I would like to
acknowledge the unwavering support from Brainalyst – A Data-Driven Company and its
talented team. Their expertise and commitment have been invaluable in bringing this
comprehensive guide to life.

Join us on this journey to unlock the full potential of Python programming. Let’s
explore, analyze, and innovate together.

Nitin Sharma
Founder/CEO
Brainalyst- A Data Driven Company

Disclaimer: This material is protected under copyright act Brainalyst © 2021-2024. Unauthorized use and/ or
duplication of this material or any part of this material including data, in any form without explicit and written
permission from Brainalyst is strictly prohibited. Any violation of this copyright will attract legal actions.

2021-2024
BRAINALYST - PYTHON

PYTHON-BASIC TO ADVANCE
Python
Introduction to Python:
• Python is a high-level, versatile, and interpreted programming language known for its simplicity and
readability. Created by Guido van Rossum and first released in 1991, Python has gained immense
popularity in various domains, including web development, data science, artificial intelligence, au-
tomation, scientific computing, and more.

What is Python:
• Python is a general-purpose programming language that emphasizes code readability and encour-
ages the use of fewer lines of code through its clear, concise syntax. Python’s design philosophy
emphasizes code readability with the use of significant whitespace.

Python Basics:
• Python is an interpreted, high-level programming language with dynamic semantics.
• It is object-oriented, which allows for modeling real-world objects in code.
• Python is favored for its simplicity and readability, making it suitable for various applications.

Rapid Application Development (RAD):


• Python’s built-in data structures and dynamic features make it ideal for Rapid Application Develop-
ment.
• It is often used as a scripting or glue language to connect different software components.

Dynamic Typing and Binding:


• Python utilizes dynamic typing, meaning you don’t need to declare variable types.
• Dynamic binding allows objects to be modified during runtime, enhancing flexibility.

Pg. No.1 2021-2024


BRAINALYST - PYTHON

Readability and Reduced Maintenance Costs:


• Python’s syntax prioritizes readability, reducing the cost of program maintenance.
• It enforces a clean and consistent code style.

Modularity and Code Reuse:


• Python supports modules and packages, encouraging modularity and code reuse.
• This aids in organizing code into manageable, reusable components.
• Extensive Standard Library.
• Python includes an extensive standard library that provides pre-built modules and functions for
various tasks.
• This library simplifies common programming challenges.

Interpretation and No Compilation:


• Python is an interpreted language, meaning there is no compilation step.
• This results in a quick edit-test-debug cycle, enhancing productivity.

Debugging Simplicity:
• Python programs are easy to debug, as errors raise exceptions instead of causing segmentation
faults.
• The interpreter prints stack traces when exceptions are not handled.

Purpose of Python:
• Python serves a wide range of purposes, such as:
• Web Development: Python is used for building web applications and websites using frameworks
like Django and Flask.
• Data Science: Python is a leading language for data analysis, machine learning, and scientific com-
puting with libraries such as NumPy, Pandas, and scikit-learn.
• Artificial Intelligence: Python is popular for developing AI and machine learning models with
frameworks like TensorFlow and PyTorch.
• Automation: Python can automate repetitive tasks and scripting, making it ideal for system admin-
istration.
• Game Development: Python has libraries like Pygame for developing 2D games.
• IoT (Internet of Things): Python can be used to program and control IoT devices.
• Desktop Applications: Python can create cross-platform desktop applications using frameworks
like PyQt and Tkinter.
• Network Programming: Python is employed for building network applications.

What Can Python Do:


• Python can do the following and more:
• Create web applications.
• Analyze data.

2021-2024 Pg. No.2


BRAINALYST - PYTHON

• Build artificial intelligence models.


• Automate tasks.
• Develop games.
• Control IoT devices.
• Create desktop applications.
• Write network applications.
• Work with databases.

• Advantages of Python:
• Readability: Python’s clear and simple syntax makes it easy for beginners to learn and un-
derstand.
• Large Standard Library: Python has an extensive library that simplifies programming tasks.
• Cross-Platform: Python is available on multiple platforms, making it versatile.
• Community Support: Python has a large, active community that provides support, libraries,
and frameworks.
• Open Source: Python is open source and free, reducing costs for development.
• Scalability: Python is scalable and used in both small scripts and large applications.

• Disadvantages of Python:
• Performance: Python is not as fast as some other languages due to its interpreted nature.
• Global Interpreter Lock (GIL): The GIL can limit multi-threading performance in CPU-bound
applications.
• Not Ideal for Mobile Development: While Python can be used for mobile apps, it’s not the
best choice for resource-intensive applications.

Why Python:
• Python’s popularity stems from its simplicity, readability, extensive libraries, and versatility. It’s
widely adopted in diverse fields, and its large community ensures ongoing development and sup-
port.

Free Open Source vs. Licensed Software:


• The primary difference between open source and licensed software lies in their distribution and
usage terms:
• Open Source Software (OSS): Open source software is released with a license that allows anyone
to view, use, modify, and distribute the source code freely. It often has a community of contributors
who maintain and improve the software.
• Licensed Software: Licensed software is proprietary, and users must pay for a license to access and
use it. The source code is typically not available, and the software is often created and maintained
by a single organization.
• The choice between open source and licensed software depends on factors like cost, support, cus-
tomization, and your specific project requirements. Open source software is attractive due to its
flexibility, reduced costs, and a sense of community involvement, while licensed software may offer
dedicated support and proprietary features

Pg. No.3 2021-2024


BRAINALYST - PYTHON

• Python is open source, which contributes to its widespread use and community-driven develop-
ment. Open source software is often chosen for its accessibility, collaborative potential, and cost-ef-
fectiveness.

• The terms GUI, IDE, and reporting tool have specific meanings:

GUI (Graphical User Interface):


• A GUI, or Graphical User Interface, is a type of user interface that allows users to interact with com-
puter programs using graphical elements such as icons, buttons, windows, and menus, instead of
solely relying on text-based commands.
• Python provides several libraries and frameworks for creating GUI applications, making it easy to
develop desktop applications with graphical interfaces. Some popular Python GUI libraries include
Tkinter, PyQt, Kivy, and wxPython.

Why install anaconda


• Anaconda is a popular open-source distribution of the Python and R programming languages used
for data science, machine learning, and scientific computing.
• Bundle of all packages and tools

2021-2024 Pg. No.4


BRAINALYST - PYTHON

• Simplified Package Management: Anaconda simplifies package management and depen-


dencies. It comes with a package manager called Conda, which makes it easy to install, update,
and manage packages and libraries used in data science and scientific computing. Conda can
create isolated environments, ensuring that different projects don’t interfere with each other.

• Data Science Ecosystem: Anaconda includes a vast collection of data science, machine learn-
ing, and scientific computing libraries and tools. Popular libraries like NumPy, pandas, scikit-
learn, Matplotlib, and Jupyter are included in Anaconda by default. This saves you the trouble
of manually installing these packages.

What is difference between anaconda and miniconda


• Anaconda and Miniconda are both distribution and package management tools for Python and other
programming languages, primarily used in data science, machine learning, and scientific computing.

• Key Differences:

• The most significant difference between Anaconda and Miniconda is the number of
pre-installed packages. Anaconda comes with a comprehensive set of data science and
scientific computing packages, while Miniconda only includes the essentials to get you
started.
• Anaconda provides a graphical user interface (Anaconda Navigator) for managing en-
vironments and packages, making it beginner-friendly. Miniconda, on the other hand, is
more command-line-centric, which may be preferred by advanced users.
• Anaconda is a larger download because it includes a vast number of pre-installed pack-
ages. Miniconda is much smaller due to its minimalist approach.
• While Anaconda is an all-in-one solution for many users, Miniconda is often used when
you want to create a custom environment tailored to your specific needs.

Python version
• Python has had several major versions over the years. The two most commonly used versions today
are Python 2 and Python 3.
• Python 1.0 (January 26, 1994): This is the first official version of Python. It laid the foun-
dation for the language’s development.
• Python 2.0 (October 16, 2000): Python 2 introduced many new features and improvements
over Python 1.0. It became one of the most widely used versions and remained popular for
many years.
• Python 3.0 (December 3, 2008): Python 3 was a significant and backward-incompatible
update. It aimed to clean up and simplify the language. Key changes included the removal of
print as a statement, the introduction of the print() function, and changes to the way strings
and Unicode were handled. Python 3 is the current and recommended version of Python.
• Python 4: There is no official Python 4 release as of my knowledge cutoff date in September
2021. The Python community has indicated that any future major versions of Python will be
backward-compatible with Python 3.

Pg. No.5 2021-2024


BRAINALYST - PYTHON

How to download anaconda


• Visit the Anaconda Website:
• Go to the Anaconda website at https://www.anaconda.com/products/individual. Ensure that
you’re downloading Anaconda Individual Edition, which is suitable for personal use.

• Choose the Right Version:


• You’ll see download options for Windows, macOS, and Linux. Select the installer that matches
your operating system (e.g., Windows or macOS).

• Download the Installer:


• Click on the “Download” button for the selected version, and your browser will start down-
loading the Anaconda installer.

• Launch the Installer:


• Once the download is complete, run the Anaconda installer by double-clicking the download-
ed file. On Windows, it’s an executable (.exe) file, and on macOS, it’s a disk image (.dmg) file.

• Follow the Installation Wizard:


• The Anaconda installation wizard will guide you through the setup process. Follow the on-
screen instructions, and you’ll be asked to agree to the license agreement, select an instal-
lation location, and choose whether to add Anaconda to your system’s PATH environment
variable (recommended for easier command-line usage).

• Complete the Installation:


• After selecting your preferences, click “Install” or “Next,” and the installation will begin. It
might take a few minutes to complete, depending on your system’s performance.

• Installation Complete:
• Once the installation is finished, you’ll receive a confirmation message. You can now close the
installer.

• Start Anaconda Navigator (Optional):


• You can start Anaconda Navigator, a graphical user interface (GUI) for managing your Python
environments and packages, by searching for “Anaconda Navigator” in your applications or
programs.

2021-2024 Pg. No.6


BRAINALYST - PYTHON

In Jupyter Notebook, you can work in two primary states:


Edit Mode:
• In Edit Mode, you can edit the content of a specific cell, such as Python code or text.
• When a cell is in Edit Mode, its border is typically highlighted in green.
• To enter Edit Mode, click inside a cell, or press Enter when the cell is selected.

Command Mode:
• In Command Mode, you interact with the Notebook as a whole, manipulating cells and per-
forming various tasks.
• When a cell is in Command Mode, its border is typically highlighted in blue.
• To enter Command Mode, press Esc or click outside the cell’s content area.
• Within these modes, you can work with cells in three main modes:

Code Mode:
• This is where you write and execute Python code.
• You can run a code cell by pressing Shift + Enter or by clicking the “Run” button.
• The output of the code execution, such as results and error messages, is displayed below the
cell.

Raw NBConvert Mode:


• Raw cells are used for storing content that should not be executed as code.
• This mode is often used for storing plain text or data that is not meant to be run.
• Raw cells are typically used when you need to include non-executable information in your
Notebook.

Markdown Mode:
• Markdown cells contain formatted text, which allows you to create rich-text documentation.
• You can use Markdown syntax to style and structure your text, including headings, lists, links,
and more.
• Markdown cells are often used for explanations, documentation, and commentary within
your Notebook.

Pg. No.7 2021-2024


BRAINALYST - PYTHON

In Jupyter Notebook, you can add and delete cells to customize your document.
To add a cell:
• While in Command Mode (blue border), you can add a new cell either above or below the
current cell.
• To add a cell above the current cell, press A (for “above”).
• To add a cell below the current cell, press B (for “below”).
• A new cell of the same type as the current cell (Code or Markdown) will appear, and you can
start typing in it.
To delete a cell:
• While in Command Mode, select the cell you want to delete by clicking it (the selected cell
will have a blue border).
• To delete the selected cell, press the X key (similar to the “cut” command). The cell will be
removed from the Notebook.

Comparing notepad and jupyter notebook


• Notepad and Jupyter Notebook are both text editors, but they serve different purposes and have
distinct features.
Notepad:
• Simple Text Editor: Notepad is a basic, lightweight text editor that comes pre-installed with
Windows.
• Text Files: It’s primarily used for creating and editing plain text files, such as configuration
files, scripts, or simple notes.
• No Code Execution: Notepad doesn’t support code execution, debugging, or interactive de-
velopment.
• Limited for Coding: It’s not well-suited for software development, data analysis, or scientific
computing.
• No Syntax Highlighting: Notepad lacks syntax highlighting for code or data formats.
• No Markdown Support: It doesn’t support Markdown, which is commonly used for creating
documentation.
• No Collaboration Features: Notepad doesn’t have features for collaboration or sharing doc-
uments.
Jupyter Notebook:
• Interactive Development: Jupyter Notebook is an interactive development environment
widely used for data science, scientific computing, and research.
• Notebook Documents: It allows you to create notebook documents that mix code (Python, R,
etc.), rich-text (Markdown), and multimedia elements (images, plots).
• Code Execution: You can run code cells one by one, which is especially useful for data analysis
and visualization.

2021-2024 Pg. No.8


BRAINALYST - PYTHON

• Rich Text: Jupyter Notebook supports Markdown, enabling the creation of rich documenta-
tion alongside code.
• Syntax Highlighting: It provides syntax highlighting for various programming languages.
• Data Visualization: Jupyter supports data visualization libraries, making it easy to create and
display plots and graphs.
• Extensions: You can install extensions and widgets to enhance its capabilities.
• Collaboration: Jupyter Notebooks can be shared with others, making it suitable for collabo-
rative work.

Important Shortcuts
Command Mode Shortcuts (Press Esc to enter):
• H: Show all shortcuts.
• A: Insert a new cell above.
• B: Insert a new cell below.
• D, D (Press D twice): Delete the current cell.
• Y: Change the cell type to code.
• M: Change the cell type to Markdown.
• R: Change the cell type to raw.
• 1 to 6: Convert the cell to a heading with the corresponding level.
• Up Arrow or K: Select the cell above.
• Down Arrow or J: Select the cell below.
• Shift + Up Arrow or Shift + K: Extend the selection above.
• Shift + Down Arrow or Shift + J: Extend the selection below.
• Shift + M: Merge selected cells.
• Shift + Up/Down: Select multiple cells.
• Shift + L: Turn on/off line numbers.
• Edit Mode Shortcuts (Press Enter to enter):
• Ctrl + Enter: Run the current cell.
• Shift + Enter: Run the current cell and move to the next one.
• Alt + Enter: Run the current cell and insert a new one below.
• Ctrl + S: Save the notebook.
• Ctrl + Z: Undo.
• Ctrl + Shift + Z or Ctrl + Y: Redo.
• Ctrl + /: Comment/uncomment lines.
• Tab: Code completion or indent.
• Shift + Tab: Tooltip with documentation.
• Ctrl + Shift + - (minus key): Split the cell at the cursor position.
Pg. No.9 2021-2024
BRAINALYST - PYTHON

• Other Shortcuts (In both modes):


• Esc + F: Find and replace in the notebook.
• Shift + Space: Scroll up.
• Space: Scroll down.
• Shift + M: Merge selected cells.
• L: Turn on/off line numbers.

To access all the shortcuts in Jupyter Notebook without using keyboard shortcuts
• Menu Bar:

• In the Jupyter Notebook, you can find the menu bar at the top.
• Click on different menu options to find the list of available shortcuts.
• For example, under “Help,” you will find a “Keyboard Shortcuts” option. Clicking on it will dis-
play a list of keyboard shortcuts.

2021-2024 Pg. No.10


BRAINALYST - PYTHON

Pg. No.11 2021-2024


BRAINALYST - PYTHON

In Jupyter Notebook, you can edit and format text within Markdown cells using vari-
ous formatting options like style, size, bold, and color.
Markdown Cell:
• Ensure you are in a Markdown cell (not a code cell).
• To create a Markdown cell, select a cell and change its type to “Markdown” using the toolbar
or keyboard shortcuts (M for Markdown).

Styling Text:
• To style text, you can use Markdown formatting. For example:
• To italicize text, wrap it with asterisks or underscores (*italic* or _italic_).
• To make text bold, wrap it with double asterisks or underscores (**bold** or __bold__).

Changing Text Size:


• You can’t directly control text size in standard Markdown. It’s determined by the rendering.
You can simulate different text sizes by using HTML tags.
• For example, <small>small text</small>, <big>large text</big>, or use headings like ##
Heading 2 for larger text.

Changing Text Color:


• Changing text color in Markdown directly is not supported. You can use HTML for this:
• <font color=”red”>Red text</font> will display red text.

Adding Links:
• To add hyperlinks, use the [text](URL) format. For example: [OpenAI’s website](https://
www.openai.com).
• Lists and Bullet Points:
• To create lists, use *, -, or 1. for bullet points or numbered lists.

Math Formulas:
• For mathematical equations, you can use LaTeX notation within Markdown cells, enclosed in
dollar signs. For example, $$E=mc^2$$ will render the famous equation.

Preview:
• After entering your Markdown text and formatting, press Shift + Enter to render the cell and
see how it looks.

2021-2024 Pg. No.12


BRAINALYST - PYTHON

Part – 8 (01:37:00 – 01:40:00)


Fundamentals of Python:
• In Python, variables, data types, and operators are fundamental concepts used for working
with data and performing operations.

Variable:
• A variable is a name that can be used to store data values.
• Variables are created when you assign a value to them.
• Variable names are case-sensitive and can contain letters, numbers, and underscores.
• Variable names must start with a letter (a-z, A-Z) or an underscore (_).

Data Types:
• Python supports various data types that define the kind of data a variable can hold. Common
data types include:

Operators:
• Operators are used to perform operations on variables and values. Python supports various
types of operators:

Pg. No.13 2021-2024


BRAINALYST - PYTHON

Variables:
• A variable in Python is a named storage location used to store data.
• It can hold various types of data, such as numbers, strings, lists, or custom objects.
• You can think of a variable as a label or a name that refers to a value.

Variable Rules:
Naming Rules:
• Variable names must start with a letter (a-z, A-Z) or an underscore (_).
• After the first character, variable names can contain letters, numbers (0-9), or underscores.
• Variable names are case-sensitive, meaning myVar and myvar are treated as different vari-
ables.
• Use descriptive names that convey the purpose of the variable (e.g., age instead of a).

Reserved Words:
• Avoid using Python’s reserved words or keywords as variable names (e.g., if, while, for, print).
Here’s a list of Python reserved words: https://docs.python.org/3/reference/lexical_analy-
sis.html#keywords.

Style Conventions:
• Follow the Python PEP 8 style guide for naming conventions (https://www.python.org/dev/
peps/pep-0008/). It suggests using lowercase letters and underscores for variable names
(e.g., my_variable_name).

Variable Syntax:
• Assigning a value to a variable is done using the = operator. For example:

What to Avoid:
• Avoid using single-letter variable names (like x, y) for anything other than loop counters.
• Avoid using ambiguous variable names that don’t clearly convey the purpose of the variable.
• Don’t reuse variable names for different types of data within the same scope (e.g., using total for
both numbers and strings).
• Be mindful of variable scoping. Variables declared inside functions have local scope, while those
declared outside functions have global scope. Avoid reusing global variable names within functions.

2021-2024 Pg. No.14


BRAINALYST - PYTHON

In Jupyter Notebook, cells are the building blocks of your interactive documents.

Two common types of cells you’ll work with are Input cells and Output cells:
Input Cell:
• An input cell is where you write and execute your code.
• You can type Python code, Markdown text, or other supported content in an input cell.
• To run the code within the input cell, you can press Shift + Enter (or Shift + Return), and the output
will appear below it.
• Input cells are typically marked with “In [ ]:” to indicate the order in which they were executed.

Pg. No.15 2021-2024


BRAINALYST - PYTHON

Output Cell:

• After running an input cell that contains code, the results or output of that code will be dis-
played in an output cell below the input cell.
• Output cells can contain text, tables, plots, error messages, or any other output generated by
your code.
• Output cells are typically marked with “Out [ ]:” to match them with the corresponding input
cell. The number inside the brackets corresponds to the execution order.

Functions
• A function is a block of reusable code that performs a specific task.
• Python provides built-in functions (like print(), len()) and allows you to create your func-
tions using the def keyword.

Types and print():


• Python is dynamically typed, meaning you don’t have to declare the type of a variable.
• Common data types include integers (int), floating-point numbers (float), strings (str), and
more.
• To print output, you can use the print() function. For example:
• print(“Hello, World!”).

• There are well-established naming conventions and rules in Python, and they are typ-
ically referred to as PEP 8 (Python Enhancement Proposal 8) guidelines. PEP 8 pro-
vides recommendations for naming identifiers (variables, functions, classes, etc.) in
Python code. Here are some key rules and guidelines from PEP 8:

2021-2024 Pg. No.16


BRAINALYST - PYTHON

Variable and Function Names:


• Use lowercase letters for variable and function names.
• Separate words with underscores for readability (snake_case).
Constants:
• Use uppercase letters for constants.
• Separate words with underscores.
Variable Names:
• Choose descriptive and meaningful variable names to improve code readability.
• Avoid using single-character variable names (e.g., a, b) unless they represent loop counters
in short loops.

Comments
• In Python, comments are used to provide explanatory notes or descriptions within your
code. Comments are not executed as part of the program and are meant for developers to
understand the code. Python supports both single-line and multi-line comments.

Single-Line Comments:
• Single-line comments are used to comment a single line of code.
• You can use the # symbol to start a single-line comment, and everything after # on that line
is considered a comment.

Double-Quoted Strings as Comments:


• Python also allows you to use double-quoted strings as comments.
• These are typically used when you want to provide multi-line comments.

Pg. No.17 2021-2024


BRAINALYST - PYTHON

Questions, revision – what learned

2021-2024 Pg. No.18


BRAINALYST - PYTHON

• How to launch anaconda navigator

What will cover in class – 2


• Data types
• Operators
• Syntax rules
• Control rules
• Conditional
• Loops

Data types
• In Python, data types are classifications that specify which type of value a variable can hold.
They are important because they define how data can be manipulated, what operations can
be performed on the data, and how data is stored in memory. Python is known for its simplic-
ity, and one way it maintains this simplicity is by using flexible, dynamic data types.
Why Data Types are Needed:
• Data types are essential for several reasons:
• Memory Allocation: Data types determine how much memory is allocated for a variable.
For example, an integer variable requires a different amount of memory compared to a float-
ing-point variable.
• Operations: Different data types support different operations. For instance, you can perform
arithmetic operations on numeric data types, but not on text data.

Pg. No.19 2021-2024


BRAINALYST - PYTHON

• Data Validation: Data types help ensure that the data you store in a variable is valid for the
type. For example, you can’t store text in an integer variable.
• Data Interpretation: Data types help Python understand how to interpret and display the
data.
• Performance: Using appropriate data types can lead to more efficient code and better per-
formance.

Only used 4 types of data types


• In Python, there are four fundamental data types that form the building blocks for all data and vari-
ables.

Int (Integer):
• An integer is a whole number, positive or negative, without any decimal point.
• Example: 5, -10, 0
• Use cases: Integers are used to represent whole numbers in Python. They are suitable for
counting items, indexing lists, and performing mathematical operations.

Float (Floating-Point):
• A float is a number that includes a decimal point, or it can be expressed using scientific no-
tation.
• Example: 3.14, -0.005, 2.0e-3
• Use cases: Floats are used when you need to work with real numbers, including fractions
and numbers with decimal places. They are commonly used in scientific calculations and for
storing measurements.

Bool (Boolean):
• A boolean represents a binary value that can be either True or False.
• Example: True, False
• Use cases: Booleans are primarily used for decision-making and control flow in Python. They
help in writing conditions, loops, and defining the logic of a program.

2021-2024 Pg. No.20


BRAINALYST - PYTHON

Str (String):
• A string is a sequence of characters, enclosed in either single or double quotes.
• Example: “Hello, World!”, ‘Python is fun’
• Use cases: Strings are used to store and manipulate text data. They are fundamental for han-
dling textual information, from simple messages to complex documents.

Here’s why these four data types are essential:


• int represents whole numbers, and it’s crucial for counting, indexing, and performing mathematical
operations.
• float handles real numbers, including decimal and fractional values, making it suitable for scientific
and financial calculations.
• bool provides the basis for decision-making and logic within programs. It’s essential for creating
conditional statements.
• str is vital for handling textual data, from simple string manipulation to complex document process-
ing. It’s the basis for working with text in Python.

Pg. No.21 2021-2024


BRAINALYST - PYTHON

• Cant convert directly str to int


• Value Error

2021-2024 Pg. No.22


BRAINALYST - PYTHON

Why not convert directly str to int


• The error you’re encountering is a “ValueError,” and it occurs because you’re trying to convert the
string “5.7” directly to an integer. Since the string “10.5” contains a decimal point, it cannot be direct-
ly converted to an integer because integers represent whole numbers without decimal parts.
• To avoid this error, you should first convert the string to a float and then to an integer:

• In Python, True, False, and None are special constants representing Boolean values and a lack of
value (NoneType) respectively.

True and False:


• True and False are the two Boolean constants in Python.
• They are used to represent the truth values of logic and conditions.
• They are often used in control structures like if statements and loops to make decisions.
None:
• None is a special constant that represents the absence of a value or a null value.
• It is often used to indicate that a variable or function has no specific value.

Pg. No.23 2021-2024


BRAINALYST - PYTHON

Operators

• In Python, operators are special symbols or keywords used to perform various operations on
data, variables, or values. Python provides a wide range of operators for tasks such as arith-
metic, comparison, logical.
Arithmetic Operators:
• Arithmetic operators are used to perform basic mathematical operations.
• + # Addition
• - # Subtraction
• * # Multiplication
• / # Division
• % # Modulus (remainder)
• // # Floor Division (integer division)
• ** # Exponentiation

• Importance: Arithmetic operators are used for performing mathematical calculations in Py-
thon.

• Rules:
• Operate on numeric data types (integers and floating-point numbers).
• Use parentheses to control the order of operations (like in regular math).
• Division (/) results in a floating-point number, even if the operands are integers.

2021-2024 Pg. No.24


BRAINALYST - PYTHON

• Floor division (//) returns an integer and discards the fractional part.
• Modulus (%) returns the remainder after division.
• Exponentiation (**) raises the left operand to the power of the right operand.

• When to Use: Arithmetic operators are used for basic calculations such as addition, subtrac-
tion, multiplication, division, and more. They are essential for numeric computations in Python.

Comparison Operators:
• Comparison operators are used to compare two values and return a Boolean result.

• == # Equal to
• != # Not equal to
• < # Less than
• > # Greater than
• <= # Less than or equal to
• >= # Greater than or equal to

• Importance: Comparison operators are used for comparing values and returning Boolean
results.

• Rules:
• Compare values of any data type.
• Result in True or False.

Pg. No.25 2021-2024


BRAINALYST - PYTHON

• When to Use: Comparison operators are vital for making decisions in conditional statements
and loops. They are used to test conditions and control the flow of your program.

Logical Operators:
• Logical operators are used to combine conditional statements.

• Importance: Logical operators are used for combining or negating conditions.

• Rules:
• Operate on Boolean values.
• and returns True if both conditions are True.
• or returns True if at least one condition is True.
• not negates the condition.

• When to Use: Logical operators are essential for creating complex conditions and controlling
program flow based on multiple conditions.

2021-2024 Pg. No.26


BRAINALYST - PYTHON

Bit wise operator

• We discuss letter

Assignment Operators:
• Assignment operators are used to assign values to variables.
• = # Assign a value
• += # Add and assign

Pg. No.27 2021-2024


BRAINALYST - PYTHON

• -= # Subtract and assign


• *= # Multiply and assign
• /= # Divide and assign
• %= # Modulus and assign
• //= # Floor division and assign
• **= # Exponentiate and assign

• Importance: Assignment operators are used for assigning values to variables.

• Rules:
• Used for assigning values to variables.
• Combining arithmetic operations and assignment (e.g., +=, -=) in a single statement.

• When to Use: Assignment operators are used extensively to store and manipulate data. They
are essential for variable assignment and updating.

2021-2024 Pg. No.28


BRAINALYST - PYTHON

Syntax rules

• Control flow statements in Python allow you to control the order in which your code is executed.
Here, I’ll explain the syntax rules for the main control flow statements in Python, including condi-
tional statements (if, elif, else), loop statements (for and while), and function definitions.

Conditional Statements (if, elif, else):

If Statement:
• Syntax:

• Explanation: The if statement is used to execute a block of code if the specified condition is
True.
• Usage: The if statement is used to conditionally execute a block of code when the specified
condition is True.
• Working: The condition is evaluated. If it’s True, the code block under the if statement is
executed. If the condition is False, the code block is skipped.
Pg. No.29 2021-2024
BRAINALYST - PYTHON

• Limitations: The if statement is limited to checking a single condition.

elif Statement (Optional):


• Syntax:

• Explanation: The elif statement allows you to check multiple conditions in sequence. If the
first condition is not met, it moves on to the next elif condition.
• Usage: The elif statement is used to check multiple conditions sequentially when the previ-
ous condition(s) are False.
• Working: Conditions are evaluated in sequence. The first True condition’s block of code is
executed, and the rest are skipped.
• Limitations: You can use multiple elif statements, but it’s not always the most efficient way
to handle complex conditionals.

else Statement (Optional):


• Syntax:

• Explanation: The else statement is used to specify code that is executed when the initial if condition
is False.
• Usage: The else statement complements the if statement and is used to specify code executed when
the initial condition is False.
• Working: If the if condition is True, the code block under if is executed. If it’s False, the code block
under else is executed.
• Limitations: It’s designed to handle binary conditions (True/False), and you cannot specify multi-
ple conditions.

2021-2024 Pg. No.30


BRAINALYST - PYTHON

Pg. No.31 2021-2024


BRAINALYST - PYTHON

2021-2024 Pg. No.32


BRAINALYST - PYTHON

Tricky questions and answers


1. Question: What is the key difference between if and elif?
Answer: The key difference is that if is used for the initial condition check, and only one if block is
executed. In contrast, elif comes after if and is used to check additional conditions sequentially until
a True condition is found and executed. It provides an alternative condition to the original if.

2. Question: Is it possible to write code that does not contain an if but only elif and else?
Answer: No, it’s not possible. elif and else statements always follow an initial if statement. They pro-
vide alternative conditions to be executed if the initial if condition is False

Pg. No.33 2021-2024


BRAINALYST - PYTHON

3. Question: How can you represent the switch-case construct (available in some other lan-
guages) using Python’s if-elif-else?
Answer: Python does not have a switch-case construct. You can achieve similar functionality using a
dictionary of functions or if-elif-else statements.

4. Question: Explain the purpose of using the else statement with no condition (e.g., else:) in
an if-else block.
Answer: An else block with no condition in Python acts as a catch-all. It is executed when none of
the preceding if or elif conditions are True. It provides a default action when no other conditions are
met.

5. Question: What happens if you have a condition that is always True in an if-elif-else chain?
Answer: If a condition is always True in an if-elif-else chain, the code block associated with that
condition will execute, and the rest of the conditions (if any) will be skipped. This is why the order
of conditions is important.

6. Question: How can you achieve the behavior of else if (common in some other languages)
in Python?
Answer: In Python, you use elif to achieve the same behavior as else if in other languages. It allows
you to check additional conditions after the initial if condition.

7. Question: Can you have nested if-elif-else statements? If so, what is the limit?
Answer: Yes, you can nest if-elif-else statements inside other if-elif-else statements. There is no strict
limit to the nesting depth, but it’s essential to maintain code readability and avoid excessive nesting.

8. Question: How do you prevent code redundancy when multiple conditions require the
same action?
Answer: To prevent redundancy, you can assign a variable or calculate the result once and use it in
multiple if or elif conditions. This promotes code reusability and improves maintainability.

9. Question: Explain the purpose of the pass statement in an if-elif-else block.


Answer: The pass statement is a placeholder used when a code block is syntactically required but
has no functionality. It’s often used during code development and testing.

10. Question: Can an if-elif-else block contain only elif conditions without an initial if condi-
tion?
Answer: No, an if-elif-else block must start with an initial if condition. elif and else are designed to
provide alternative conditions and actions based on the outcome of the initial if condition.

2021-2024 Pg. No.34


BRAINALYST - PYTHON

While Loop:
• A while loop in Python is used to repeatedly execute a block of code as long as a given condition is
True. It’s suitable for scenarios where you don’t know in advance how many times the loop should
run.
• Mostly use for loop in python.

Syntax:
• The basic syntax of a while loop in Python is as follows:

• condition: A Boolean expression that determines whether the loop should continue or ter-
minate.
How It Works:
• The condition is evaluated. If it’s True, the code block within the loop is executed.
• After the code block execution, the condition is evaluated again.
• If the condition remains True, the loop continues to execute.
• This process repeats until the condition becomes False.
• When the condition is False, the loop terminates, and the program continues with the code
after the loop
Use Cases:
• Unknown Number of Iterations: When you don’t know in advance how many times the loop
should run.
• Continuous Monitoring: To monitor a condition and respond when the condition becomes
False.
• User Interaction: To repeatedly ask for user input until a valid response is received.

Common Mistakes:
• Forgetting to Update the Condition: Ensure that the condition within the while loop is modi-
fied inside the loop to eventually become False. Forgetting this can lead to infinite loops.
• Incorrect Initialization: Initialize variables used in the condition outside the loop, or the loop
may not run at all.
• No Exit Condition: Be cautious not to create loops without a way to exit. Always have an exit
strategy, such as breaking the loop under certain conditions.
• Infinite Loops: Care must be taken to avoid creating infinite loops, as they can consume sys-
tem resources and cause your program to become unresponsive.
• User Input: When using a while loop for user input, provide a clear way for users to exit the
loop or terminate the program.
Pg. No.35 2021-2024
BRAINALYST - PYTHON

• Condition Evaluation: Ensure the condition in the while loop is structured in a way that it
eventually becomes False. Otherwise, the loop will run indefinitely.
• Initialization: Initialize variables used in the loop outside the loop to avoid issues with reini-
tialization.

2021-2024 Pg. No.36


BRAINALYST - PYTHON

• Control flow statements in Python, including loops like for and while, allow you to ex-
ecute code conditionally or repeatedly.

For Loop:
• A for loop in Python is used to iterate over a sequence (such as a list, tuple, string, or range)
and execute a block of code for each item in the sequence. The loop continues until all items
in the sequence have been processed.

Syntax: The basic syntax of a for loop in Python is:

• variable: A variable that takes the value of each item in the sequence during each iteration.
• sequence: A collection of items (e.g., a list, tuple, string) that the loop iterates through.
How It Works:
• The for loop starts by assigning the first item in the sequence to the variable.
• The code block within the loop is executed with the variable set to the first item.
• The loop repeats steps 1 and 2 for each item in the sequence.
• The loop terminates when all items in the sequence have been processed.

Iterating over a list of numbers:

Pg. No.37 2021-2024


BRAINALYST - PYTHON

Use Cases:
• Processing Data: for loops are commonly used to process data stored in collections (lists,
tuples, dictionaries) or strings. For each element in the collection, you can perform specific
operations.
• Iterating Over Ranges: You can use a for loop with the range() function to generate a se-
quence of numbers to iterate through.
• File Handling: for loops are used to read files line by line.
Common Mistakes:
• Modifying the Sequence: Avoid modifying the sequence within the loop. It can lead to unex-
pected behavior.
• Infinite Loops: Ensure that the loop’s condition eventually becomes False to prevent infinite
loops.
• Indentation: Proper indentation is crucial in Python. Ensure that the code block within the
loop is indented correctly.
When to Use:
• Use a for loop when you have a collection of items, and you want to perform a set of opera-
tions on each item in the collection.
Limitations:
• You should not modify the sequence within the loop because it can lead to unexpected be-
havior.
• Be cautious when iterating over ranges, especially in a while loop, to prevent infinite loops.
Important Notes:
• For efficiency, consider using list comprehensions for simple operations on sequences.
• Ensure that your loop conditions eventually become False to prevent infinite loops.
• Pay attention to indentation and colon usage, as they are essential for the correct structure
of loops in Python.
• Always initialize the loop variables before using them in a loop.
• Be cautious when modifying the sequence within a for loop, as this can lead to unexpected
behavior.

2021-2024 Pg. No.38


BRAINALYST - PYTHON

Pg. No.39 2021-2024


BRAINALYST - PYTHON

Range Function:
• In Python, the range function is used to generate a sequence of numbers within a specified range. It’s
often used in for loops to iterate a specific number of times.
Syntax:
• The basic syntax of the range function is as follows:

• Start (optional): The starting value of the sequence. The default is 0.


• Stop: The stopping value. The generated sequence includes all numbers up to, but not
including, this value.
2021-2024 Pg. No.40
BRAINALYST - PYTHON

• Step (optional): The step size, indicating the interval between numbers in the se-
quence. The default is 1.

Usage:
• When using range(stop), it generates a sequence starting from 0 to stop - 1.
• When using range(start, stop), it generates a sequence from start to stop - 1.
• When using range(start, stop, step), it generates a sequence from start to stop - 1,
with a step size of step.

Important Notes:
• The range function generates a sequence of numbers efficiently without creating a list
in memory. This is beneficial for large ranges.
• The start, stop, and step values can be negative, which allows you to generate se-
quences in reverse or decremental order.
• The sequence generated by range is “half-open,” meaning it includes the start value
but excludes the stop value. For example, range(2, 6) generates numbers from 2 to 5.
• The range function is often used in for loops to iterate a specific number of times. For
example, for i in range(5) will iterate five times.
• To convert the range object into a list of numbers, you can use the list function. For
example, list(range(5)) will return [0, 1, 2, 3, 4].
Use Cases:
• Iterating over a sequence of numbers.
• Specifying the number of iterations in a loop.
• Creating custom sequences of numbers for various purposes.
Common Mistakes:
• Forgetting that the stop value is not included in the generated sequence. Make sure to
adjust your loop accordingly.
• Specifying a negative step value when you intend to generate a sequence in increasing
order.
Limitations:
• The range function generates sequences of integers only. It does not support float-
ing-point numbers.

Pg. No.41 2021-2024


BRAINALYST - PYTHON

Data structures
• Data structures are a fundamental concept in computer science and programming. They enable you
to efficiently store, organize, and manipulate data. In Python, there are several built-in data struc-
tures, including lists, tuples, dictionaries, and sets.
Properties of Data Structures:
• One-Dimensional: Lists, tuples, dictionaries, and sets are all one-dimensional, which means
they store data in a linear sequence. For multi-dimensional data structures, you can use lists
of lists or other nested structures.
• Heterogeneous: These data structures allow you to store elements of different data types.
For example, you can have a list that contains integers, strings, and lists, all within the same
list.
• No Broadcasting: Unlike some numerical libraries like NumPy, these built-in data structures
do not inherently support broadcasting. Broadcasting typically involves applying operations
element-wise or across arrays.
• Vectorization: While these data structures do not support vectorized operations like NumPy
arrays, you can perform vectorized operations using list comprehensions or similar tech-
niques. Vectorization in this context means efficiently applying an operation to all elements
of a data structure.

2021-2024 Pg. No.42


BRAINALYST - PYTHON

Lists and Tuples in Python:


Lists:
Creating a List:
• Lists are created by enclosing a comma-separated sequence of values within square brackets
[ ].

Pg. No.43 2021-2024


BRAINALYST - PYTHON

Creating Lists with a Range:


• To create a list with a range of values, use the list() constructor with the range() function. The range()
function generates a sequence of numbers.

• In this example, range(1, 11) generates numbers from 1 (inclusive) to 11 (exclusive). The list() con-
structor converts this range into a list.

Tuples:
Creating a Tuple:
• Tuples are created by enclosing a comma-separated sequence of values within parentheses
( ).

Accessing Elements:
• Like lists, tuples are ordered, and you can access their elements using indices, starting from
0.

2021-2024 Pg. No.44


BRAINALYST - PYTHON

Creating Tuples with a Range:


• To create a tuple with a range of values, you can use the tuple() constructor with the range()
function in a similar way.

• In this example, range(2, 11, 2) generates even numbers from 2 (inclusive) to 11 (ex-
clusive) with a step of 2. The tuple() constructor converts this range into a tuple.

Range Function Parameters:


• The range() function takes three parameters:
• Start: The starting value of the range (inclusive).
• Stop: The stopping value of the range (exclusive).
• Step: The interval between numbers. It’s optional and defaults to 1.
Use Cases:
• Creating lists and tuples with specified ranges is useful for various scenarios. You can gen-
erate sequences of numbers for iterations, indexes, or any other purpose. This approach is
efficient and memory-friendly for generating large sequences of numbers.

Pg. No.45 2021-2024


BRAINALYST - PYTHON

Important Note:
• Keep in mind that the range() function generates numbers within the specified range, but it
does not create a list or tuple by itself. You need to wrap it with list() or tuple() to convert it
into the desired data structure.

Accessing Values in Lists and Tuples in Python


• In Python, you can access values in lists and tuples using indexing.
• Accessing Values by Index:
• To access a specific value in a list or tuple, use square brackets with the index of the item you want
to retrieve. Python uses 0-based indexing, meaning the first element has an index of 0, the second
element has an index of 1, and so on.

Negative Indexing:
• You can also use negative indexing to access elements from the end of the list or tuple. -1
represents the last element, -2 the second-to-last, and so on.

2021-2024 Pg. No.46


BRAINALYST - PYTHON

Slicing:
• Slicing allows you to access multiple elements in a sequence. It uses the colon : to specify a range of
indices.
• Slicing is a powerful feature in Python for working with sequences like lists, tuples, and strings. It
allows you to create new sequences by extracting a portion of an existing sequence.

Basic Slicing Syntax:


• Slicing uses the colon (:) operator to specify a range within a sequence. The general syntax
is sequence[start:end], where start is inclusive, and end is exclusive. This means that the
resulting slice includes elements from the start index up to, but not including, the end index.

Slicing with Omitted Start or End:


• You can omit the start or end in the slicing syntax. Omitting the start is equivalent to starting
from the beginning, and omitting the end is equivalent to ending at the sequence’s last ele-
ment.

Pg. No.47 2021-2024


BRAINALYST - PYTHON

Slicing with Step Value:


• You can specify a step value in slicing by adding a second colon followed by the step size. The
step determines how many elements are skipped in the sequence.

Negative Indexing and Slicing:


• Slicing supports negative indices. Negative indices count from the end of the sequence, with
-1 referring to the last element.

• The starting index is inclusive (it’s part of the slice).


• The ending index is exclusive (it’s not part of the slice).
• Omitting the start or end index will slice from the beginning or until the end, respectively.

Slicing vs. Indexing:


• The primary difference between indexing and slicing is that indexing returns a single element
at a specified index, while slicing returns a subsequence as a new sequence. Indexing is used
to access individual items, and slicing is used for extracting and manipulating subarrays.

2021-2024 Pg. No.48


BRAINALYST - PYTHON

• In Python, the id() function is used to get the identity (unique identifier) of an object. It re-
turns an integer that represents the memory address of the object. Each object in Python has
a unique id, which can be considered as a unique identifier for that object.

• In this example, a and b are two variables. We assign the integer value 10 to a and 5 to b.
When we use id(a) and id(b), Python returns the unique memory addresses for these two
variables.

How it works
• Identity of Objects: Each object in Python has a unique identity, which is determined
by its memory address. The id() function retrieves this unique identifier.
• Memory Addresses: Python stores objects in memory, and each object is assigned a
specific memory location. The id() function returns the memory address as an integer
value.
• Object Comparison: You can use id() to compare objects to see if they are the same.
If two variables have the same id, they refer to the same object in memory.

Pg. No.49 2021-2024


BRAINALYST - PYTHON

• Mutable vs. Immutable Objects: The behavior of id() can vary depending on wheth-
er the object is mutable or immutable. Mutable objects (e.g., lists, dictionaries) may
have the same id after modifications, while immutable objects (e.g., integers, strings)
will have different id values if modified.

• n this example, the id of x and y is the same initially. However, when we modify x, it
gets a new memory address because integers are immutable.
• Garbage Collection: Python manages memory using a mechanism called garbage
collection. When an object is no longer referenced, Python reclaims the memory oc-
cupied by that object. The id() value is not guaranteed to remain the same once the
object is garbage-collected.
• Caveats: While id() is useful for comparing object identity, it is not typically used for
common programming tasks. Python provides other methods for object comparison,
such as the == operator for comparing values and the is operator for checking identity.

2021-2024 Pg. No.50


BRAINALYST - PYTHON

• In Python, Object-Oriented Programming (OOP) is a programming paradigm that uses


objects and classes for designing and structuring code. OOP is a fundamental concept
in Python and many other modern programming languages. It offers a way to model
real-world entities and their interactions in a software application. OOP is based on
several key principles and concepts:
• Objects: Objects are the core building blocks of OOP. They represent real-world enti-
ties or concepts and encapsulate both data (attributes) and the methods (functions)
that operate on that data. For example, in a banking application, you can have objects
representing customers, accounts, and transactions.
• Classes: A class is a blueprint or template for creating objects. It defines the structure
and behavior of objects. The class acts as a blueprint that defines the attributes (data)
and methods (functions) that the objects created from it will have.

• In Python, you can use the dir() function to get a list of all the attributes and
methods available for a particular object, including built-in objects like tuples
and lists.

Pg. No.51 2021-2024


BRAINALYST - PYTHON

Updating methods in Python


1. extend()
• The extend() method is used to add multiple elements to the end of a list from an iterable
(e.g., another list, tuple, or string). It modifies the original list in place.

• Syntax:

2. insert()
• The insert() method is used to add an element at a specific position in the list. It takes two
arguments: the index at which to insert the element and the value to be inserted. It also mod-
ifies the original list in place.

• Syntax:

2021-2024 Pg. No.52


BRAINALYST - PYTHON

3. append()
• The append() method is used to add an element to the end of the list. It modi-
fies the original list in place.

• Syntax:

• Important Note: append() adds only one element at a time to the end of the list.

Key Points:
• insert() allows you to specify the index where the element should be inserted.
• It modifies the original list in place.
• It can be used to insert a single element at a specific location within the list.
Differences:
• extend() is used for adding multiple elements from an iterable to the end of
the list.
• append() adds a single element to the end of the list.
• insert() inserts an element at a specified index within the list.

Methods for removing elements from a list in Python


1. remove()
• The remove() method is used to remove the first occurrence of a specific value from a list. If
the value is not found in the list, it raises a ValueError.
• Syntax:

Pg. No.53 2021-2024


BRAINALYST - PYTHON

Key Points:
• Remove() deletes the first occurrence of the specified value from the list.
• If the value is not found in the list, it raises a ValueError.
• It modifies the original list in place.
2. Pop()
• The pop() method is used to remove an element from the list at a specified index. It returns
the removed element. If the index is not provided, it removes and returns the last element.

Key Points:
• pop() removes and returns the element at the specified index.
• If no index is provided, it removes and returns the last element.
• It modifies the original list in place.
• If the index is out of range, it raises an IndexError.

3. Clear()
• The clear() method is used to remove all elements from the list, effectively emptying it. After
calling clear(), the list becomes an empty list.

2021-2024 Pg. No.54


BRAINALYST - PYTHON

Key Points:
• Clear() removes all elements from the list.
• After calling clear(), the list is empty (contains no elements).
• It modifies the original list in place.
Differences:
• Remove() is used to delete the first occurrence of a specific value from the list.
• pop() removes and returns an element at a specified index, or the last element if no
index is provided.
• Clear() removes all elements from the list, leaving it empty.
Important Note:
• When using remove() and pop(), ensure that the element or index you are removing
exists in the list, or appropriate error handling should be in place to avoid exceptions.
• Clear() is a handy method for clearing all elements from a list when you want to re-
use the list.
Tuples
• Tuples are similar to lists, but unlike lists, they are immutable, meaning their elements can-
not be modified once the tuple is created.

Adding Elements (Concatenation):


• Tuples are immutable, so you cannot directly add elements to an existing tuple. However, you
can concatenate two or more tuples to create a new tuple.

• Finding Index of an Element (index()):


• The index() method is used to find the index (position) of the first occurrence of a specific
element in the tuple. If the element is not found, it raises a ValueError.

Pg. No.55 2021-2024


BRAINALYST - PYTHON

Key Differences Between Lists and Tuples:


• Lists are mutable (you can change their contents), while tuples are immutable (you cannot
change their contents once created).
• Lists are defined using square brackets [...], and tuples are defined using parentheses (...).
• Lists are typically used for collections of items that may need to be modified, while tuples are
used for collections of items that should not be modified.
• Due to their immutability, tuples are generally faster and consume less memory than lists,
making them more suitable for certain use cases.
• Lists have more built-in methods for modification (e.g., append, extend, remove), while tu-
ples have fewer methods since they are not intended to change.
Sort ()
• The sort() method is used to sort the elements of the list in ascending order. You can use the
reverse argument to sort in descending order.

Copy()
• The copy() method is used to create a shallow copy of a list. A shallow copy is a new list that
contains references to the same elements as the original list. In other words, it creates a new
list, but the elements themselves are not duplicated; they still point to the same objects in
memory. This means that changes made to the elements in the original list will also affect the
elements in the copied list and vice versa.

2021-2024 Pg. No.56


BRAINALYST - PYTHON

Syntax:

When to Use copy():


• Creating an Independent Copy: Use copy() when you want to create an independent copy of
a list, meaning you need a new list with the same elements as the original, but you want to
work with them independently.
• Preserving the Original Data: When you want to preserve the original list and work with a
copy, use copy() to avoid unintended modifications to the original data.
• Avoiding Side Effects: In scenarios where changes in one list should not affect another list,
creating a copy using copy() is the way to go.
Use Cases:
• Creating a backup of a list.
• Implementing undo/redo functionality, where you need to maintain the history of data
changes.
• When working with data that should not be accidentally modified, such as configuration set-
tings.
Important Note:
• Copy() creates a shallow copy, which means that if the list contains mutable objects
(e.g., other lists or dictionaries), they will be shared between the original and the cop-
ied list. To create a deep copy, which duplicates the original list as well as all its nested
elements, you can use the copy module’s deepcopy() function.

Pg. No.57 2021-2024


BRAINALYST - PYTHON

Count()
• The count() method is used to count the number of occurrences of a specific element in a list.
It allows you to find out how many times a particular value appears in the list.

Syntax:

• list: The list in which you want to count occurrences.


• value: The element for which you want to count occurrences.

How count() Works:


• The count() method is called on a list and takes one argument, which is the value you want
to count.
• It iterates through the list, element by element, and compares each element to the specified
value.
• If a matching element is found, it increments a counter by 1.
• Once it has iterated through the entire list, it returns the final count.
• Important Notes:
• The count() method returns an integer value representing the count of occurrences. If the
value is not found in the list, it returns 0.
• It’s important to note that count() only counts the direct occurrences of the specified ele-
ment and doesn’t consider nested lists or elements within nested lists. If you have nested
lists, you would need to iterate through them manually to count occurrences.

2021-2024 Pg. No.58


BRAINALYST - PYTHON

Questions 1

Pg. No.59 2021-2024


BRAINALYST - PYTHON

Question 2

2021-2024 Pg. No.60


BRAINALYST - PYTHON

Question 3

Pg. No.61 2021-2024


BRAINALYST - PYTHON

Questions: For list and tuple


Using a For Loop:
• You can also achieve the same result using a for loop. This method is a bit more verbose but can be
useful if you need to perform additional operations along the way.

2021-2024 Pg. No.62


BRAINALYST - PYTHON
• Here, you iterate through each element in L1 and append the result of dividing it by 100 to the new
list L2.
• Regarding the use of the extend method, you cannot use it directly for this specific task because it is
designed to extend a list with the elements of another iterable (e.g., another list). It is not intended
for performing element-wise operations like division. The extend method concatenates two lists,
whereas you want to create a new list with transformed elements. For element-wise operations, list
comprehension or loops are the appropriate tools.

Question 2
• In this code, you iterate through each element in L1 using a for loop. If the element is greater than
20, it is appended to the new list L2. The result will be a list containing elements greater than 20.

Question 3
• To create a new list L2 with elements greater than 20 from the list L1 after dividing each number in
the list by 100, you can use a for loop or list comprehension.

Pg. No.63 2021-2024


BRAINALYST - PYTHON

• In this code, you first create the list L1. Then, you iterate through each element in L1 using a for loop.
For each element, you divide it by 100 and check if the result is greater than 20. If it is, you append
it to the new list L2.

List Comprehension
• List comprehension is a concise and powerful way to create lists in Python. It allows you to create
a new list by applying an expression to each item in an existing iterable (e.g., a list, tuple, or range)
and optionally filtering the items based on a condition.

• Basic Syntax:

• New_list: The resulting list that will be created.


• Expression: The operation or transformation to be applied to each item.
• Item: A variable that represents each item in the iterable.
• Iterable: The original iterable (e.g., list, tuple, range) from which items are taken.
• Condition (optional): A filter that determines whether the item should be included in the new list.

Example 1: Creating a List of Squares:

• In this example, we generate a list of squares from 1 to 5 using a list comprehension. For each
item x in the range, the expression x**2 is applied.

2021-2024 Pg. No.64


BRAINALYST - PYTHON
Example 2: Filtering Even Numbers:

• Here, we create a list of even numbers by filtering the range of numbers to include only those
where x % 2 equals 0.

Using Conditional Expression (Ternary Operator):


• You can also use a conditional expression within the expression part to apply different operations
based on a condition:

Important Notes:
• List comprehensions are efficient and often more readable than traditional for loops.
• Keep list comprehensions concise; they can become hard to read if too complex.
• You can use multiple for and if clauses for more complex operations.

Questions:

Pg. No.65 2021-2024


BRAINALYST - PYTHON

Dictionaries in Python
• A dictionary is a versatile and widely used data structure in Python. It’s a collection of key-value
pairs that provides a way to store, access, and manipulate data efficiently.

Basic Syntax:

• my_dict: The dictionary variable.


• “key1”, “key2”: Keys, which must be unique and immutable (e.g., strings, numbers, or tuples).
• “value1”, “value2”: Values associated with the keys. Values can be of any data type.

2021-2024 Pg. No.66


BRAINALYST - PYTHON

Example:

Pg. No.67 2021-2024


BRAINALYST - PYTHON
Common Operations:
• Looping through a dictionary using for loops or dictionary methods.
• Checking if a key exists in a dictionary.
• Removing key-value pairs using del or pop().
Use Cases:
• Dictionaries are used to represent data with a key-value relationship.
• Ideal for storing settings, configurations, or any data with named attributes.
• Used to count occurrences of items (e.g., word frequencies in text).
• Efficient for data retrieval when the key is known.
Rules and Notes:
• Keys must be unique within a dictionary.
• Keys are typically strings or numbers.
• Values can be of any data type.
• Dictionaries are unordered (as of Python 3.7) but, from Python 3.7+, maintain the
insertion order.
• Dictionaries are mutable; you can modify their contents.
Limitations:
• Dictionaries do not allow duplicate keys. If you use the same key multiple times, only
the last occurrence is retained.
• Keys must be hashable, meaning they are immutable. Lists, other dictionaries, and
most custom objects cannot be used as keys.
Common Mistakes:
• Trying to access a non-existent key without checking its existence using in or get().
• Modifying the dictionary structure during iteration, which can result in unexpected
behavior.
Note:
• Dictionaries don’t have a concept of indexing or position like lists or tuples.
Instead of using numerical indexes, dictionaries use keys to access their values.
• Key-Value Pairs: Dictionaries store data as key-value pairs. Each key is unique
and maps to a specific value. The key is used to access the associated value.
• No Order (Before Python 3.7): In Python versions before 3.7, dictionaries do
not guarantee any specific order of key-value pairs. This means that the order
in which you add items to a dictionary may not be preserved. Accessing ele-
ments by index, like in lists, doesn’t make sense for dictionaries.
• Preserved Order (Python 3.7+): In Python 3.7 and later, the insertion order of
key-value pairs is guaranteed to be preserved. So, when iterating through a dic-
tionary, the items are returned in the order they were added.
• Accessing Values: To access a value in a dictionary, you provide the key inside
square brackets [] or use the get() method. There is no need to use numerical
indexes.
2021-2024 Pg. No.68
BRAINALYST - PYTHON

• Note: dictionary no method to add value we can update or assign value.


you can add or update key-value pairs

Pg. No.69 2021-2024


BRAINALYST - PYTHON

Accessing Elements:
• Access by Key: To access a value in a dictionary, use the key enclosed in square brackets or the get()
method.

• If the key doesn’t exist in the dictionary, accessing it using square brackets will raise a KeyError,
while get() will return None.

Using in to Check for Key Existence:

• You can check if a key exists in a dictionary using the in operator.

Updating Elements:
• To update the value associated with a key, simply access the key and assign a new value to it.

Removing Elements:
• Del Statement: You can use the del statement to remove a key-value pair by specifying the
key.

• pop() Method: The pop() method removes a key-value pair by specifying the key and re-
turns the corresponding value. If the key is not found, you can provide a default value.

2021-2024 Pg. No.70


BRAINALYST - PYTHON

• popitem() Method: The popitem() method removes and returns an arbitrary key-value pair
from the dictionary as a tuple. This method is useful for popping the last item in Python 3.7+.

Notes:
• When using del, if you try to delete a key that doesn’t exist, it will raise a Key-
Error. Using pop() or popitem() with a nonexistent key allows you to provide a
default value or handle it gracefully.

Pg. No.71 2021-2024


BRAINALYST - PYTHON
• Dictionaries are unordered collections, which means the order of key-value
pairs may not be preserved in earlier Python versions (before 3.7). However,
Python 3.7+ maintains the insertion order.
• Make sure to handle cases where the key may not exist, especially when re-
moving items, to avoid errors.

Set
• A set in Python is an unordered collection of unique elements. Unlike a list or tuple, a set does not
allow duplicate values. Sets are commonly used for tasks that involve storing and managing distinct
items.

Key Features of Sets:


• Unordered Collection: Sets don’t maintain the order of elements. When you iterate through a
set, the order in which you added elements is not preserved.
• Uniqueness: Sets only store unique elements. If you attempt to add a duplicate element to a
set, it won’t raise an error, but the element will not be added again.
• Mutable: Sets are mutable, which means you can add or remove elements after creating a set.
• No Indexing: Sets do not support indexing, slicing, or any sequence-like behavior because
they are unordered.
Creating Sets:
• You can create a set using curly braces {} or the set() constructor. For example:

2021-2024 Pg. No.72


BRAINALYST - PYTHON

Adding Elements:
• You can add elements to a set using the add() method:

Removing Elements:
• Use the remove() or discard() method to remove elements. The difference is that remove()
raises a KeyError if the element is not found, while discard() does not:

• Set Operations:
• Sets support various set operations, such as union, intersection, and difference:

Common Use Cases:


• Removing duplicates from a list.
• Checking for the existence of unique elements in a collection.
• Mathematical operations on sets, such as union, intersection, and difference.
• Implementing algorithms like breadth-first search, where you need to keep track of unique
visited nodes.
Differences Between Sets and Dictionaries:
• Sets store single, unique elements, while dictionaries store key-value pairs.
• Sets use curly braces {} or set() to create, while dictionaries use curly braces or dict() and
consist of key-value pairs.
• Sets are unordered and do not support indexing, while dictionaries maintain order and allow
you to access values by their keys.
• When iterating over a dictionary, you access key-value pairs, whereas with sets, you iterate
over individual elements.
Limitations:
• Sets are not hashable, so you cannot have sets of sets.
• You cannot have mutable elements (like lists) within a set because mutable elements may
change and disrupt the set’s uniqueness property.

Pg. No.73 2021-2024


BRAINALYST - PYTHON

In Python sets, there are several set operations that allow you to manipulate and com-
bine sets.
Intersection (&):
• The intersection of two sets contains elements that are common to both sets.
• It is represented using the & operator.
• For example, if you have two sets A and B, A & B will give you a new set containing the ele-
ments that are present in both A and B.

Union (|):
• The union of two sets contains all unique elements from both sets.
• It is represented using the | operator.
• For example, if you have two sets A and B, A | B will give you a new set containing all unique
elements from both sets.

Symmetric Difference (^):


• The symmetric difference of two sets contains elements that are unique to each set, exclud-
ing the elements common to both sets.
• It is represented using the ^ operator.
• For example, if you have two sets A and B, A ^ B will give you a new set containing elements
that are in A or B, but not in both.

Difference (-):
• The difference between two sets contains elements that are in the first set but not in the
second set.
• It is represented using the - operator.
• For example, if you have two sets A and B, A - B will give you a new set containing elements
that are in A but not in B.

2021-2024 Pg. No.74


BRAINALYST - PYTHON

Discard:
• The discard method is used to remove an element from a set, if it exists in the set. If the
element is not in the set, it doesn’t raise an error.
• This method is handy when you want to remove an element from a set, but you’re not sure
if it exists in the set.

User-Defined Functions (UDFs)


• In Python are custom functions created by the user to perform specific tasks. They allow you to en-
capsulate a block of code that can be reused throughout your program.
Defining a UDF:
• You can define a UDF using the def keyword followed by the function name and a set of parentheses.
The function name should follow the same naming rules as variables. You can pass zero or more
parameters (arguments) within the parentheses.

Pg. No.75 2021-2024


BRAINALYST - PYTHON

Rules for UDFs:


• Function names must start with a letter or an underscore (_).
• The rest of the name can contain letters, numbers, and underscores.
• Function names are case-sensitive.
• Function names should not be the same as Python’s reserved words.

Function Body:
• The function body contains a block of code that defines the functionality of the function. It
may include statements, expressions, loops, conditionals, and other Python constructs.

Parameters (Arguments):
• Parameters are values passed to the function when it is called. You can specify parameters
inside the parentheses. Parameters are local variables to the function and can be used within
the function body.

Return Statement:
• A UDF can return a value using the return statement. This value can be of any data type. If the
return statement is omitted, the function returns None.

2021-2024 Pg. No.76


BRAINALYST - PYTHON

Use of UDFs:
• Code Reusability: UDFs allow you to reuse code for repetitive tasks.
• Modularity: They promote code organization by breaking down complex tasks into smaller,
manageable functions.
• Abstraction: UDFs provide a higher level of abstraction, making code more readable and
maintainable.
Limitations:
• Functions can be relatively slow, especially when used in tight loops.
• Nesting too many functions can lead to performance issues and reduced readability.
• Overuse of global variables within functions can make code harder to understand and main-
tain.

• Important Notes:
• UDFs should have a clear purpose and be named descriptively to improve code read-
ability.
• Document your functions using docstrings to explain their purpose and usage.
• Functions should perform a single task or serve a single responsibility (following the
Single Responsibility Principle).

Pg. No.77 2021-2024


BRAINALYST - PYTHON

2021-2024 Pg. No.78


BRAINALYST - PYTHON

*args, and **kwargs

• n Python, you can use the *args and **kwargs constructs in function definitions to work with vari-
able-length argument lists. These are often referred to as “arbitrary argument lists” and can be very
useful when you want to create functions that can accept a varying number of arguments.

Pg. No.79 2021-2024


BRAINALYST - PYTHON

1. args (Arbitrary Positional Arguments):


• The *args syntax allows a function to accept a variable number of non-keyword (positional)
arguments. When you use *args in a function definition, it collects any extra positional argu-
ments into a tuple.

• When you call this function, any extra arguments you pass will be collected into the args
tuple:

2. kwargs (Arbitrary Keyword Arguments):


• The **kwargs syntax allows a function to accept a variable number of keyword arguments.
When you use **kwargs in a function definition, it collects any extra keyword arguments into
a dictionary.

• When you call this function, any extra keyword arguments you pass will be collected into the
kwargs dictionary:

2021-2024 Pg. No.80


BRAINALYST - PYTHON

Common Use Cases:


• *args and **kwargs are frequently used in functions that need to process a variable
number of arguments, making code more flexible.
• They are often used in decorators to wrap functions with additional behavior without
knowing the number or names of the arguments.

Pg. No.81 2021-2024


BRAINALYST - PYTHON
Important Notes:
• The names args and kwargs are not fixed; you can use any names you prefer, but args
and kwargs are conventional.
• When defining a function, regular positional arguments must come before *args, and
*args must come before **kwargs.

Part -8 (02:14:00 – 02:49:00)


Input function
• The input() function in Python is used to take user input from the keyboard during
runtime. It allows you to interact with your Python programs by providing input val-
ues from the command line.

Usage
• The input() function reads a line of text from the user and returns it as a
string.

Rules and Usage:


• Prompt (Optional): You can provide an optional string argument to input(), which is
used as a prompt to the user.

• Reading as String: input() always returns a string. If you want to use the input as a number,
you need to convert it explicitly.

• User Input: The function waits for the user to type something and press “Enter.” Once the
user enters a value and presses “Enter,” the input is read.
Important Note:
• Be cautious when using input(), especially if you plan to use the input for critical
or sensitive operations. Always validate and sanitize user input, especially if it’s
used for security-related tasks.

2021-2024 Pg. No.82


BRAINALYST - PYTHON
Limitations:
• input() is primarily used in console-based Python programs. In graphical user inter-
faces (GUI) or web applications, user input is typically handled differently.

Lambda Functions in Python:


• Lambda functions, often referred to as anonymous functions, are a powerful and concise feature in
Python. They allow you to create small, inline functions without the need to define a full function
using the def keyword.

1. Lambda Function Syntax:


• A lambda function is defined using the lambda keyword, followed by one or more ar-
guments and an expression. The syntax is as follows:

2. Single Expression:
• Lambda functions are limited to a single expression, and the result of the expression
is returned.

Pg. No.83 2021-2024


BRAINALYST - PYTHON

3. Use of Lambda Functions:


• Small Functions: Lambdas are ideal for creating short, simple functions.
• Functional Programming: They are commonly used in functional programming con-
structs like map, filter, and reduce.
• Anonymous Functions: When you need a quick, throwaway function.
4. Rules for Lambda Functions:
• Lambdas can take any number of arguments but can have only one expression.
• The expression is evaluated and returned when the lambda function is called.
• Lambda functions can be assigned to variables and used like regular functions.
5. Lambda vs. Regular Functions:
• Lambda functions are more concise and suitable for simple operations.
• Regular functions defined with def are more versatile and can handle complex oper-
ations and multiple expressions.
6. Use Cases for Lambda Functions:
• Sorting: You can use lambda functions as the key argument in sorting operations to
specify custom sorting criteria.
• Filtering: In combination with filter(), lambdas can be used to filter elements from
iterables.
• Mapping: With map(), lambdas can transform data within an iterable.
7. Limitations and What to Avoid:
• Avoid using lambdas for complex logic; they are meant for simple operations.
• Clarity can be compromised if your lambda becomes too lengthy.
• Overusing lambda functions can make code harder to read.

2021-2024 Pg. No.84


BRAINALYST - PYTHON

• Lambda functions and the map() function are powerful tools that allow you to stream-
line your code by applying functions to iterable data.
• Python’s map() function is a built-in function that’s widely used for applying a spec-
ified function to each item in an iterable (such as a list) and returning a map object
containing the results.

The map() Function:


• The map() function is used to apply a given function to all items in an input iterable (e.g., list, tuple)
and return a new iterable with the results. Its structure is:

• map() is a fantastic choice for transforming data without needing a manual loop.
How map() Works:
• The map() function iterates through the items in the input iterable.
• For each item, it applies the specified function.
• The results are collected and returned as a map object, which can be converted to a list
or another iterable.
Rules and Limitations:
• The applied function should accept the same number of arguments as there are input
iterables.
• map() returns a map object, which can be converted to a list or another iterable for prac-
tical use.
• While efficient for straightforward transformations, map() may not replace loops for
more complex operations.

Pg. No.85 2021-2024


BRAINALYST - PYTHON

Why Use map():


• Conciseness: It enables you to perform operations on all elements in an iterable with just
one line of code.
• Readability: map() makes your code more concise and easier to understand compared to
explicit loops.

Exploring Python’s str Method


• In Python, str stands for “string,” which is a fundamental data type used to represent text. The str
method encompasses a wide range of functionalities for creating, manipulating, and formatting text.

2021-2024 Pg. No.86


BRAINALYST - PYTHON

1. Creating Strings:
• Single Quotes: You can create a string using single quotes like this: ‘Hello, World!’.
• Double Quotes: Strings can also be defined using double quotes: “Hello, Python!”.
• Triple Quotes: Triple quotes (‘’’ or “””) are used for multi-line strings:

2. String Concatenation:
• Strings can be combined using the + operator:

3. String Indexing and Slicing:


• Strings are sequences of characters, and you can access individual characters using indi-
ces. For example:

4. String Methods:
• Python provides a multitude of string methods for various operations. Some commonly
used methods include:
• str.upper(): Converts a string to uppercase.
• str.lower(): Converts a string to lowercase.
• str.strip(): Removes leading and trailing whitespace.
• str.replace(): Replaces occurrences of a substring with another.
• str.split(): Splits a string into a list using a specified delimiter.

Pg. No.87 2021-2024


BRAINALYST - PYTHON

5. Escape Characters:
• You can use escape characters to include special characters within strings. For example:
• \n: Represents a newline.
• \t: Represents a tab.
• \\: Represents a backslash.

2021-2024 Pg. No.88


BRAINALYST - PYTHON

Importing Packages in Python


• Python’s extensive library ecosystem is one of its strengths. You can easily expand Python’s capabil-
ities by importing packages (also known as libraries or modules).
Importing a Package:
• You import a package using the import keyword followed by the package name. For
instance, to import the popular package NumPy, known for numerical operations:

Using an Alias:
• Python allows you to use an alias for the package, which can make your code cleaner.
Common aliases are used to save time when typing. For NumPy, you can use the alias
np:

pip install for External Packages:


• Python has a package manager called pip. It’s used to install external packages not
included in the Python standard library. To install a package using pip, open your
command-line or terminal and run:

Pg. No.89 2021-2024


BRAINALYST - PYTHON

NumPy ndarray:
• NumPy is a fundamental package for scientific computing in Python. Its ndarray (short for n-dimen-
sional array) is a versatile data structure that underpins most numerical operations in Python.

n-dimensional:
• NumPy ndarrays can have any number of dimensions. They are often used for 1D (vectors), 2D
(matrices), and higher-dimensional arrays.
• You can create arrays with different shapes and dimensions, making it flexible for various applica-
tions.

2021-2024 Pg. No.90


BRAINALYST - PYTHON

Homogeneous:
• All elements within a NumPy ndarray must have the same data type (e.g., integers, floats).
• This homogeneity ensures efficient memory usage and optimized performance.

Allow Broadcasting:
• Broadcasting is a powerful feature of NumPy arrays.
• It allows for element-wise operations between arrays of different shapes and dimensions.
• NumPy automatically adjusts the smaller array to match the shape of the larger array during oper-
ations.

Allow Vectorization:
• Vectorization is the process of applying operations to entire arrays without using explicit loops.
• NumPy supports vectorized operations, making code concise and computationally efficient.
• For example, you can add two NumPy arrays together, element by element, without the need for
explicit loops.

Fast and Memory Efficient:


• NumPy is implemented in C and Fortran, making it incredibly fast for numerical computations.
• It uses contiguous blocks of memory for arrays, ensuring efficient memory usage.

Pg. No.91 2021-2024


BRAINALYST - PYTHON

Type Conversion:
np.array():
• NumPy’s np.array() function is used to create an array from a list, tuple, or any iterable
object.
• It automatically infers the data type, making it a versatile way to create NumPy arrays.

In-Built Methods:
np.zeros():
• np.zeros(shape) creates an array filled with zeros.
• The shape is defined as a tuple, specifying the number of elements along each dimension.
np.ones():
• np.ones(shape) creates an array filled with ones.
• Like np.zeros(), you specify the shape as a tuple.

np.full():
• np.full(shape, fill_value) creates an array filled with a specific value (fill_value).
• The shape is defined as a tuple.

np.arange():
• np.arange(start, stop, step) creates an array with evenly spaced values.
• It is similar to Python’s built-in range(), but it generates a NumPy array.

np.linspace():
• np.linspace(start, stop, num) generates an array of evenly spaced values over a specified
range.
• You define the number of values (num) you want, not the step size.

np.random.random():
• np.random.random(size) generates random numbers in the range [0.0, 1.0).
• You specify the shape of the output array using the size parameter.

np.random.randint():
• np.random.randint(low, high, size) generates random integers between low (inclusive)
and high (exclusive).
• The size parameter specifies the shape of the output.

Array Manipulation Methods:


Transpose (array.T):
• Transposes the array, swapping rows and columns.

2021-2024 Pg. No.92


BRAINALYST - PYTHON

• Useful for matrix operations.


• Reshape (array.reshape(new_shape)):

Example:

Type Conversion:
np.array():
• Use it when you need to convert a Python list, tuple, or other iterable into a NumPy array.
• This is especially handy when working with numerical data, as NumPy arrays provide
efficient operations and calculations.

In-Built Methods:
np.zeros(shape):
• When you want to create an array filled with zeros for initialization.
• Useful when setting up arrays for later data insertion or calculation.

Pg. No.93 2021-2024


BRAINALYST - PYTHON

np.ones(shape):
• Great for initializing an array with ones.
• Often used in situations like creating a mask or a starting point for addition.

np.full(shape, fill_value):
• When you need to initialize an array with a specific constant value.
• Useful for tasks such as setting up a grid with predefined values.

np.arange(start, stop, step):


• To generate a range of values with a specified step size.
• It’s handy for creating sequences of numbers to use in iterations.

np.linspace(start, stop, num):


• When you require evenly spaced values over a defined range.
• Commonly used for plotting data with evenly spaced x-values.

np.random.random(size):
• When you need random floating-point values between 0 and 1.
• Useful for generating random data or introducing variability into simulations.

np.random.randint(low, high, size):


• For generating random integers within a specified range.
• Often used in scenarios like shuffling data or selecting random samples.

Array Manipulation Methods:


Transpose (array.T):
• Use it when you need to swap rows and columns, especially when dealing with matrices.
• Valuable in linear algebra, statistics, and other areas requiring data transformation.

Reshape (array.reshape(new_shape):
• When you need to change the shape of an array while keeping the total number of elements con-
stant.
• Useful for preparing data for specific algorithms or reshaping images for deep learning models.

2021-2024 Pg. No.94


BRAINALYST - PYTHON

Creating a One-Dimensional Array: In-build method

Creating a Two-Dimensional Array:

Pg. No.95 2021-2024


BRAINALYST - PYTHON

• In NumPy, np.full is a function used to create an array with a specified shape (dimensions) and fill it
with a constant value. It is a convenient way to initialize arrays with a predefined value.

• np.arange is a function provided by the NumPy library. It is used to create an array with regularly
spaced values within a specified range.
• The limitation of the built-in range function in Python is that it only supports generating sequences
of integer values. This means you can create sequences like range(1, 10) to get integers from 1 to 9,
but you can’t create sequences with floating-point numbers or specify a non-integer step size. For
tasks that involve non-integer values or customized step sizes, range falls short.
• This is where NumPy’s np.arange function comes to the rescue. np.arange provides a more versatile
solution for generating sequences of numbers. You can specify the start, stop, and step values as pa-
rameters, allowing you to create sequences of integers or floating-point numbers with a non-integer
step.

2021-2024 Pg. No.96


BRAINALYST - PYTHON

• np.linspace is a function in NumPy, a popular Python library for numerical and scientific comput-
ing.
Use Cases:
• Creating evenly spaced values: np.linspace is commonly used when you need to create a se-
quence of numbers that are evenly spaced, such as dividing a range into equal intervals.
• Generating time intervals: It’s useful in time series data or simulations where you want to
create evenly spaced time intervals.
• Visualization: It’s frequently used in data visualization to generate data points for plotting
graphs.

Why Use np.linspace:


• Provides control over the number of values: You can specify the number of values you want
in the sequence using the num parameter.
• Handles both endpoints: The endpoint parameter allows you to include or exclude the stop
value, providing flexibility in defining the sequence’s boundaries.
• Generates evenly spaced values: It ensures that the values are uniformly distributed between
start and stop.
• Common in scientific computing: It’s widely used in scientific and engineering applica-
tions to create arrays with specific characteristics.

Limitations:
• Limited to linear sequences: np.linspace generates linear sequences where the
difference between values is constant. It cannot be used to create sequences with
non-linear spacing.a

Pg. No.97 2021-2024


BRAINALYST - PYTHON

np.random.random and np.random.randint are NumPy functions for generating ran-


dom numbers in Python.
np.random.random:
• Function Signature:
• size: The shape of the output array. It specifies how many random numbers you want to gener-
ate. By default, it’s None, which generates a single random float.

Purpose:
• np.random.random is used to generate random float values from a uniform distribution
in the half-open interval [0.0, 1.0). In other words, it generates random floats between 0
(inclusive) and 1 (exclusive).

Use Cases:
• Simulation: It’s often used in simulations and modeling where you need random input
data.
• Random Sampling: When you need to randomly sample data, such as for bootstrapping in
statistics or creating randomized datasets.
Why Use np.random.random:
• Uniform distribution: It provides random numbers uniformly distributed between 0 and
1.
• Reproducibility: You can set the seed for the random number generator to make your
experiments reproducible.

Limitation:
• Limited range: np.random.random only generates random floats in the range [0.0, 1.0),
which is not suitable for all applications.

np.random.randint:
Purpose:
• np.random.randint generates random integers from a discrete uniform distribution.
Use Cases:
• Simulations: It’s widely used in simulations and games where you need random events.
• Random Sampling: When you need random samples for experimentation or generating
test data.

2021-2024 Pg. No.98


BRAINALYST - PYTHON

Why Use np.random.randint:


• Discrete values: It provides random integers, which are often more suitable for specific
applications.
• Control over the range: You can specify the range within which the random integers
should be generated.
Limitation:
• Discrete values: np.random.randint generates integers, which may not be suitable for ap-
plications requiring random floats.

Pg. No.99 2021-2024


BRAINALYST - PYTHON

In Python, you can access data from an array using various techniques.
Indexing:
• Accessing a specific element at a given index.

Slicing:
• Extracting a portion of the array.

2021-2024 Pg. No.100


BRAINALYST - PYTHON

Pg. No.101 2021-2024


BRAINALYST - PYTHON

How to update
• Updating NumPy arrays in a vectorized manner is a powerful and efficient way to
modify array elements or perform operations on them.
• Updating NumPy arrays in a vectorized

1. Scalar Operations:
• You can update all elements of an array by applying a scalar operation. For

2021-2024 Pg. No.102


BRAINALYST - PYTHON

example, adding 5 to all elements in an array arr can be done as:

2. Element-Wise Operations:
• You can perform element-wise operations between arrays of the same
shape. For example, adding two arrays element-wise:

Conditional Updates:
• You can use boolean indexing to update specific elements in an array based on a
condition:

Broadcasting:
• NumPy allows operations between arrays of different shapes, as long as they are compat-
ible. Broadcasting can simplify vectorized updates. For example, adding a 1D array to a
2D array:

Pg. No.103 2021-2024


BRAINALYST - PYTHON

2021-2024 Pg. No.104


BRAINALYST - PYTHON

Pg. No.105 2021-2024


BRAINALYST - PYTHON

You can combine data using arrays and NumPy’s hstack and vstack functions to hori-
zontally and vertically stack arrays, respectively.
1. hstack (Horizontal Stack):
• hstack is used to horizontally stack (concatenate along columns) multiple arrays. This is of-
ten used when you want to combine data with the same number of rows but different attri-
butes or features.

2. vstack (Vertical Stack):


• vstack is used to vertically stack (concatenate along rows) multiple arrays. It’s typically used
when you want to stack data with the same number of columns but different observations or
data points.

2021-2024 Pg. No.106


BRAINALYST - PYTHON

Sequence of operation

• Operators are evaluated following a specific order known as the “order of operations.” This order
ensures that mathematical expressions are evaluated correctly. The order of operations can be re-
membered using the acronym PEMDAS (Parentheses, Exponents, Multiplication and Division, Ad-
dition and Subtraction), or BODMAS (Brackets, Orders, Division and Multiplication, Addition and
Subtraction).

Pg. No.107 2021-2024


BRAINALYST - PYTHON

Breakdown of the order in which arithmetic operations are evaluated:


• Parentheses (P): Expressions enclosed in parentheses are evaluated first.
• Exponents (E): Exponentiation operations (raising a number to a power) are performed next.
• Multiplication and Division (M and D): Multiplication and division are evaluated from left to right.
This means that if there are multiple multiplication and division operations in an expression, they
are performed in the order they appear.
• Addition and Subtraction (A and S): Finally, addition and subtraction are performed from left to
right. Like multiplication and division, if there are multiple addition and subtraction operations,
they are executed in the order they appear.

Questions- assignment

2021-2024 Pg. No.108


BRAINALYST - PYTHON

Pg. No.109 2021-2024


BRAINALYST - PYTHON

Part – 3 (00:41:00 – 00:48:00)


What we learned

Properties

Pandas library, you can work with two primary data structures: Series and Data-
Frame.
Series:
• A Series is a one-dimensional labeled array capable of holding data of any type.
• It’s similar to a column in a spreadsheet or a single-variable dataset.
• Series has both data and labels (index), making it easy to perform data manipulations.
• You can create a Series from a Python list, array, or dictionary.
• Series is homogeneous, meaning it can contain data of the same data type.
• It’s also size-immutable; you cannot change the size of a Series once it’s created.
• You can access elements using labels or integer-based indices.
• Common operations on Series include data selection, filtering, and aggregation.
2021-2024 Pg. No.110
BRAINALYST - PYTHON

• You can perform element-wise operations between Series, and they will align based on the
index.
• Series is the building block of a DataFrame, where each column is essentially a Series.

DataFrame:
• A DataFrame is a two-dimensional, size-mutable, and heterogeneous tabular data structure.
• It’s similar to a spreadsheet or a SQL table.
• DataFrames are commonly used for data manipulation, cleaning, exploration, and analysis.
• Each column in a DataFrame is a Series.
• DataFrames have both rows and columns, with each row representing a record and each
column a variable.
• You can create a DataFrame from various data sources, including dictionaries, lists, Series,
and external files (e.g., CSV or Excel).
• DataFrames are versatile, supporting various data types within the same structure.
• They allow for easy indexing, selection, and filtering of data.
• DataFrames can be transposed (rows become columns, and vice versa).
• You can merge, join, or concatenate DataFrames to combine data from different sources.
• DataFrames provide powerful functionality for data analysis and manipulation, such as
groupby operations, pivot tables, and time series analysis.

Properties of Series:
• Homogeneous: Series contains data of a single data type.
• Indexed: Each element in a Series has an associated label (index).
• Size-immutable: You cannot change the size of a Series once created.
• Element-wise operations: You can apply operations to each element in a Series.

Properties of DataFrames:
• Heterogeneous: DataFrames can hold data of different data types.
• Tabular structure: DataFrames have a two-dimensional structure.
• Indexed: Both rows and columns have labels (row and column indices).
• Size-mutable: You can add or remove rows and columns.
• Versatile: DataFrames support various data manipulation and analysis operations.

Pg. No.111 2021-2024


BRAINALYST - PYTHON

Creating a Series:
• You can create a Series in Python using the Pandas library.
• Common ways to create a Series:
• From a Python list: my_series = pd.Series([1, 2, 3, 4])
• From a NumPy array: my_series = pd.Series(np.array([1, 2, 3, 4]))
• From a dictionary: my_series = pd.Series({‘A’: 1, ‘B’: 2, ‘C’: 3})
• Series is homogeneous; it holds data of the same data type.

2021-2024 Pg. No.112


BRAINALYST - PYTHON

• Note: default index – system generated, we cant update/modify


• User defined index: user defined , we can customize if user has not given any UDI, UDI will
be same as DI

Pg. No.113 2021-2024


BRAINALYST - PYTHON

Accessing Data:
• You can access data in a Series using indexing.
• To update elements in a Pandas Series, you can assign new values to specific indices or labels.
• Access by label: my_series[‘A’] (if the index is a label)
• Access by position: my_series[0] (if the index is a position)
• You can also use slicing to access a range of elements: my_series[1:3]
• Note: Access element start from 0, use default index used.

2021-2024 Pg. No.114


BRAINALYST - PYTHON

Updating Data:
• You can update the data in a Series using indexing.
• Change a value by label: my_series[‘A’] = 10
• Change a value by position: my_series[0] = 10

Pg. No.115 2021-2024


BRAINALYST - PYTHON

Python’s Pandas library, iloc and loc are used for accessing elements in DataFrames
and Series.
1. iloc (Integer Location):
• iloc is primarily used for integer-based indexing. You can access elements by their integer
position within the DataFrame or Series.
• The indexing starts at 0, similar to standard Python indexing.
• The syntax for iloc is data.iloc[row, column].

Properties of iloc:
• It uses integer-based index positions.
• You can use integer slices and lists for selection.
• It doesn’t include the endpoint of slices (similar to Python slicing).
• It allows you to select specific rows and columns by their numeric positions.

Use Cases for iloc:


• When you want to access data using integer-based positions.
• For numerical operations or when you know the exact position of the data you need.
• To perform slicing operations using integer positions.

2. loc (Label-based Location):


• loc is used for label-based indexing, meaning you can access elements using their la-
bels or indices.
• The syntax for loc is data.loc[row_label, column_label].
Properties of loc:
• It uses label-based indices.
• It includes the endpoint of slices (unlike iloc).
• You can use labels or label-based conditions for data selection.
• You can also use Boolean indexing to filter data based on labels.

Use Cases for loc:


• When you want to access data using labels or indices.

2021-2024 Pg. No.116


BRAINALYST - PYTHON

• For selecting data based on specific criteria or conditions.


• When working with labeled datasets and index names.

• Here’s a quick comparison:

• Use iloc when you need to access data by integer position.


• Use loc when you want to access data by labels or indices, and for label-based condi-
tions or criteria.

Pg. No.117 2021-2024


BRAINALYST - PYTHON

• Note: Its only work for iloc is defauld index so use loc for text

2021-2024 Pg. No.118


BRAINALYST - PYTHON

Pg. No.119 2021-2024


BRAINALYST - PYTHON

Indexing:
• Series has two main components: data and an index.
• The index is like a label for each data point.
• You can customize the index while creating the Series.

In-Depth Methods:
• Series offers methods for various data operations:
• .head(): View the first few elements.
• .tail(): View the last few elements.
• .describe(): Get summary statistics.
• .max(): Find the maximum value.
• .min(): Find the minimum value.
• .mean(): Calculate the mean.
• .sum(): Calculate the sum.
• .value_counts(): Count unique values.

Properties:
• Homogeneous: Series holds data of a single data type.
• Indexed: Each element in a Series has an associated label (index).
• Size-immutable: You cannot change the size of a Series once created.
• Element-wise operations: You can apply operations to each element in a Series.

Limitations:
• Size: Limited to available memory.
• Homogeneous: Data type must be the same across all elements.
• Immutability: You cannot change the size or data type of a Series after creation.

2021-2024 Pg. No.120


BRAINALYST - PYTHON

Load the Dataset:


• Use libraries like Pandas to load the dataset into a DataFrame.
• Inspect the first few rows using head() to get a sense of the data.

Data Exploration:
• Check the dimensions of the dataset using shape to see how many rows and columns
are present.
• Use info() to get information about the data types and missing values.
• Describe the data statistically with describe() to get summary statistics.
• Check for missing values using isna() or isnull() combined with sum().

Data Cleaning:
• Handle missing values by either removing rows or imputing values.
• Remove duplicates if they exist.
• Correct data types if necessary, like converting strings to numbers.
• Rename columns for clarity.
• Handle outliers if needed.

Data Analysis:
• Perform various analyses depending on the dataset and your objectives. This can in-
clude:
• Aggregations (grouping, summing, averaging)
• Filtering data based on criteria
• Applying statistical tests or machine learning models

Data Visualization:
• Create visualizations to gain insights. Use libraries like Matplotlib or Seaborn for this.
• Common plots include histograms, bar charts, scatter plots, and heatmaps.

Exporting Data:
• If you’ve made changes to the dataset, save it to a new file.

Pg. No.121 2021-2024


BRAINALYST - PYTHON

What is csv file and excel file difference?


File Format:
• CSV: It is a plain text file where values in each row are separated by commas (or other delimiters
like semicolons or tabs). CSV files have a .csv extension.
• Excel: Excel files are binary files created and used by Microsoft Excel. They have extensions like
.xls, .xlsx, and .xlsm.

• Import data from a CSV (Comma-Separated Values) file using the Pandas library.

Read the CSV File:


• Use the pd.read_csv() function to read the CSV file. You need to pass the file path as an argu-
ment to this function.

2021-2024 Pg. No.122


BRAINALYST - PYTHON

View the Data:


• You can view the contents of the DataFrame by simply printing it.

Check type of data


• Check number of dimension
• Check shape( no. rows and no. column)

Understand the process of EDA, understand data is the first step-practical implemen-
tation
• Data import: step 1

Pg. No.123 2021-2024


BRAINALYST - PYTHON

Simply methods: 2 step


• Exploratory Data Analysis (EDA) process in Python to understand and work with your data-
set. EDA is crucial for getting insights into your data and preparing it for further analysis.

type():
• The type() function helps you determine the type of your data structure. For example, it can
tell you if you’re dealing with a Pandas DataFrame or Series.

• Limitations: It only provides basic information about the data type.


• Use Case: Useful to quickly check the type of a variable.

dtype:
• The dtype attribute is used to check the data type of the elements in a Pandas Series or Data-
Frame. It’s essential to ensure your data is interpreted correctly.

• Limitations: It won’t show you the data type of multiple columns at once.
• Use Case: Vital for confirming data types, especially for numerical operations.

shape:
• The shape attribute returns a tuple representing the dimensions of your DataFrame. It’s use-
ful for understanding the size of your dataset in terms of rows and columns.

• Limitations: It doesn’t give details about individual columns.


• Use Case: Crucial to understand the dataset’s structure and size.

2021-2024 Pg. No.124


BRAINALYST - PYTHON

count():
• The count() method provides the count of non-null elements in each column. It’s crucial for
identifying missing data points.

• Limitations: It only provides names and not column properties.


• Use Case: Helps access and manage specific columns.

columns:
• The columns attribute lists the column names of your DataFrame. It’s handy for selecting
specific columns or understanding the dataset’s structure.

• Limitations: It only provides names and not column properties.


• Use Case: Helps access and manage specific columns.
ndim():
• The ndim() method returns the number of dimensions of your DataFrame. It’s usually 2 for
a DataFrame and 1 for a Series.

• Limitations: May not display complete information for large datasets.


• Use Case: Provides an initial overview of the dataset’s structure.

info():
• The info() method offers a concise summary of your DataFrame, including column data
types and non-null counts. It’s a great initial overview of your dataset.

• Limitations: May not display complete information for large datasets.


• Use Case: Provides an initial overview of the dataset’s structure.

Pg. No.125 2021-2024


BRAINALYST - PYTHON

head() and tail():


• head() and tail() methods display the first or last n rows of your DataFrame. They’re excel-
lent for quickly glancing at your data’s structure and values.

• Limitations: Limited to displaying a fixed number of rows.


• Use Case: Initial data exploration and understanding.

describe():
• The describe() method generates summary statistics of your data, including count, mean,
standard deviation, minimum, and maximum values. It provides valuable insights into the
distribution of your numerical data.

• Limitations: Only applies to numerical columns.


• Use Case: Useful for gaining insights into the central tendency and spread of data.

2021-2024 Pg. No.126


BRAINALYST - PYTHON

• count: This is the count of non-missing (non-null) values for each column. It tells you how
many data points are available for each numerical column.
• mean: The mean (average) of the data in each column. It gives you an idea of the central val-
ue around which the data is distributed.
• std: The standard deviation, which measures the spread or dispersion of the data. It tells you
how much individual data points typically deviate from the mean.
• min: The minimum value in each column, which is the smallest observed value.
• 25%: The 25th percentile (1st quartile) value, indicating that 25% of the data points are less
than or equal to this value. It’s a measure of data distribution.
• 50%: The 50th percentile (2nd quartile or median) value. It represents the middle value of
the data when arranged in ascending order.
• 75%: The 75th percentile (3rd quartile) value. It’s another measure of data distribution.
• max: The maximum value in each column, which is the largest observed value.

Step 3 : data
Methods:
• Methods are functions that you can call on an object (like a DataFrame or a Series).
• They are typically followed by parentheses, e.g., head(), describe(), info().
• Methods perform some action on the object, and they may accept arguments within the parenthe-
ses to modify their behavior.
• For example, head() is a method that returns the first few rows of a DataFrame. You can specify the
number of rows you want to see by providing an argument like head(10) to see the first 10 rows.
Attributes:
• Attributes are values or properties associated with an object. They provide information about the
object but don’t perform actions.
• Attributes are accessed without parentheses, e.g., shape, dtypes, columns.

Pg. No.127 2021-2024


BRAINALYST - PYTHON

• Attributes provide information or characteristics of the object they are attached to. For instance,
shape is an attribute that tells you the dimensions (rows and columns) of a DataFrame.

What is subsetting
• Data cleaning in Python often involves subsetting, which means selecting specific columns from a
DataFrame or rows based on certain conditions. There are several ways to achieve this in Pandas,
a popular Python library for data manipulation. Let’s explore the different methods for subsetting:

4 different ways

Using Square Brackets []:


• You can subset a DataFrame by selecting one or more columns using square brackets.
• This method is straightforward and is suitable when you know the column names.

Using .loc[]:
• The .loc[] method allows you to select rows and columns by label (column names and row
indices).
• You can specify both row and column labels using .loc[].

2021-2024 Pg. No.128


BRAINALYST - PYTHON

Using .iloc[]:
• The .iloc[] method is used for integer-based indexing. It allows you to select rows and col-
umns by integer positions.
• This method is useful when you want to select data by its numerical position.
Selecting by Column Name:
• You can also select columns by directly referencing their names.
• Note: Remember to replace ‘Column_Name’, ‘Column1’, ‘Column2’, 0, 1, 2, and 1:3 in the
examples with the actual column names or integer positions you want to select. Sub-
setting allows you to focus on specific parts of your data for analysis, visualization, or
further processing, which is a crucial step in data cleaning and preparation.

Pg. No.129 2021-2024


BRAINALYST - PYTHON

2021-2024 Pg. No.130


BRAINALYST - PYTHON

Pg. No.131 2021-2024


BRAINALYST - PYTHON

• In Python, particularly when working with Pandas DataFrames, the rename function is used
to change the labels of the rows or columns. It’s a powerful tool for data preprocessing and
cleaning.

• mapper: This parameter allows you to specify the mapping of the old labels to the new la-
bels. It can be a dictionary, a function, or None (default).

• index and columns: These parameters let you specify whether you want to rename the row
(index) labels or column labels. You should choose one or the other.

2021-2024 Pg. No.132


BRAINALYST - PYTHON

• axis: This parameter is an alternative to using index and columns. It can be set to 0 for index
labels or 1 for column labels.

• copy: By default, a new DataFrame with the updated labels is returned. If you set copy to
False, the original DataFrame is modified in place.

• inplace: If set to True, the original DataFrame is modified in place (overwrites the existing
one). If False (default), a new DataFrame is returned.

• level: When working with MultiIndex DataFrames, this parameter allows you to specify the
level you want to rename.
Rules:
• When using rename, you can either specify a dictionary to map old labels to new labels, or
you can provide a function that transforms the labels.
• If you want to rename both rows and columns, you can use mapper to specify the changes
for both. If you want to rename only rows or columns, you can use the index and columns
parameters.
• The function doesn’t directly affect the original DataFrame unless you set inplace to True or
assign the result back to the original DataFrame.
• The level parameter is used when working with MultiIndex DataFrames, allowing you to re-
name specific levels within the index.

Limitations:
• The rename function is generally limited to changing labels; it doesn’t perform complex data
transformations.
• It might not be the most efficient choice for very large DataFrames due to the need to create
a copy.
• If labels are not unique, renaming could lead to unexpected results.

When to Use:
• Use rename when you need to make your DataFrame’s row or column labels more meaning-
ful.
• It’s helpful for data preprocessing when the original labels are not descriptive or need to be
standardized.
• You might also use it when working with MultiIndex DataFrames to change level names.

• Particularly when working with Pandas DataFrames, the drop operation is used to
eliminate one or more columns from a DataFrame. Here’s a comprehensive expla-
nation of the drop operation:

Pg. No.133 2021-2024


BRAINALYST - PYTHON

• labels: This parameter specifies what to drop. It can be a single label or a list of labels, indi-
cating the columns to remove.

• axis: By default, it’s set to 0, which means dropping rows. If you want to drop columns, set it
to 1.

• index and columns: These parameters are alternatives to specifying the axis. Use index for
rows and columns for columns.

• level: When working with MultiIndex DataFrames, you can specify the level at which to drop
labels.

• inplace: If set to True, the original DataFrame is modified in place, and nothing is returned. If
False (default), a new DataFrame with the specified columns removed is returned.

• errors: This parameter defines how to handle labels that are not found. The default ‘raise’
will raise an error. ‘ignore’ will suppress the error.

Explanation:
Dropping Columns:
• When you want to remove one or more columns from a DataFrame, set the axis or use the
columns parameter.
• Specify the names of the columns you want to drop using the labels parameter.
• If you set inplace to True, the original DataFrame is modified; otherwise, a new Data-
Frame with the specified columns removed is returned.

2021-2024 Pg. No.134


BRAINALYST - PYTHON

Dropping Rows:
• To eliminate rows, use the axis parameter or the index parameter. By default, axis is set
to 0 for rows.
• Specify the row labels you want to drop using the labels parameter.
• Similar to column dropping, setting inplace to True modifies the original DataFrame.
Rules:
• The drop method allows you to remove specific rows or columns by label, providing you
with flexibility in data cleaning and preprocessing.
• You can use the labels parameter to specify the labels you want to drop, and it can be a
single label or a list of labels.
• Be cautious when using inplace=True as it modifies the original DataFrame, which could
be irreversible.

Use Cases:
• Data Cleaning: Remove irrelevant or redundant columns to simplify data analysis.
• Data Preprocessing: Drop rows with missing or incorrect data.
• Selective Data Extraction: Create a new DataFrame by excluding specific columns.
• Preparing Data for Modeling: Eliminate target columns when preparing data for machine
learning tasks.

• In Python, particularly when working with Pandas, filtering is a crucial opera-


tion for selecting specific rows or columns from a DataFrame based on certain
conditions.

Filtering Rows:
• To filter rows based on specific conditions, you can use boolean indexing. Here’s a break-
down:

Boolean Indexing:
• Boolean indexing is the process of selecting rows from a DataFrame based on a condition.
• You create a boolean mask, a series of True/False values, where each value corresponds
to whether the condition is met for that row.

Pg. No.135 2021-2024


BRAINALYST - PYTHON

2021-2024 Pg. No.136


BRAINALYST - PYTHON

• In Python, particularly when working with Pandas, the in operator is used to check
for the presence of a value within a Series or DataFrame.

Checking for Membership:


• The in operator in Pandas is used to check if a specific value exists in a Series or DataFrame. It re-
turns a Boolean Series or DataFrame, indicating whether each element in the Series or DataFrame is
equal to the value you’re searching for.

Here’s how it works:


For Series:
• When you use the in operator on a Pandas Series, it returns a Boolean Series with True for
elements that match the specified value and False for those that don’t.

Pg. No.137 2021-2024


BRAINALYST - PYTHON

For Data Frames:


• When working with DataFrames, you can use the in operator to check if a value is present in
any column.

Use Cases:
• Filtering: You can use the in operator to filter rows based on the presence of specific values
in a column.

• Membership Checks: It’s helpful for checking if a value exists in a dataset before performing
operations on it.

Rules:
• The in operator is case-sensitive. Ensure the value’s case matches the case in your data.
• Use the any() or all() function along with the in operator for more complex conditions when
working with DataFrames.

Use Cases:
• Filtering: You can use the in operator to filter rows based on the presence of specific values
in a column.
• Membership Checks: It’s helpful for checking if a value exists in a dataset before performing
operations on it.
Rules:
• The in operator is case-sensitive. Ensure the value’s case matches the case in your data.

2021-2024 Pg. No.138


BRAINALYST - PYTHON

• Use the any() or all() function along with the in operator for more complex conditions when
working with DataFrames.
Limitations:
• The in operator is primarily used for exact matches. For more advanced searches, you might
need other methods like regular expressions.

• Sorting in Pandas is a fundamental data manipulation operation that allows you


to arrange your data in a specific order. Sorting is particularly useful for organiz-
ing data in a way that makes it easier to analyze and draw insights from it.

• s.sort_values(ascending=True) sorts the Series in ascending order, which is the default be-
havior.
• To sort in descending order, you can use s.sort_values(ascending=False).

Use Cases:
• Data Exploration: Sorting helps you explore data effectively by arranging it based on relevant
columns.
• Data Presentation: For presenting data in a readable manner, especially in tables and reports.
• Data Filtering: You can use sorting as a preliminary step to filter data based on specific con-
ditions.
Methods:
• .sort_values(): The primary method for sorting in Pandas, available for both Series and Data-
Frames.
• .sort_index(): Sorts by the index (row labels) instead of values.
Parameters:
• by: Specifies the column(s) by which to sort.
• ascending: Determines the sorting order (default is ascending).
• inplace: Modifies the original data if set to True.
• axis: For DataFrames, you can choose to sort rows (axis=0) or columns (axis=1).
Limitations:
• Sorting can be resource-intensive for large datasets. It’s essential to consider performance
when sorting extensive data.
Pg. No.139 2021-2024
BRAINALYST - PYTHON

2021-2024 Pg. No.140


BRAINALYST - PYTHON

Handling Duplicates:
• Dealing with duplicate values is common when working with real-world data.
Detecting Duplicates:
• df.duplicated(subset=None, keep=’first’): Returns a boolean Series indicating duplicate rows.
• subset specifies the columns to consider for duplicates.
• keep determines which duplicates to mark (‘first’, ‘last’, or ‘False’).

Removing Duplicates:
• df.drop_duplicates(subset=None, keep=’first’, inplace=False): Removes duplicate rows.
• subset specifies the columns to consider for duplicates.
• keep determines which duplicates to keep.
• inplace=True modifies the DataFrame in place.

Counting Duplicates:
• df.duplicated().sum(): Counts the total number of duplicates.
Use Cases:
• Sorting is useful for organizing data for analysis and presentation.
• Handling duplicates ensures data accuracy and consistency.
Limitations:
• Sorting can be resource-intensive for large datasets.
• Handling duplicates may impact data loss, so consider the implications.
Best Practices:
• When handling duplicates, understand the data and business context to decide which dupli-
cates to keep or remove.
• Before sorting, make sure you know the data’s characteristics and choose the appropriate
columns for sorting.

Pg. No.141 2021-2024


BRAINALYST - PYTHON

Question

2021-2024 Pg. No.142


BRAINALYST - PYTHON

Outliers
• Detecting outliers using various methods like IQR (Interquartile Range), percentiles, and standard
deviation (z-scores) is a common data analysis task.
1. Outlier:
• An outlier is a data point that significantly deviates from the rest of the data in a dataset. It
can be either an unusually small or unusually large value.
2. IQR (Interquartile Range) Method:
• The IQR is a measure of statistical dispersion calculated as the difference between the third
quartile (Q3) and the first quartile (Q1) of the data.
• IQR = Q3 - Q1.

Pg. No.143 2021-2024


BRAINALYST - PYTHON

3. Percentile Method:
• Percentiles divide the data into equal parts. The median, for example, is the 50th percentile.
• You can use a specific percentile value (e.g., 90th percentile) to set a threshold for outliers.

4. Standard Deviation (z-scores) Method:


• The standard deviation measures the spread of data.
• Z-scores quantify how many standard deviations a data point is from the mean (average).
• Data points with z-scores far from the mean may be considered outliers.

Calculations for IQR Method:


• Calculate Q1 (the 25th percentile) and Q3 (the 75th percentile) of the data.
• Calculate IQR = Q3 - Q1.
• Define lower bound (Lower Cutoff) as Q1 - 1.5 * IQR and upper bound (Upper Cutoff)
as Q3 + 1.5 * IQR.
• Any data point below Lower Cutoff or above Upper Cutoff is considered an outlier.

Calculations for Percentile Method:


• Choose a specific percentile value (e.g., 90th percentile).
• Define a threshold based on this percentile value (Lower Cutoff or Upper Cutoff).
• Any data point exceeding this threshold is considered an outlier.

Calculations for Standard Deviation Method:


• Calculate the mean (average) and standard deviation of the data.
• Calculate the z-score for each data point: Z = (Data Point - Mean) / Standard Devia-
tion.
• Define a threshold based on the number of standard deviations from the mean (Low-
er Cutoff and Upper Cutoff).
• Any data point with a z-score below Lower Cutoff or above Upper Cutoff is consid-
ered an outlier.

Limitations and Rules for Identifying Outliers:


• The choice of method and thresholds depends on the dataset and the specific analy-
sis you are performing.
• No single method works best for all situations, and it’s often useful to use multiple
methods to cross-validate results.
• Common threshold values include 1.5 or 2 times the IQR for the IQR method, a spe-
cific percentile value for the Percentile method, and z-scores above a certain value
(e.g., 2 or 3) for the Standard Deviation method.
• Outliers should be carefully evaluated; they may be data errors or indicate valuable
insights.
• Outliers should be treated according to the context; you can remove, transform, or
analyze them separately depending on your analysis goals.

2021-2024 Pg. No.144


BRAINALYST - PYTHON

Pg. No.145 2021-2024


BRAINALYST - PYTHON

• Treating outliers in a dataset is an important step in data preprocessing to ensure


that they don’t unduly influence statistical analyses or machine learning models.
There are several methods to handle outliers in Python, including clipping, transfor-
mation, and removal.
• Clipping Method for Outlier Treatment:
• Clipping is a simple and effective method to handle outliers. It involves setting a low-
er and upper threshold range for values in your dataset. Any data point that falls
below the lower threshold is set to the lower threshold value, and any data point
that exceeds the upper threshold is set to the upper threshold value. This effectively
limits the range of data points to be within a specific range.

How to Clip Outliers in Python:


• In Python, you can use the numpy library to apply the clipping method. The numpy.clip() function
allows you to clip values in an array to a specified range.

2021-2024 Pg. No.146


BRAINALYST - PYTHON

Rules for Clipping Outliers:


• Choose Appropriate Thresholds: Select the lower and upper thresholds based on your data
and analysis goals. You can set them according to domain knowledge or statistical criteria.
• Understand Data Impact: Consider the impact of clipping on your data. Clipping too aggres-
sively may result in data loss or information distortion, so use it judiciously.
• Preserve Data Characteristics: Be aware that clipping doesn’t eliminate outliers; it truncates
them. Ensure that the data retains its statistical properties as needed for your analysis.

Limitations and Considerations:


• Clipping is a simple and quick method to handle outliers, but it may not always be the most
appropriate. Depending on the context, other methods like transformation or statistical tests
may be more suitable.
• It doesn’t provide insights into the nature or cause of outliers; it simply constrains their val-
ues.
• Clipping should be used with caution, and the choice of thresholds should be based on a clear
understanding of the data and analysis requirements.

Additional Note:
• Clipping is just one of many methods to handle outliers. Other methods include:
• Transformation (e.g., log transformation, square root transformation) to reduce the impact
of extreme values.
• Winsorizing, which sets the extreme values to a specified percentile value.
• Outlier removal, where extreme values are removed from the dataset.
• Robust statistical methods, which are less affected by outliers (e.g., the median instead of the
mean).
• The choice of method depends on the specific characteristics of your data and the goals of
your analysis.

Practical implementation – outlier

Pg. No.147 2021-2024


BRAINALYST - PYTHON

2021-2024 Pg. No.148


BRAINALYST - PYTHON

• Box plots, also known as box-and-whisker plots, are a useful visualization for identifying and
understanding outliers in a dataset. They provide a visual summary of the distribution of a
dataset and help you identify extreme values.

1. Creating a Box Plot in Python:


• You can create a box plot in Python using libraries like Matplotlib or Seaborn. Here, we’ll use
Matplotlib for simplicity.

• This code will generate a basic box plot of your data.

2. Understanding the Components of a Box Plot:


• Box (IQR): The box in the middle of the plot represents the Interquartile Range (IQR). It spans
from the first quartile (Q1) to the third quartile (Q3). The length of the box shows the spread of
the middle 50% of the data.
• Median (Q2): The line inside the box is the median (Q2), which is the middle value of the dataset
when it is ordered.
• Whiskers: The whiskers extend from the box to the minimum and maximum values within a
specified range.
• Outliers: Data points outside the whiskers are considered outliers and are displayed as individ-
ual points.

Pg. No.149 2021-2024


BRAINALYST - PYTHON

3. Identifying Outliers:
• Outliers are data points that fall below Q1 - 1.5 * IQR or above Q3 + 1.5 * IQR. These are data
points that significantly deviate from the central 50% of the data.
• In the box plot, outliers are shown as individual points outside the whiskers. They can be above
(upper outliers) or below (lower outliers) the whiskers.
4. Reading a Box Plot:
• If the data is symmetrically distributed:
• The box is roughly centered within the whiskers.
• The median line is in the middle of the box.
• Outliers may be present, but they are evenly distributed on both sides of the whiskers.
If the data is skewed:
• The box may be shifted to one side of the whiskers.
• The median may be closer to the thicker part of the box.
• Outliers may be clustered on one side of the whiskers.
• The length of the box (IQR) and the spread of the whiskers indicate the data’s spread and
variability.
5. How to Define Outliers Using a Box Plot:
• As mentioned earlier, outliers are defined as data points below Q1 - 1.5 * IQR or above Q3 + 1.5
* IQR.
• It’s important to choose a specific threshold (e.g., 1.5 or 3) based on the context of your analysis.
6. Interpreting the Box Plot:
• A box plot helps you quickly visualize the central tendency, spread, and the presence of outliers
in your data.
• The length of the box and whiskers provides information about data variability.
• Outliers are clearly displayed, making it easy to identify extreme values.

2021-2024 Pg. No.150


BRAINALYST - PYTHON

• In Python, when working with pandas, finding and handling missing values is a crucial part of
data preprocessing. You can use various methods and attributes provided by pandas to locate
missing values in Series and DataFrames.

1. Detecting Missing Values in a Series:


• In pandas, you can use the .isna() and .isnull() methods interchangeably to check for missing
values in a Series (a single column of data). Both methods return a Series of Boolean values, in-
dicating whether each element is missing or not.
• Series.isna() returns True for missing values and False for non-missing values.
• Series.isnull() is an alias for Series.isna() and provides the same result.

Pg. No.151 2021-2024


BRAINALYST - PYTHON

• This code will return a Boolean Series with True values indicating the positions of missing values.

2. Detecting Missing Values in a DataFrame:


• In a DataFrame (a tabular data structure with multiple columns), you can use the .isna() and .isnull()
methods to identify missing values. However, it’s more common to use the .notna() and .notnull()
methods, which are complementary. These methods also return Boolean DataFrames but indicate
whether each element is not missing.
• DataFrame.isna() and DataFrame.isnull() return True for missing values and False for non-missing
values.
• DataFrame.notna() and DataFrame.notnull() return True for non-missing values and False for miss-
ing values.

• This code will return two DataFrames: one indicating missing values and the other indicating
non-missing values.

3. Handling Missing Values:


• Once you have identified missing values, you can handle them using methods like:

2021-2024 Pg. No.152


BRAINALYST - PYTHON

• .dropna(): Remove rows or columns with missing values.


• .fillna(): Fill missing values with a specific value or strategy, like mean or median imputation.

Calculate the Percentage of Missing Values


• To calculate the percentage of missing values in each column, you can use the following code:
• Calculating the percentage of missing values in a dataset is a useful step in data preprocessing. This
information helps you understand the extent of missing data and decide how to handle it effectively.
You can calculate the percentage of missing values for each column in a pandas DataFrame.

• data.isnull().sum() counts the number of missing values in each column.


• len(data) provides the total number of rows in the DataFrame.
• By dividing the number of missing values by the total number of rows and multiplying by 100, you
get the percentage of missing values.

Pg. No.153 2021-2024


BRAINALYST - PYTHON

• Handling missing values is a critical part of data preprocessing in Python, as they can adversely af-
fect the results of your analysis or machine learning models. The choice of a missing value treatment
method depends on the nature of the data and your specific analysis goals.

1. Removing Missing Values (Deletion):


• Method: This involves removing rows or columns with missing values.
• When to Use: This method is suitable when the missing values are randomly distributed,
and you have a large dataset. It’s commonly used with Time Series data.
• Rules: Consider removing a column with more than a specific threshold (e.g., 50%) of
missing values.
• Carefully assess the impact on the analysis as removing data may lead to loss of informa-
tion.
• Limitations: It can lead to a loss of valuable information, especially if missing data is not
missing at random.
2. Imputation:
• Method: Imputation involves filling missing values with estimated or calculated values.
• When to Use: Imputation is a common method when data is missing at random and you
want to retain the entire dataset.
• Rules: For numerical data, you can use the mean, median, or mode of the column.
• For categorical data, you can use the most frequent category.
• More advanced techniques like regression imputation or k-Nearest Neighbors imputa-
tion can be used for more complex cases.
• Limitations: Imputed values may not be accurate, and the choice of imputation method
can introduce bias.
3. Forward Fill and Backward Fill:
• Method: For time series data, you can propagate the last known value forward (forward
fill) or use the next available value backward (backward fill).
• When to Use: This method is appropriate for time series data when missing values are
likely to be close to their neighboring values.
• Rules: You can specify a limit to how many consecutive values can be filled using this
method.
• Limitations: It may not be suitable for data with high variability.
4. Interpolation:
• Method: Interpolation fills missing values by estimating values based on available data
points.
• When to Use: Use interpolation for time series data or data with a clear trend.
• Rules: Linear interpolation is a simple and common method, while more advanced meth-
ods like polynomial or spline interpolation can be used for complex data.
• Limitations: The accuracy of interpolation depends on the quality and nature of data.

5. Advanced Imputation Techniques:


2021-2024 Pg. No.154
BRAINALYST - PYTHON
• Method: For complex data with systematic patterns, advanced imputation techniques
like k-Nearest Neighbors, regression, or matrix factorization can be applied.
• When to Use: When the data has complex relationships, and simple imputation methods
are not effective.
• Rules: These methods require a deeper understanding of the data and domain knowl-
edge.
• Parameters and tuning may be necessary.
• Limitations: These methods may be computationally expensive and require more effort
in model building.

6. Domain-Specific Strategies:
• Method: Depending on the domain and the data, specific strategies for handling missing
values may be required.
• When to Use: When domain knowledge suggests a particular method. For example, you
might have business rules for handling missing customer data.
• Rules: Follow domain-specific guidelines and best practices.
• Limitations: Specific strategies may not be applicable to all datasets.

7. Data Augmentation:
• Method: If you have a small dataset with missing values, you can augment it by generat-
ing synthetic data to replace missing values.
• When to Use: Use data augmentation when you have a limited amount of data and miss-
ing values are preventing meaningful analysis.
• Rules: Use appropriate data generation techniques to create plausible replacements for
missing data.
• Limitations: Generated data should be representative of the original dataset.
8. Indicator Variables:
• Method: Create binary indicator variables (0 or 1) to flag the presence or absence of
missing values.
• When to Use: This method helps preserve the information about the missingness pat-
tern.
• Rules: Consider creating indicator variables for specific columns with missing values.
• This is especially useful in predictive modeling.
• Limitations: It increases the dimensionality of the data.

Choosing the Right Method:


• The choice of method depends on the nature of missing values, the size of the dataset,
and the goals of your analysis.
• Imputation is a common first step, but it’s important to consider other methods as well.
• Domain knowledge and context play a significant role in deciding which method to use.

Pg. No.155 2021-2024


BRAINALYST - PYTHON

What is dropna and what are argument inside it?


• In Python, when working with pandas, the dropna() method is used to remove missing values (NaN
or None) from a DataFrame or Series. It is a commonly used method for data preprocessing when
you want to eliminate rows or columns that contain missing data. The dropna() method accepts
several arguments that allow you to customize the behavior of the operation.

• Break down the arguments and their meanings:


axis (default 0):
• This argument specifies whether you want to remove rows (axis=0) or columns (axis=1)
with missing values.
• If axis=0, you will drop rows with missing values.
• If axis=1, you will drop columns with missing values.
how (default ‘any’):
• This argument specifies how to determine whether to drop a row or column. It can take the
following values:
• ‘any’ (default): Drops the row or column if it contains any missing value.
• ‘all’: Drops the row or column only if all values are missing.

2021-2024 Pg. No.156


BRAINALYST - PYTHON

thresh (default None):


• This argument specifies the minimum number of non-missing values that a row or column
must contain to be retained. If a row or column has at least thresh non-missing values, it will
not be dropped.
• Set thresh to an integer value to control the threshold for dropping.
subset (default None):
• This argument is used when you want to drop rows or columns based on the presence of
missing values in specific columns.
• Pass a list of column names to specify the subset of columns to consider when dropping rows
or columns.
• Rows or columns will be dropped if they contain missing values in any of the specified col-
umns.
inplace (default False):
• This argument, when set to True, modifies the DataFrame in place and does not return a new
DataFrame. If set to False (the default), a new DataFrame with missing values removed is
returned, leaving the original DataFrame unchanged.

Stores.dropna? –hit enter

Pg. No.157 2021-2024


BRAINALYST - PYTHON

In Python and data analysis, “group” and “bins” are terms used to catego-
rize and organize data, especially when dealing with numerical values.
1. Group and Bins:
• Grouping refers to the process of dividing a dataset into categories or groups based on cer-
tain criteria or attributes. These categories are created to facilitate the analysis of data, par-
ticularly when dealing with large datasets.

2021-2024 Pg. No.158


BRAINALYST - PYTHON

• Bins are specific intervals or ranges into which data is divided. Binning is a method of group-
ing continuous data into discrete categories or intervals. Each bin represents a subset of data
points falling within a particular range.
• For example, if you have a dataset of ages, you can group them into bins like “0-10,” “11-20,”
“21-30,” and so on. Each of these bins is a category that represents a range of ages.
2. Categorical Data:
• Categorical data is a type of data that represents categories or labels. It includes data that can
be divided into groups but does not have a natural order or ranking.

• Categorical data can be further divided into:


• Nominal data: Categories with no specific order or ranking, like colors or city names.
• Ordinal data: Categories with a specific order or ranking, but the intervals between
categories may not be uniform, like education levels (e.g., “high school,” “college,”
“master’s degree”).
• Categorical data is typically represented using strings or labels, and statistical mea-
sures like mode are used to describe the central tendency.
3. Continuous Data:
• Continuous data represents numerical values that can take on an infinite number of possi-
bilities within a given range. This type of data can be measured with great precision and can
include decimal values.
• Continuous data is typically used for variables like age, height, weight, and temperature. It
can be summarized using statistical measures like mean, median, and standard deviation.

Here’s a key distinction:

• Categorical data represents categories or labels and is typically non-numeric.


• Continuous data represents numerical values that can take any value within a range.

Why Grouping and Binning are Important:


• Grouping and binning are essential techniques in data analysis for summarizing and visualizing data
effectively.
• They can make it easier to understand patterns, relationships, and distributions within the data.
• Grouping and binning are often used in data visualization, histogram creation, and in machine learn-
ing tasks such as feature engineering.
• For example, when creating a histogram to visualize the distribution of ages in a population, you
would group the ages into bins (e.g., 0-10, 11-20, etc.) to create a clear representation of the data.
This grouping allows you to see how many individuals fall into each age range, providing insights
into the age distribution of the population.

Pg. No.159 2021-2024


BRAINALYST - PYTHON

2021-2024 Pg. No.160


BRAINALYST - PYTHON

• Group
• Bins
• Pd.cut()
• Np.where
• Pd.qcut()
• The functions pd.cut() and pd.qcut() in Python are used for binning, which is a way to divide a con-
tinuous numerical variable into discrete intervals or bins. These functions are commonly used in
data analysis and machine learning for data preprocessing.

pd.cut() (Pandas Cut):


• Purpose: It is used to bin a continuous variable into discrete intervals.
• Usage: You provide the data to be binned and specify the bin edges or the number of bins.

• Rules:
• Binning can be done based on fixed-width bins (specifying bin edges) or adaptive-width bins.
• You can label the resulting bins, and Pandas will create a new categorical column with the bin
labels.

• When to Use:
• Use pd.cut() when you want to divide a continuous variable into predefined or custom bins.
• It’s useful when you have prior knowledge of how the data should be divided, e.g., age groups
or income brackets.

Pg. No.161 2021-2024


BRAINALYST - PYTHON

pd.qcut() (Pandas Quantile Cut):


• Purpose: It is used to bin a continuous variable into intervals based on quantiles, ensuring roughly
equal-sized bins.
• Usage: You provide the data and the number of bins, and it calculates the bin edges based on quan-
tiles.

• Rules:
• It is especially useful when you want to ensure that each bin has approximately the same
number of data points.

• When to Use:
• Use pd.qcut() when you want to distribute data evenly across bins.
• It’s suitable for when you want to handle data with a wide range or skewed distribution.

2021-2024 Pg. No.162


BRAINALYST - PYTHON

np.where() (NumPy Where):


• Purpose: It is used to perform conditional element-wise operations on arrays.
• Usage: You provide the condition, and it returns the elements from one of two arrays based on that
condition.

• Rules:
• The condition is a boolean array that has the same shape as the input arrays.
• It’s commonly used to replace values in an array based on a condition.

• When to Use:
• Use np.where() when you need to create a new array based on a condition without using ex-
plicit loops.
• It’s used for tasks like data cleaning or feature engineering.

Pg. No.163 2021-2024


BRAINALYST - PYTHON

Group by - Summaries of data – understand using pivot table excel


• A pivot table is a powerful data analysis tool in spreadsheet software like Microsoft Excel, Goo-
gle Sheets, or other similar applications. It allows you to summarize and analyze large sets of data
quickly and efficiently by creating a compact, organized table. Here’s how to use a pivot table and
why it’s valuable for data analysis:
• Creating a Pivot Table: Data Preparation: Ensure that your dataset is organized in a tabular
format with column headers and consistent data.
• Select Data: Highlight the range of data you want to analyze. This should include the column
headers.
• Insert Pivot Table: In Excel, go to the “Insert” tab and select “PivotTable.” In Google Sheets, go to
“Data” and choose “Pivot table.”
• Design the Pivot Table: In the PivotTable Field List (Excel) or Pivot Table Editor (Google Sheets),
drag and drop the fields from your dataset into specific areas of the pivot table:

2021-2024 Pg. No.164


BRAINALYST - PYTHON

• Rows: Place a field here to group your data along the rows of the pivot table. For example, if
you’re analyzing sales data, you can place the “Product” field here to see sales by product.
• Columns: Place a field here to create column headers that segment your data further. For in-
stance, you can put the “Year” field here to compare sales across different years.
• Values: Add a field to this section to calculate values such as sums, averages, counts, or other
summary statistics. You can, for example, place the “Sales” field here and choose to calculate the
sum of sales.
• Filters (Optional): If you want to apply filters to your data, you can place a field in this area to
limit the data shown in the pivot table.

Benefits of Using a Pivot Table:


• Data Summarization: Pivot tables allow you to summarize and aggregate large data-
sets, making it easier to understand and derive insights.
• Customization: You can quickly reorganize the pivot table to view data from different
angles by dragging and dropping fields.
• Interactive Analysis: Pivot tables are often interactive, enabling you to filter, sort, and
drill down into the data to explore various aspects.
• Reduced Manual Calculations: Pivot tables handle calculations automatically, elimi-
nating the need for manual data analysis and computations.
• Visual Presentation: Pivot tables present data in a structured, readable format, making
it easier to create reports and charts for presentation.
• Consolidation: You can consolidate data from multiple sources into one pivot table for
a comprehensive view of the information.

Pg. No.165 2021-2024


BRAINALYST - PYTHON

Group By in Python
Introduction:
• Grouping is a fundamental operation in data analysis. In Python, the groupby operation is
commonly used with libraries like Pandas to group data based on one or more columns and
perform aggregate functions on the grouped data. This allows for data summarization, anal-
ysis, and visualization.

How to Use Group By:


• To use the groupby operation in Python, follow these steps:
• Import Libraries: First, import the necessary libraries, such as Pandas.
• Load Data: Load your dataset into a Pandas DataFrame.
• Group Data: Use the groupby method on the DataFrame, specifying the column(s) by which
you want to group the data.
• Aggregate: Apply one or more aggregate functions (e.g., sum, mean, count) to the grouped
data to summarize the information.
• View Results: Examine the grouped and aggregated data, often as a new DataFrame.

Rules:
• The column(s) used for grouping must be categorical or discrete data. Numeric data can be
grouped if it represents categories (e.g., integers representing categories).
• You can group by multiple columns to create a hierarchical structure for grouping.
• Aggregate functions should be chosen based on the type of data and the analysis goals.

Common Aggregate Functions:


• sum(): Calculates the sum of numeric values within each group.
• mean(): Computes the average of numeric values in each group.
• count(): Counts the number of observations in each group.
• max() and min(): Find the maximum and minimum values in each group.
• size(): Returns the size of each group, including missing values.
• median(): Calculates the median of numeric values in each group.

2021-2024 Pg. No.166


BRAINALYST - PYTHON

Limitations:
• Grouping large datasets can consume significant memory and slow down processing.
• Grouping by multiple columns can lead to complex hierarchical structures, which may re-
quire careful handling.
• Some aggregate functions may not be applicable to all data types. For example, you can’t cal-
culate the median of non-numeric data.

When to Use Group By:


• Group by is used when you want to analyze and summarize data based on specific categories
or attributes.
• It’s valuable for exploratory data analysis to uncover patterns and trends within the data.
• Grouping is often employed for creating summary reports and visualizations.

Pg. No.167 2021-2024


BRAINALYST - PYTHON

2021-2024 Pg. No.168


BRAINALYST - PYTHON

Pg. No.169 2021-2024


BRAINALYST - PYTHON

Introduction:
• A pivot table is a powerful data analysis tool used to summarize and transform data in Py-
thon. It’s commonly associated with Pandas, a popular data manipulation library. Pivot tables
help reorganize and aggregate data for better analysis and visualization.

How to Use Pivot Tables:


• Import Libraries: Start by importing the necessary libraries, primarily Pandas for data ma-
nipulation.
• Load Data: Load your dataset into a Pandas DataFrame.
• Create a Pivot Table: Use the pivot_table() method on the DataFrame. Specify the index
(rows), columns, and values. The index and columns are typically categorical variables, while
the values are numeric.
• Aggregate Data: Apply an aggregation function to the values to summarize the data (e.g.,
sum, mean, count).
• View Results: Examine the pivot table, often as a new DataFrame, which is now organized
for analysis.

Rules:
• Index and columns should be categorical variables or discrete data. Numeric variables can be
used if they represent categories (e.g., integers representing categories).
• Aggregation functions should be chosen based on the type of data and analysis goals.
• You can create multi-level pivot tables by specifying multiple index and column variables.

Common Aggregate Functions:


• sum(): Calculates the sum of numeric values in each group.
• mean(): Computes the average of numeric values in each group.
• count(): Counts the number of observations in each group.
• max() and min(): Find the maximum and minimum values in each group.
• size(): Returns the size of each group, including missing values.
• median(): Calculates the median of numeric values in each group.

2021-2024 Pg. No.170


BRAINALYST - PYTHON
• Limitations:

• Pivot tables can be memory-intensive, especially for large datasets.


• They may not be suitable for all types of data, particularly if your dataset is already in a for-
mat optimized for analysis.
• Complex hierarchical structures created by multi-level pivot tables can be challenging to in-
terpret.

When to Use Pivot Tables:


• Pivot tables are ideal for restructuring and summarizing data, especially for reporting and
visualization purposes.
• They are valuable in situations where you want to analyze data from different perspectives
quickly.
• Pivot tables are often used in data exploration and for creating summary reports, dashboards,
and visualizations.

How to Plot Pivot Tables:


• After creating a pivot table in Pandas, you can use libraries like Matplotlib, Seaborn, or Plotly
to visualize the results.
• Common plots for pivot tables include bar charts, line charts, and heatmaps, depending on
the data and analysis goals.
• Plotting allows you to present the summarized data in a more visually appealing and under-
standable format.

Pg. No.171 2021-2024


BRAINALYST - PYTHON

• pd.crosstab() in Python:

• pd.crosstab() is a function in the Pandas library that is used to compute cross-tabulations,


also known as contingency tables, of two or more categorical variables. Cross-tabulations
provide a way to summarize and analyze the relationship between different categorical vari-
ables. Here’s an in-depth explanation of pd.crosstab():

What is pd.crosstab()?
• pd.crosstab() is a Pandas function for creating cross-tabulations. It is used to display
the frequency or count of observations that fall into various categories of two or more
categorical variables.
Use Cases:
• Exploring Relationships: You can use pd.crosstab() to explore the relationships
between different categorical variables in your dataset. For example, you can
examine the relationship between gender and product preference or the rela-
tionship between education level and employment status.

• Statistical Analysis: Cross-tabulations are useful in statistical analysis, hypoth-


esis testing, and chi-squared tests to determine whether there is a significant
association between two categorical variables.

• Reporting and Visualization: pd.crosstab() can help in generating tables for


reports and visualizations, making it easier to communicate and present cate-
gorical data relationships.

2021-2024 Pg. No.172


BRAINALYST - PYTHON

Syntax:

• index: The categorical variable to be used as the row index in the table.
• columns: The categorical variable(s) to be used as column headers.
• values: (optional) The variable to be aggregated within the cells.
• aggfunc: (optional) The aggregation function to be applied to values.
• rownames and colnames: (optional) Names for row and column indexes.
• margins: (optional) If True, adds a row and column for row and column mar-
gins.
• margins_name: (optional) Name for the row and column margins.
• dropna: (optional) If True, removes rows or columns containing only NaNs.
• normalize: (optional) If True, returns proportions instead of counts.

How to Use pd.crosstab():


• Use pd.crosstab(): Apply pd.crosstab() to create the cross-tabulation.

• View the Result: Examine the cross-tabulation result, which is a Pandas DataFrame.

Different Ways to Use pd.crosstab():


• You can use pd.crosstab() with more than two categorical variables by specifying mul-
tiple columns in the columns parameter.
• You can use the values and aggfunc parameters to aggregate and summarize a numer-
ic variable based on the cross-tabulation of categorical variables.
Limitations:
• pd.crosstab() is primarily for analyzing categorical variables. It may not be suitable
for continuous or highly granular data.
• It is not intended for calculating complex statistics or performing regression analysis.

Pg. No.173 2021-2024


BRAINALYST - PYTHON

Join
• “join” is an operation used to combine rows from two or more tables based on a related column
between them. Joins are fundamental for retrieving data from multiple tables in a relational data-
base and are a cornerstone of SQL query functionality. There are several types of joins, each serving
different purposes.
Types of Joins:
INNER JOIN:
• An inner join returns only the rows that have matching values in both tables.
• If there is no match for a row in one table, it will not appear in the result set.
• The result contains only the common data between the tables.

LEFT (OUTER) JOIN:


• A left join returns all the rows from the left (or first) table and the matching rows from
the right (or second) table.
• If there are no matches for a row in the right table, NULL values are returned for the right
table’s columns in the result.

RIGHT (OUTER) JOIN:


• A right join is similar to a left join but returns all rows from the right (second) table and
the matching rows from the left (first) table.
• Rows from the left table with no matches in the right table result in NULL values for the
left table’s columns in the result.

FULL (OUTER) JOIN:


• A full join returns all rows from both tables, including those with and without matches.
• Rows without matches in one table have NULL values in the columns from the other table.

SELF JOIN:
• A self join is used to combine rows from a single table.
• It is particularly useful when dealing with hierarchical data or when you want to compare
rows within the same table.

2021-2024 Pg. No.174


BRAINALYST - PYTHON

How Joins Work:


• Joins are typically performed by specifying the related columns from each table in the SQL
query using the ON keyword.
• The database engine compares values in the specified columns in both tables and includes
rows that satisfy the join condition in the result set.
• The result of a join is a new table, known as the result set or joined table, which contains
columns from both tables.

Pg. No.175 2021-2024


BRAINALYST - PYTHON

Introduction:
• Merging data in Python involves combining multiple datasets or DataFrames based on
common columns or indices. It is a critical operation in data manipulation, particular-
ly when working with structured data using libraries like Pandas. This process helps in
bringing data from different sources together for analysis, reporting, and visualization.
How to Merge Data:
• To merge data in Python, you typically use the pd.merge() function in Pandas, which is
similar to SQL JOIN operations. Here’s an in-depth explanation:

2021-2024 Pg. No.176


BRAINALYST - PYTHON

• left and right: The DataFrames or datasets to be merged.


• how: Specifies the type of join to be performed (e.g., ‘inner’, ‘outer’, ‘left’, ‘right’).
• on: The column(s) on which to merge the DataFrames.
• left_on and right_on: Columns to merge from the left and right DataFrames when they
have different column names.
• left_index and right_index: Set to True to use the index as a key for merging.
• sort: Sort the result by the join key(s).
• suffixes: Specify suffixes for overlapping column names from the left and right Data-
Frames.
• copy: Set to False to avoid copying data when merging.

When to Use Merging:


• Combining Related Data: Use merging when you have related data spread across multiple
tables, and you want to bring it together for analysis.
• Data Integration: When working with data from different sources, merge the data to cre-
ate a unified dataset for analysis.
• Enriching Data: Merge data to add additional columns or information to an existing data-
set.
Rules:
• The column(s) used for merging should be present in both DataFrames and have the
same data type.
• When merging on multiple columns, the columns should be specified as a list.
• Ensure that the merge keys are unique and do not contain duplicate values.

Common Types of Joins:


INNER JOIN (Default):
• Retains only the matching rows from both DataFrames.

OUTER JOIN:
• Retains all rows from both DataFrames and fills in missing values with NaN.

Pg. No.177 2021-2024


BRAINALYST - PYTHON

LEFT JOIN:
• Retains all rows from the left DataFrame and the matching rows from the right Data-
Frame. Fills in missing values with NaN.

RIGHT JOIN:
• Retains all rows from the right DataFrame and the matching rows from the left Data-
Frame. Fills in missing values with NaN.
Limitations:
• Merging large datasets can be memory-intensive, so it’s important to consider available
resources.
• Merging on non-unique or inconsistent keys can lead to unexpected results.
• Be cautious with column name conflicts when merging DataFrames.

2021-2024 Pg. No.178


BRAINALYST - PYTHON

Append

• The append() method in Python is a built-in method used to add an element or object to a list. It is
a common operation when working with lists, which are a fundamental data structure in Python.

Pg. No.179 2021-2024


BRAINALYST - PYTHON

How to Use the append() Method:


• The append() method is used to add an element to the end of a list.

Syntax:

• list: The list to which you want to add the element.


• element: The element or object you want to add to the list.
• Dynamic Lists: Use append() when you want to build a list dynamically, adding elements as
your program runs or based on specific conditions.
• Sequential Data Entry: It’s useful for collecting data or results sequentially, such as in loops
or when reading data from a source one item at a time.

Why Use the append() Method:


• Efficient for Growing Lists: append() is efficient for adding elements to the end of a list. It
avoids the need to create a new list and copy existing elements, making it faster than other
list operations.
• Dynamic Data Structures: It allows you to create dynamic data structures that can grow or
shrink as needed.

Rules:
• Append() can only be used with lists. It cannot be used with other data types like tu-
ples or dictionaries.
• The element added with append() becomes the last element of the list.

Limitations:
• The append() method only adds elements to the end of the list. If you need to insert an
element at a specific position in the list, you should use the insert() method.
• append() is an in-place operation, so it modifies the original list. If you need to create
a new list with additional elements, you should use concatenation or list comprehen-
sion.

• In this example, the append() method is used to add the element 4 to the end of the
my_list list.

2021-2024 Pg. No.180


BRAINALYST - PYTHON

Introduction:
• Appending data in Python is the process of adding rows or records from one dataset
to another. It is a common operation when working with structured data using librar-
ies like Pandas. Appending is useful when you have multiple datasets with the same
structure, and you want to combine them vertically to create a larger dataset.

How to Append Data:


• To append data in Python, you can use the pd.concat() function in Pandas. Here’s an in-depth
explanation:
Syntax:
• The basic syntax of pd.concat() for appending data is as follows:

-
• objs: A list of DataFrames or Series to be concatenated.
• axis: Specifies the axis along which to concatenate (0 for rows, 1 for columns).
• join: Defines how to handle the overlapping indexes (e.g., ‘outer’ for union, ‘inner’ for inter-
section).
• ignore_index: If True, resets the index in the result.
• keys: Adds a hierarchical index to the result, creating a multi-level index.
• verify_integrity: Checks for duplicate index values and raises an exception if found.
• sort: Sorts the result by the values.

When to Use Appending:


• Combining Data: Use appending when you have multiple datasets with the same structure
and want to combine them into a single dataset.
• Time Series Data: Append data when dealing with time series data, where new records are
collected over time and need to be added to an existing dataset.

Why Use Appending:


• Appending data allows you to create a larger dataset without the need to modify the original
datasets.
• It is useful for consolidating data collected over time or from different sources.

Pg. No.181 2021-2024


BRAINALYST - PYTHON

• Appending is often more efficient than merging or joining when the datasets have the same
structure.

Rules:
• The columns in the DataFrames to be appended should have the same names and data
types.
• The order of columns should match in the DataFrames.
• Make sure the index values are unique if you don’t want any issues with index duplication.

Limitations:
• Appending data can lead to duplicate index values if not handled properly.
• It may not be the best choice for combining datasets with different structures or when
you need to perform complex data integration operations.

2021-2024 Pg. No.182


BRAINALYST - PYTHON

Pg. No.183 2021-2024


BRAINALYST - PYTHON

Question

2021-2024 Pg. No.184


BRAINALYST - PYTHON

Some Methods use with function


• In Python, the .apply() and .applymap() functions are powerful tools in the Pandas library for ap-
plying functions to DataFrames and Series. These functions allow for flexible data manipulation and
transformation.
.apply() Function:
How to Use .apply():
• The .apply() function is used to apply a function along the axis of a DataFrame or Series.

• func: The function to be applied to each column or row.


• axis: Specifies whether the function should be applied to rows (0) or columns (1).
• args: A tuple of additional arguments to be passed to the function.
• **kwds: Additional keyword arguments to be passed to the function.
When to Use .apply():
• Custom Operations: Use .apply() when you need to perform a custom operation on each
element or column of a DataFrame or Series.
• Aggregating Data: It is helpful for aggregating data along rows or columns, calculating
summary statistics, or applying complex functions.
Different Use Cases for .apply():
• Calculating the mean or median of each column.
• Applying a custom function to clean or transform data.
• Normalizing data within each column.
• Handling missing values using a custom imputation strategy.
Rules:
• The function passed to .apply() should take a Series or DataFrame as input and return a
Series, DataFrame, or scalar value.
• The function should be designed to operate on a single column or row, not the entire
DataFrame.
Limitations:
• Using .apply() with complex functions can be slower than using built-in Pandas methods
for the same tasks.
• It may not be suitable for operations that require high performance, such as element-wise
mathematical operations.
apply() with Lambda Function:
How to Use .apply() with Lambda:
• The .apply() function with a lambda function is used to apply a custom function to each

Pg. No.185 2021-2024


BRAINALYST - PYTHON

element of a Series or DataFrame. Its basic syntax is as follows:

• column_name: The name of the column to which you want to apply the function.
• lambda x: A lambda function that defines the operation to be applied to each element.

When to Use .apply() with Lambda:


• Custom Element-Wise Operations: Use .apply() with a lambda function when you need to
perform custom operations on each element of a specific column.
• Data Transformation: It is helpful for transforming data, especially when the transforma-
tion logic is relatively simple and can be defined using a lambda function.

Different Use Cases for .apply() with Lambda:


• Applying a custom mathematical operation to each element of a column.
• Cleaning and formatting text data in a specific column.
• Creating a new column based on the values of an existing column.

Rules:
• The lambda function should take an element as input and return the transformed ele-
ment.
• Ensure that the lambda function logic is defined clearly and concisely.

Limitations:
• Using .apply() with complex or slow lambda functions can be inefficient, especially for
large DataFrames.
• It may not be suitable for operations that require multiple columns or complex logic.

.applymap() Function:
How to Use .applymap():
• The .applymap() function is used to apply a function element-wise to a DataFrame. Its
basic syntax is as follows:

• func: The function to be applied to each element in the DataFrame.


When to Use .applymap():
• Element-Wise Operations: Use .applymap() when you need to apply a function to each
element in a DataFrame.
• Cleaning and Data Transformation: It is useful for cleaning and transforming data, espe-
cially when you want to maintain the DataFrame structure.

2021-2024 Pg. No.186


BRAINALYST - PYTHON

Different Use Cases for .applymap():


• Removing leading or trailing whitespace from all string elements.
• Applying a function to round or format numeric values.
• Standardizing text data (e.g., converting to lowercase).

Rules:
• The function passed to .applymap() should operate on a single element (scalar) and re-
turn a scalar value.

Limitations:
• .applymap() can be slower than vectorized operations when dealing with large Data-
Frames.
• It may not be suitable for complex operations involving multiple columns or rows.

Pg. No.187 2021-2024


BRAINALYST - PYTHON

2021-2024 Pg. No.188


BRAINALYST - PYTHON

Pg. No.189 2021-2024


BRAINALYST - PYTHON

.apply map() with User-Defined Function:


• How to Use .applymap() with User-Defined Function:
• You can use a user-defined function with .applymap() to apply a custom operation to each
element of a DataFrame. Define the function and then use it with .applymap(). The syntax is
as follows:

When to Use .applymap() with User-Defined Function:


• Complex Element-Wise Operations: Use .applymap() with a user-defined function when you
need to perform more complex or customized operations on each element of a DataFrame.
• Data Cleaning and Transformation: It is valuable for data preprocessing tasks that involve
multiple steps or conditional logic.

Different Use Cases for .applymap() with User-Defined Function:


• Parsing and reformatting text data in specific columns.
• Complex data validation and cleaning.
• Handling missing values using a custom strategy.

Rules:
• The user-defined function should take an element as input and return the transformed ele-
ment.
• Ensure that the function is well-documented and handles different edge cases.

Limitations:
• .applymap() is slower compared to vectorized operations for large DataFrames.
• It may not be suitable for extremely complex operations or tasks that require interaction
between multiple columns.

2021-2024 Pg. No.190


BRAINALYST - PYTHON

Question

Define a function that takes a Pandas Series as an argument.


• Use the sum() method of the Series to calculate the sum of its elements.
• Return the result.

• In this code, we define a lambda function sum_series that takes a Pandas Series as its argument
and returns the sum of the elements using the .sum() method of the Series.
• When you call the lambda function sum_series(sample_series), it calculates the sum of the sam-
ple_series and returns the result.

Pg. No.191 2021-2024


BRAINALYST - PYTHON

• In Python, data type conversion is the process of changing the data type of a variable or a Pandas Se-
ries to another data type. This can be necessary for various reasons, such as ensuring compatibility,
performing operations, or data cleaning.

2021-2024 Pg. No.192


BRAINALYST - PYTHON

1. pd.to_datetime():
• How to Use pd.to_datetime():
• The pd.to_datetime() function in Pandas is used to convert a Series of strings or
numbers to datetime objects.

• arg: The input data to be converted to datetime.


• format: A format string specifying the format of the input dates.
• errors: How to handle parsing errors (‘raise’, ‘coerce’, ‘ignore’).
• Other optional parameters for handling different date formats and timezones.
Pg. No.193 2021-2024
BRAINALYST - PYTHON

When to Use pd.to_datetime():


• Use pd.to_datetime() when you have a Series containing date or time information
that you want to convert to datetime objects.
• It is useful for handling different date formats and standardizing them.

Different Use Cases for pd.to_datetime():


• Converting a column of date strings to datetime objects for time series analysis.
• Handling data from various sources with different date formats.

Rules:
• The arg parameter should be a Pandas Series or a list-like object containing date or
time information.
• If errors is set to ‘raise’, parsing errors will raise an exception. ‘coerce’ will convert
errors to NaT (Not-a-Time), and ‘ignore’ will ignore errors.

Limitations:
• pd.to_datetime() may not handle extremely unusual date formats.
• The format parameter can be challenging to define for custom date formats.

2. pd.to_numeric():
How to Use pd.to_numeric():
• The pd.to_numeric() function is used to convert a Series to numeric data types.
The basic syntax is as follows:

2021-2024 Pg. No.194


BRAINALYST - PYTHON

When to Use pd.to_numeric():


• Use pd.to_numeric() when you want to ensure that a Series contains numeric data
types.
• It is helpful for cleaning and standardizing numeric data.

Different Use Cases for pd.to_numeric():


• Converting columns with mixed data types to a consistent numeric data type.
• Handling datasets with numeric columns that contain non-numeric values.
Rules:
• The arg parameter should be a Pandas Series or a list-like object containing data to
be converted.
• If errors is set to ‘raise’, conversion errors will raise an exception. ‘coerce’ will con-
vert errors to NaN, and ‘ignore’ will ignore errors.

Limitations:
• pd.to_numeric() may not handle complex data types or mixed data types within a
single Series.

3. Series.astype():
• How to Use Series.astype():

• The astype() method is used to convert the data type of a Pandas Series to a speci-
fied data type. The basic syntax is as follows:

When to Use Series.astype():


• Use astype() when you want to explicitly change the data type of a Series to match
your requirements.
• It is useful for scenarios where you have control over the desired data type.

Different Use Cases for Series.astype():


• Converting a Series of integers to floating-point numbers.
• Changing a categorical variable to a string data type.

Rules:
• The dtype parameter should be a valid Pandas or NumPy data type.
• Data loss or unexpected results may occur if the data type conversion is not appro-
priate.
Limitations:
• Astype() does not perform type inference; you need to specify the desired data type
explicitly.
• It does not handle complex data transformations or conversions outside the scope
of supported data types.

Pg. No.195 2021-2024


BRAINALYST - PYTHON

• In Python, the datetime module provides functionality for working with dates and
times. The Pandas library, an essential tool for data manipulation, extends this func-
tionality with its Timestamp data type. To understand the methods available in Pan-
das for working with timestamps, you can explore the Timestamp object by printing
its attributes and methods using dir(pd.Timestamp).

Attributes:
• year: Returns the year of the timestamp.
• month: Returns the month of the timestamp (1-12).
• day: Returns the day of the timestamp (1-31).
• hour: Returns the hour of the timestamp (0-23).
• minute: Returns the minute of the timestamp (0-59).
• second: Returns the second of the timestamp (0-59).
• microsecond: Returns the microsecond of the timestamp (0-999999).
• nanosecond: Returns the nanosecond of the timestamp (0-999999999).
• day_of_week: Returns the day of the week as an integer (0=Monday, 6=Sunday).
• day_name(): Returns the name of the day of the week (e.g., “Monday”).
• month_name(): Returns the name of the month (e.g., “January”).
• quarter: Returns the quarter of the year (1-4).

Methods:
• strftime(): Format the timestamp as a string using format codes (e.g., %Y-%m-%d for “YYYY-MM-
DD”).
• date(): Extract the date part (year, month, and day) of the timestamp.
• time(): Extract the time part (hour, minute, second, microsecond) of the timestamp.

2021-2024 Pg. No.196


BRAINALYST - PYTHON
• replace(): Create a new timestamp with specified parts replaced.
• to_period(): Convert the timestamp to a period.
• to_pydatetime(): Convert the timestamp to a Python datetime.datetime object.
• tz_convert(): Convert the timestamp to a new time zone.
• tz_localize(): Localize a timestamp by specifying a time zone.
• is_leap_year(): Check if the year of the timestamp is a leap year.
• is_month_end(): Check if the timestamp is the last day of the month.
• is_month_start(): Check if the timestamp is the first day of the month.
• is_quarter_end(): Check if the timestamp is the last day of the quarter.
• is_quarter_start(): Check if the timestamp is the first day of the quarter.

Arithmetic Operations:
• You can perform various arithmetic operations with timestamps, such as addition, subtraction, and
comparison. For example, you can calculate the time difference between two timestamps or add a
certain number of days to a timestamp.

Various methods available in the datetime module in Python:


date time.date():
• Returns the date part (year, month, day) of a datetime object.

Pg. No.197 2021-2024


BRAINALYST - PYTHON

date time.time():
• Returns the time part (hour, minute, second, microsecond) of a datetime object.

date time.replace():
• Creates a new datetime object with specified parts (year, month, day, etc.) replaced.

date time.year:
• Returns the year of the datetime object.

date time.month:
• Returns the month of the datetime object (1-12).

date time.day:
• Returns the day of the datetime object (1-31).

date time.hour:
• Returns the hour of the datetime object (0-23).

date time.minute:
• Returns the minute of the datetime object (0-59).

date time.second:
• Returns the second of the datetime object (0-59).

date time.micro second:


• Returns the microsecond of the datetime object (0-999999).

date time.week day():


• Returns the day of the week as an integer (0=Monday, 6=Sunday).

date time.isoweek day():


• Returns the day of the week as an integer (1=Monday, 7=Sunday).

date time.iso calendar():


• Returns a tuple with ISO year, ISO week number, and ISO weekday.

date time.iso format():


• Returns a string representation of the datetime object in ISO 8601 format.
date time.c time():
• Returns a string representation of the datetime object in the format “Tue Dec 17 23:55:59
2019”.

2021-2024 Pg. No.198


BRAINALYST - PYTHON

date time.strf time(format):


• Formats the datetime object as a string using format codes.

date time.time tuple():


• Returns a time.struct_time object.

date time.time stamp():


• Returns the Unix timestamp (seconds since January 1, 1970).

date time.from time stamp (time stamp):


• Creates a datetime object from a Unix timestamp.

date time.utc from time stamp (time stamp):


• Creates a UTC datetime object from a Unix timestamp.

date time.from iso format (date_string):


• Creates a datetime object from an ISO 8601 formatted string.

date time.combine (date, time):


• Combines a date object and a time object into a datetime object.

date time.now(tz):
• Returns the current datetime object, optionally in the specified time zone (tz).

date time.utc now():


• Returns the current UTC datetime object.

date time.min:
• The smallest representable datetime object.

date time.max:
• The largest representable datetime object.

date time.resolution:
• The smallest possible difference between two datetime objects.

Pg. No.199 2021-2024


BRAINALYST - PYTHON

2021-2024 Pg. No.200


BRAINALYST - PYTHON

• When you print dir(datetime.datetime), you’re inspecting the attributes and methods avail-
able for the datetime class within the datetime module in Python. This class is used to repre-
sent and manipulate date and time objects.

Properties:
• year: Represents the year of a datetime object.
• Properties’ Use: You can access and modify the year part of a datetime object using this property.
• month: Represents the month of a datetime object (1-12).
• Properties’ Use: Use it to access and modify the month part of a datetime object.
• day: Represents the day of the datetime object (1-31).
• Properties’ Use: Access and modify the day part of a datetime object with this property.
• hour: Represents the hour of the datetime object (0-23).
• Properties’ Use: Access and modify the hour part of a datetime object using this property.
• minute: Represents the minute of the datetime object (0-59).
• Properties’ Use: Access and modify the minute part of a datetime object.
• second: Represents the second of the datetime object (0-59).
• Properties’ Use: Use this property to access and modify the second part of a datetime object.
• microsecond: Represents the microsecond of the datetime object (0-999999).
• Properties’ Use: Access and modify the microsecond part of a datetime object.
• tzinfo: Represents the time zone information for the datetime object.
• Properties’ Use: You can access and modify the time zone information associated with a datetime
object.

Methods:
• replace(): Creates a new datetime object with specified parts replaced.
• Methods’ Use: You can change specific attributes (year, month, day, etc.) while keeping the rest the
same.
• strftime(): Formats the datetime object as a string using format codes.
• Methods’ Use: Use this to represent the datetime as a string with a custom format.
• timestamp(): Converts the datetime object to a Unix timestamp (seconds since January 1, 1970).
• Methods’ Use: Convert a datetime object to a timestamp for various calculations.
• date(): Extracts the date part (year, month, day) of the datetime object.
• Methods’ Use: Obtain the date component from a datetime object.
• time(): Extracts the time part (hour, minute, second, microsecond) of the datetime object.
• Methods’ Use: Retrieve the time component from a datetime object.
Rules:
• Properties allow you to access and modify specific attributes of a datetime object.
• Methods provide functionality for creating new datetime objects, formatting, and extracting date
and time components.

Pg. No.201 2021-2024


BRAINALYST - PYTHON

Limitations:
• Datetime objects are limited to the range of dates that can be represented in the underlying system
(e.g., it might not represent dates before 1970 or after 2038 on some systems).
• Time zone handling can be complex, and the tzinfo property might not be available for all datetime
objects.

Questions:

2021-2024 Pg. No.202


BRAINALYST - PYTHON

Pg. No.203 2021-2024


BRAINALYST - PYTHON

2021-2024 Pg. No.204


BRAINALYST - PYTHON

Pg. No.205 2021-2024

You might also like