Python Guide 2025
Python Guide 2025
T
ALLYOUNEED
TOKNOW SERI
oBe
co
meaS
u
PYTHON
cc
es
s
f
ES
“
u
lDa
taPr
of
es
s
io
nal
ABOUT BRAINALYST
Brainalyst is a pioneering data-driven company dedicated to transforming data into actionable insights and
innovative solutions. Founded on the principles of leveraging cutting-edge technology and advanced analytics,
Brainalyst has become a beacon of excellence in the realms of data science, artificial intelligence, and machine
learning.
OUR MISSION
At Brainalyst, our mission is to empower businesses and individuals by providing comprehensive data solutions
that drive informed decision-making and foster innovation. We strive to bridge the gap between complex data and
meaningful insights, enabling our clients to navigate the digital landscape with confidence and clarity.
WHAT WE OFFER
• Data Strategy Development: Crafting customized data strategies aligned with your business
objectives.
• Advanced Analytics Solutions: Implementing predictive analytics, data mining, and statistical
analysis to uncover valuable insights.
• Business Intelligence: Developing intuitive dashboards and reports to visualize key metrics and
performance indicators.
• Machine Learning Models: Building and deploying ML models for classification, regression,
clustering, and more.
• Natural Language Processing: Implementing NLP techniques for text analysis, sentiment analysis,
and conversational AI.
• Computer Vision: Developing computer vision applications for image recognition, object detection,
and video analysis.
• Workshops and Seminars: Hands-on training sessions on the latest trends and technologies in
data science and AI.
• Customized Training Programs: Tailored training solutions to meet the specific needs of
organizations and individuals.
2021-2024
4. Generative AI Solutions
As a leader in the field of Generative AI, Brainalyst offers innovative solutions that create new content and
enhance creativity. Our services include:
• Content Generation: Developing AI models for generating text, images, and audio.
• Creative AI Tools: Building applications that support creative processes in writing, design, and
media production.
• Generative Design: Implementing AI-driven design tools for product development and
optimization.
OUR JOURNEY
Brainalyst’s journey began with a vision to revolutionize how data is utilized and understood. Founded by
Nitin Sharma, a visionary in the field of data science, Brainalyst has grown from a small startup into a renowned
company recognized for its expertise and innovation.
KEY MILESTONES:
• Inception: Brainalyst was founded with a mission to democratize access to advanced data analytics and AI
technologies.
• Expansion: Our team expanded to include experts in various domains of data science, leading to the
development of a diverse portfolio of services.
• Innovation: Brainalyst pioneered the integration of Generative AI into practical applications, setting new
standards in the industry.
• Recognition: We have been acknowledged for our contributions to the field, earning accolades and
partnerships with leading organizations.
Throughout our journey, we have remained committed to excellence, integrity, and customer satisfaction.
Our growth is a testament to the trust and support of our clients and the relentless dedication of our team.
Choosing Brainalyst means partnering with a company that is at the forefront of data-driven innovation. Our
strengths lie in:
• Expertise: A team of seasoned professionals with deep knowledge and experience in data science and AI.
• Customer Focus: A dedication to understanding and meeting the unique needs of each client.
• Results: Proven success in delivering impactful solutions that drive measurable outcomes.
JOIN US ON THIS JOURNEY TO HARNESS THE POWER OF DATA AND AI. WITH BRAINALYST, THE FUTURE IS
DATA-DRIVEN AND LIMITLESS.
2021-2024
TABLE OF CONTENTS
1. Preface 7. Sets in Python
• Creating Sets
2. Introduction to Python
• Set Operations
• What is Python
• Intersection
• Python Basics
• Union
• Purpose of Python
• Symmetric Difference
• What Can Python Do
• Difference
• Advantages of Python
• Disadvantages of Python • Adding and Removing Elements
• Why Python 8. User-Defined Functions (UDFs)
• Free Open Source vs. Licensed Software • Defining UDFs
3. Getting Started with Python • Parameters and Return Values
• Installing Anaconda • *args and **kwargs
• Difference between Anaconda and Miniconda • Lambda Functions
• Downloading and Installing Anaconda • Using map() with Lambda Functions
• Using Jupyter Notebook 9. String Operations
• Important Shortcuts in Jupyter Notebook • Creating and Manipulating Strings
4. Fundamentals of Python • String Methods
• Variables • Escape Characters
• Data Types 10. NumPy Basics
• Operators • Introduction to NumPy
• Syntax Rules • Creating NumPy Arrays
• Control Flow Statements • Array Manipulation
• Conditional Statements (if, elif, else) • NumPy Array Methods
• Loop Statements (for, while) • Broadcasting and Vectorization
• Function Definitions
11. Data Visualization with Python
5. Lists and Tuples in Python • Introduction to Data Visualization
• Creating Lists and Tuples • Plotting with Matplotlib
• Accessing Values in Lists and Tuples • Plotting with Seaborn
• Slicing and Indexing • Creating Various Charts and Graphs
• List and Tuple Methods
12. Advanced Python Topics
6. Dictionaries in Python • Object-Oriented Programming (OOP)
• Creating Dictionaries • Working with Files
• Accessing Elements in Dictionaries • Exception Handling
• Adding and Removing Elements • Regular Expressions
• Dictionary Methods • Working with Dates and Times
2021-2024
Preface
Welcome to “Python: Basic to Advanced,” a comprehensive guide designed to help
you master Python, one of the most powerful and versatile programming languages
available today. Whether you are a beginner starting your programming journey or an
experienced developer looking to deepen your understanding of Python, this handbook
will serve as an invaluable resource.
Python’s simplicity and readability have made it a popular choice for a wide range
of applications, from web development and data science to artificial intelligence and
automation. This handbook covers everything from the fundamental concepts of Python
programming to advanced topics, ensuring you have the knowledge and skills to tackle
any challenge.
We begin with an introduction to Python, exploring its history, advantages, and core
concepts. You will then learn how to get started with Python, including setting up your
development environment and using Jupyter Notebook for interactive programming.
As you progress through the chapters, you will delve into data structures, control flow
statements, functions, and more. The handbook also covers essential libraries such as
NumPy and Matplotlib, which are crucial for data analysis and visualization.
Join us on this journey to unlock the full potential of Python programming. Let’s
explore, analyze, and innovate together.
Nitin Sharma
Founder/CEO
Brainalyst- A Data Driven Company
Disclaimer: This material is protected under copyright act Brainalyst © 2021-2024. Unauthorized use and/ or
duplication of this material or any part of this material including data, in any form without explicit and written
permission from Brainalyst is strictly prohibited. Any violation of this copyright will attract legal actions.
2021-2024
BRAINALYST - PYTHON
PYTHON-BASIC TO ADVANCE
Python
Introduction to Python:
• Python is a high-level, versatile, and interpreted programming language known for its simplicity and
readability. Created by Guido van Rossum and first released in 1991, Python has gained immense
popularity in various domains, including web development, data science, artificial intelligence, au-
tomation, scientific computing, and more.
What is Python:
• Python is a general-purpose programming language that emphasizes code readability and encour-
ages the use of fewer lines of code through its clear, concise syntax. Python’s design philosophy
emphasizes code readability with the use of significant whitespace.
Python Basics:
• Python is an interpreted, high-level programming language with dynamic semantics.
• It is object-oriented, which allows for modeling real-world objects in code.
• Python is favored for its simplicity and readability, making it suitable for various applications.
Debugging Simplicity:
• Python programs are easy to debug, as errors raise exceptions instead of causing segmentation
faults.
• The interpreter prints stack traces when exceptions are not handled.
Purpose of Python:
• Python serves a wide range of purposes, such as:
• Web Development: Python is used for building web applications and websites using frameworks
like Django and Flask.
• Data Science: Python is a leading language for data analysis, machine learning, and scientific com-
puting with libraries such as NumPy, Pandas, and scikit-learn.
• Artificial Intelligence: Python is popular for developing AI and machine learning models with
frameworks like TensorFlow and PyTorch.
• Automation: Python can automate repetitive tasks and scripting, making it ideal for system admin-
istration.
• Game Development: Python has libraries like Pygame for developing 2D games.
• IoT (Internet of Things): Python can be used to program and control IoT devices.
• Desktop Applications: Python can create cross-platform desktop applications using frameworks
like PyQt and Tkinter.
• Network Programming: Python is employed for building network applications.
• Advantages of Python:
• Readability: Python’s clear and simple syntax makes it easy for beginners to learn and un-
derstand.
• Large Standard Library: Python has an extensive library that simplifies programming tasks.
• Cross-Platform: Python is available on multiple platforms, making it versatile.
• Community Support: Python has a large, active community that provides support, libraries,
and frameworks.
• Open Source: Python is open source and free, reducing costs for development.
• Scalability: Python is scalable and used in both small scripts and large applications.
• Disadvantages of Python:
• Performance: Python is not as fast as some other languages due to its interpreted nature.
• Global Interpreter Lock (GIL): The GIL can limit multi-threading performance in CPU-bound
applications.
• Not Ideal for Mobile Development: While Python can be used for mobile apps, it’s not the
best choice for resource-intensive applications.
Why Python:
• Python’s popularity stems from its simplicity, readability, extensive libraries, and versatility. It’s
widely adopted in diverse fields, and its large community ensures ongoing development and sup-
port.
• Python is open source, which contributes to its widespread use and community-driven develop-
ment. Open source software is often chosen for its accessibility, collaborative potential, and cost-ef-
fectiveness.
• The terms GUI, IDE, and reporting tool have specific meanings:
• Data Science Ecosystem: Anaconda includes a vast collection of data science, machine learn-
ing, and scientific computing libraries and tools. Popular libraries like NumPy, pandas, scikit-
learn, Matplotlib, and Jupyter are included in Anaconda by default. This saves you the trouble
of manually installing these packages.
• Key Differences:
• The most significant difference between Anaconda and Miniconda is the number of
pre-installed packages. Anaconda comes with a comprehensive set of data science and
scientific computing packages, while Miniconda only includes the essentials to get you
started.
• Anaconda provides a graphical user interface (Anaconda Navigator) for managing en-
vironments and packages, making it beginner-friendly. Miniconda, on the other hand, is
more command-line-centric, which may be preferred by advanced users.
• Anaconda is a larger download because it includes a vast number of pre-installed pack-
ages. Miniconda is much smaller due to its minimalist approach.
• While Anaconda is an all-in-one solution for many users, Miniconda is often used when
you want to create a custom environment tailored to your specific needs.
Python version
• Python has had several major versions over the years. The two most commonly used versions today
are Python 2 and Python 3.
• Python 1.0 (January 26, 1994): This is the first official version of Python. It laid the foun-
dation for the language’s development.
• Python 2.0 (October 16, 2000): Python 2 introduced many new features and improvements
over Python 1.0. It became one of the most widely used versions and remained popular for
many years.
• Python 3.0 (December 3, 2008): Python 3 was a significant and backward-incompatible
update. It aimed to clean up and simplify the language. Key changes included the removal of
print as a statement, the introduction of the print() function, and changes to the way strings
and Unicode were handled. Python 3 is the current and recommended version of Python.
• Python 4: There is no official Python 4 release as of my knowledge cutoff date in September
2021. The Python community has indicated that any future major versions of Python will be
backward-compatible with Python 3.
• Installation Complete:
• Once the installation is finished, you’ll receive a confirmation message. You can now close the
installer.
Command Mode:
• In Command Mode, you interact with the Notebook as a whole, manipulating cells and per-
forming various tasks.
• When a cell is in Command Mode, its border is typically highlighted in blue.
• To enter Command Mode, press Esc or click outside the cell’s content area.
• Within these modes, you can work with cells in three main modes:
Code Mode:
• This is where you write and execute Python code.
• You can run a code cell by pressing Shift + Enter or by clicking the “Run” button.
• The output of the code execution, such as results and error messages, is displayed below the
cell.
Markdown Mode:
• Markdown cells contain formatted text, which allows you to create rich-text documentation.
• You can use Markdown syntax to style and structure your text, including headings, lists, links,
and more.
• Markdown cells are often used for explanations, documentation, and commentary within
your Notebook.
In Jupyter Notebook, you can add and delete cells to customize your document.
To add a cell:
• While in Command Mode (blue border), you can add a new cell either above or below the
current cell.
• To add a cell above the current cell, press A (for “above”).
• To add a cell below the current cell, press B (for “below”).
• A new cell of the same type as the current cell (Code or Markdown) will appear, and you can
start typing in it.
To delete a cell:
• While in Command Mode, select the cell you want to delete by clicking it (the selected cell
will have a blue border).
• To delete the selected cell, press the X key (similar to the “cut” command). The cell will be
removed from the Notebook.
• Rich Text: Jupyter Notebook supports Markdown, enabling the creation of rich documenta-
tion alongside code.
• Syntax Highlighting: It provides syntax highlighting for various programming languages.
• Data Visualization: Jupyter supports data visualization libraries, making it easy to create and
display plots and graphs.
• Extensions: You can install extensions and widgets to enhance its capabilities.
• Collaboration: Jupyter Notebooks can be shared with others, making it suitable for collabo-
rative work.
Important Shortcuts
Command Mode Shortcuts (Press Esc to enter):
• H: Show all shortcuts.
• A: Insert a new cell above.
• B: Insert a new cell below.
• D, D (Press D twice): Delete the current cell.
• Y: Change the cell type to code.
• M: Change the cell type to Markdown.
• R: Change the cell type to raw.
• 1 to 6: Convert the cell to a heading with the corresponding level.
• Up Arrow or K: Select the cell above.
• Down Arrow or J: Select the cell below.
• Shift + Up Arrow or Shift + K: Extend the selection above.
• Shift + Down Arrow or Shift + J: Extend the selection below.
• Shift + M: Merge selected cells.
• Shift + Up/Down: Select multiple cells.
• Shift + L: Turn on/off line numbers.
• Edit Mode Shortcuts (Press Enter to enter):
• Ctrl + Enter: Run the current cell.
• Shift + Enter: Run the current cell and move to the next one.
• Alt + Enter: Run the current cell and insert a new one below.
• Ctrl + S: Save the notebook.
• Ctrl + Z: Undo.
• Ctrl + Shift + Z or Ctrl + Y: Redo.
• Ctrl + /: Comment/uncomment lines.
• Tab: Code completion or indent.
• Shift + Tab: Tooltip with documentation.
• Ctrl + Shift + - (minus key): Split the cell at the cursor position.
Pg. No.9 2021-2024
BRAINALYST - PYTHON
To access all the shortcuts in Jupyter Notebook without using keyboard shortcuts
• Menu Bar:
• In the Jupyter Notebook, you can find the menu bar at the top.
• Click on different menu options to find the list of available shortcuts.
• For example, under “Help,” you will find a “Keyboard Shortcuts” option. Clicking on it will dis-
play a list of keyboard shortcuts.
In Jupyter Notebook, you can edit and format text within Markdown cells using vari-
ous formatting options like style, size, bold, and color.
Markdown Cell:
• Ensure you are in a Markdown cell (not a code cell).
• To create a Markdown cell, select a cell and change its type to “Markdown” using the toolbar
or keyboard shortcuts (M for Markdown).
Styling Text:
• To style text, you can use Markdown formatting. For example:
• To italicize text, wrap it with asterisks or underscores (*italic* or _italic_).
• To make text bold, wrap it with double asterisks or underscores (**bold** or __bold__).
Adding Links:
• To add hyperlinks, use the [text](URL) format. For example: [OpenAI’s website](https://
www.openai.com).
• Lists and Bullet Points:
• To create lists, use *, -, or 1. for bullet points or numbered lists.
Math Formulas:
• For mathematical equations, you can use LaTeX notation within Markdown cells, enclosed in
dollar signs. For example, $$E=mc^2$$ will render the famous equation.
Preview:
• After entering your Markdown text and formatting, press Shift + Enter to render the cell and
see how it looks.
Variable:
• A variable is a name that can be used to store data values.
• Variables are created when you assign a value to them.
• Variable names are case-sensitive and can contain letters, numbers, and underscores.
• Variable names must start with a letter (a-z, A-Z) or an underscore (_).
Data Types:
• Python supports various data types that define the kind of data a variable can hold. Common
data types include:
Operators:
• Operators are used to perform operations on variables and values. Python supports various
types of operators:
Variables:
• A variable in Python is a named storage location used to store data.
• It can hold various types of data, such as numbers, strings, lists, or custom objects.
• You can think of a variable as a label or a name that refers to a value.
Variable Rules:
Naming Rules:
• Variable names must start with a letter (a-z, A-Z) or an underscore (_).
• After the first character, variable names can contain letters, numbers (0-9), or underscores.
• Variable names are case-sensitive, meaning myVar and myvar are treated as different vari-
ables.
• Use descriptive names that convey the purpose of the variable (e.g., age instead of a).
Reserved Words:
• Avoid using Python’s reserved words or keywords as variable names (e.g., if, while, for, print).
Here’s a list of Python reserved words: https://docs.python.org/3/reference/lexical_analy-
sis.html#keywords.
Style Conventions:
• Follow the Python PEP 8 style guide for naming conventions (https://www.python.org/dev/
peps/pep-0008/). It suggests using lowercase letters and underscores for variable names
(e.g., my_variable_name).
Variable Syntax:
• Assigning a value to a variable is done using the = operator. For example:
What to Avoid:
• Avoid using single-letter variable names (like x, y) for anything other than loop counters.
• Avoid using ambiguous variable names that don’t clearly convey the purpose of the variable.
• Don’t reuse variable names for different types of data within the same scope (e.g., using total for
both numbers and strings).
• Be mindful of variable scoping. Variables declared inside functions have local scope, while those
declared outside functions have global scope. Avoid reusing global variable names within functions.
In Jupyter Notebook, cells are the building blocks of your interactive documents.
Two common types of cells you’ll work with are Input cells and Output cells:
Input Cell:
• An input cell is where you write and execute your code.
• You can type Python code, Markdown text, or other supported content in an input cell.
• To run the code within the input cell, you can press Shift + Enter (or Shift + Return), and the output
will appear below it.
• Input cells are typically marked with “In [ ]:” to indicate the order in which they were executed.
Output Cell:
• After running an input cell that contains code, the results or output of that code will be dis-
played in an output cell below the input cell.
• Output cells can contain text, tables, plots, error messages, or any other output generated by
your code.
• Output cells are typically marked with “Out [ ]:” to match them with the corresponding input
cell. The number inside the brackets corresponds to the execution order.
Functions
• A function is a block of reusable code that performs a specific task.
• Python provides built-in functions (like print(), len()) and allows you to create your func-
tions using the def keyword.
• There are well-established naming conventions and rules in Python, and they are typ-
ically referred to as PEP 8 (Python Enhancement Proposal 8) guidelines. PEP 8 pro-
vides recommendations for naming identifiers (variables, functions, classes, etc.) in
Python code. Here are some key rules and guidelines from PEP 8:
Comments
• In Python, comments are used to provide explanatory notes or descriptions within your
code. Comments are not executed as part of the program and are meant for developers to
understand the code. Python supports both single-line and multi-line comments.
Single-Line Comments:
• Single-line comments are used to comment a single line of code.
• You can use the # symbol to start a single-line comment, and everything after # on that line
is considered a comment.
Data types
• In Python, data types are classifications that specify which type of value a variable can hold.
They are important because they define how data can be manipulated, what operations can
be performed on the data, and how data is stored in memory. Python is known for its simplic-
ity, and one way it maintains this simplicity is by using flexible, dynamic data types.
Why Data Types are Needed:
• Data types are essential for several reasons:
• Memory Allocation: Data types determine how much memory is allocated for a variable.
For example, an integer variable requires a different amount of memory compared to a float-
ing-point variable.
• Operations: Different data types support different operations. For instance, you can perform
arithmetic operations on numeric data types, but not on text data.
• Data Validation: Data types help ensure that the data you store in a variable is valid for the
type. For example, you can’t store text in an integer variable.
• Data Interpretation: Data types help Python understand how to interpret and display the
data.
• Performance: Using appropriate data types can lead to more efficient code and better per-
formance.
Int (Integer):
• An integer is a whole number, positive or negative, without any decimal point.
• Example: 5, -10, 0
• Use cases: Integers are used to represent whole numbers in Python. They are suitable for
counting items, indexing lists, and performing mathematical operations.
Float (Floating-Point):
• A float is a number that includes a decimal point, or it can be expressed using scientific no-
tation.
• Example: 3.14, -0.005, 2.0e-3
• Use cases: Floats are used when you need to work with real numbers, including fractions
and numbers with decimal places. They are commonly used in scientific calculations and for
storing measurements.
Bool (Boolean):
• A boolean represents a binary value that can be either True or False.
• Example: True, False
• Use cases: Booleans are primarily used for decision-making and control flow in Python. They
help in writing conditions, loops, and defining the logic of a program.
Str (String):
• A string is a sequence of characters, enclosed in either single or double quotes.
• Example: “Hello, World!”, ‘Python is fun’
• Use cases: Strings are used to store and manipulate text data. They are fundamental for han-
dling textual information, from simple messages to complex documents.
• In Python, True, False, and None are special constants representing Boolean values and a lack of
value (NoneType) respectively.
Operators
• In Python, operators are special symbols or keywords used to perform various operations on
data, variables, or values. Python provides a wide range of operators for tasks such as arith-
metic, comparison, logical.
Arithmetic Operators:
• Arithmetic operators are used to perform basic mathematical operations.
• + # Addition
• - # Subtraction
• * # Multiplication
• / # Division
• % # Modulus (remainder)
• // # Floor Division (integer division)
• ** # Exponentiation
• Importance: Arithmetic operators are used for performing mathematical calculations in Py-
thon.
• Rules:
• Operate on numeric data types (integers and floating-point numbers).
• Use parentheses to control the order of operations (like in regular math).
• Division (/) results in a floating-point number, even if the operands are integers.
• Floor division (//) returns an integer and discards the fractional part.
• Modulus (%) returns the remainder after division.
• Exponentiation (**) raises the left operand to the power of the right operand.
• When to Use: Arithmetic operators are used for basic calculations such as addition, subtrac-
tion, multiplication, division, and more. They are essential for numeric computations in Python.
Comparison Operators:
• Comparison operators are used to compare two values and return a Boolean result.
• == # Equal to
• != # Not equal to
• < # Less than
• > # Greater than
• <= # Less than or equal to
• >= # Greater than or equal to
• Importance: Comparison operators are used for comparing values and returning Boolean
results.
• Rules:
• Compare values of any data type.
• Result in True or False.
• When to Use: Comparison operators are vital for making decisions in conditional statements
and loops. They are used to test conditions and control the flow of your program.
Logical Operators:
• Logical operators are used to combine conditional statements.
• Rules:
• Operate on Boolean values.
• and returns True if both conditions are True.
• or returns True if at least one condition is True.
• not negates the condition.
• When to Use: Logical operators are essential for creating complex conditions and controlling
program flow based on multiple conditions.
• We discuss letter
Assignment Operators:
• Assignment operators are used to assign values to variables.
• = # Assign a value
• += # Add and assign
• Rules:
• Used for assigning values to variables.
• Combining arithmetic operations and assignment (e.g., +=, -=) in a single statement.
• When to Use: Assignment operators are used extensively to store and manipulate data. They
are essential for variable assignment and updating.
Syntax rules
• Control flow statements in Python allow you to control the order in which your code is executed.
Here, I’ll explain the syntax rules for the main control flow statements in Python, including condi-
tional statements (if, elif, else), loop statements (for and while), and function definitions.
If Statement:
• Syntax:
• Explanation: The if statement is used to execute a block of code if the specified condition is
True.
• Usage: The if statement is used to conditionally execute a block of code when the specified
condition is True.
• Working: The condition is evaluated. If it’s True, the code block under the if statement is
executed. If the condition is False, the code block is skipped.
Pg. No.29 2021-2024
BRAINALYST - PYTHON
• Explanation: The elif statement allows you to check multiple conditions in sequence. If the
first condition is not met, it moves on to the next elif condition.
• Usage: The elif statement is used to check multiple conditions sequentially when the previ-
ous condition(s) are False.
• Working: Conditions are evaluated in sequence. The first True condition’s block of code is
executed, and the rest are skipped.
• Limitations: You can use multiple elif statements, but it’s not always the most efficient way
to handle complex conditionals.
• Explanation: The else statement is used to specify code that is executed when the initial if condition
is False.
• Usage: The else statement complements the if statement and is used to specify code executed when
the initial condition is False.
• Working: If the if condition is True, the code block under if is executed. If it’s False, the code block
under else is executed.
• Limitations: It’s designed to handle binary conditions (True/False), and you cannot specify multi-
ple conditions.
2. Question: Is it possible to write code that does not contain an if but only elif and else?
Answer: No, it’s not possible. elif and else statements always follow an initial if statement. They pro-
vide alternative conditions to be executed if the initial if condition is False
3. Question: How can you represent the switch-case construct (available in some other lan-
guages) using Python’s if-elif-else?
Answer: Python does not have a switch-case construct. You can achieve similar functionality using a
dictionary of functions or if-elif-else statements.
4. Question: Explain the purpose of using the else statement with no condition (e.g., else:) in
an if-else block.
Answer: An else block with no condition in Python acts as a catch-all. It is executed when none of
the preceding if or elif conditions are True. It provides a default action when no other conditions are
met.
5. Question: What happens if you have a condition that is always True in an if-elif-else chain?
Answer: If a condition is always True in an if-elif-else chain, the code block associated with that
condition will execute, and the rest of the conditions (if any) will be skipped. This is why the order
of conditions is important.
6. Question: How can you achieve the behavior of else if (common in some other languages)
in Python?
Answer: In Python, you use elif to achieve the same behavior as else if in other languages. It allows
you to check additional conditions after the initial if condition.
7. Question: Can you have nested if-elif-else statements? If so, what is the limit?
Answer: Yes, you can nest if-elif-else statements inside other if-elif-else statements. There is no strict
limit to the nesting depth, but it’s essential to maintain code readability and avoid excessive nesting.
8. Question: How do you prevent code redundancy when multiple conditions require the
same action?
Answer: To prevent redundancy, you can assign a variable or calculate the result once and use it in
multiple if or elif conditions. This promotes code reusability and improves maintainability.
10. Question: Can an if-elif-else block contain only elif conditions without an initial if condi-
tion?
Answer: No, an if-elif-else block must start with an initial if condition. elif and else are designed to
provide alternative conditions and actions based on the outcome of the initial if condition.
While Loop:
• A while loop in Python is used to repeatedly execute a block of code as long as a given condition is
True. It’s suitable for scenarios where you don’t know in advance how many times the loop should
run.
• Mostly use for loop in python.
Syntax:
• The basic syntax of a while loop in Python is as follows:
• condition: A Boolean expression that determines whether the loop should continue or ter-
minate.
How It Works:
• The condition is evaluated. If it’s True, the code block within the loop is executed.
• After the code block execution, the condition is evaluated again.
• If the condition remains True, the loop continues to execute.
• This process repeats until the condition becomes False.
• When the condition is False, the loop terminates, and the program continues with the code
after the loop
Use Cases:
• Unknown Number of Iterations: When you don’t know in advance how many times the loop
should run.
• Continuous Monitoring: To monitor a condition and respond when the condition becomes
False.
• User Interaction: To repeatedly ask for user input until a valid response is received.
Common Mistakes:
• Forgetting to Update the Condition: Ensure that the condition within the while loop is modi-
fied inside the loop to eventually become False. Forgetting this can lead to infinite loops.
• Incorrect Initialization: Initialize variables used in the condition outside the loop, or the loop
may not run at all.
• No Exit Condition: Be cautious not to create loops without a way to exit. Always have an exit
strategy, such as breaking the loop under certain conditions.
• Infinite Loops: Care must be taken to avoid creating infinite loops, as they can consume sys-
tem resources and cause your program to become unresponsive.
• User Input: When using a while loop for user input, provide a clear way for users to exit the
loop or terminate the program.
Pg. No.35 2021-2024
BRAINALYST - PYTHON
• Condition Evaluation: Ensure the condition in the while loop is structured in a way that it
eventually becomes False. Otherwise, the loop will run indefinitely.
• Initialization: Initialize variables used in the loop outside the loop to avoid issues with reini-
tialization.
• Control flow statements in Python, including loops like for and while, allow you to ex-
ecute code conditionally or repeatedly.
For Loop:
• A for loop in Python is used to iterate over a sequence (such as a list, tuple, string, or range)
and execute a block of code for each item in the sequence. The loop continues until all items
in the sequence have been processed.
• variable: A variable that takes the value of each item in the sequence during each iteration.
• sequence: A collection of items (e.g., a list, tuple, string) that the loop iterates through.
How It Works:
• The for loop starts by assigning the first item in the sequence to the variable.
• The code block within the loop is executed with the variable set to the first item.
• The loop repeats steps 1 and 2 for each item in the sequence.
• The loop terminates when all items in the sequence have been processed.
Use Cases:
• Processing Data: for loops are commonly used to process data stored in collections (lists,
tuples, dictionaries) or strings. For each element in the collection, you can perform specific
operations.
• Iterating Over Ranges: You can use a for loop with the range() function to generate a se-
quence of numbers to iterate through.
• File Handling: for loops are used to read files line by line.
Common Mistakes:
• Modifying the Sequence: Avoid modifying the sequence within the loop. It can lead to unex-
pected behavior.
• Infinite Loops: Ensure that the loop’s condition eventually becomes False to prevent infinite
loops.
• Indentation: Proper indentation is crucial in Python. Ensure that the code block within the
loop is indented correctly.
When to Use:
• Use a for loop when you have a collection of items, and you want to perform a set of opera-
tions on each item in the collection.
Limitations:
• You should not modify the sequence within the loop because it can lead to unexpected be-
havior.
• Be cautious when iterating over ranges, especially in a while loop, to prevent infinite loops.
Important Notes:
• For efficiency, consider using list comprehensions for simple operations on sequences.
• Ensure that your loop conditions eventually become False to prevent infinite loops.
• Pay attention to indentation and colon usage, as they are essential for the correct structure
of loops in Python.
• Always initialize the loop variables before using them in a loop.
• Be cautious when modifying the sequence within a for loop, as this can lead to unexpected
behavior.
Range Function:
• In Python, the range function is used to generate a sequence of numbers within a specified range. It’s
often used in for loops to iterate a specific number of times.
Syntax:
• The basic syntax of the range function is as follows:
• Step (optional): The step size, indicating the interval between numbers in the se-
quence. The default is 1.
Usage:
• When using range(stop), it generates a sequence starting from 0 to stop - 1.
• When using range(start, stop), it generates a sequence from start to stop - 1.
• When using range(start, stop, step), it generates a sequence from start to stop - 1,
with a step size of step.
Important Notes:
• The range function generates a sequence of numbers efficiently without creating a list
in memory. This is beneficial for large ranges.
• The start, stop, and step values can be negative, which allows you to generate se-
quences in reverse or decremental order.
• The sequence generated by range is “half-open,” meaning it includes the start value
but excludes the stop value. For example, range(2, 6) generates numbers from 2 to 5.
• The range function is often used in for loops to iterate a specific number of times. For
example, for i in range(5) will iterate five times.
• To convert the range object into a list of numbers, you can use the list function. For
example, list(range(5)) will return [0, 1, 2, 3, 4].
Use Cases:
• Iterating over a sequence of numbers.
• Specifying the number of iterations in a loop.
• Creating custom sequences of numbers for various purposes.
Common Mistakes:
• Forgetting that the stop value is not included in the generated sequence. Make sure to
adjust your loop accordingly.
• Specifying a negative step value when you intend to generate a sequence in increasing
order.
Limitations:
• The range function generates sequences of integers only. It does not support float-
ing-point numbers.
Data structures
• Data structures are a fundamental concept in computer science and programming. They enable you
to efficiently store, organize, and manipulate data. In Python, there are several built-in data struc-
tures, including lists, tuples, dictionaries, and sets.
Properties of Data Structures:
• One-Dimensional: Lists, tuples, dictionaries, and sets are all one-dimensional, which means
they store data in a linear sequence. For multi-dimensional data structures, you can use lists
of lists or other nested structures.
• Heterogeneous: These data structures allow you to store elements of different data types.
For example, you can have a list that contains integers, strings, and lists, all within the same
list.
• No Broadcasting: Unlike some numerical libraries like NumPy, these built-in data structures
do not inherently support broadcasting. Broadcasting typically involves applying operations
element-wise or across arrays.
• Vectorization: While these data structures do not support vectorized operations like NumPy
arrays, you can perform vectorized operations using list comprehensions or similar tech-
niques. Vectorization in this context means efficiently applying an operation to all elements
of a data structure.
• In this example, range(1, 11) generates numbers from 1 (inclusive) to 11 (exclusive). The list() con-
structor converts this range into a list.
Tuples:
Creating a Tuple:
• Tuples are created by enclosing a comma-separated sequence of values within parentheses
( ).
Accessing Elements:
• Like lists, tuples are ordered, and you can access their elements using indices, starting from
0.
• In this example, range(2, 11, 2) generates even numbers from 2 (inclusive) to 11 (ex-
clusive) with a step of 2. The tuple() constructor converts this range into a tuple.
Important Note:
• Keep in mind that the range() function generates numbers within the specified range, but it
does not create a list or tuple by itself. You need to wrap it with list() or tuple() to convert it
into the desired data structure.
Negative Indexing:
• You can also use negative indexing to access elements from the end of the list or tuple. -1
represents the last element, -2 the second-to-last, and so on.
Slicing:
• Slicing allows you to access multiple elements in a sequence. It uses the colon : to specify a range of
indices.
• Slicing is a powerful feature in Python for working with sequences like lists, tuples, and strings. It
allows you to create new sequences by extracting a portion of an existing sequence.
• In Python, the id() function is used to get the identity (unique identifier) of an object. It re-
turns an integer that represents the memory address of the object. Each object in Python has
a unique id, which can be considered as a unique identifier for that object.
• In this example, a and b are two variables. We assign the integer value 10 to a and 5 to b.
When we use id(a) and id(b), Python returns the unique memory addresses for these two
variables.
How it works
• Identity of Objects: Each object in Python has a unique identity, which is determined
by its memory address. The id() function retrieves this unique identifier.
• Memory Addresses: Python stores objects in memory, and each object is assigned a
specific memory location. The id() function returns the memory address as an integer
value.
• Object Comparison: You can use id() to compare objects to see if they are the same.
If two variables have the same id, they refer to the same object in memory.
• Mutable vs. Immutable Objects: The behavior of id() can vary depending on wheth-
er the object is mutable or immutable. Mutable objects (e.g., lists, dictionaries) may
have the same id after modifications, while immutable objects (e.g., integers, strings)
will have different id values if modified.
• n this example, the id of x and y is the same initially. However, when we modify x, it
gets a new memory address because integers are immutable.
• Garbage Collection: Python manages memory using a mechanism called garbage
collection. When an object is no longer referenced, Python reclaims the memory oc-
cupied by that object. The id() value is not guaranteed to remain the same once the
object is garbage-collected.
• Caveats: While id() is useful for comparing object identity, it is not typically used for
common programming tasks. Python provides other methods for object comparison,
such as the == operator for comparing values and the is operator for checking identity.
• In Python, you can use the dir() function to get a list of all the attributes and
methods available for a particular object, including built-in objects like tuples
and lists.
• Syntax:
2. insert()
• The insert() method is used to add an element at a specific position in the list. It takes two
arguments: the index at which to insert the element and the value to be inserted. It also mod-
ifies the original list in place.
• Syntax:
3. append()
• The append() method is used to add an element to the end of the list. It modi-
fies the original list in place.
• Syntax:
• Important Note: append() adds only one element at a time to the end of the list.
Key Points:
• insert() allows you to specify the index where the element should be inserted.
• It modifies the original list in place.
• It can be used to insert a single element at a specific location within the list.
Differences:
• extend() is used for adding multiple elements from an iterable to the end of
the list.
• append() adds a single element to the end of the list.
• insert() inserts an element at a specified index within the list.
Key Points:
• Remove() deletes the first occurrence of the specified value from the list.
• If the value is not found in the list, it raises a ValueError.
• It modifies the original list in place.
2. Pop()
• The pop() method is used to remove an element from the list at a specified index. It returns
the removed element. If the index is not provided, it removes and returns the last element.
Key Points:
• pop() removes and returns the element at the specified index.
• If no index is provided, it removes and returns the last element.
• It modifies the original list in place.
• If the index is out of range, it raises an IndexError.
3. Clear()
• The clear() method is used to remove all elements from the list, effectively emptying it. After
calling clear(), the list becomes an empty list.
Key Points:
• Clear() removes all elements from the list.
• After calling clear(), the list is empty (contains no elements).
• It modifies the original list in place.
Differences:
• Remove() is used to delete the first occurrence of a specific value from the list.
• pop() removes and returns an element at a specified index, or the last element if no
index is provided.
• Clear() removes all elements from the list, leaving it empty.
Important Note:
• When using remove() and pop(), ensure that the element or index you are removing
exists in the list, or appropriate error handling should be in place to avoid exceptions.
• Clear() is a handy method for clearing all elements from a list when you want to re-
use the list.
Tuples
• Tuples are similar to lists, but unlike lists, they are immutable, meaning their elements can-
not be modified once the tuple is created.
Copy()
• The copy() method is used to create a shallow copy of a list. A shallow copy is a new list that
contains references to the same elements as the original list. In other words, it creates a new
list, but the elements themselves are not duplicated; they still point to the same objects in
memory. This means that changes made to the elements in the original list will also affect the
elements in the copied list and vice versa.
Syntax:
Count()
• The count() method is used to count the number of occurrences of a specific element in a list.
It allows you to find out how many times a particular value appears in the list.
Syntax:
Questions 1
Question 2
Question 3
Question 2
• In this code, you iterate through each element in L1 using a for loop. If the element is greater than
20, it is appended to the new list L2. The result will be a list containing elements greater than 20.
Question 3
• To create a new list L2 with elements greater than 20 from the list L1 after dividing each number in
the list by 100, you can use a for loop or list comprehension.
• In this code, you first create the list L1. Then, you iterate through each element in L1 using a for loop.
For each element, you divide it by 100 and check if the result is greater than 20. If it is, you append
it to the new list L2.
List Comprehension
• List comprehension is a concise and powerful way to create lists in Python. It allows you to create
a new list by applying an expression to each item in an existing iterable (e.g., a list, tuple, or range)
and optionally filtering the items based on a condition.
• Basic Syntax:
• In this example, we generate a list of squares from 1 to 5 using a list comprehension. For each
item x in the range, the expression x**2 is applied.
• Here, we create a list of even numbers by filtering the range of numbers to include only those
where x % 2 equals 0.
Important Notes:
• List comprehensions are efficient and often more readable than traditional for loops.
• Keep list comprehensions concise; they can become hard to read if too complex.
• You can use multiple for and if clauses for more complex operations.
Questions:
Dictionaries in Python
• A dictionary is a versatile and widely used data structure in Python. It’s a collection of key-value
pairs that provides a way to store, access, and manipulate data efficiently.
Basic Syntax:
Example:
Accessing Elements:
• Access by Key: To access a value in a dictionary, use the key enclosed in square brackets or the get()
method.
• If the key doesn’t exist in the dictionary, accessing it using square brackets will raise a KeyError,
while get() will return None.
Updating Elements:
• To update the value associated with a key, simply access the key and assign a new value to it.
Removing Elements:
• Del Statement: You can use the del statement to remove a key-value pair by specifying the
key.
• pop() Method: The pop() method removes a key-value pair by specifying the key and re-
turns the corresponding value. If the key is not found, you can provide a default value.
• popitem() Method: The popitem() method removes and returns an arbitrary key-value pair
from the dictionary as a tuple. This method is useful for popping the last item in Python 3.7+.
Notes:
• When using del, if you try to delete a key that doesn’t exist, it will raise a Key-
Error. Using pop() or popitem() with a nonexistent key allows you to provide a
default value or handle it gracefully.
Set
• A set in Python is an unordered collection of unique elements. Unlike a list or tuple, a set does not
allow duplicate values. Sets are commonly used for tasks that involve storing and managing distinct
items.
Adding Elements:
• You can add elements to a set using the add() method:
Removing Elements:
• Use the remove() or discard() method to remove elements. The difference is that remove()
raises a KeyError if the element is not found, while discard() does not:
• Set Operations:
• Sets support various set operations, such as union, intersection, and difference:
In Python sets, there are several set operations that allow you to manipulate and com-
bine sets.
Intersection (&):
• The intersection of two sets contains elements that are common to both sets.
• It is represented using the & operator.
• For example, if you have two sets A and B, A & B will give you a new set containing the ele-
ments that are present in both A and B.
Union (|):
• The union of two sets contains all unique elements from both sets.
• It is represented using the | operator.
• For example, if you have two sets A and B, A | B will give you a new set containing all unique
elements from both sets.
Difference (-):
• The difference between two sets contains elements that are in the first set but not in the
second set.
• It is represented using the - operator.
• For example, if you have two sets A and B, A - B will give you a new set containing elements
that are in A but not in B.
Discard:
• The discard method is used to remove an element from a set, if it exists in the set. If the
element is not in the set, it doesn’t raise an error.
• This method is handy when you want to remove an element from a set, but you’re not sure
if it exists in the set.
Function Body:
• The function body contains a block of code that defines the functionality of the function. It
may include statements, expressions, loops, conditionals, and other Python constructs.
Parameters (Arguments):
• Parameters are values passed to the function when it is called. You can specify parameters
inside the parentheses. Parameters are local variables to the function and can be used within
the function body.
Return Statement:
• A UDF can return a value using the return statement. This value can be of any data type. If the
return statement is omitted, the function returns None.
Use of UDFs:
• Code Reusability: UDFs allow you to reuse code for repetitive tasks.
• Modularity: They promote code organization by breaking down complex tasks into smaller,
manageable functions.
• Abstraction: UDFs provide a higher level of abstraction, making code more readable and
maintainable.
Limitations:
• Functions can be relatively slow, especially when used in tight loops.
• Nesting too many functions can lead to performance issues and reduced readability.
• Overuse of global variables within functions can make code harder to understand and main-
tain.
• Important Notes:
• UDFs should have a clear purpose and be named descriptively to improve code read-
ability.
• Document your functions using docstrings to explain their purpose and usage.
• Functions should perform a single task or serve a single responsibility (following the
Single Responsibility Principle).
• n Python, you can use the *args and **kwargs constructs in function definitions to work with vari-
able-length argument lists. These are often referred to as “arbitrary argument lists” and can be very
useful when you want to create functions that can accept a varying number of arguments.
• When you call this function, any extra arguments you pass will be collected into the args
tuple:
• When you call this function, any extra keyword arguments you pass will be collected into the
kwargs dictionary:
Usage
• The input() function reads a line of text from the user and returns it as a
string.
• Reading as String: input() always returns a string. If you want to use the input as a number,
you need to convert it explicitly.
• User Input: The function waits for the user to type something and press “Enter.” Once the
user enters a value and presses “Enter,” the input is read.
Important Note:
• Be cautious when using input(), especially if you plan to use the input for critical
or sensitive operations. Always validate and sanitize user input, especially if it’s
used for security-related tasks.
2. Single Expression:
• Lambda functions are limited to a single expression, and the result of the expression
is returned.
• Lambda functions and the map() function are powerful tools that allow you to stream-
line your code by applying functions to iterable data.
• Python’s map() function is a built-in function that’s widely used for applying a spec-
ified function to each item in an iterable (such as a list) and returning a map object
containing the results.
• map() is a fantastic choice for transforming data without needing a manual loop.
How map() Works:
• The map() function iterates through the items in the input iterable.
• For each item, it applies the specified function.
• The results are collected and returned as a map object, which can be converted to a list
or another iterable.
Rules and Limitations:
• The applied function should accept the same number of arguments as there are input
iterables.
• map() returns a map object, which can be converted to a list or another iterable for prac-
tical use.
• While efficient for straightforward transformations, map() may not replace loops for
more complex operations.
1. Creating Strings:
• Single Quotes: You can create a string using single quotes like this: ‘Hello, World!’.
• Double Quotes: Strings can also be defined using double quotes: “Hello, Python!”.
• Triple Quotes: Triple quotes (‘’’ or “””) are used for multi-line strings:
2. String Concatenation:
• Strings can be combined using the + operator:
4. String Methods:
• Python provides a multitude of string methods for various operations. Some commonly
used methods include:
• str.upper(): Converts a string to uppercase.
• str.lower(): Converts a string to lowercase.
• str.strip(): Removes leading and trailing whitespace.
• str.replace(): Replaces occurrences of a substring with another.
• str.split(): Splits a string into a list using a specified delimiter.
5. Escape Characters:
• You can use escape characters to include special characters within strings. For example:
• \n: Represents a newline.
• \t: Represents a tab.
• \\: Represents a backslash.
Using an Alias:
• Python allows you to use an alias for the package, which can make your code cleaner.
Common aliases are used to save time when typing. For NumPy, you can use the alias
np:
NumPy ndarray:
• NumPy is a fundamental package for scientific computing in Python. Its ndarray (short for n-dimen-
sional array) is a versatile data structure that underpins most numerical operations in Python.
n-dimensional:
• NumPy ndarrays can have any number of dimensions. They are often used for 1D (vectors), 2D
(matrices), and higher-dimensional arrays.
• You can create arrays with different shapes and dimensions, making it flexible for various applica-
tions.
Homogeneous:
• All elements within a NumPy ndarray must have the same data type (e.g., integers, floats).
• This homogeneity ensures efficient memory usage and optimized performance.
Allow Broadcasting:
• Broadcasting is a powerful feature of NumPy arrays.
• It allows for element-wise operations between arrays of different shapes and dimensions.
• NumPy automatically adjusts the smaller array to match the shape of the larger array during oper-
ations.
Allow Vectorization:
• Vectorization is the process of applying operations to entire arrays without using explicit loops.
• NumPy supports vectorized operations, making code concise and computationally efficient.
• For example, you can add two NumPy arrays together, element by element, without the need for
explicit loops.
Type Conversion:
np.array():
• NumPy’s np.array() function is used to create an array from a list, tuple, or any iterable
object.
• It automatically infers the data type, making it a versatile way to create NumPy arrays.
In-Built Methods:
np.zeros():
• np.zeros(shape) creates an array filled with zeros.
• The shape is defined as a tuple, specifying the number of elements along each dimension.
np.ones():
• np.ones(shape) creates an array filled with ones.
• Like np.zeros(), you specify the shape as a tuple.
np.full():
• np.full(shape, fill_value) creates an array filled with a specific value (fill_value).
• The shape is defined as a tuple.
np.arange():
• np.arange(start, stop, step) creates an array with evenly spaced values.
• It is similar to Python’s built-in range(), but it generates a NumPy array.
np.linspace():
• np.linspace(start, stop, num) generates an array of evenly spaced values over a specified
range.
• You define the number of values (num) you want, not the step size.
np.random.random():
• np.random.random(size) generates random numbers in the range [0.0, 1.0).
• You specify the shape of the output array using the size parameter.
np.random.randint():
• np.random.randint(low, high, size) generates random integers between low (inclusive)
and high (exclusive).
• The size parameter specifies the shape of the output.
Example:
Type Conversion:
np.array():
• Use it when you need to convert a Python list, tuple, or other iterable into a NumPy array.
• This is especially handy when working with numerical data, as NumPy arrays provide
efficient operations and calculations.
In-Built Methods:
np.zeros(shape):
• When you want to create an array filled with zeros for initialization.
• Useful when setting up arrays for later data insertion or calculation.
np.ones(shape):
• Great for initializing an array with ones.
• Often used in situations like creating a mask or a starting point for addition.
np.full(shape, fill_value):
• When you need to initialize an array with a specific constant value.
• Useful for tasks such as setting up a grid with predefined values.
np.random.random(size):
• When you need random floating-point values between 0 and 1.
• Useful for generating random data or introducing variability into simulations.
Reshape (array.reshape(new_shape):
• When you need to change the shape of an array while keeping the total number of elements con-
stant.
• Useful for preparing data for specific algorithms or reshaping images for deep learning models.
• In NumPy, np.full is a function used to create an array with a specified shape (dimensions) and fill it
with a constant value. It is a convenient way to initialize arrays with a predefined value.
• np.arange is a function provided by the NumPy library. It is used to create an array with regularly
spaced values within a specified range.
• The limitation of the built-in range function in Python is that it only supports generating sequences
of integer values. This means you can create sequences like range(1, 10) to get integers from 1 to 9,
but you can’t create sequences with floating-point numbers or specify a non-integer step size. For
tasks that involve non-integer values or customized step sizes, range falls short.
• This is where NumPy’s np.arange function comes to the rescue. np.arange provides a more versatile
solution for generating sequences of numbers. You can specify the start, stop, and step values as pa-
rameters, allowing you to create sequences of integers or floating-point numbers with a non-integer
step.
• np.linspace is a function in NumPy, a popular Python library for numerical and scientific comput-
ing.
Use Cases:
• Creating evenly spaced values: np.linspace is commonly used when you need to create a se-
quence of numbers that are evenly spaced, such as dividing a range into equal intervals.
• Generating time intervals: It’s useful in time series data or simulations where you want to
create evenly spaced time intervals.
• Visualization: It’s frequently used in data visualization to generate data points for plotting
graphs.
Limitations:
• Limited to linear sequences: np.linspace generates linear sequences where the
difference between values is constant. It cannot be used to create sequences with
non-linear spacing.a
Purpose:
• np.random.random is used to generate random float values from a uniform distribution
in the half-open interval [0.0, 1.0). In other words, it generates random floats between 0
(inclusive) and 1 (exclusive).
Use Cases:
• Simulation: It’s often used in simulations and modeling where you need random input
data.
• Random Sampling: When you need to randomly sample data, such as for bootstrapping in
statistics or creating randomized datasets.
Why Use np.random.random:
• Uniform distribution: It provides random numbers uniformly distributed between 0 and
1.
• Reproducibility: You can set the seed for the random number generator to make your
experiments reproducible.
Limitation:
• Limited range: np.random.random only generates random floats in the range [0.0, 1.0),
which is not suitable for all applications.
np.random.randint:
Purpose:
• np.random.randint generates random integers from a discrete uniform distribution.
Use Cases:
• Simulations: It’s widely used in simulations and games where you need random events.
• Random Sampling: When you need random samples for experimentation or generating
test data.
In Python, you can access data from an array using various techniques.
Indexing:
• Accessing a specific element at a given index.
Slicing:
• Extracting a portion of the array.
How to update
• Updating NumPy arrays in a vectorized manner is a powerful and efficient way to
modify array elements or perform operations on them.
• Updating NumPy arrays in a vectorized
1. Scalar Operations:
• You can update all elements of an array by applying a scalar operation. For
2. Element-Wise Operations:
• You can perform element-wise operations between arrays of the same
shape. For example, adding two arrays element-wise:
Conditional Updates:
• You can use boolean indexing to update specific elements in an array based on a
condition:
Broadcasting:
• NumPy allows operations between arrays of different shapes, as long as they are compat-
ible. Broadcasting can simplify vectorized updates. For example, adding a 1D array to a
2D array:
You can combine data using arrays and NumPy’s hstack and vstack functions to hori-
zontally and vertically stack arrays, respectively.
1. hstack (Horizontal Stack):
• hstack is used to horizontally stack (concatenate along columns) multiple arrays. This is of-
ten used when you want to combine data with the same number of rows but different attri-
butes or features.
Sequence of operation
• Operators are evaluated following a specific order known as the “order of operations.” This order
ensures that mathematical expressions are evaluated correctly. The order of operations can be re-
membered using the acronym PEMDAS (Parentheses, Exponents, Multiplication and Division, Ad-
dition and Subtraction), or BODMAS (Brackets, Orders, Division and Multiplication, Addition and
Subtraction).
Questions- assignment
Properties
Pandas library, you can work with two primary data structures: Series and Data-
Frame.
Series:
• A Series is a one-dimensional labeled array capable of holding data of any type.
• It’s similar to a column in a spreadsheet or a single-variable dataset.
• Series has both data and labels (index), making it easy to perform data manipulations.
• You can create a Series from a Python list, array, or dictionary.
• Series is homogeneous, meaning it can contain data of the same data type.
• It’s also size-immutable; you cannot change the size of a Series once it’s created.
• You can access elements using labels or integer-based indices.
• Common operations on Series include data selection, filtering, and aggregation.
2021-2024 Pg. No.110
BRAINALYST - PYTHON
• You can perform element-wise operations between Series, and they will align based on the
index.
• Series is the building block of a DataFrame, where each column is essentially a Series.
DataFrame:
• A DataFrame is a two-dimensional, size-mutable, and heterogeneous tabular data structure.
• It’s similar to a spreadsheet or a SQL table.
• DataFrames are commonly used for data manipulation, cleaning, exploration, and analysis.
• Each column in a DataFrame is a Series.
• DataFrames have both rows and columns, with each row representing a record and each
column a variable.
• You can create a DataFrame from various data sources, including dictionaries, lists, Series,
and external files (e.g., CSV or Excel).
• DataFrames are versatile, supporting various data types within the same structure.
• They allow for easy indexing, selection, and filtering of data.
• DataFrames can be transposed (rows become columns, and vice versa).
• You can merge, join, or concatenate DataFrames to combine data from different sources.
• DataFrames provide powerful functionality for data analysis and manipulation, such as
groupby operations, pivot tables, and time series analysis.
Properties of Series:
• Homogeneous: Series contains data of a single data type.
• Indexed: Each element in a Series has an associated label (index).
• Size-immutable: You cannot change the size of a Series once created.
• Element-wise operations: You can apply operations to each element in a Series.
Properties of DataFrames:
• Heterogeneous: DataFrames can hold data of different data types.
• Tabular structure: DataFrames have a two-dimensional structure.
• Indexed: Both rows and columns have labels (row and column indices).
• Size-mutable: You can add or remove rows and columns.
• Versatile: DataFrames support various data manipulation and analysis operations.
Creating a Series:
• You can create a Series in Python using the Pandas library.
• Common ways to create a Series:
• From a Python list: my_series = pd.Series([1, 2, 3, 4])
• From a NumPy array: my_series = pd.Series(np.array([1, 2, 3, 4]))
• From a dictionary: my_series = pd.Series({‘A’: 1, ‘B’: 2, ‘C’: 3})
• Series is homogeneous; it holds data of the same data type.
Accessing Data:
• You can access data in a Series using indexing.
• To update elements in a Pandas Series, you can assign new values to specific indices or labels.
• Access by label: my_series[‘A’] (if the index is a label)
• Access by position: my_series[0] (if the index is a position)
• You can also use slicing to access a range of elements: my_series[1:3]
• Note: Access element start from 0, use default index used.
Updating Data:
• You can update the data in a Series using indexing.
• Change a value by label: my_series[‘A’] = 10
• Change a value by position: my_series[0] = 10
Python’s Pandas library, iloc and loc are used for accessing elements in DataFrames
and Series.
1. iloc (Integer Location):
• iloc is primarily used for integer-based indexing. You can access elements by their integer
position within the DataFrame or Series.
• The indexing starts at 0, similar to standard Python indexing.
• The syntax for iloc is data.iloc[row, column].
Properties of iloc:
• It uses integer-based index positions.
• You can use integer slices and lists for selection.
• It doesn’t include the endpoint of slices (similar to Python slicing).
• It allows you to select specific rows and columns by their numeric positions.
• Note: Its only work for iloc is defauld index so use loc for text
Indexing:
• Series has two main components: data and an index.
• The index is like a label for each data point.
• You can customize the index while creating the Series.
In-Depth Methods:
• Series offers methods for various data operations:
• .head(): View the first few elements.
• .tail(): View the last few elements.
• .describe(): Get summary statistics.
• .max(): Find the maximum value.
• .min(): Find the minimum value.
• .mean(): Calculate the mean.
• .sum(): Calculate the sum.
• .value_counts(): Count unique values.
Properties:
• Homogeneous: Series holds data of a single data type.
• Indexed: Each element in a Series has an associated label (index).
• Size-immutable: You cannot change the size of a Series once created.
• Element-wise operations: You can apply operations to each element in a Series.
Limitations:
• Size: Limited to available memory.
• Homogeneous: Data type must be the same across all elements.
• Immutability: You cannot change the size or data type of a Series after creation.
Data Exploration:
• Check the dimensions of the dataset using shape to see how many rows and columns
are present.
• Use info() to get information about the data types and missing values.
• Describe the data statistically with describe() to get summary statistics.
• Check for missing values using isna() or isnull() combined with sum().
Data Cleaning:
• Handle missing values by either removing rows or imputing values.
• Remove duplicates if they exist.
• Correct data types if necessary, like converting strings to numbers.
• Rename columns for clarity.
• Handle outliers if needed.
Data Analysis:
• Perform various analyses depending on the dataset and your objectives. This can in-
clude:
• Aggregations (grouping, summing, averaging)
• Filtering data based on criteria
• Applying statistical tests or machine learning models
Data Visualization:
• Create visualizations to gain insights. Use libraries like Matplotlib or Seaborn for this.
• Common plots include histograms, bar charts, scatter plots, and heatmaps.
Exporting Data:
• If you’ve made changes to the dataset, save it to a new file.
• Import data from a CSV (Comma-Separated Values) file using the Pandas library.
Understand the process of EDA, understand data is the first step-practical implemen-
tation
• Data import: step 1
type():
• The type() function helps you determine the type of your data structure. For example, it can
tell you if you’re dealing with a Pandas DataFrame or Series.
dtype:
• The dtype attribute is used to check the data type of the elements in a Pandas Series or Data-
Frame. It’s essential to ensure your data is interpreted correctly.
• Limitations: It won’t show you the data type of multiple columns at once.
• Use Case: Vital for confirming data types, especially for numerical operations.
shape:
• The shape attribute returns a tuple representing the dimensions of your DataFrame. It’s use-
ful for understanding the size of your dataset in terms of rows and columns.
count():
• The count() method provides the count of non-null elements in each column. It’s crucial for
identifying missing data points.
columns:
• The columns attribute lists the column names of your DataFrame. It’s handy for selecting
specific columns or understanding the dataset’s structure.
info():
• The info() method offers a concise summary of your DataFrame, including column data
types and non-null counts. It’s a great initial overview of your dataset.
describe():
• The describe() method generates summary statistics of your data, including count, mean,
standard deviation, minimum, and maximum values. It provides valuable insights into the
distribution of your numerical data.
• count: This is the count of non-missing (non-null) values for each column. It tells you how
many data points are available for each numerical column.
• mean: The mean (average) of the data in each column. It gives you an idea of the central val-
ue around which the data is distributed.
• std: The standard deviation, which measures the spread or dispersion of the data. It tells you
how much individual data points typically deviate from the mean.
• min: The minimum value in each column, which is the smallest observed value.
• 25%: The 25th percentile (1st quartile) value, indicating that 25% of the data points are less
than or equal to this value. It’s a measure of data distribution.
• 50%: The 50th percentile (2nd quartile or median) value. It represents the middle value of
the data when arranged in ascending order.
• 75%: The 75th percentile (3rd quartile) value. It’s another measure of data distribution.
• max: The maximum value in each column, which is the largest observed value.
Step 3 : data
Methods:
• Methods are functions that you can call on an object (like a DataFrame or a Series).
• They are typically followed by parentheses, e.g., head(), describe(), info().
• Methods perform some action on the object, and they may accept arguments within the parenthe-
ses to modify their behavior.
• For example, head() is a method that returns the first few rows of a DataFrame. You can specify the
number of rows you want to see by providing an argument like head(10) to see the first 10 rows.
Attributes:
• Attributes are values or properties associated with an object. They provide information about the
object but don’t perform actions.
• Attributes are accessed without parentheses, e.g., shape, dtypes, columns.
• Attributes provide information or characteristics of the object they are attached to. For instance,
shape is an attribute that tells you the dimensions (rows and columns) of a DataFrame.
What is subsetting
• Data cleaning in Python often involves subsetting, which means selecting specific columns from a
DataFrame or rows based on certain conditions. There are several ways to achieve this in Pandas,
a popular Python library for data manipulation. Let’s explore the different methods for subsetting:
4 different ways
Using .loc[]:
• The .loc[] method allows you to select rows and columns by label (column names and row
indices).
• You can specify both row and column labels using .loc[].
Using .iloc[]:
• The .iloc[] method is used for integer-based indexing. It allows you to select rows and col-
umns by integer positions.
• This method is useful when you want to select data by its numerical position.
Selecting by Column Name:
• You can also select columns by directly referencing their names.
• Note: Remember to replace ‘Column_Name’, ‘Column1’, ‘Column2’, 0, 1, 2, and 1:3 in the
examples with the actual column names or integer positions you want to select. Sub-
setting allows you to focus on specific parts of your data for analysis, visualization, or
further processing, which is a crucial step in data cleaning and preparation.
• In Python, particularly when working with Pandas DataFrames, the rename function is used
to change the labels of the rows or columns. It’s a powerful tool for data preprocessing and
cleaning.
• mapper: This parameter allows you to specify the mapping of the old labels to the new la-
bels. It can be a dictionary, a function, or None (default).
• index and columns: These parameters let you specify whether you want to rename the row
(index) labels or column labels. You should choose one or the other.
• axis: This parameter is an alternative to using index and columns. It can be set to 0 for index
labels or 1 for column labels.
• copy: By default, a new DataFrame with the updated labels is returned. If you set copy to
False, the original DataFrame is modified in place.
• inplace: If set to True, the original DataFrame is modified in place (overwrites the existing
one). If False (default), a new DataFrame is returned.
• level: When working with MultiIndex DataFrames, this parameter allows you to specify the
level you want to rename.
Rules:
• When using rename, you can either specify a dictionary to map old labels to new labels, or
you can provide a function that transforms the labels.
• If you want to rename both rows and columns, you can use mapper to specify the changes
for both. If you want to rename only rows or columns, you can use the index and columns
parameters.
• The function doesn’t directly affect the original DataFrame unless you set inplace to True or
assign the result back to the original DataFrame.
• The level parameter is used when working with MultiIndex DataFrames, allowing you to re-
name specific levels within the index.
Limitations:
• The rename function is generally limited to changing labels; it doesn’t perform complex data
transformations.
• It might not be the most efficient choice for very large DataFrames due to the need to create
a copy.
• If labels are not unique, renaming could lead to unexpected results.
When to Use:
• Use rename when you need to make your DataFrame’s row or column labels more meaning-
ful.
• It’s helpful for data preprocessing when the original labels are not descriptive or need to be
standardized.
• You might also use it when working with MultiIndex DataFrames to change level names.
• Particularly when working with Pandas DataFrames, the drop operation is used to
eliminate one or more columns from a DataFrame. Here’s a comprehensive expla-
nation of the drop operation:
• labels: This parameter specifies what to drop. It can be a single label or a list of labels, indi-
cating the columns to remove.
• axis: By default, it’s set to 0, which means dropping rows. If you want to drop columns, set it
to 1.
• index and columns: These parameters are alternatives to specifying the axis. Use index for
rows and columns for columns.
• level: When working with MultiIndex DataFrames, you can specify the level at which to drop
labels.
• inplace: If set to True, the original DataFrame is modified in place, and nothing is returned. If
False (default), a new DataFrame with the specified columns removed is returned.
• errors: This parameter defines how to handle labels that are not found. The default ‘raise’
will raise an error. ‘ignore’ will suppress the error.
Explanation:
Dropping Columns:
• When you want to remove one or more columns from a DataFrame, set the axis or use the
columns parameter.
• Specify the names of the columns you want to drop using the labels parameter.
• If you set inplace to True, the original DataFrame is modified; otherwise, a new Data-
Frame with the specified columns removed is returned.
Dropping Rows:
• To eliminate rows, use the axis parameter or the index parameter. By default, axis is set
to 0 for rows.
• Specify the row labels you want to drop using the labels parameter.
• Similar to column dropping, setting inplace to True modifies the original DataFrame.
Rules:
• The drop method allows you to remove specific rows or columns by label, providing you
with flexibility in data cleaning and preprocessing.
• You can use the labels parameter to specify the labels you want to drop, and it can be a
single label or a list of labels.
• Be cautious when using inplace=True as it modifies the original DataFrame, which could
be irreversible.
Use Cases:
• Data Cleaning: Remove irrelevant or redundant columns to simplify data analysis.
• Data Preprocessing: Drop rows with missing or incorrect data.
• Selective Data Extraction: Create a new DataFrame by excluding specific columns.
• Preparing Data for Modeling: Eliminate target columns when preparing data for machine
learning tasks.
Filtering Rows:
• To filter rows based on specific conditions, you can use boolean indexing. Here’s a break-
down:
Boolean Indexing:
• Boolean indexing is the process of selecting rows from a DataFrame based on a condition.
• You create a boolean mask, a series of True/False values, where each value corresponds
to whether the condition is met for that row.
• In Python, particularly when working with Pandas, the in operator is used to check
for the presence of a value within a Series or DataFrame.
Use Cases:
• Filtering: You can use the in operator to filter rows based on the presence of specific values
in a column.
• Membership Checks: It’s helpful for checking if a value exists in a dataset before performing
operations on it.
Rules:
• The in operator is case-sensitive. Ensure the value’s case matches the case in your data.
• Use the any() or all() function along with the in operator for more complex conditions when
working with DataFrames.
Use Cases:
• Filtering: You can use the in operator to filter rows based on the presence of specific values
in a column.
• Membership Checks: It’s helpful for checking if a value exists in a dataset before performing
operations on it.
Rules:
• The in operator is case-sensitive. Ensure the value’s case matches the case in your data.
• Use the any() or all() function along with the in operator for more complex conditions when
working with DataFrames.
Limitations:
• The in operator is primarily used for exact matches. For more advanced searches, you might
need other methods like regular expressions.
• s.sort_values(ascending=True) sorts the Series in ascending order, which is the default be-
havior.
• To sort in descending order, you can use s.sort_values(ascending=False).
Use Cases:
• Data Exploration: Sorting helps you explore data effectively by arranging it based on relevant
columns.
• Data Presentation: For presenting data in a readable manner, especially in tables and reports.
• Data Filtering: You can use sorting as a preliminary step to filter data based on specific con-
ditions.
Methods:
• .sort_values(): The primary method for sorting in Pandas, available for both Series and Data-
Frames.
• .sort_index(): Sorts by the index (row labels) instead of values.
Parameters:
• by: Specifies the column(s) by which to sort.
• ascending: Determines the sorting order (default is ascending).
• inplace: Modifies the original data if set to True.
• axis: For DataFrames, you can choose to sort rows (axis=0) or columns (axis=1).
Limitations:
• Sorting can be resource-intensive for large datasets. It’s essential to consider performance
when sorting extensive data.
Pg. No.139 2021-2024
BRAINALYST - PYTHON
Handling Duplicates:
• Dealing with duplicate values is common when working with real-world data.
Detecting Duplicates:
• df.duplicated(subset=None, keep=’first’): Returns a boolean Series indicating duplicate rows.
• subset specifies the columns to consider for duplicates.
• keep determines which duplicates to mark (‘first’, ‘last’, or ‘False’).
Removing Duplicates:
• df.drop_duplicates(subset=None, keep=’first’, inplace=False): Removes duplicate rows.
• subset specifies the columns to consider for duplicates.
• keep determines which duplicates to keep.
• inplace=True modifies the DataFrame in place.
Counting Duplicates:
• df.duplicated().sum(): Counts the total number of duplicates.
Use Cases:
• Sorting is useful for organizing data for analysis and presentation.
• Handling duplicates ensures data accuracy and consistency.
Limitations:
• Sorting can be resource-intensive for large datasets.
• Handling duplicates may impact data loss, so consider the implications.
Best Practices:
• When handling duplicates, understand the data and business context to decide which dupli-
cates to keep or remove.
• Before sorting, make sure you know the data’s characteristics and choose the appropriate
columns for sorting.
Question
Outliers
• Detecting outliers using various methods like IQR (Interquartile Range), percentiles, and standard
deviation (z-scores) is a common data analysis task.
1. Outlier:
• An outlier is a data point that significantly deviates from the rest of the data in a dataset. It
can be either an unusually small or unusually large value.
2. IQR (Interquartile Range) Method:
• The IQR is a measure of statistical dispersion calculated as the difference between the third
quartile (Q3) and the first quartile (Q1) of the data.
• IQR = Q3 - Q1.
3. Percentile Method:
• Percentiles divide the data into equal parts. The median, for example, is the 50th percentile.
• You can use a specific percentile value (e.g., 90th percentile) to set a threshold for outliers.
Additional Note:
• Clipping is just one of many methods to handle outliers. Other methods include:
• Transformation (e.g., log transformation, square root transformation) to reduce the impact
of extreme values.
• Winsorizing, which sets the extreme values to a specified percentile value.
• Outlier removal, where extreme values are removed from the dataset.
• Robust statistical methods, which are less affected by outliers (e.g., the median instead of the
mean).
• The choice of method depends on the specific characteristics of your data and the goals of
your analysis.
• Box plots, also known as box-and-whisker plots, are a useful visualization for identifying and
understanding outliers in a dataset. They provide a visual summary of the distribution of a
dataset and help you identify extreme values.
3. Identifying Outliers:
• Outliers are data points that fall below Q1 - 1.5 * IQR or above Q3 + 1.5 * IQR. These are data
points that significantly deviate from the central 50% of the data.
• In the box plot, outliers are shown as individual points outside the whiskers. They can be above
(upper outliers) or below (lower outliers) the whiskers.
4. Reading a Box Plot:
• If the data is symmetrically distributed:
• The box is roughly centered within the whiskers.
• The median line is in the middle of the box.
• Outliers may be present, but they are evenly distributed on both sides of the whiskers.
If the data is skewed:
• The box may be shifted to one side of the whiskers.
• The median may be closer to the thicker part of the box.
• Outliers may be clustered on one side of the whiskers.
• The length of the box (IQR) and the spread of the whiskers indicate the data’s spread and
variability.
5. How to Define Outliers Using a Box Plot:
• As mentioned earlier, outliers are defined as data points below Q1 - 1.5 * IQR or above Q3 + 1.5
* IQR.
• It’s important to choose a specific threshold (e.g., 1.5 or 3) based on the context of your analysis.
6. Interpreting the Box Plot:
• A box plot helps you quickly visualize the central tendency, spread, and the presence of outliers
in your data.
• The length of the box and whiskers provides information about data variability.
• Outliers are clearly displayed, making it easy to identify extreme values.
• In Python, when working with pandas, finding and handling missing values is a crucial part of
data preprocessing. You can use various methods and attributes provided by pandas to locate
missing values in Series and DataFrames.
• This code will return a Boolean Series with True values indicating the positions of missing values.
• This code will return two DataFrames: one indicating missing values and the other indicating
non-missing values.
• Handling missing values is a critical part of data preprocessing in Python, as they can adversely af-
fect the results of your analysis or machine learning models. The choice of a missing value treatment
method depends on the nature of the data and your specific analysis goals.
6. Domain-Specific Strategies:
• Method: Depending on the domain and the data, specific strategies for handling missing
values may be required.
• When to Use: When domain knowledge suggests a particular method. For example, you
might have business rules for handling missing customer data.
• Rules: Follow domain-specific guidelines and best practices.
• Limitations: Specific strategies may not be applicable to all datasets.
7. Data Augmentation:
• Method: If you have a small dataset with missing values, you can augment it by generat-
ing synthetic data to replace missing values.
• When to Use: Use data augmentation when you have a limited amount of data and miss-
ing values are preventing meaningful analysis.
• Rules: Use appropriate data generation techniques to create plausible replacements for
missing data.
• Limitations: Generated data should be representative of the original dataset.
8. Indicator Variables:
• Method: Create binary indicator variables (0 or 1) to flag the presence or absence of
missing values.
• When to Use: This method helps preserve the information about the missingness pat-
tern.
• Rules: Consider creating indicator variables for specific columns with missing values.
• This is especially useful in predictive modeling.
• Limitations: It increases the dimensionality of the data.
In Python and data analysis, “group” and “bins” are terms used to catego-
rize and organize data, especially when dealing with numerical values.
1. Group and Bins:
• Grouping refers to the process of dividing a dataset into categories or groups based on cer-
tain criteria or attributes. These categories are created to facilitate the analysis of data, par-
ticularly when dealing with large datasets.
• Bins are specific intervals or ranges into which data is divided. Binning is a method of group-
ing continuous data into discrete categories or intervals. Each bin represents a subset of data
points falling within a particular range.
• For example, if you have a dataset of ages, you can group them into bins like “0-10,” “11-20,”
“21-30,” and so on. Each of these bins is a category that represents a range of ages.
2. Categorical Data:
• Categorical data is a type of data that represents categories or labels. It includes data that can
be divided into groups but does not have a natural order or ranking.
• Group
• Bins
• Pd.cut()
• Np.where
• Pd.qcut()
• The functions pd.cut() and pd.qcut() in Python are used for binning, which is a way to divide a con-
tinuous numerical variable into discrete intervals or bins. These functions are commonly used in
data analysis and machine learning for data preprocessing.
• Rules:
• Binning can be done based on fixed-width bins (specifying bin edges) or adaptive-width bins.
• You can label the resulting bins, and Pandas will create a new categorical column with the bin
labels.
• When to Use:
• Use pd.cut() when you want to divide a continuous variable into predefined or custom bins.
• It’s useful when you have prior knowledge of how the data should be divided, e.g., age groups
or income brackets.
• Rules:
• It is especially useful when you want to ensure that each bin has approximately the same
number of data points.
• When to Use:
• Use pd.qcut() when you want to distribute data evenly across bins.
• It’s suitable for when you want to handle data with a wide range or skewed distribution.
• Rules:
• The condition is a boolean array that has the same shape as the input arrays.
• It’s commonly used to replace values in an array based on a condition.
• When to Use:
• Use np.where() when you need to create a new array based on a condition without using ex-
plicit loops.
• It’s used for tasks like data cleaning or feature engineering.
• Rows: Place a field here to group your data along the rows of the pivot table. For example, if
you’re analyzing sales data, you can place the “Product” field here to see sales by product.
• Columns: Place a field here to create column headers that segment your data further. For in-
stance, you can put the “Year” field here to compare sales across different years.
• Values: Add a field to this section to calculate values such as sums, averages, counts, or other
summary statistics. You can, for example, place the “Sales” field here and choose to calculate the
sum of sales.
• Filters (Optional): If you want to apply filters to your data, you can place a field in this area to
limit the data shown in the pivot table.
Group By in Python
Introduction:
• Grouping is a fundamental operation in data analysis. In Python, the groupby operation is
commonly used with libraries like Pandas to group data based on one or more columns and
perform aggregate functions on the grouped data. This allows for data summarization, anal-
ysis, and visualization.
Rules:
• The column(s) used for grouping must be categorical or discrete data. Numeric data can be
grouped if it represents categories (e.g., integers representing categories).
• You can group by multiple columns to create a hierarchical structure for grouping.
• Aggregate functions should be chosen based on the type of data and the analysis goals.
Limitations:
• Grouping large datasets can consume significant memory and slow down processing.
• Grouping by multiple columns can lead to complex hierarchical structures, which may re-
quire careful handling.
• Some aggregate functions may not be applicable to all data types. For example, you can’t cal-
culate the median of non-numeric data.
Introduction:
• A pivot table is a powerful data analysis tool used to summarize and transform data in Py-
thon. It’s commonly associated with Pandas, a popular data manipulation library. Pivot tables
help reorganize and aggregate data for better analysis and visualization.
Rules:
• Index and columns should be categorical variables or discrete data. Numeric variables can be
used if they represent categories (e.g., integers representing categories).
• Aggregation functions should be chosen based on the type of data and analysis goals.
• You can create multi-level pivot tables by specifying multiple index and column variables.
• pd.crosstab() in Python:
What is pd.crosstab()?
• pd.crosstab() is a Pandas function for creating cross-tabulations. It is used to display
the frequency or count of observations that fall into various categories of two or more
categorical variables.
Use Cases:
• Exploring Relationships: You can use pd.crosstab() to explore the relationships
between different categorical variables in your dataset. For example, you can
examine the relationship between gender and product preference or the rela-
tionship between education level and employment status.
Syntax:
• index: The categorical variable to be used as the row index in the table.
• columns: The categorical variable(s) to be used as column headers.
• values: (optional) The variable to be aggregated within the cells.
• aggfunc: (optional) The aggregation function to be applied to values.
• rownames and colnames: (optional) Names for row and column indexes.
• margins: (optional) If True, adds a row and column for row and column mar-
gins.
• margins_name: (optional) Name for the row and column margins.
• dropna: (optional) If True, removes rows or columns containing only NaNs.
• normalize: (optional) If True, returns proportions instead of counts.
• View the Result: Examine the cross-tabulation result, which is a Pandas DataFrame.
Join
• “join” is an operation used to combine rows from two or more tables based on a related column
between them. Joins are fundamental for retrieving data from multiple tables in a relational data-
base and are a cornerstone of SQL query functionality. There are several types of joins, each serving
different purposes.
Types of Joins:
INNER JOIN:
• An inner join returns only the rows that have matching values in both tables.
• If there is no match for a row in one table, it will not appear in the result set.
• The result contains only the common data between the tables.
SELF JOIN:
• A self join is used to combine rows from a single table.
• It is particularly useful when dealing with hierarchical data or when you want to compare
rows within the same table.
Introduction:
• Merging data in Python involves combining multiple datasets or DataFrames based on
common columns or indices. It is a critical operation in data manipulation, particular-
ly when working with structured data using libraries like Pandas. This process helps in
bringing data from different sources together for analysis, reporting, and visualization.
How to Merge Data:
• To merge data in Python, you typically use the pd.merge() function in Pandas, which is
similar to SQL JOIN operations. Here’s an in-depth explanation:
OUTER JOIN:
• Retains all rows from both DataFrames and fills in missing values with NaN.
LEFT JOIN:
• Retains all rows from the left DataFrame and the matching rows from the right Data-
Frame. Fills in missing values with NaN.
RIGHT JOIN:
• Retains all rows from the right DataFrame and the matching rows from the left Data-
Frame. Fills in missing values with NaN.
Limitations:
• Merging large datasets can be memory-intensive, so it’s important to consider available
resources.
• Merging on non-unique or inconsistent keys can lead to unexpected results.
• Be cautious with column name conflicts when merging DataFrames.
Append
• The append() method in Python is a built-in method used to add an element or object to a list. It is
a common operation when working with lists, which are a fundamental data structure in Python.
Syntax:
Rules:
• Append() can only be used with lists. It cannot be used with other data types like tu-
ples or dictionaries.
• The element added with append() becomes the last element of the list.
Limitations:
• The append() method only adds elements to the end of the list. If you need to insert an
element at a specific position in the list, you should use the insert() method.
• append() is an in-place operation, so it modifies the original list. If you need to create
a new list with additional elements, you should use concatenation or list comprehen-
sion.
• In this example, the append() method is used to add the element 4 to the end of the
my_list list.
Introduction:
• Appending data in Python is the process of adding rows or records from one dataset
to another. It is a common operation when working with structured data using librar-
ies like Pandas. Appending is useful when you have multiple datasets with the same
structure, and you want to combine them vertically to create a larger dataset.
-
• objs: A list of DataFrames or Series to be concatenated.
• axis: Specifies the axis along which to concatenate (0 for rows, 1 for columns).
• join: Defines how to handle the overlapping indexes (e.g., ‘outer’ for union, ‘inner’ for inter-
section).
• ignore_index: If True, resets the index in the result.
• keys: Adds a hierarchical index to the result, creating a multi-level index.
• verify_integrity: Checks for duplicate index values and raises an exception if found.
• sort: Sorts the result by the values.
• Appending is often more efficient than merging or joining when the datasets have the same
structure.
Rules:
• The columns in the DataFrames to be appended should have the same names and data
types.
• The order of columns should match in the DataFrames.
• Make sure the index values are unique if you don’t want any issues with index duplication.
Limitations:
• Appending data can lead to duplicate index values if not handled properly.
• It may not be the best choice for combining datasets with different structures or when
you need to perform complex data integration operations.
Question
• column_name: The name of the column to which you want to apply the function.
• lambda x: A lambda function that defines the operation to be applied to each element.
Rules:
• The lambda function should take an element as input and return the transformed ele-
ment.
• Ensure that the lambda function logic is defined clearly and concisely.
Limitations:
• Using .apply() with complex or slow lambda functions can be inefficient, especially for
large DataFrames.
• It may not be suitable for operations that require multiple columns or complex logic.
.applymap() Function:
How to Use .applymap():
• The .applymap() function is used to apply a function element-wise to a DataFrame. Its
basic syntax is as follows:
Rules:
• The function passed to .applymap() should operate on a single element (scalar) and re-
turn a scalar value.
Limitations:
• .applymap() can be slower than vectorized operations when dealing with large Data-
Frames.
• It may not be suitable for complex operations involving multiple columns or rows.
Rules:
• The user-defined function should take an element as input and return the transformed ele-
ment.
• Ensure that the function is well-documented and handles different edge cases.
Limitations:
• .applymap() is slower compared to vectorized operations for large DataFrames.
• It may not be suitable for extremely complex operations or tasks that require interaction
between multiple columns.
Question
• In this code, we define a lambda function sum_series that takes a Pandas Series as its argument
and returns the sum of the elements using the .sum() method of the Series.
• When you call the lambda function sum_series(sample_series), it calculates the sum of the sam-
ple_series and returns the result.
• In Python, data type conversion is the process of changing the data type of a variable or a Pandas Se-
ries to another data type. This can be necessary for various reasons, such as ensuring compatibility,
performing operations, or data cleaning.
1. pd.to_datetime():
• How to Use pd.to_datetime():
• The pd.to_datetime() function in Pandas is used to convert a Series of strings or
numbers to datetime objects.
Rules:
• The arg parameter should be a Pandas Series or a list-like object containing date or
time information.
• If errors is set to ‘raise’, parsing errors will raise an exception. ‘coerce’ will convert
errors to NaT (Not-a-Time), and ‘ignore’ will ignore errors.
Limitations:
• pd.to_datetime() may not handle extremely unusual date formats.
• The format parameter can be challenging to define for custom date formats.
2. pd.to_numeric():
How to Use pd.to_numeric():
• The pd.to_numeric() function is used to convert a Series to numeric data types.
The basic syntax is as follows:
Limitations:
• pd.to_numeric() may not handle complex data types or mixed data types within a
single Series.
3. Series.astype():
• How to Use Series.astype():
• The astype() method is used to convert the data type of a Pandas Series to a speci-
fied data type. The basic syntax is as follows:
Rules:
• The dtype parameter should be a valid Pandas or NumPy data type.
• Data loss or unexpected results may occur if the data type conversion is not appro-
priate.
Limitations:
• Astype() does not perform type inference; you need to specify the desired data type
explicitly.
• It does not handle complex data transformations or conversions outside the scope
of supported data types.
• In Python, the datetime module provides functionality for working with dates and
times. The Pandas library, an essential tool for data manipulation, extends this func-
tionality with its Timestamp data type. To understand the methods available in Pan-
das for working with timestamps, you can explore the Timestamp object by printing
its attributes and methods using dir(pd.Timestamp).
Attributes:
• year: Returns the year of the timestamp.
• month: Returns the month of the timestamp (1-12).
• day: Returns the day of the timestamp (1-31).
• hour: Returns the hour of the timestamp (0-23).
• minute: Returns the minute of the timestamp (0-59).
• second: Returns the second of the timestamp (0-59).
• microsecond: Returns the microsecond of the timestamp (0-999999).
• nanosecond: Returns the nanosecond of the timestamp (0-999999999).
• day_of_week: Returns the day of the week as an integer (0=Monday, 6=Sunday).
• day_name(): Returns the name of the day of the week (e.g., “Monday”).
• month_name(): Returns the name of the month (e.g., “January”).
• quarter: Returns the quarter of the year (1-4).
Methods:
• strftime(): Format the timestamp as a string using format codes (e.g., %Y-%m-%d for “YYYY-MM-
DD”).
• date(): Extract the date part (year, month, and day) of the timestamp.
• time(): Extract the time part (hour, minute, second, microsecond) of the timestamp.
Arithmetic Operations:
• You can perform various arithmetic operations with timestamps, such as addition, subtraction, and
comparison. For example, you can calculate the time difference between two timestamps or add a
certain number of days to a timestamp.
date time.time():
• Returns the time part (hour, minute, second, microsecond) of a datetime object.
date time.replace():
• Creates a new datetime object with specified parts (year, month, day, etc.) replaced.
date time.year:
• Returns the year of the datetime object.
date time.month:
• Returns the month of the datetime object (1-12).
date time.day:
• Returns the day of the datetime object (1-31).
date time.hour:
• Returns the hour of the datetime object (0-23).
date time.minute:
• Returns the minute of the datetime object (0-59).
date time.second:
• Returns the second of the datetime object (0-59).
date time.now(tz):
• Returns the current datetime object, optionally in the specified time zone (tz).
date time.min:
• The smallest representable datetime object.
date time.max:
• The largest representable datetime object.
date time.resolution:
• The smallest possible difference between two datetime objects.
• When you print dir(datetime.datetime), you’re inspecting the attributes and methods avail-
able for the datetime class within the datetime module in Python. This class is used to repre-
sent and manipulate date and time objects.
Properties:
• year: Represents the year of a datetime object.
• Properties’ Use: You can access and modify the year part of a datetime object using this property.
• month: Represents the month of a datetime object (1-12).
• Properties’ Use: Use it to access and modify the month part of a datetime object.
• day: Represents the day of the datetime object (1-31).
• Properties’ Use: Access and modify the day part of a datetime object with this property.
• hour: Represents the hour of the datetime object (0-23).
• Properties’ Use: Access and modify the hour part of a datetime object using this property.
• minute: Represents the minute of the datetime object (0-59).
• Properties’ Use: Access and modify the minute part of a datetime object.
• second: Represents the second of the datetime object (0-59).
• Properties’ Use: Use this property to access and modify the second part of a datetime object.
• microsecond: Represents the microsecond of the datetime object (0-999999).
• Properties’ Use: Access and modify the microsecond part of a datetime object.
• tzinfo: Represents the time zone information for the datetime object.
• Properties’ Use: You can access and modify the time zone information associated with a datetime
object.
Methods:
• replace(): Creates a new datetime object with specified parts replaced.
• Methods’ Use: You can change specific attributes (year, month, day, etc.) while keeping the rest the
same.
• strftime(): Formats the datetime object as a string using format codes.
• Methods’ Use: Use this to represent the datetime as a string with a custom format.
• timestamp(): Converts the datetime object to a Unix timestamp (seconds since January 1, 1970).
• Methods’ Use: Convert a datetime object to a timestamp for various calculations.
• date(): Extracts the date part (year, month, day) of the datetime object.
• Methods’ Use: Obtain the date component from a datetime object.
• time(): Extracts the time part (hour, minute, second, microsecond) of the datetime object.
• Methods’ Use: Retrieve the time component from a datetime object.
Rules:
• Properties allow you to access and modify specific attributes of a datetime object.
• Methods provide functionality for creating new datetime objects, formatting, and extracting date
and time components.
Limitations:
• Datetime objects are limited to the range of dates that can be represented in the underlying system
(e.g., it might not represent dates before 1970 or after 2038 on some systems).
• Time zone handling can be complex, and the tzinfo property might not be available for all datetime
objects.
Questions: