Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Python For Data Science

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Python for Data science

SUBJECT : Python for Data science (102045603)


By Professor : Mr. Aryan Ketankumar Kothambia
G H Patel College of Engineering & Technology
D: 9/07/2024
Content
● What is data science? & Why do we need it?
● DS vs AI vs ML vs DL
● Data Science Forms
● Live example
● Programming languages in data science
● Python
● Running Python projects
● Python IDEs
● Python libraries for data science
What is data science? & Why do we need it?
What is Data Science?

Data science is an interdisciplinary field that combines:

1. Statistics
2. Scientific Methods
3. Artificial Intelligence (AI)
4. Data Analysis

Why Do We Need Data Science?

The goal of data science is to extract meaning and insights from data, whether it is structured or unstructured.

Structured vs. Unstructured Data

● Structured Data: Organized and easily searchable data, often in rows and columns (e.g., databases).
● Unstructured Data: Unorganized data that doesn't fit into traditional data models (e.g., text, images, videos).
What is data science? & Why do we need it?
DS vs AI vs ML vs DL
Data Science Forms

1. Input (collect data)


2. Analyse
3. Output (Useful Insights i.e. Predictions)

This slide illustrates a basic flowchart of the data science process, starting from data collection,
followed by analysis, and resulting in useful insights or predictions.
Data Science Examples
● Google Translate
● YouTube Copyright Tool
● Grammarly
● Voice Assistants like Alexa, Google Home
● Self-Driving Cars
● YouTube Suggested Videos
● Suggested Movies by Netflix, Amazon Prime, etc.
● Object Detection
● Smartwatch
● Facebook Image Tagging

The slide also includes logos of Netflix, Amazon Prime Video, and YouTube.
Programming Languages for Data Science

Python

● Open Source
● Interpreted
● Object-oriented

● Statistical computing and graphics


● Open Source
Python
Loops: For and while loops to iterate over data.

Decision Making Statements: if, elif, and else statements to make


decisions based on conditions.

OOPS (Object-Oriented Programming System) Concepts:


Including inheritance, abstraction, encapsulation, and polymorphism.

Strings: Managing and manipulating text data.

Tuples: Immutable sequences of elements.

Lists: Mutable sequences of elements.

Dictionary: Key-value pairs for storing data.

Functions: Blocks of reusable code defined using the def keyword.


Running Python projects

Visual Studio Code (Code Editor)

Jupyter Notebook (Web Application)

PyCharm (IDE)

Spyder (IDE)

IDLE (Integrated Development and Learning Environment)

Replit (Online IDE)

Google Colab (Cloud-based Jupyter Notebooks)


Introduction to Various Python IDEs

1. IDLE (Integrated Development and Learning Environment)

● Overview: IDLE is a simple and lightweight IDE that comes bundled with Python. It is designed to
be easy to use, making it ideal for beginners.
● Key Features:
○ Python Shell: Interactive interpreter for quick testing of code snippets.
○ Code Editor: Basic editor with syntax highlighting, auto-completion, and indentation.
○ Debugger: Integrated debugger with stepping and breakpoints.
○ Cross-Platform: Available on Windows, macOS, and Linux.
● Best For: Beginners learning Python and simple script development.
Introduction to Various Python IDEs

2. Jupyter Notebook

● Overview: Jupyter Notebook is a web-based interactive environment that allows you to create and
share documents containing live code, equations, visualizations, and narrative text.
● Key Features:
○ Interactive Code Execution: Write and execute code in cells, making it easy to test and
debug.
○ Rich Media Support: Embed visualizations, images, videos, and LaTeX equations.
○ Integration with Data Science Libraries: Pre-installed libraries for data analysis and
machine learning, like Pandas, NumPy, and Matplotlib.
○ Collaborative: Share notebooks and collaborate with others via platforms like GitHub.
● Best For: Data science, machine learning, and academic research.
Introduction to Various Python IDEs

3. PyCharm

● Overview: PyCharm is a powerful IDE developed by JetBrains specifically for Python development.
It comes in two editions: the free Community edition and the paid Professional edition.
● Key Features:
○ Intelligent Code Editor: Advanced code completion, refactoring, and error detection.
○ Integrated Tools: Built-in support for version control, database tools, and testing frameworks.
○ Debugging and Profiling: Advanced debugger and profiler to optimize performance.
○ Web Development: Support for web frameworks like Django and Flask.
● Best For: Professional software development and complex projects.
Introduction to Various Python IDEs

Spyder (Scientific Python Development Environment)

● Overview: Spyder is an open-source IDE tailored for scientific programming and data analysis
with Python. It integrates well with popular scientific libraries.
● Key Features:
○ Integrated IPython Console: Enhanced interactive Python shell.
○ Variable Explorer: Inspect variables, data frames, and arrays in a user-friendly manner.
○ Code Analysis: Real-time code analysis and linting.
○ Visualization Tools: Seamless integration with Matplotlib for inline plotting.
● Best For: Scientific computing, data analysis, and engineering.
Python libraries for data science

NumPy - Used for handling numerical data.

Pandas - Used for managing and manipulating tabular data.

Matplotlib - Used for creating visualizations.

Seaborn - Used for creating visualizations, typically offering a higher-level interface than Matplotlib.
Python libraries for data science
TensorFlow - A library for machine learning and deep learning.

Keras - An API for building and training neural networks, often used with TensorFlow.

NumPy - Handles numerical data and array operations.

SciPy - Provides functions for scientific and technical computing.

Scikit-Learn - A machine learning library for data mining and data analysis.

PyTorch - A deep learning framework.

Pandas - Manages and manipulates tabular data.

Scrapy - A web scraping framework.

Matplotlib - Creates visualizations.

BeautifulSoup - Parses HTML and XML documents, used for web scraping.
Congratulations

You now know basic concept about python for data science!!

You might also like