Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
5 views

7 Practicals With Python Practice With Data Science Cookbook

The document outlines a practical work guide for a Python Data Science Cookbook, emphasizing the importance of understanding data science algorithms and programming them effectively. It covers various aspects of data science, including Python basics, data analysis, data mining, and machine learning, with structured recipes for students to practice. Each section provides detailed topics and objectives, focusing on hands-on experience with Python libraries like NumPy and scikit-learn.

Uploaded by

bikaydaniel33
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

7 Practicals With Python Practice With Data Science Cookbook

The document outlines a practical work guide for a Python Data Science Cookbook, emphasizing the importance of understanding data science algorithms and programming them effectively. It covers various aspects of data science, including Python basics, data analysis, data mining, and machine learning, with structured recipes for students to practice. Each section provides detailed topics and objectives, focusing on hands-on experience with Python libraries like NumPy and scikit-learn.

Uploaded by

bikaydaniel33
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Practical work in the Python Data science Cookbook

1. Introduction
Data science is a field that is at the intersection of many fields, including data mining, machine learning, and
statistics, to name a few. Data science has penetrated deeply in our connected world and there is a
growing demand in the market for people who not only understand data science algorithms thoroughly, but
are also capable of programming these algorithms. Treating these algorithms as a black box and using them
in decision-making systems will lead to counterproductive results. With tons of algorithms and innumerable
problems out there, it requires a good grasp of the underlying algorithms in order to choose the best one for
any given problem. Python as a programming language has evolved over the years and today, it is the number
one choice for a data scientist.

• Its ability to act as a scripting language for quick prototype building and
• its sophisticated language constructs for full-fledged software development combined with its fantastic
library support for numeric computations has led to its current popularity among data scientists and
the general scientific programming community.
• Not just that, Python is also popular among web developers; thanks to frameworks such as Django
and Flask.

Carefully crafted recipes, which touch upon the different aspects of data science, including data exploration,
data analysis and mining, machine learning, and large scale machine learning.
The first chapter introduces the Python data structures and function programming concepts. The early chapters
cover the basics of data science and the later chapters are dedicated to advanced data science algorithms.

State-of-the-art algorithms that are currently used in practice by leading data scientists across industries
including the ensemble methods, random forest, regression with regularization, and others are covered in
detail. Some of the algorithms that are popular in academia and still not widely introduced to the mainstream
such as rotational forest are covered in detail.

Covering the right mix of math philosophy behind the data science algorithms and implementation
details. With each recipe, just enough math introductions are provided to contemplate how the algorithm
works; (take full benefits of these methods in applications)

2. Content of the practical work


Part 1 Python Basics and Environment for data science
• introduces Python’s built-in data structures and functions, which are very handy for data science
programming. introduces Python’s scientific programming and plotting libraries, including NumPy,
matplotlib, and scikit-learn.

Part 2 Data Analysis : exploration and more (wrangle and deep dive)
• covers data preprocessing and transformation routines to perform exploratory data analysis tasks
in order to efficiently build data science algorithms. introduces the concept of dimensionality
reduction (from simple methods to the advanced state-of-the-art techniques) in order to tackle the
curse of dimensionality issues in data science..

1
Part 3 Data Mining : Needle in a haystack Name
• discusses unsupervised data mining techniques, starting with elaborate discussions on distance
methods and kernel methods and following it up with clustering and outlier detection techniques.

Part 4 Machine learning : supervised, regression, ensemble, tree-based and perceptron/stochastic gradient
descent
• covers supervised data mining techniques, including nearest neighbors, Naïve Bayes, and
classification trees. In the beginning, we will lay a heavy emphasis on data preparation for
supervised learning.
• introduces regression problems and follows it up with topics on regularization including LASSO
and ridge. Finally, we will discuss crossvalidation techniques as a way to choose hyperparameters
for these methods.
• introduces various ensemble techniques including bagging, boosting, and gradient boosting This
chapter shows you how to make a powerful state-ofthe-art method in data science where, instead of
building a single model for a given problem, an ensemble or a bag of models are built.
• introduces some more bagging methods based on tree-based algorithms. Due to their robustness to
noise and universal applicability to a variety of problems, they are very popular among the data science
community.
• Online Learning, covers large scale machine learning and algorithms suited to tackle such large
scale problems. This includes algorithms that work with streaming data and data that cannot be fitted
into memory completely (perception and stochastic gradient descent). Several types of linear
algorithms, including logistic regression, linear regression, and linear SVM, can be accommodated
using this framework

3. Structure of the 5 recipes awaited from students


a. Description of the topic/recipes Using NumPy libraries
NumPy provides an efficient way of handling very large arrays in Python. Most of the Python scientific
libraries use NumPy internally for the array and matrix operations. In this book, we will be using NumPy
extensively. We will introduce NumPy in this recipe.

b. Objective to achieve in the recipes Getting ready


We will write a series of Python statements manipulating arrays and matrices, and learn how to use NumPy
on the way. Our intent is to get you used to working with NumPy arrays, as NumPy will serve as the basis
for most of the recipes in this book.

c. How you did it


Let’s start by creating some simple matrices and arrays:

d. How it works
Let’s start by including the NumPy library:

e. Additional information
You can refer to the following link for some excellent NumPy documentation:
http://www.numpy.org

2
f) Connections to other recipes : See also
Plotting with matplotlib recipe in Chapter 3, Analyzing Data - Explore & Wrangle
Machine Learning with Scikit Learn recipe in Chapter 3, Analyzing Data - Explore &Wrangle

4. Themes of the 5 recipes/topics awaited from students


Each student of a Practical Work group will practice only one recipe/topic in each of the following themes
according to his number on the student list.

1. Recipes for Python basics:

1. Using dictionary objects


2. Working with a dictionary of dictionaries
3. Working with tuples
4. Using sets
5. Writing a list
6. Creating a list from another list - list comprehension
7. Using iterators
8. Generating an iterator and a generator
9. Using iterables
10. Passing a function as a variable
11. Embedding functions in another function
12. Passing a function as a parameter
13. Returning a function
14. Altering the function behavior with decorators
15. Creating anonymous functions with lambda
16. Using the map function
17. Working with filters
18. Using zip and izip
19. Processing arrays from the tabular data
20. Preprocessing the columns
21. Sorting lists
22. Sorting with a key
23. Working with itertools

2. Recipes for Python Environment:

1. Using NumPy libraries


2. Plotting with matplotlib
3. Machine learning with scikit-learn

3. Recipes for Data analysis


1.Analyzing univariate data graphically
2. Grouping the data and using dot plots
3. Using scatter plots for multivariate data
4. Using heat maps
5. Performing summary statistics and plots
6. Using a box-and-whisker plot

3
7. Imputing the data
8. Performing random sampling
9.Scaling the data
10. Standardizing the data
11. Performing tokenization
12. Removing stop words
13. Stemming the words
14. Performing word lemmatization
15. Representing the text as a bag of words
16. Calculating term frequencies and inverse document frequencies

17. Extracting the principal components


18. Using Kernel PCA
19. Extracting features using Singular Value Decomposition
20. Reducing the data dimension with Random Projection
21. Decomposing Feature matrices using NMF (Non-negative Matrix Factorization)

4. Topics for data mining:


1. Working with distance measures
2. Learning and using kernel methods
3. Clustering data using the k-means method
4. Learning vector quantization
5. Finding outliers in univariate data
6. Discovering outliers using the local outlier factor method

5. Topics for machine learning

1. Preparing data for model building


2. Finding the nearest neighbors
3. Classifying documents using Naïve Bayes
4. Building decision trees to solve multiclass problems

5. Predicting real-valued numbers using regression


6. Learning regression with L2 shrinkage – ridge
7. Learning regression with L1 shrinkage – LASSO
8. Using cross-validation iterators with L1 and L2 shrinkage

9. the bagging Method


10. the boosting Method, AdaBoost
11the gradient Boosting

12. Going from trees to forest – Random Forest


13. Growing extremely randomized Trees
14. Growing a rotation forest

15. Using perceptron as an online linear algorithm


16. Using stochastic gradient descent for regression
17. Using stochastic gradient descent for classification

You might also like