Programming For Data Analytics Introduction
Programming For Data Analytics Introduction
Analytics
Python for Data Spring 2019
Week 1 ~ An
Science Introduction
Objectives for Week 1
Summer 2019
Comparing GitHub stars and contributors for different open source tools
In November 2016, scikit-learn became a No. 1 open source
machine learning project for Python, according to
KDNuggets.
scikit-learn is a high level
library designed for
supervised and
unsupervised machine
learning algorithms.
• Use python hello.py in cmd mode or import hello from python prompt to
run the Python script
• Use help() whenever needed
• Use quit() to quit
Python script (.py), IPython, !type, %run
• Use your notepad (or IDLE) to save this
import sys
print('version is', sys.version)
into a file sys-version.py
• This is a Python script
• Editable with a text editor such as Notepad
• Open your IPython mode (>ipython from cmd)
• Use this !type sys-version.py to stream the file’s content
• Run the script with %run sys-version.py
• quit() to quit
• Again, outside of python prompt, you just use python to
run Python scripts
Testing Jupyter Notebook
• The Jupyter Notebook is an interactive environment for
running code in the browser.
• It is a great tool for exploratory data analysis and is
widely used by data scientists.
• A foundational tool for learning, research, computing,
and data-powered communications.
• The primary tool for INFS 772 ~ Python Programming for
Data Analytics.
• Download the introduction_to_notebook file from D2L
course site under Python Files.
• We are going to test the Jupyter Notebook file.
• Do not open Jupyter Notebook files with Notepad! Make
sure your jupyter notebook kernel is running and then
open your notebook files from the browser.
Work with a Notebook (browser-based)
• Start a Notebook
• Open terminal/command prompt jupyter notebook
• Or you can open it via the menu of Anaconda programs
• Notebook will open at http://127.0.0.1:8888
• Exit by closing the browser, then typing Ctrl+C in the terminal window
• 1. Quantitative,
• 2. Utilizing open-source/high level libraries, APIs, and
pre-trained models,
• 3. Communicating verbally and visually,
• 4. Learning the latest advances.
VGG-16: a 16-hidden-layer deep neural network
Your job as a data scientist:
1. Understand the problem and your
data (business processes)
2. Understand the nature of the model
you use (how it works)
3. Utilize pre-trained model and APIs
4. Fine-tune the learning process with
a state-of-the-art optimizer
5. Evaluate the model
6. Test the model with your new data
7. Predict! Or deploy the model
8. Communicate the results!
Instructions on Assignment 1
1. Download the data files from the UCI ML data depository site before you
can use them in your python program.
Make sure they are in your working directory (for me, it
is C:\Users\dzeng\772 Programming for Data Analytics)
And you may want to use
import os
os.getcwd()
to display it for you if you are not sure. This is where your jupyter notebook
file is.
They are “text” files but make sure they have the correct extensions (.data
and .attributes) when you save them. More specifically, change the .names
file into .attributes when you save it in your machine:
https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/
2. Download and install a separate module call “simple_ml”.
It is on D2L under Course Files, you need to download it into either (for
me)
C:\Users\dzeng\AppData\Local\Programs\Python\Python36\Lib
If you installed Python separately, or
C:\Users\dzeng\AppData\Local\Continuum\Anaconda3\Lib\site-
packages\
If you use Anaconda3
3. The codes in the cells are highly connected. You (almost) have to run
them one by one from the beginning in order to run the cells related to
assignment 1 later in the file.
4. Remove the two exception-handling examples in the file
Outline for Week 3
• Review: basic data structures: list, tuple, dict
• for loop
• Functions (lambda function/operator)
• Classes
• Assignment 1
• Defining and calling functions
• Starting point:
single_instance_list : a list of attribute values of a single instance
attribute_names : a list of attribute names
File I/O
Week Date Topics Reading/Working Document Assignments
1 1/10 Introduction, Python/Jupyter Notebook Setup PyData: Chapter 2
System Testing Introduction to Python and Jupyter Notebook
2 1/17 Data Structures, Functions, and Files PyData: Chapter 3 Assignment 1 out
3 1/24 NumPy Basics PyData: Chapter 4
4 1/31 Getting Started with pandas PyData: Chapter 5 Assignment 2 out
10 minutes to pandas
5 2/7 Input/Output Tools PyData: Chapter 6 Assignment 1 due
6 2/14 Data Cleaning and Preparation PyData: Chapter 7
7 2/21 Data Wrangling: Join, Combine, and Reshape PyData: Chapter 8
8 2/28 Plotting and Visualization PyData: Chapter 9 Assignment 2 due
Assignment 3 out
9 3/7 Spring Break
10 3/14 Introduction to scikit-learn I: A First Application: MLPy: Chapter 1
Classifying Iris Species
11 3/21 Introduction to scikit-learn II: k-Nearest Neighbors MLPy: Chapter 2: p37 - 42
12 3/28 Introduction to scikit-learn III: Linear Models for MLPy: Chapter 2: p58 - 69 Assignment 3 due
Classification Assignment 4 out
Project out
13 4/4 Introduction to scikit-learn IV: Neural Networks MLPy: Chapter 2: p106 - 119
14 4/11 Introduction to keras I Introducing Keras: deep learning with Python
DLPy: Chapter 1
15 4/18 Introduction to keras II Classifying movie reviews: a binary classification Assignment 4 due
example
DLPy: Chapter 3
16 4/25 Review/Future Trends of Machine/Deep Project due
Learning/AI
Review
List
Lists are very similar to strings, except that each element can be of any type.
The syntax for creating lists in Python is [...]
Dictionaries
Dictionaries are also like lists, except that each element is a key-value pair. The syntax for
dictionaries is {key1 : value1, ...}
Classes
• Classes provide a means of bundling data and functionality together.
• Creating a new class creates a new type of object, allowing new
instances of that type to be made.
• Each class instance can have attributes attached to it for maintaining
its state.
• Class instances can also have methods (defined by its class) for
modifying its state.