Applied Data Science Module
Applied Data Science Module
world problems through data analysis. Our hands-on approach ensures the skills students acquire
translate seamlessly into the workplace
Across both units in the Module, students gain a comprehensive introduction to scientific
computing, Python, and the related tools data scientists use to succeed in their work. Students
will develop machine learning and statistical analysis skills through hands-on practice with open-
ended investigations of real-world data.
All students receive complimentary access to a ready-to-use Python environment for the entire
Module. This allows students to gain first-hand experience with Python, pandas, and Jupyter
Notebooks, and allows for immediate immersion into novel data science problems.
Learn from the Best
The Applied Data Science Module is built by WorldQuant University’s partner, The Data
Incubator, a fellowship program that trains data scientists. Graduates earn a Credly badge upon
completion of each unit to share and celebrate their professional development.
Next Deadline
September 5, 2021
Cost
Free
Length
8 or 16 weeks
Applicant Requirements
Experience with algebraic concepts and basic Python
Commitment
10-12 Hours a week
Award
Credly Badge
The Applied Data Science Module is delivered online to enable students to participate in a flexible yet
rigorous continuing education program to amplify their skills and knowledge. To apply, applicants fill out
a profile on their educational history and technical skillset, which takes about 20 minutes to complete.
The Module
Across two units and sixteen weeks, students learn to source data relevant to a business problem
or task, to summarize data in aggregate statistics and visualizations, and to model trends to
showcase insights and make practical business decisions.
Students who successfully complete Unit I are eligible to enroll in Unit II. Students who
complete either Unit earn a badge from Credly, the recognized leader in skills credentialing.
Skills Used:
Data Wrangling
Basic + Advanced Data Analysis
Python Basic Syntax + Data Structures
Object Oriented Programming
CSV
For Loops
While Loops
JSON
NumPy
Pandas
SQL
Unit I Projects
Project 1
In this project students use Python to compute Mersenne numbers, using the Lucas-Lehmer test
to identify Mersenne numbers that are prime. They use Python data structures and core
programming principles such as loops to implement their solution. In addition, students learn to
implement the Sieve of Eratosthenes as a faster solution for checking if numbers are prime,
learning about the importance of algorithm time complexity.
Project 2
In this project students use Object Oriented Programming to create a class that represents a
geometric point. They define methods that describe common operations with points such as
adding two points together and finding the distance between two points. Finally, they write a K-
means clustering algorithm that uses the previous defined point class.
Project 3
In this project students use basic Python data structures, functions, and control program flow to
answer posed questions over medical data from the British NHS on prescription drugs. They also
work with fundamental data wrangling techniques such as joining data sets together, splitting
data into groups, and aggregating data into summary statistics.
Project 4
In this project students use the Python package pandas to perform data analysis on a prescription
drug data set from the British NHS. They answer questions such as identifying what medical
practices prescribe opioids at an usually high rate and what practices are prescribing substantially
more rare drugs compared to the rest of the medical practices. They also use statistical concepts
like z-score to help identify the aforementioned practices.
Unit II Projects
Project 1
In this project students work with nursing home inspection data from the United States,
predicting which providers may be fined and for how much. They use the scikit-learn Python
package to construct progressively more complicated machine learning models. They also
impute missing values, apply feature engineering, and encode categorical data.
Project 2
In this project students use natural language processing to train various machine learning models
to predict an Amazon review rating based on the text of the review. Further, they use one of the
trained models to gain insight on the reviews, identifying words that are highly polar. With these
highly polar words identified, one can understand what words highly influence the model’s
prediction.