CME538 Lecture 1 Slide 1
CME538 Lecture 1 Slide 1
CME538 Lecture 1 Slide 1
Normal
arrhythmias.
▪ Each year at SickKids 700
critically ill children suffer
arrhythmias.
Not Normal
▪ In 2020, arrhythmia was
directly implicated as the
cause of morbidity and
mortality in 114 children.
▪ Time to diagnosis is the
most important factor in
determining patient
outcomes.
The Critical Care Unit
Reality 19 Rooms
42 Beds
2 Staff Physician
Working
AI The Critical Care Unit
19 Rooms
42 Beds
Mjaye Trains Expert AI
system for Arrhythmia AI
Detection & Diagnosis AI AI AI AI AI AI
AI AI
AI AI AI
AI AI AI
AI AI
AI
AI AI AI
AI AI AI AI
AI AI
AI AI AI
AI AI
AI AI
AI AI AI AI
AI
How is this task done today?
Electrophysiologist
Staff Physician
Fellow
Nurse
Nurse
JET 30 minutes later 1 hour later 12 hours later 1 hour later Treatment
Onset begins
How is this task done with AI? AI Diagnosis
JET 98%
Staff Physician
AI
Detects
JET
Nurse
Pages
Mjaye
Confirm 12-Lead
AI
What is the impact?
Onset Hours
Treatment
understand solve
the world problems
(Science) (Engineering)
Machine
Learning
Drew Conway’s Venn Diagram of Data Science.
▪ Climate Scientist.
Machine
▪ Investor. Learning
▪ Medical Doctor.
Drew Conway’s Venn Diagram of Data Science.
results in misuse.
Drew Conway’s Venn Diagram of Data Science.
X → f → y
▪ Blood Pressure ▪ Risk of Sepsis
▪ Heart Rate
▪ Age 98% Accuracy
▪ Diagnosis
▪ Lab Tests (white
blood cell count)
Skills of Data Science
▪ Machine Learning. Missing knowledge of the data.
▪ Danger Zone
X → f → y
▪ Blood Pressure ▪ Risk of Sepsis
▪ Heart Rate
▪ Age 98% Accuracy
▪ Diagnosis
▪ Lab Tests (white
blood cell count)
Skills of Data Science
▪ Machine Learning. Missing knowledge of how the
▪ Danger Zone model would be implemented.
X → f → y
▪ Blood Pressure ▪ Risk of Sepsis
▪ Heart Rate
▪ Age 98% Accuracy
▪ Diagnosis
▪ Lab Tests (white
blood cell count)
Drew Conway’s Venn Diagram of Data Science.
Buzz-Word Overload
Domain
▪ Data Science Expertise
▪ An inter-disciplinary field
that uses scientific
methods, processes, Danger
Zone!
Data
Analyst
algorithms and systems to Data
extract knowledge and Science
insights from data.
Machine
Learning
Buzz-Word Overload
▪ Data Analytics
▪ Data Analytics is often
conducted with a specific goal
in mind.
▪ With Data Analytics,
information is often split into
two groups: what companies
know and what they are
aware that they do not know.
▪ Employing Data Analytics, a
company can sort through
data to find specific insights
targeted to its needs and
goals.
Image Link
Buzz-Word Overload
▪ Big Data
▪ Big data is a term that describes
the large volume of data that
inundates a business on a day-
to-day basis.
▪ Four V’s of Big Data:
▪ Volume
▪ Velocity
▪ Variety
▪ Veracity
Image Link
Buzz-Word Overload
▪ Data Engineering
▪ The development,
construction, testing
and maintenance of
architectures, such as
databases and large-
scale processing
systems.
▪ Data Engineers
transform data into a
useful format for
analysis.
Image Link
Buzz-Word Overload
▪ Artificial Intelligence (AI)
▪ Any technique that enables
computers to execute tasks in an
intelligent manner:
▪ Robotics
▪ Expert Systems
▪ Natural Language Processing
▪ Machine Learning
“Just as electricity transformed almost everything 100 years ago, today
I actually have a hard time thinking of an industry that I don’t think AI
will transform in the next several years.”
- Andrew Ng
Buzz-Word Overload
▪ Machine Learning (ML)
▪ The science of getting computers to
act intelligently by learning from
examples and without being explicitly
programmed.
▪ Applications:
▪ Fraud Detection
▪ Spam Filtering
▪ Netflix, Amazon, recommendations
▪ Models:
▪ Linear Regression
▪ Decision Trees
▪ Random Forests
▪ Neural Networks
Buzz-Word Overload
▪ Deep Learning (DL)
▪ A sub-domain of machine
learning that uses deep neural
networks.
▪ Applications:
▪ Speech Recognition (Siri, Cortana,
Alexa)
▪ Natural Language Processing
(Google Translate)
▪ Face Recognition (iPhone X)
What does it mean to be a
Data Scientist today?
What does it mean to be a Data Scientist
▪ Asked people
involved with Data
Science to complete
an online survey.
▪ Self-reported →
selection bias.
▪ 983 Respondents. ▪ 19,717 Respondents.
▪ 2016. ▪ 2019 (more recent).
▪ Survey Bias: More ML focused. ▪ Survey Bias: More ML focused.
▪ Charts focus on 21% with Data
Scientist title.
Country
▪ The largest number of
responses to the
survey were from the
United States and
India. Brazil and
Russia were the next-
most common
locations. Countries
not shown (such as
many in Central
Africa) had no
responses. Remember, self-ported → selection bias.
Country
▪ California has the
highest median salary
of any state or
country, even though
its per capita GDP
($62K) is not ranked
so high.
▪ The anomaly is likely
due to the San
Francisco Bay Area,
where per capita GDP
is $80K–$90K. Remember, self-ported → selection bias.
Education
▪ The data scientist
community is highly
educated.
▪ Looking at only
employed data
scientists, over 70% of
respondents have a
degree above a
bachelor’s degree,
with a majority
(~52%) having a
master’s degree. Remember, self-ported → selection bias.
Age
▪ Millennials dominate
data science, with 25-
29-year old's being
the most common age
Millennials
bracket.
Salary
▪ United States Data
Scientists average
higher wages than
others surveyed,
followed by Germany
and Japan.
2012: Udacity
2012: Udemy
2013: Year of the anti-MOOC
2015+: MOOCs evolution
Image Link
Applications in Civil & Mineral Engineering
▪ Mineral Engineering:
▪ Mineral exploration.
▪ Generative mine design.
▪ Equipment health monitoring.
▪ Mine site safety.
▪ Autonomous vehicles.
▪ Deep mine hazard assessment.
▪ Assisted core logging.
Prospectivity
Map
Applications in Civil & Mineral Engineering
Applications in Civil & Mineral Engineering
Oil Pressure
▪ Mineral Engineering: Operator
& Temp
RPM
Applications in Civ & Min
▪ Mineral Engineering:
▪ Mineral exploration.
▪ Generative mine design.
▪ Equipment health monitoring.
▪ Mine site safety.
▪ Autonomous vehicles.
▪ Deep mine hazard assessment.
▪ Assisted core logging.
"A single missing
tooth can result in
productivity losses of
USD $430k per
incident"
Applications in Civ & Min
▪ Mineral Engineering:
▪ Mineral exploration.
▪ Generative mine design.
▪ Equipment health monitoring.
▪ Mine site safety.
▪ Autonomous vehicles.
▪ Deep mine hazard assessment.
▪ Assisted core logging.
Applications in Civ & Min
▪ Mineral Engineering:
▪ Mineral exploration.
▪ Generative mine design.
▪ Equipment health monitoring.
▪ Mine site safety.
▪ Autonomous vehicles.
▪ Deep mine hazard assessment.
▪ Assisted core logging.
Applications in Civ & Min
Applications in Civ & Min
Applications in Civ & Min
Applications in Civ & Min
Applications in Civ & Min
Applications in Civ & Min
Applications in Civ & Min
Applications in Civ & Min
ML Applications
Applications in Civ & Min
Applications in Civ & Min
Opportunities in Civil & Mineral Engineering
▪ Domain knowledge is essential for successful AI.
# Import 3rd party dependencies
from keras import applications
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
# Load dataset
x_train, y_train, x_test, y_test, classes = extract_transform_load_dataset()
# Add softmax layer corresponding to the number of class labels in our dataset
out = Dense(num_classes, activation='softmax')(out)
# Define model
model = Model(inputs=base_model.input, outputs=out)
# Train model
model.fit(x_train, x_train, epochs=100, batch_size=64)
# Evaluate model
predictions = model.evaluate(x_test, y_test)
# Save model
model.save_weights('model.h5’ )
Opportunities in CivMin
▪ Meetups
▪ Hackathons
▪ Startup Incubators
▪ Competitions #BIGMONEY
Opportunities in Civil & Mineral Engineering
▪ You’re early.
You’re Here
Data Science = Essential CivMin Skill
▪ You will need to be able to access, clean, explore, and interpret
data.
▪ Even if you’re not going to be the one building ML models, you
NEED to be able to speak the language.
- lucky you! ;)
CME538, about this course.
Teaching Team
▪ Sebastian Goodfellow (sebastian.goodfellow@utoronto.ca)
▪ Navid Kayhani (navid.kayhani@mail.utoronto.ca)
▪ Marc Saleh (marc.saleh@mail.utoronto.ca)
▪ Soroush Sobhkhiz (s.sobhkhiz@mail.utoronto.ca)
Live Lectures
Mondays
12-3 pm
Content Release
Mondays 1 am
Lecture Demo (link)
Tutorials
Tutorials
Content Release
Mondays 1 am
Tutorial Demo (link)
Assignments (35%)
Code Quality
Assignment Demo (link)
Project 1 (25%)
▪ In this project, you'll
compete against your
classmates in a private
CME538 Kaggle machine
learning competition.
▪ We will be releasing more
information on this project
over the coming weeks.
▪ The competition will start
on November 14th and
close on November 22nd.
▪ All code will be checked for
plagiarism.
Project 1 (25%)
Assignment 8
(Optional)
Project 1
▪ Option1
▪ Project 1 is worth 25%.
▪ Option 2
▪ Project 1 is worth 20%.
▪ Assignment 8 is worth 5%.
Project 2 (40%)
▪ In this project, you'll take everything you've learned in lectures,
tutorials, assignments and project 1, and venture out on your own.
▪ Teams of 4 (Teams must be assembled by Friday, October 7th).
▪ You have a lot of freedom with how you choose to approach the
project and with what libraries, visualizations, and models you
choose to use.
▪ Start Date: Now.
▪ Due Date: December 7th.
Project 2 (40%)
▪ Project Topic
▪ Option 1: Choose your own dataset,
problem, question.
▪ Option 2: Use the bikeshare dataset.
Project 2 (40%)
▪ Deliverables
▪ Project Proposal (Due November 4th, 10 marks)
▪ Medium Article (Due December 7th 15 marks)
▪ Good "The bar chart below shows a significant drop in rides on February 17.
Further investigation revealed this was the date of the famed 2006 blizzard, which
shut down the city. A newspaper article linked here documents over 60 cm of
snowfall and widespread power outages which resulted in the military being
called in."
▪ Bad "The bar chart below shows a drop in rides in February."
▪ Live Presentation (Week 12, 10)
▪ Publish Github Repository (Due December 7th 5 marks)
▪ Your code and project structure will be evaluated.
▪ Ensure you're writing #cleancode and that anyone can easily reproduce your
results.
Industry Spotlights
Structure
▪ Lectures
▪ High-level overview of concepts and methods.
▪ Tutorials
▪ Step-by-step code-along.
Independence,
▪ Assignments Confidence &
▪ Individual work where students are guided through a Context
problem and must fill in answers.
▪ Project 1
▪ Test your ML skills.
▪ Project 2
▪ Completely open end-to-end Data Science project.
▪ Industry Spotlights
▪ Provide context (Why are you learning this stuff?).
This is going to be a lot of fun.
CME538 Introduction to Data Science
Week 1 | Lecture 1 (1.1)
Introduction to the Data Science landscape and its application in engineering.