DSRT - 734 - Residency Week - Second - Presentation
DSRT - 734 - Residency Week - Second - Presentation
DSRT - 734 - Residency Week - Second - Presentation
Information Sciences
1
UC Residency Schedule Feb 14 – 16, 2020
2020 Summer_MAIN_DSRT 734-Inferential Statistics in Decision Making
2
Residency Rules
• Residency session is 60% of the course learning experience
• The assignment includes research/case analysis/industry project
– Students should complete collaboratively in groups
– 4 students per group (max)
• The deliverable of the project includes
– the group presentation
– the research paper by each individual
• Research paper is a 10-15 pages, to be submitted by Week 14.
• Final Presentations are:
– 20 minutes for each group
– Delivered in the final session 8am - 1:00pm on Sunday.
3
Statistical Methods Review
• Some well-known statistical tests and procedures are:
– Analysis of variance (ANOVA)
– Chi-squared test.
– Correlation.
– Factor analysis.
– Mann–Whitney U.
– Mean square weighted deviation (MSWD)
– Pearson product-moment correlation coefficient.
– Regression analysis.
– Etc…
4
Residency Grading
Stage Points Due Requirements Output
Pre-Residency 10 6:00 pm Friday June 5th • Complete pre- Fill out in iLearn questionnaire
Assessment residency assessment
Quiz #2
Identify a data set for 20 11:00 am Saturday June 6th– Each team explains the data set,
Analysis and Define the each Group presents 7-8 they choose, analyze the data
Statistical methods mins attributes, visualizes, correlates,
variables , proposes ML
approach ~ 5 slides team
Design Solution 40 9:00 am – 12:00 noon Each team presents full analysis
Sunday June 7th with comparison of ML
20 minutes per each Group algorithms
Post-Residency 10 1:00 pm Sunday June 7th • Complete post-residency Fill out in iLearn questionnaire
Assessment assessment
5
Agenda - Overall
Friday June 5th Saturday continued
5:00 pm Introductions 12:30 pm Feedback
5.30 pm Pre-Residency assessment 12:45 pm Lunch
6:00 pm Machine Learning Review 1:45 pm Continue Group work
6:45 pm Group Planning & Q&A 7:30 pm Session End
7:25 pm Q&A Session – Begin Research
10:00 pm Session End Sun June 7th
Saturday June 6th 8:00 am Group Presentations -
• Each Group presents 20 mins – Each
8:00 am Effective Presentations
group is graded on how they present
8:30 am Continue research, presentation their results
11:00 am Group presentations – explain Q&A 12.30 pm Post-Residency Assessment
and analyze, visualize your data set – and 1:00 pm Wrap Up
you will analyze the data
6
Friday June 5th
5:00 pm Introductions/ Pre-Residency assessment
• Everyone introduce themselves • Go into iLearn and fill out Pre-
• Name assessment
• Professional Background • How does this course help you
• Why taking this course? professionally?
• What have you learnt so far? • What are your course
• Skills & preferences expectations?
• Lead?
• Present?
• Process data?
• Design Charts and Slides
7
1. Heart Disease Dataset
https://www.kaggle.com/ronitf/heart-disease-uci
• This dataset includes 39,669 results of international football matches starting from the very first official match in
1972 up to 2018.
• The matches range from FIFA World Cup to FIFI Wild Cup to regular friendly matches.
• The matches are strictly men's full internationals and the data does not include Olympic Games or matches where
at least one of the teams was the nation's B-team, U-23 or a league select team.
– results.csv includes the following columns:
– date - date of the match
– home_team - the name of the home team
– away_team - the name of the away team
– home_score - full-time home team score including extra time, not including penalty-shootouts
– away_score - full-time away team score including extra time, not including penalty-shootouts
– tournament - the name of the tournament
– city - the name of the city/town/administrative unit where the match was played
– country - the name of the country where the match was played
– neutral - TRUE/FALSE column indicating whether the match was played at a neutral venue
• Goal: Soccer Predictions for the Fifa World Cup 2018
10
4. Students Performance in Exams
https://www.kaggle.com/spscientist/students-performance-in-exams/home
• Context
– Marks secured by the students
• Content
– This data set consists of the marks secured by the students in various subjects.
• Acknowledgements
– http://roycekimmons.com/tools/generated_data/exams
• Inspiration
– Understand the influence of the parents background, test preparation etc on students
performance
– Use the student data of test results, create a fictitous variable of pass or fail.
– Predict whether a student passes or fails using these any classification method
https://www.kaggle.com/katiej277/classification-in-r-logistic-regression-and-lda
11
5. Bank Marketing
https://www.kaggle.com/henriqueyamahata/bank-marketing
Bank client data:
• Age (numeric)
• Job : type of job (categorical: 'admin.', 'blue-collar', 'entrepreneur', 'housemaid', 'management',
'retired', 'self-employed', 'services', 'student', 'technician', 'unemployed', 'unknown')
• Marital : marital status (categorical: 'divorced', 'married', 'single', 'unknown' ; note: 'divorced'
means divorced or widowed)
• Education (categorical: 'basic.4y', 'basic.6y', 'basic.9y', 'high.school', 'illiterate',
'professional.course', 'university.degree', 'unknown')
• Default: has credit in default? (categorical: 'no', 'yes', 'unknown')
• Housing: has housing loan? (categorical: 'no', 'yes', 'unknown')
• Loan: has personal loan? (categorical: 'no', 'yes', 'unknown')
y - has the client subscribed a term deposit? (binary: 'yes', 'no')
12
6. Breast Cancer Wisconsin (Diagnostic) Data
https://www.kaggle.com/uciml/breast-cancer-wisconsin-data
13
7. IBM HR Analytics Employee Attrition & Performance
https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset
• Uncover the factors that lead to employee attrition and explore important questions
such as ‘show me a breakdown of distance from home by job role and attrition’ or
‘compare average monthly income by education and attrition’.
This is a fictional data set created by IBM data scientists.
• Education 1 'Below College' 2 'College' 3 'Bachelor' 4 'Master' 5 'Doctor'
• Environment Satisfaction 1 'Low' 2 'Medium' 3 'High' 4 'Very High'
• Job Involvement 1 'Low' 2 'Medium' 3 'High' 4 'Very High'
• Job Satisfaction 1 'Low' 2 'Medium' 3 'High' 4 'Very High'
• Performance Rating 1 'Low' 2 'Good' 3 'Excellent' 4 'Outstanding'
• Relationship Satisfaction 1 'Low' 2 'Medium' 3 'High' 4 'Very High'
• Work Life Balance 1 'Bad' 2 'Good' 3 'Better' 4 'Best'
14
8. House Sales in King County, USA
https://www.kaggle.com/harlfoxem/housesalesprediction
15
Questions?
16