Project Requirement
Project Requirement
Project Requirement
Xiao Lei
Project deliverables
(1) 15-min (maximum) presentation in the last week.
(2) Five page (maximum) project report, excluding code, table, figures, etc.
Data Resources
Collecting your own data will be viewed as a bonus. In general, your data should have a
reasonable size (>100 rows).
Kaggle
https://github.com/awesomedata/awesome-public-datasets
Taxi Data: http://www.nyc.gov/html/tlc/htm1/about/trip_record_data.shtm1
CMU statlib
Tianchi, KDD Cup, AI Challenger
UCI Machine Learning Repository
Google’s dataset search engine: https://datasetsearch.research.google.com/
Government datasets: Singapore: https://data.gov.sg/, Australia:
https://data.gov.au/, EU: https://data.europa.eu/data/datasets, New Zealand:
https://data.govt.nz/
https://machinelearningmastery.com/machine-learning-datasets-in-r/
The R dataset package:
https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html
Presentation
Presentations should summarize the contents of your project report. Be prepared to
clearly describe the key aspects of your project. What are the most important issues in
real world that your project addresses? What are the key strengths and weaknesses of
proposed approaches for addressing your topic (if relevant)? This should be a group
presentation and all members of the team should appear in the presentation. There is no
limit on the number of pages of the slides, but the total duration is 15 minutes. Allocate
the first slide to describe the duties/contributions of all of the group members, include
percentage of work, and all of the percentages of different members should be different.
Every member of the team should begin their part by introducing his/her name.
Project report
1 One paragraph describing the duties of all of the group members (use percentage
to denote the contribution of each team member)
2 Section 1: Executive Summary
Written to your boss's boss, who does not know statistics.
Describe your problem, why it is interesting and important, what you have
done (data collection, statistical analysis, etc.), and what conclusion you have
reached. How to utilize what you have found to benefit the company (you can
define what kind of company you are in, to make your work relevant)
3 Section 2: Introduction
Describe your problem, why it is interesting, etc. State your approach.
The section should end with a paragraph, starting with: The rest of this report
is organized as follows. In Section 3 , we ...; In Section 4 , we ...
4 Section 3: Data
Describe your data and how you collected them.
Some summary statistics - tables and figures (exploratory data analysis)
5 Section 4: Analysis
Detailed analysis: - Show tables and figures. Do not show code.
Pay attention to interpretations and explain why each procedure is needed,
what you have learn from it etc.
6 Section 5: Conclusion
State what you have learned. How it will benefit your company etc.
If you are given 6 more months to work on it, what will you do. Or things like
that. Make the case that the company needs you.
Rubrics