Assignment Yassir 1
Assignment Yassir 1
Murdoch University
Mid-term assignment & Data science application project
In this unit, you will complete two consecutive assignments that focus on a specific topic in
real-world data science applications. These assignments are designed to help you develop a
good understanding of the latest data-driven modeling techniques used in real-world
applications, and to guide you through the implementation of the entire data science pipeline
on a real dataset using R. By completing these assignments, you will gain hands-on experience
and knowledge that will prepare you for new real-world data science projects.
Topic background: Dementia is a debilitating disease that affects millions of people worldwide.
Early detection and risk prediction are crucial for effective treatment and care. Data-driven
models are increasingly important in the field of dementia research, as they can identify
patterns and relationships in complex datasets that can be used to predict an individual's risk
of developing the disease.
For this assignment, your group will conduct a concise literature review on the latest data-
driven models for the dementia risk analysis and prediction. The purpose of this review is to
help you gain knowledge and ideas about the most up-to-date data-driven approaches used
for dementia risk analysis and prediction, so that you can develop your own models to analyze
the dementia data provided in Assignment 2.
Useful links:
Where to find literature review
https://scholar.google.com.au/
https://librarysearch.murdoch.edu.au/discovery/search?vid=61MUN_INST:61MU&lang=en
Search for literature Guide
https://libguides.murdoch.edu.au/LitReview/search
IEEE Referencing Guide
https://libguides.murdoch.edu.au/IEEE
https://medium.com/academicianhelp/ieee-referencing-using-microsoft-word-66c855181d64
Assignment 2 Data science application project (individual assignment)
Students will work independently to perform the entire data science pipeline on a given real-
world dementia dataset using R. You will be required to describe the entire project in a
detailed report and submit the code.
The data set used in this study was obtained from a mobile health care service offered in
collaboration with non-governmental organizations that run elderly care centers. This service
was provided to elderly people residing in various districts of Hong Kong for free from 2008 to
2018. The data set consists of 2299 cases, each of which includes eleven variables. These
variables include age, body height, body weight, education level, financial support, geriatric
depression scale score, out-of-pocket financial source (whether they were independent or
dependent on family), marital status, Mini Nutritional Assessment part A score, Mini
Nutritional Assessment part B score. The outcome labels were based on the categories of the
Mini Mental State Exam.
Assignment guidelines:
Each student is required to submit one project report in a Word document, and R files
which are reproducible to generate all the results in the report.
R is the only accepted programming language for this assignment. You must use R to
complete all tasks and analyses.
Introduction: Introduce the topic of the data science project, including the problem
statement and the goals that the project aims to achieve.
Dataset description: Provide background information on the dataset used in the project,
including its source and any relevant characteristics. Include summary statistics to give
readers an overview of the data.
Data pre-processing: Explain any pre-processing steps that were necessary for the dataset
and justify why they were performed. This section should consider steps such as cleaning,
transforming or encoding the data.
Prediction modelling: Select two prediction models and applied them on the given dataset.
This section should also include some brief information on the selected models, explain
why the chosen models were appropriate for the dataset. Also evaluate the performance
of the two models and compare their results using the appropriate performance metrics.
Results and discussion: Analyze the results and discuss the findings in a clear and engaging
manner. This section should include visualizations and any insights gleaned from the data.
Conclusion: summarize the project to give a concise overview of the project and useful
insights and conclusions.
In addition to the project report, we also require the submission of an R file that includes the
complete code performed from data loading to prediction modeling. The code should be well-
organized, easy to follow, and produce the same outcomes as presented in the project report.
R file guidelines:
In your submitted code file, include comments to explain the purpose and functionality of
each section of code.
Organize the code into clear sections, such as data cleaning, exploratory data analysis and
prediction model implementation.
Use white space and indentation to enhance readability.
Avoid using overly complicated code, and instead focus on writing clear, concise code.
Bonus task:
Create an R Shiny app that allows users to interact with the data science pipeline you
developed in the project.
Note that
1) This task is a ‘bonus’, which means you will not lose any mark if it is not completed.
However, if you completed, you would earn extra marks (up to extra 15 points on the total
mark of the assignment, with the cap of reaching 100).
2) The bonus task will not be supervised by the teaching staff. Some useful online links are
provided to guide creating the R Shiny app. Therefore, students who are interested need to
rely on their self-learning and exploration to complete the task.
Specification: The R Shiny app should 1) be user-friendly, with clear instructions and intuitive
navigation. 2) Users should be able to upload the dataset, perform exploration data analysis
via generating different visualizations, select prediction models, and view performance
metrics. To develop the app, the student will need to integrate the code used in the previous
tasks into the Shiny framework. Additional features, such as interactive visualizations, can also
be added to enhance the user experience.
Submission for the bonus task requires the Shiny app R scripts and a separate simple user
guide Word document (1-2 pages) that explains the app's functionality and provides
instructions on how to use it. Students can include screenshots and code snippets to showcase
the app's features and functionality.