uses scientific methods, processes, algorithms and systems to extract knowledge and insights from noisy, structured and unstructured data and apply knowledge and actionable insights from data across a broad range of application domains….” - Wikipedia’s Definition of Data Science
Image thanks to Serap Baysal
What do I need to learn?
Image thanks to Benjamin Obi Tayo, Ph.D
Our Focus:
● Coding with pyhton
● Data Wrangling ● Data Visualization Data Analytics • We want to analyze data and draw conclusions from the information. • This information is critical in making key business decisions. • A common saying that data is the new oil is not an exaggeration. • It has become pervasive and is largely used in many industries e.g. health, finance, education, automative,… Data Analytics (Types) • Descriptive – understand what the happened e.g the number of accidents increased year-on-year. • Diagnostic – why has it happened e.g why have the number of accidents increased. • Predictive – what will happen e.g will the number of accidents increase. • Prescriptive – what action should be taken e.g what should be done to reduce the number of accidents We shall focus primarily on the descriptive and diagnostic aspects for this moule. Data Analytics (Tools) • There are many tools that can be used depending on the task and the environment. • For example, businesses may typically use enterprise tools likes Microsoft Excel, Tableau, Power Bi. • Need for automation and fine tuning also means that programming tools have become very popular e.g R, Python • For this module we shall focus on using Python as the main tool. • NOTE: You can use whatever tools you prefer for the mini-project. Data Analytics (What we shall do) • Pandas for handling data • Data loading • Data wrangling • Exploration • Data engineering (introduction) • Visualization • Matplotlib and Seaborn Types of Data • Data can mainly grouped as Categorical or Numerical.
• Categorical data takes on specific values e.g rating a safe boda
rider can only be done in predefined star categories.
• Numerical data represents a numeric quantity which can take on
any value (sometimes within a given range) e.g age of students in a class. Categorical Data • Can take on two forms; • Nominal - here the categories do not have any underlying order e.g assigning male or female sex to a subject or selecting a direction from the compass. • Ordinal - the categories have some underlying order e.g rating a product as bad, average or good. There is an inherent ordering which implies that good>average>bad Numerical Data • Also takes on two forms based on range • Discrete - contains a finite number of values, is countable and cannot be subdivided e.g age of a person • Continuous - Infinite number of probable values and can usually be measured e.g weight of a person, money earned by a business. Questions on types of data • What type of data is this? • Rainfall in millimeters recorded for the past 5 years. • Force in Newtons required to bend a different springs. • Birth month of students in a class