From the course: Complete Guide to AI and Data Science for SQL: From Beginner to Advanced
Unlock the full course today
Join today to access over 24,500 courses taught by industry experts.
Checking the distribution of the variables - SQL Tutorial
From the course: Complete Guide to AI and Data Science for SQL: From Beginner to Advanced
Checking the distribution of the variables
- [Instructor] Now that you've explored the summary statistics of your dataset in the last step, in this step, you are going to visualize your data to gain a deeper understanding of its distribution. Data visualization is like putting on special glasses that allows you to see patterns and insights in your data. To do this, you'll be using Python Library's Matplotlib and Seaborn. Let's dive right in. You see this code? When you run it, you're creating histograms for each of your columns, and each histogram represents the distribution of a specific attribute. Here's an example. Take a look at the crime rate column. The distribution of crime rate appears to be highly skewed to the right with a mean of 3.61 and a maximum value of 88.98. When you look at the histogram, you'll see that most houses have a crime rate below 20 and there are fewer houses with higher crime rates. This means that the majority of houses in the…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
-
-
-
-
(Locked)
Importing necessary libraries and dataset overview3m 18s
-
(Locked)
Loading the data7m 36s
-
(Locked)
Checking the data info2m 13s
-
(Locked)
Summary statistics of the dataset5m 49s
-
(Locked)
Checking the distribution of the variables5m 42s
-
(Locked)
Applying log transformation and re-checking distribution3m 6s
-
(Locked)
Challenge: Preparation1m 5s
-
(Locked)
Solution: Preparation1m 19s
-
(Locked)
-
-
-
-
-
-