Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Exploratory Data Analysis EDA On Power BI 1712874850

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

# [ Exploratory Data Analysis (EDA) on Power BI ] [ cheatsheet ]

1. Data Import

● Import data from CSV: Source = Csv.Document(File.Contents("file.csv"),


[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None])
● Import data from Excel: Source =
Excel.Workbook(File.Contents("file.xlsx"), null, true)
● Import data from SQL Server: Source = Sql.Database("server", "database",
[Query="SELECT * FROM table"])
● Import data from Web: Source =
Web.Page(Web.Contents("https://example.com"))

2. Data Transformation

● Remove columns: Table.RemoveColumns(Source, {"Column1", "Column2"})


● Rename columns: Table.RenameColumns(Source, {{"OldName", "NewName"}})
● Filter rows: Table.SelectRows(Source, each [Column] > 10)
● Sort rows: Table.Sort(Source, {{"Column", Order.Ascending}})
● Group rows: Table.Group(Source, {"Column"}, {{"Count", each
Table.RowCount(_), type number}})
● Merge queries: Table.NestedJoin(Source1, {"Key"}, Source2,
{"ForeignKey"}, "NewColumn", JoinKind.Inner)
● Append queries: Table.Combine({Source1, Source2})
● Pivot data: Table.Pivot(Source, List.Distinct(Source[ColumnToPivot]),
"ColumnToPivot", "ValueColumn", List.Sum)

3. Data Cleaning

● Remove duplicates: Table.Distinct(Source)


● Replace values: Table.ReplaceValue(Source, "OldValue", "NewValue",
Replacer.ReplaceText, {"Column"})
● Fill null values: Table.FillDown(Source, {"Column"})
● Handle errors: Table.ReplaceErrorValues(Source, {{"Column",
"DefaultValue"}})
● Trim whitespace: Table.TransformColumns(Source, {{"Column", Text.Trim,
type text}})
● Remove non-printable characters: Table.TransformColumns(Source,
{{"Column", each Text.Remove(_, {"0".."9", "a".."z", "A".."Z", " "}),
type text}})

By: Waleed Mousa


4. Data Validation

● Check for null values: Table.AddColumn(Source, "IsNull", each if [Column]


= null then "Yes" else "No")
● Check for empty values: Table.AddColumn(Source, "IsEmpty", each if
[Column] = "" then "Yes" else "No")
● Check for duplicate values: Table.AddColumn(Source, "IsDuplicate", each
if List.Contains(List.RemoveFirstN(Source[Column],
List.PositionOf(Source[Column], [Column])), [Column]) then "Yes" else
"No")
● Check for valid data types: Table.AddColumn(Source, "IsValid", each if
Value.Is([Column], type text) then "Yes" else "No")
● Check for valid ranges: Table.AddColumn(Source, "IsInRange", each if
[Column] >= 0 and [Column] <= 100 then "Yes" else "No")

5. Data Exploration

● View column data types: Table.TransformColumnTypes(Source, {{"Column1",


type text}, {"Column2", type number}})
● View column statistics: Table.Profile(Source, {"Column"})
● View unique values: Table.Distinct(Table.SelectColumns(Source,
{"Column"}))
● View top N rows: Table.FirstN(Source, 10)
● View bottom N rows: Table.LastN(Source, 10)
● View sample rows: Table.Sample(Source, 100, 1234)
● View missing values: Table.AddColumn(Source, "IsMissing", each if
[Column] = null then 1 else 0)
● View data distribution: Table.Profile(Source, {"Column"}, 0.1)

6. Data Visualization

● Create a bar chart: BarChart = Table.Group(Source, {"Category"},


{{"Value", each List.Sum([Value]), type number}})
● Create a line chart: LineChart = Table.Group(Source, {"Date"}, {{"Value",
each List.Sum([Value]), type number}})
● Create a pie chart: PieChart = Table.Group(Source, {"Category"},
{{"Value", each List.Sum([Value]), type number}})
● Create a scatter plot: ScatterPlot = Table.Group(Source, {"X", "Y"},
{{"Value", each List.Sum([Value]), type number}})

By: Waleed Mousa


● Create a treemap: Treemap = Table.Group(Source, {"Category",
"Subcategory"}, {{"Value", each List.Sum([Value]), type number}})
● Create a heatmap: Heatmap = Table.Pivot(Source,
List.Distinct(Source[Row]), "Row", List.Distinct(Source[Column]),
"Column", "Value", List.Sum)
● Create a funnel chart: FunnelChart = Table.Group(Source, {"Stage"},
{{"Value", each List.Sum([Value]), type number}})
● Create a gauge chart: GaugeChart = Table.Group(Source, {"Category"},
{{"Value", each List.Sum([Value]), type number}})

7. Statistical Analysis

● Calculate mean: Mean = List.Average(Source[Column])


● Calculate median: Median = List.Median(Source[Column])
● Calculate mode: Mode = Table.Group(Source, {"Column"}, {{"Count", each
Table.RowCount(_)}})[Column]{List.MaxN(Table.Group(Source, {"Column"},
{{"Count", each Table.RowCount(_)}}) [Count], 1)}
● Calculate standard deviation: StandardDeviation =
List.StandardDeviation(Source[Column])
● Calculate variance: Variance = List.Variance(Source[Column])
● Calculate minimum value: Minimum = List.Min(Source[Column])
● Calculate maximum value: Maximum = List.Max(Source[Column])
● Calculate quartiles: Quartiles = {List.Percentile(Source[Column], 0.25),
List.Percentile(Source[Column], 0.5), List.Percentile(Source[Column],
0.75)}
● Calculate correlation: Correlation = Table.RowCount(Source) > 0 ?
List.Correlation(Source[Column1], Source[Column2]) : null
● Perform t-test: TTest = List.TTest(Source[Column1], Source[Column2],
0.95, 0)

8. Time Series Analysis

● Convert to date type: Table.TransformColumnTypes(Source, {{"Date", type


date}})
● Extract year from date: Table.TransformColumns(Source, {{"Year", each
Date.Year([Date]), type number}})
● Extract month from date: Table.TransformColumns(Source, {{"Month", each
Date.Month([Date]), type number}})
● Extract day from date: Table.TransformColumns(Source, {{"Day", each
Date.Day([Date]), type number}})

By: Waleed Mousa


● Extract day of week: Table.TransformColumns(Source, {{"DayOfWeek", each
Date.DayOfWeek([Date]), type number}})
● Extract day of year: Table.TransformColumns(Source, {{"DayOfYear", each
Date.DayOfYear([Date]), type number}})
● Extract quarter from date: Table.TransformColumns(Source, {{"Quarter",
each Date.QuarterOfYear([Date]), type number}})
● Calculate moving average: Table.AddColumn(Source, "MovingAverage", each
List.Average(List.Range(Source[Value], [Index] - 2, 3)))
● Calculate year-over-year growth: Table.Group(Source, {"Year"}, {{"Value",
each List.Sum([Value])}})

9. Geographic Analysis

● Create a map visualization: Map = Table.AddColumn(Source, "Location",


each Text.Combine({Text.From([Latitude], "en-US"), ",",
Text.From([Longitude], "en-US")}))
● Calculate distance between points: Distance = (6371 *
Number.Acos(Number.Cos(Number.Radians(90 - [Latitude1])) *
Number.Cos(Number.Radians(90 - [Latitude2])) +
Number.Sin(Number.Radians(90 - [Latitude1])) *
Number.Sin(Number.Radians(90 - [Latitude2])) *
Number.Cos(Number.Radians([Longitude1] - [Longitude2]))))
● Identify nearest location: NearestLocation = Table.AddColumn(Source,
"NearestLocation", each
Text.Combine({Text.From(List.Min(Table.AddColumn(Source, "Distance", each
Distance([Latitude], [Longitude], [Latitude1], [Longitude1]))[Distance]),
"en-US"), ",", Text.From(List.Min(Table.AddColumn(Source, "Distance",
each Distance([Latitude], [Longitude], [Latitude1],
[Longitude1]))[Longitude]), "en-US")}))
● Create a choropleth map: ChoroplethMap = Table.Group(Source, {"Region"},
{{"Value", each List.Sum([Value])}})

10. Data Insights

● Identify top N categories: TopCategories =


Table.FirstN(Table.Sort(Table.Group(Source, {"Category"}, {{"Value", each
List.Sum([Value])}}), {{"Value", Order.Descending}}), 5)
● Identify bottom N categories: BottomCategories =
Table.LastN(Table.Sort(Table.Group(Source, {"Category"}, {{"Value", each
List.Sum([Value])}}), {{"Value", Order.Ascending}}), 5)
● Identify trending categories: TrendingCategories =
Table.AddColumn(Table.Group(Source, {"Category", "Date"}, {{"Value", each

By: Waleed Mousa


List.Sum([Value])}}), "PreviousValue", each
Table.RowCount(Table.SelectRows(Source, each [Date] < [Date] and
[Category] = [Category])) > 0 ? Table.Max(Table.SelectRows(Source, each
[Date] < [Date] and [Category] = [Category])[Value]) : null)
● Identify anomalies: Anomalies = Table.AddColumn(Source, "IsAnomaly", each
if [Value] < List.Average(Source[Value]) - 2 *
List.StandardDeviation(Source[Value]) or [Value] >
List.Average(Source[Value]) + 2 * List.StandardDeviation(Source[Value])
then "Yes" else "No")
● Identify patterns: Patterns = Table.AddColumn(Source, "Pattern", each if
[Value] > 1000 then "High" else if [Value] < 500 then "Low" else
"Medium")

11. Data Quality

● Calculate missing value percentage: MissingValuePercentage =


Table.RowCount(Table.SelectRows(Source, each
List.Any(Record.FieldValues(_), each _ = null))) / Table.RowCount(Source)
● Calculate duplicate value percentage: DuplicateValuePercentage =
(Table.RowCount(Source) - Table.RowCount(Table.Distinct(Source))) /
Table.RowCount(Source)
● Calculate outlier percentage: OutlierPercentage =
Table.RowCount(Table.SelectRows(Source, each [Value] <
List.Average(Source[Value]) - 2 * List.StandardDeviation(Source[Value])
or [Value] > List.Average(Source[Value]) + 2 *
List.StandardDeviation(Source[Value]))) / Table.RowCount(Source)
● Identify data type mismatches: DataTypeMismatches =
Table.AddColumn(Source, "DataTypeMismatch", each if Value.Is([Column],
type text) and not Text.Contains([Column], ".") then "Yes" else "No")
● Identify inconsistent formats: InconsistentFormats =
Table.AddColumn(Source, "InconsistentFormat", each if
Text.Contains([Column], "/") and Text.Contains([Column], "-") then "Yes"
else "No")

12. Data Transformation

● Pivot data: PivotedData = Table.Pivot(Source,


List.Distinct(Source[ColumnToPivot]), "ColumnToPivot", "ValueColumn")
● Unpivot data: UnpivotedData = Table.UnpivotOtherColumns(Source,
{"KeyColumn"}, "Attribute", "Value")
● Transpose data: TransposedData = Table.Transpose(Source)

By: Waleed Mousa


● Split column by delimiter: SplitColumn = Table.SplitColumn(Source,
"ColumnToSplit", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),
{"Column1", "Column2"})
● Merge columns: MergedColumn = Table.AddColumn(Source, "MergedColumn",
each Text.Combine({[Column1], [Column2]}, " "))
● Create conditional column: ConditionalColumn = Table.AddColumn(Source,
"ConditionalColumn", each if [Column1] > 10 then "High" else "Low")
● Group and aggregate data: GroupedData = Table.Group(Source,
{"GroupColumn"}, {{"AggregatedValue", each List.Sum([Value]), type
number}})

13. Data Modeling

● Create a calendar table: CalendarTable =


Table.AddColumn(Table.TransformColumnTypes(Table.FromList(List.Dates(#dat
e(2020, 1, 1), 365, #duration(1, 0, 0, 0)), {{"Date", type date}}),
"Year", each Date.Year([Date]))
● Create a date dimension: DateDimension =
Table.TransformColumnTypes(Table.FromList(List.Distinct(Table.TransformCo
lumns(Source, {{"Date", each Date.From([Date]), type date}}))), {{"Date",
type date}, {"Year", Int64.Type}, {"Month", Int64.Type}, {"Day",
Int64.Type}, {"DayOfWeek", Int64.Type}})
● Create a slowly changing dimension (SCD): SCDimension =
Table.Distinct(Table.Buffer(Table.NestedJoin(Source, {"ProductID"},
Table.AddIndexColumn(Table.SelectColumns(Table.Distinct(Table.SelectColum
ns(Source, {"ProductID", "ProductName", "Category"})), {"ProductID",
"ProductName", "Category"}), "Version", 1, 1), {"ProductID", "Version"},
"Product", JoinKind.FullOuter)))
● Create a fact table: FactTable = Table.NestedJoin(Source, {"OrderID"},
Table.AddIndexColumn(Table.Distinct(Table.SelectColumns(Source,
{"OrderID", "CustomerID", "ProductID", "OrderDate", "Quantity",
"TotalAmount"})), "FactID", 1, 1), {"OrderID"}, "Fact", JoinKind.Inner)
● Create a star schema: StarSchema = Table.NestedJoin(FactTable,
{"CustomerID"},
Table.AddIndexColumn(Table.Distinct(Table.SelectColumns(Source,
{"CustomerID", "CustomerName", "Country"})), "CustomerKey", 1, 1),
{"CustomerID", "CustomerKey"}, "Customer", JoinKind.Inner)

14. Advanced Analytics

● Perform market basket analysis: MarketBasket =


Table.AddColumn(Table.Group(Table.Distinct(Table.SelectColumns(Source,

By: Waleed Mousa


{"OrderID", "ProductID"})), {"OrderID"}, {{"Products", each
Text.Combine([ProductID], ","), type text}}), "SupportCount", each
Table.RowCount(Table.SelectRows(Source, each
List.Contains(Text.Split([Products], ","), [ProductID]))))
● Perform customer segmentation: CustomerSegmentation =
Table.AddColumn(Table.Group(Source, {"CustomerID"}, {{"TotalSpend", each
List.Sum([TotalAmount]), type number}, {"VisitFrequency", each
Table.RowCount(_), type number}, {"Recency", each
Date.From(List.Max([OrderDate])), type date}}), "Segment", each if
[TotalSpend] > 1000 and [VisitFrequency] > 10 and
Date.IsInPreviousNMonths([Recency], 3) then "High Value" else if
[TotalSpend] > 500 and [VisitFrequency] > 5 and
Date.IsInPreviousNMonths([Recency], 6) then "Mid Value" else "Low Value")
● Perform cohort analysis: CohortAnalysis =
Table.Group(Table.AddColumn(Source, "CohortMonth", each
Date.StartOfMonth([OrderDate])), {"CohortMonth", "CustomerID"},
{{"TotalSpend", each List.Sum([TotalAmount]), type number},
{"VisitFrequency", each Table.RowCount(_), type number}})
● Perform RFM analysis: RFMAnalysis = Table.AddColumn(Table.Group(Source,
{"CustomerID"}, {{"Recency", each Date.From(List.Max([OrderDate])), type
date}, {"Frequency", each Table.RowCount(_), type number}, {"Monetary",
each List.Sum([TotalAmount]), type number}}), "RFMScore", each
Text.Combine({Text.Range(Text.From(Date.DayOfYear([Recency])), 0, 1),
Text.Range(Text.From([Frequency]), 0, 1),
Text.Range(Text.From([Monetary]), 0, 1)}))
● Perform customer lifetime value analysis: CustomerLifetimeValue =
Table.AddColumn(Table.Group(Source, {"CustomerID"}, {{"TotalSpend", each
List.Sum([TotalAmount]), type number}, {"VisitFrequency", each
Table.RowCount(_), type number}, {"AverageOrderValue", each
List.Average([TotalAmount]), type number}, {"CustomerLifetime", each
Duration.Days(DateTime.LocalNow() - Table.Min(_[OrderDate])), type
number}}), "CLV", each [AverageOrderValue] * [VisitFrequency] *
[CustomerLifetime] / 365)

15. Data Storytelling

● Create a KPI visual: KPIVisual = Table.AddColumn(Table.Group(Source,


{"Category"}, {{"TotalSales", each List.Sum([Sales]), type number},
{"TargetSales", each List.Sum([Target]), type number}}), "Status", each
if [TotalSales] >= [TargetSales] then "Meeting Target" else "Below
Target")

By: Waleed Mousa


● Create a trend visual: TrendVisual = Table.AddColumn(Table.Group(Source,
{"Date"}, {{"Sales", each List.Sum([Sales]), type number}}),
"PreviousSales", each #"Sales"{[Index] - 1})
● Create a comparison visual: ComparisonVisual = Table.Group(Source,
{"Category", "Date"}, {{"ThisYearSales", each List.Sum([This Year
Sales]), type number}, {"LastYearSales", each List.Sum([Last Year
Sales]), type number}})
● Create a distribution visual: DistributionVisual = Table.Group(Source,
{"AgeGroup"}, {{"Sales", each List.Sum([Sales]), type number}})
● Create a relationship visual: RelationshipVisual =
Table.NestedJoin(Table.Group(Source, {"CustomerID"}, {{"TotalSales", each
List.Sum([Sales]), type number}}), {"CustomerID"}, Table.Group(Source,
{"CustomerID", "ProductCategory"}, {{"CategorySales", each
List.Sum([Sales]), type number}}), {"CustomerID"}, "CustomerProduct",
JoinKind.Inner)

16. Reporting and Dashboard Design

● Create a dynamic title: DynamicTitle = "Sales Analysis - " &


Text.From(List.Min(Source[OrderDate]), "MMMM YYYY") & " to " &
Text.From(List.Max(Source[OrderDate]), "MMMM YYYY")
● Create a drill-through report: DrillThroughReport =
Table.SelectRows(Source, each [OrderID] = OrderIDParameter)
● Create a conditional formatting rule: ConditionalFormatting = if [Sales]
< 1000 then "Red" else if [Sales] < 5000 then "Yellow" else "Green"
● Create a tooltip: Tooltip = "Sales: " & Text.From([Sales], "$#,0.00") & "
| Quantity: " & Text.From([Quantity], "#,0")
● Create a custom visual: CustomVisual =
Table.ToColumns(Table.Group(Source, {"Category"}, {{"Sales", each
List.Sum([Sales]), type number}}))
● Create a responsive layout: ResponsiveLayout =
Table.Combine({Table.SelectColumns(Source, {"Category"}),
Table.SelectColumns(Source, {"Sales"})})

17. Data Refresh and Scheduling

● Refresh data manually: ManualRefresh = Table.Refresh(Source)


● Schedule data refresh: ScheduledRefresh = Table.Buffer(Source, 1440)
● Incremental refresh: IncrementalRefresh = Table.SelectRows(Source, each
[OrderDate] > MaxDate)

By: Waleed Mousa


● Refresh a specific table: TableRefresh = Table.Refresh(Source, {"Table1",
"Table2"})
● Refresh multiple data sources: MultiSourceRefresh =
Table.Combine({Table.Refresh(Source1), Table.Refresh(Source2)})

By: Waleed Mousa

You might also like