Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views31 pages

dm cia1

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 31

Ronak Nitin Sheth

3 BBA FMA A
2324351

Data Mining
CIA-1

Topic: Data Warehousing and OLAP Operations

Submitted to:
Dr Shashidhar Yadav J
School of Business and Management
CHRIST (Deemed To Be University)
Yeshwanthpur Campus

Date of submission: 29/7/24


INDEX
1. Introduction to dataset
2. OLAP operations
• Roll Up
• Drill Down
• Slicing
• Dicing
• Pivot
3. ETL (Extract, Transform, Load)
4. Star Schema
5. Snowflake Schema
ABOUT THE DATASET

The dataset contains sales information for various products across different states and
markets. Each row represents a product, with details such as:

• Customer ID: Unique identifier for the customer


• Customer Name: Name of the customer
• Product Type: The category of the product (Electronics, Apparel, Grocery,
Automotive, Furniture)
• Product Line: Specific product within the type (Laptops, Men's Wear, Snacks, Cars,
Home Office)
• State: The state where the product is sold
• Market Size: The size of the market (Small, Medium, Large)
• Total Expense: Total expenses incurred for the product
• Sales: Total sales generated by the product
• Profit: Profit made from the product
• Cost of Goods: Cost of producing or acquiring the product

ccc

Upto 55 columns..
We assume that data is processed and has been uploaded to the software and connected

OLAP OPERATIONS
Drop the selected file “dataset1.xlsx” file in rapid miner and connect it with a Splan

Step 1: Upload the data by clicking on Import Data section and then click on “dataset1.xlsx’

Step 2: Drag and Drop the uploaded data


ROLL UP Operation
Question: What are the Average sales amounts for each State?
Step 1. Add aggregator function and select “SALES” and average function.

Step 2. Select attribute as “STATE”


Step 3. Run option

The above screenshot shows average sales of all products in all the states
DRILL DOWN Operation
Question: What are the sales details for Product type “Electronics” in the State “FL” region.
Step 1. Add aggregator parameter and choose “SALES” sum, and select attribute of the same
as Product_type and State

Step 2.Add “Filter Example” operator add attributes as shown in the image.
Step 3. Run

The above screenshot shows the average sales of Product type Electronics in the state FL
SLICING Operation
Question: What is the average profit according to market size and product line “Cars”
according to the market?
Step 1. Add Filter Example and Product types and product line as Cars

Step 2. Add aggregate operator and then choose average profit and attributes as shown in
image
Step 3. Run

The above screenshot shows Product line “Cars” which market size is large with avg profit
50613.500
DICING Operation
Question: What is the average profit,sales and total expenditure according to market size and
product line “Cars” according to the market?
Step 1. Step 1. Add Filter Example and Product types and product line as Cars

Step 2. Add aggregate operator and then choose average profit,sales and expenditures and
attributes as shown in image
Step 3. Run

The above screenshot shows the market size of cars along with average of profit,sales and
total expense
PIVOT Operation
Question: What are the total sales amounts for each state and product type?
Step 1. Click Pivot Operator

Step 2. Select PIVOT operator and click the group by operator and choose State and Product
type, click on attribute “Sales” sum.
Step 3. Run
ETL Process (Extract, Transform, Load)

The following steps are included in the ETL (Extract, Transform, Load) process in
RapidMiner Studio:

1. Extract: This stage entails obtaining data from multiple sources, including web services,
spreadsheets, and databases. Multiple operators are offered by RapidMiner to enable
connection to these data sources and the retrieval of the required data.

2. Transform: In order to match the intended format, the data may need to be cleaned,
normalized, and converted after extraction. Data filtering, aggregation, sorting, dataset
combining, handling missing values, and data type conversion are all included in this stage.
To carry out these operations, RapidMiner provides a large selection of transformation
operators.

3. Load: The last phase is to load the converted data into the desired system, which could be a
database, data warehouse, or other type of storage system. Operators can write data to several
locations with RapidMiner.

Question: What would the data look like with sales amount from state “NY”

Step 1: Drag and Drop data and select attributes .

Step 2: Add filter example operator


Step 3: Add Write CSV operator and click on the folder option and type in the name of the
file you want to create.
STAR SCHEMA

Star Schema is a data warehouse design characterized by a central fact table surrounded by
multiple dimension tables. The fact table stores quantitative data (metrics) like sales, costs, or
profits. Dimension tables provide context for the fact table, containing descriptive attributes
like time, product, customer, location, etc. This structure is optimized for query performance,
allowing for efficient data analysis and aggregation. Dimension tables typically have
hierarchies (e.g., year, quarter, month) for drill-down analysis. Foreign keys in the fact table
link to primary keys in dimension tables, creating a star-like pattern. Star schemas are widely
used in business intelligence applications due to their simplicity and effectiveness in
supporting complex queries and reporting.

A fact table is required.


Step 1: Add dataset and Select Attribute operator in dropdown box

Step 2. In the attribute filter type click on “A SUBSET” instead of all attributes and then add
all the attributes
Step 3. Get Aggregator operator and select attributes.

Step 4. Add write csv file and then save it as fact table
Step 5. Now open excel with fact table

Step 5. Click the first row and then go to data and click on text to columns
Step 6. Then transpose and copy it
Step 7. Make New table and then paste it another sheet

Step 8. Make data table and now do it personally.


Repeat the data table for each column
Select the attribute operator and aggregator operator and then group by attribute for each
colum and then put in excel.
Step 9. Repeat the same steps for this as well but in operator and select only State attribute
After formatting , final look in excel
Step 10. Repeat the same steps for this as well but in operator and select only Market Size
attribute
After formatting, Final excel table looks like
Step11. Repeat the same steps for this as well but in operator and select only Product type
attribute
After formatting, Final excel table looks like:
Step 12. Repeat the same steps for this as well but in operator and select only Product type
attribute
After formatting, Final excel sheet
(Star Schema Table)
6. SNOWFLAKE SCHEMA

A snowflake schema is a data warehouse design that extends the star schema concept. It
normalizes dimension tables into multiple related tables, forming a hierarchical structure
resembling a snowflake. A central fact table contains numerical data, linked to these
dimension tables. This design reduces data redundancy and improves data integrity.
However, it can increase query complexity due to multiple joins. Snowflake schemas are
suitable for large data warehouses where storage efficiency and data quality are paramount.
While it can be complex to implement, it offers advantages in terms of space optimization
and data consistency.
So on repetition of this we can directly have the snowflake schema.

(SNOWFLAKE SCHEMA TABLE)

You might also like