dm cia1

Ronak Nitin Sheth
3 BBA FMA A
2324351
Data Mining
CIA-1
Topic: Data Warehousing and OLAP Operations
Submitted to:
Dr Shashidhar Yadav J
School of Business and Management
CHRIST (Deemed To Be University)
Yeshwanthpur Campus
Date of submission: 29/7/24

INDEX
1. Introduction to dataset
2. OLAP operations
• Roll Up
• Drill Down
• Slicing
• Dicing
• Pivot
3. ETL (Extract, Transform, Load)
4. Star Schema
5. Snowflake Schema
ABOUT THE DATASET
The dataset contains sales information for various products across different states and
markets. Each row represents a product, with details such as:
• Customer ID: Unique identifier for the customer

• Customer Name: Name of the customer
• Product Type: The category of the product (Electronics, Apparel, Grocery,
Automotive, Furniture)
• Product Line: Specific product within the type (Laptops, Men's Wear, Snacks, Cars,
Home Office)
• State: The state where the product is sold
• Market Size: The size of the market (Small, Medium, Large)
• Total Expense: Total expenses incurred for the product
• Sales: Total sales generated by the product
• Profit: Profit made from the product
• Cost of Goods: Cost of producing or acquiring the product
ccc
Upto 55 columns..
We assume that data is processed and has been uploaded to the software and connected
OLAP OPERATIONS
Drop the selected file “dataset1.xlsx” file in rapid miner and connect it with a Splan
Step 1: Upload the data by clicking on Import Data section and then click on “dataset1.xlsx’
Step 2: Drag and Drop the uploaded data

ROLL UP Operation
Question: What are the Average sales amounts for each State?
Step 1. Add aggregator function and select “SALES” and average function.
Step 2. Select attribute as “STATE”

Step 3. Run option
The above screenshot shows average sales of all products in all the states
DRILL DOWN Operation
Question: What are the sales details for Product type “Electronics” in the State “FL” region.
Step 1. Add aggregator parameter and choose “SALES” sum, and select attribute of the same
as Product_type and State
Step 2.Add “Filter Example” operator add attributes as shown in the image.
Step 3. Run
The above screenshot shows the average sales of Product type Electronics in the state FL
SLICING Operation
Question: What is the average profit according to market size and product line “Cars”
according to the market?
Step 1. Add Filter Example and Product types and product line as Cars
Step 2. Add aggregate operator and then choose average profit and attributes as shown in
image
Step 3. Run
The above screenshot shows Product line “Cars” which market size is large with avg profit
50613.500
DICING Operation
Question: What is the average profit,sales and total expenditure according to market size and
product line “Cars” according to the market?
Step 1. Step 1. Add Filter Example and Product types and product line as Cars
Step 2. Add aggregate operator and then choose average profit,sales and expenditures and
attributes as shown in image
Step 3. Run
The above screenshot shows the market size of cars along with average of profit,sales and
total expense
PIVOT Operation
Question: What are the total sales amounts for each state and product type?
Step 1. Click Pivot Operator
Step 2. Select PIVOT operator and click the group by operator and choose State and Product
type, click on attribute “Sales” sum.
Step 3. Run
ETL Process (Extract, Transform, Load)
The following steps are included in the ETL (Extract, Transform, Load) process in
RapidMiner Studio:
1. Extract: This stage entails obtaining data from multiple sources, including web services,
spreadsheets, and databases. Multiple operators are offered by RapidMiner to enable
connection to these data sources and the retrieval of the required data.
2. Transform: In order to match the intended format, the data may need to be cleaned,
normalized, and converted after extraction. Data filtering, aggregation, sorting, dataset
combining, handling missing values, and data type conversion are all included in this stage.
To carry out these operations, RapidMiner provides a large selection of transformation
operators.
3. Load: The last phase is to load the converted data into the desired system, which could be a
database, data warehouse, or other type of storage system. Operators can write data to several
locations with RapidMiner.
Question: What would the data look like with sales amount from state “NY”
Step 1: Drag and Drop data and select attributes .
Step 2: Add filter example operator

Step 3: Add Write CSV operator and click on the folder option and type in the name of the
file you want to create.
STAR SCHEMA
Star Schema is a data warehouse design characterized by a central fact table surrounded by
multiple dimension tables. The fact table stores quantitative data (metrics) like sales, costs, or
profits. Dimension tables provide context for the fact table, containing descriptive attributes
like time, product, customer, location, etc. This structure is optimized for query performance,
allowing for efficient data analysis and aggregation. Dimension tables typically have
hierarchies (e.g., year, quarter, month) for drill-down analysis. Foreign keys in the fact table
link to primary keys in dimension tables, creating a star-like pattern. Star schemas are widely
used in business intelligence applications due to their simplicity and effectiveness in
supporting complex queries and reporting.
A fact table is required.

Step 1: Add dataset and Select Attribute operator in dropdown box
Step 2. In the attribute filter type click on “A SUBSET” instead of all attributes and then add
all the attributes
Step 3. Get Aggregator operator and select attributes.
Step 4. Add write csv file and then save it as fact table
Step 5. Now open excel with fact table
Step 5. Click the first row and then go to data and click on text to columns
Step 6. Then transpose and copy it
Step 7. Make New table and then paste it another sheet
Step 8. Make data table and now do it personally.

Repeat the data table for each column
Select the attribute operator and aggregator operator and then group by attribute for each
colum and then put in excel.
Step 9. Repeat the same steps for this as well but in operator and select only State attribute
After formatting , final look in excel
Step 10. Repeat the same steps for this as well but in operator and select only Market Size
attribute
After formatting, Final excel table looks like
Step11. Repeat the same steps for this as well but in operator and select only Product type
attribute
After formatting, Final excel table looks like:
Step 12. Repeat the same steps for this as well but in operator and select only Product type
attribute
After formatting, Final excel sheet
(Star Schema Table)
6. SNOWFLAKE SCHEMA
A snowflake schema is a data warehouse design that extends the star schema concept. It
normalizes dimension tables into multiple related tables, forming a hierarchical structure
resembling a snowflake. A central fact table contains numerical data, linked to these
dimension tables. This design reduces data redundancy and improves data integrity.
However, it can increase query complexity due to multiple joins. Snowflake schemas are
suitable for large data warehouses where storage efficiency and data quality are paramount.
While it can be complex to implement, it offers advantages in terms of space optimization
and data consistency.
So on repetition of this we can directly have the snowflake schema.
(SNOWFLAKE SCHEMA TABLE)

dm cia1

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

dm cia1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

dm cia1

Uploaded by

Copyright:

Available Formats

Ronak Nitin Sheth

Topic: Data Warehousing and OLAP Operations

Date of submission: 29/7/24

• Customer ID: Unique identifier for the customer

Step 2: Drag and Drop the uploaded data

Step 2. Select attribute as “STATE”

Step 1: Drag and Drop data and select attributes .

Step 2: Add filter example operator

A fact table is required.

Step 8. Make data table and now do it personally.

(SNOWFLAKE SCHEMA TABLE)

You might also like

dm cia1

Uploaded by

Document Informationclick to expand document informationData mining cia 1

Document Informationclick to expand document information

Copyright:

Available Formats

dm cia1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

dm cia1

Uploaded by

Copyright:

Available Formats

Ronak Nitin Sheth

Topic: Data Warehousing and OLAP Operations

Date of submission: 29/7/24

• Customer ID: Unique identifier for the customer

Step 2: Drag and Drop the uploaded data

Step 2. Select attribute as “STATE”

Step 1: Drag and Drop data and select attributes .

Step 2: Add filter example operator

A fact table is required.

Step 8. Make data table and now do it personally.

(SNOWFLAKE SCHEMA TABLE)

You might also like