dm cia1
dm cia1
dm cia1
3 BBA FMA A
2324351
Data Mining
CIA-1
Submitted to:
Dr Shashidhar Yadav J
School of Business and Management
CHRIST (Deemed To Be University)
Yeshwanthpur Campus
The dataset contains sales information for various products across different states and
markets. Each row represents a product, with details such as:
ccc
Upto 55 columns..
We assume that data is processed and has been uploaded to the software and connected
OLAP OPERATIONS
Drop the selected file “dataset1.xlsx” file in rapid miner and connect it with a Splan
Step 1: Upload the data by clicking on Import Data section and then click on “dataset1.xlsx’
The above screenshot shows average sales of all products in all the states
DRILL DOWN Operation
Question: What are the sales details for Product type “Electronics” in the State “FL” region.
Step 1. Add aggregator parameter and choose “SALES” sum, and select attribute of the same
as Product_type and State
Step 2.Add “Filter Example” operator add attributes as shown in the image.
Step 3. Run
The above screenshot shows the average sales of Product type Electronics in the state FL
SLICING Operation
Question: What is the average profit according to market size and product line “Cars”
according to the market?
Step 1. Add Filter Example and Product types and product line as Cars
Step 2. Add aggregate operator and then choose average profit and attributes as shown in
image
Step 3. Run
The above screenshot shows Product line “Cars” which market size is large with avg profit
50613.500
DICING Operation
Question: What is the average profit,sales and total expenditure according to market size and
product line “Cars” according to the market?
Step 1. Step 1. Add Filter Example and Product types and product line as Cars
Step 2. Add aggregate operator and then choose average profit,sales and expenditures and
attributes as shown in image
Step 3. Run
The above screenshot shows the market size of cars along with average of profit,sales and
total expense
PIVOT Operation
Question: What are the total sales amounts for each state and product type?
Step 1. Click Pivot Operator
Step 2. Select PIVOT operator and click the group by operator and choose State and Product
type, click on attribute “Sales” sum.
Step 3. Run
ETL Process (Extract, Transform, Load)
The following steps are included in the ETL (Extract, Transform, Load) process in
RapidMiner Studio:
1. Extract: This stage entails obtaining data from multiple sources, including web services,
spreadsheets, and databases. Multiple operators are offered by RapidMiner to enable
connection to these data sources and the retrieval of the required data.
2. Transform: In order to match the intended format, the data may need to be cleaned,
normalized, and converted after extraction. Data filtering, aggregation, sorting, dataset
combining, handling missing values, and data type conversion are all included in this stage.
To carry out these operations, RapidMiner provides a large selection of transformation
operators.
3. Load: The last phase is to load the converted data into the desired system, which could be a
database, data warehouse, or other type of storage system. Operators can write data to several
locations with RapidMiner.
Question: What would the data look like with sales amount from state “NY”
Star Schema is a data warehouse design characterized by a central fact table surrounded by
multiple dimension tables. The fact table stores quantitative data (metrics) like sales, costs, or
profits. Dimension tables provide context for the fact table, containing descriptive attributes
like time, product, customer, location, etc. This structure is optimized for query performance,
allowing for efficient data analysis and aggregation. Dimension tables typically have
hierarchies (e.g., year, quarter, month) for drill-down analysis. Foreign keys in the fact table
link to primary keys in dimension tables, creating a star-like pattern. Star schemas are widely
used in business intelligence applications due to their simplicity and effectiveness in
supporting complex queries and reporting.
Step 2. In the attribute filter type click on “A SUBSET” instead of all attributes and then add
all the attributes
Step 3. Get Aggregator operator and select attributes.
Step 4. Add write csv file and then save it as fact table
Step 5. Now open excel with fact table
Step 5. Click the first row and then go to data and click on text to columns
Step 6. Then transpose and copy it
Step 7. Make New table and then paste it another sheet
A snowflake schema is a data warehouse design that extends the star schema concept. It
normalizes dimension tables into multiple related tables, forming a hierarchical structure
resembling a snowflake. A central fact table contains numerical data, linked to these
dimension tables. This design reduces data redundancy and improves data integrity.
However, it can increase query complexity due to multiple joins. Snowflake schemas are
suitable for large data warehouses where storage efficiency and data quality are paramount.
While it can be complex to implement, it offers advantages in terms of space optimization
and data consistency.
So on repetition of this we can directly have the snowflake schema.