Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
104 views

Problem Statement

The document discusses analyzing sales and order data from a retail store dataset. It provides instructions on cleaning the data, summarizing total and average sales by customer, and analyzing customer orders to identify frequent buyers. Sample outputs are provided for the sales and orders analysis tasks.

Uploaded by

Ro HiTh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views

Problem Statement

The document discusses analyzing sales and order data from a retail store dataset. It provides instructions on cleaning the data, summarizing total and average sales by customer, and analyzing customer orders to identify frequent buyers. Sample outputs are provided for the sales and orders analysis tasks.

Uploaded by

Ro HiTh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Informatica Hands-On Challenge: Super_Store Analysis

• Introduction: You are provided with a sample dataset from a retail store,
Super_Store. This dataset contains information about orders, customers, products,
and sales. Your task involves cleaning the data, analyzing sales, customer orders,
customer geography, and order processing time using Informatica PowerCenter.

Data Preparation:

• Oracle SQL Setup:

• Log in to Oracle SQL Developer in Admin connection using the credentials:

• Username: system

• Password: Admin

Create a table named Super_Store with the provided structure

Row_ID INT

Order_Date DATE

Ship_Date DATE

Ship_Mode VARCHAR(50)

Customer_ID VARCHAR(50)

Customer_Name VARCHAR(50)

Segment VARCHAR(50)

Country VARCHAR(50)

City VARCHAR(50)

State VARCHAR(50)

Postal_Code VARCHAR(50)

Region VARCHAR(50)
Product_ID VARCHAR(250)

Category VARCHAR(250)

Sub_Category VARCHAR(250)

Product_Name VARCHAR(250)

Sales INT

• NOTE : while loading data into table update order_date, ship_date Date Format to
DD/MM/YYYY

Load superstore_data.csv into the Super_Store table.

You are given data set is in the "~\Desktop\Project\miniproject-informatica-


super_store \"

Informatica Repository Setup:

Connect to the Informatica repository manager using the following credentials:

• Username: Administrator

• Password: Administrator

Create a folder named Super_Store in the repository manager.

How to Import Source Table in Source Analyzer

Following are the steps to import source table in Informatica Source Analyzer:

Step 1) Go to “Sources” option


In source analyzer
1. Click on tab “Sources” from the main menu
2. Select import from database option, after this ODBC Connection box will open.
Step 2) Create ODBC connection

• We will now create ODBC connection


1. Click on the button next to ODBC data Source(...).
2. On the next page, Select user DSN tab and click Add button.
3. Select oracle wire protocol
4. On the next page, select the general tab and enter the database details.
Then click connect.
• Data Source name : oracle
• Host : localhost
• port : 1521
• sid : xe

Create Connections for Workflow Manager

To Create a Relational Connection

Step 1: In Workflow Manager


• Click on the Connection menu
• Select Relational Option
Step 2: In the pop up window
• Select Oracle in type
• Click on the new button
Step 3: In the new window of connection object definition
• Enter Connection Name (system)
• Enter username - system
• Enter password – Admin
• Enter connection string - xe
• Leave other settings as default and Select OK button
Note : For more credentials, like for designer, knidly check in the Readme File.

Note : Please Follow the naming conventions in the problem statement

Data Cleaning:

• Mapping Name: Map_Cleaned_Data

• Workflow Name: Workflow_Cleaned_Data


• Session Name: Session_Cleaned_Data

• Target Table: Super_Store_Cleaned_Data

Operations:

• Remove duplicates from the dataset to ensure data integrity.

• Filter records where Country is 'United States' to focus on domestic orders.

• Extract numeric part from Customer_ID to standardize customer identification.(EX:


CH-1234, extract 1234)

• Concatenate Customer_ID and Customer_Name with '-' to create a unique


identifier for each customer.(Ex: 1234-Charlies, Extracted_ID-Customer_name) and
store it in Customer_Id_Name Column

• Drop the customer_id, Customer_name Columns

• After cleaning Load data into the Super_Store_Cleaned_Data target table (For
columns check sample output)

• After completing of mapping, in the workflow manager.

• We are going to connect to repository “Repo_etlclass”, so double click on the


folder to connect.

• Enter user name and password then select “Connect Button”.

Username : Administrator

Password : Administrator

• create session Session_Cleaned_Data

• Step 1 – Double click on the session object in wokflow manager. It will open
a task window to modify the task properties.

• Step 2 – In the edit task window


1. Select mapping tab
2. Select connection property
3. Assign the connection to source and target, the connection which we
created in early steps.(In targets property – load data as normal and
click on insert check box)
4. Select OK Button
Creating workflow
• Step 1 – In workflow designer
1. Select workflows menu
2. Select create option
• Step 2 – In create workflow window

1. Enter workflow name - Workflow_Cleaned_Data


2. Select OK Button ( leave other options as default)
->When you create a workflow, it does not consist of any tasks. So, to execute any task in
a workflow you have to add task in it.
• Step 1 – To add session task that we have created in Task developer to the
workflow desinger
1. In the navigator tree, expand the tasks folder
2. Drag and drop the command task to workflow designer
• After link task to the workflow and start the workflow
->Kindly check the workflow monitor if it success or not.
Sample Output : Reamining columns and additional with 'CUSTOMER_ID_NAME'
COLUMN.

CUSTOMER_ID_NAME

21925-Zushuss Donatelli

16585-Ken Black

21520-Tracy Blumstein

NOTE : Super_Store_Cleaned_Data table data is used for the below every tasks.
Analysis Tasks:

Task 1: Sales Summary

• Mapping Name: Map_Sales_Summary

• Workflow Name: Workflow_Sales_Summary

• Session Name: Session_Sales_Summary

• Target Table: Sales_Summary

Problem Statement: Summarize total sales and average sales for each customer. Identify
customers with significant contribution to overall sales.

Operations:

• Summarize the sales data by calculating the sum of sales and store it in Total_Sales
and average sales store it in a Avg_Sales column for each customer using their
'Customer_ID_Name'

• Order the summarized data in descending order based on the total sales
('Total_Sales').

• Filter customers with total sales greater than 3000 and average sales greater than
300 to focus on significant contributors.

• Drop the unnecessary columns, kindly check the sample output.

• Load data into the Sales_Summary target table (For columns check sample
output)

• After completing of mapping, in the workflow manager.

• We are going to connect to repository “Repo_etlclass”, so double click on the


folder to connect.

• Enter user name and password then select “Connect Button”.

Username : Administrator
Password : Administrator

• create session Session_Sales_Summary

• Step 1 – Double click on the session object in wokflow manager. It will open
a task window to modify the task properties.

• Step 2 – In the edit task window


1. Select mapping tab
2. Select connection property
3. Assign the connection to source and target, the connection which we
created in early steps.(In targets property – load data as normal and
click on insert check box)
4. Select OK Button
Creating workflow
• Step 1 – In workflow designer
1. Select workflows menu
2. Select create option
• Step 2 – In create workflow window

1. Enter workflow name - Workflow_Sales_Summary


2. Select OK Button ( leave other options as default)
->When you create a workflow, it does not consist of any tasks. So, to execute any task in
a workflow you have to add task in it.
• Step 1 – To add session task that we have created in Task developer to the
workflow desinger
1. In the navigator tree, expand the tasks folder
2. Drag and drop the command task to workflow designer
• After link task to the workflow and start the workflow
->Kindly check the workflow monitor if it success or not.

Sample Output:

CUSTOMER_ID_ TOTAL_SALES AVG_SALES


NAME
11140-Becky 10540 1506
Martin

14635-Grant 8167 4084


Thornton

20290-Sean 5580 2790


Braxton

Task 2: Customer Order Analysis

• Mapping Name: Map_Order_Analysis

• Workflow Name: Workflow_Order_Analysis

• Session Name: Session_Order_Analysis

• Target Table: Order_Analysis

Problem Statement: Analyze customer orders to determine the most frequent buyers
and their order patterns.

Operations:

• Filter records for customers in category 'Office Supplies' and City in 'San Francisco'
to analyze local customer behavior.

• Create new column orders_count, Calculate the count of orders for each customer
to determine their order frequency.

• Sort the results by order count in descending order to identify the most frequent
buyers and get only top 10 records.

• Drop the unnecessary columns, kindly check the sample output.

• Load data into the Order_Analysis target table (For columns check sample output)

• After completing of mapping, in the workflow manager.

• We are going to connect to repository “Repo_etlclass”, so double click on the


folder to connect.

• Enter user name and password then select “Connect Button”.

Username : Administrator

Password : Administrator

• create session Session_Order_Analysis

• Step 1 – Double click on the session object in wokflow manager. It will open
a task window to modify the task properties.

• Step 2 – In the edit task window


1. Select mapping tab
2. Select connection property
3. Assign the connection to source and target, the connection which we
created in early steps.(In targets property – load data as normal and
click on insert check box)
4. Select OK Button
Creating workflow
• Step 1 – In workflow designer
1. Select workflows menu
2. Select create option
• Step 2 – In create workflow window

1. Enter workflow name - Workflow_Order_Analysis


2. Select OK Button ( leave other options as default)
->When you create a workflow, it does not consist of any tasks. So, to execute any task in
a workflow you have to add task in it.
• Step 1 – To add session task that we have created in Task developer to the
workflow desinger
1. In the navigator tree, expand the tasks folder
2. Drag and drop the command task to workflow designer
• After link task to the workflow and start the workflow
->Kindly check the workflow monitor if it success or not.
Sample output:

CUSTOMER_ID_NA ORDERS_COUN
ME T

15520-Jeremy 4
Pistek

16510-Keith Herrera 4

Task 3: Customer Geography Analysis6

• Mapping Name: Map_Geography_Analysis

• Workflow Name: Workflow_Geography_Analysis

• Session Name: Session_Geography_Analysis

• Target Table: Geography_Analysis

Problem Statement: Analyze customer distribution across different regions to identify


potential market segments.

Operations:

• Filter records for customers in state 'California' to focus on specific geographical


areas.

• Group customers by region (North, South, East, West) based on their location data.

• Calculate the count of customers in each region to understand the geographical


distribution.

• Drop the unnecessary columns, kindly check the sample output.

• Load data into the Geography_Analysis target table (For columns check sample
output)

• After completing of mapping, in the workflow manager.


• We are going to connect to repository “Repo_etlclass”, so double click on the
folder to connect.

• Enter user name and password then select “Connect Button”.

Username : Administrator

Password : Administrator

• create session Session_Geography_Analysis

• Step 1 – Double click on the session object in wokflow manager. It will open
a task window to modify the task properties.

• Step 2 – In the edit task window


1. Select mapping tab
2. Select connection property
3. Assign the connection to source and target, the connection which we
created in early steps.(In targets property – load data as normal and
click on insert check box)
4. Select OK Button
Creating workflow
• Step 1 – In workflow designer
1. Select workflows menu
2. Select create option
• Step 2 – In create workflow window

1. Enter workflow name - Workflow_Geography_Analysis


2. Select OK Button ( leave other options as default)
->When you create a workflow, it does not consist of any tasks. So, to execute any task in
a workflow you have to add task in it.
• Step 1 – To add session task that we have created in Task developer to the
workflow desinger
1. In the navigator tree, expand the tasks folder
2. Drag and drop the command task to workflow designer
• After link task to the workflow and start the workflow
->Kindly check the workflow monitor if it success or not.
Sample output:

CUSTOMER_ID_ REGION STATE REGION_ORDERS_CO


NAME UNT

14095-Eudokia West California 182


Martin

Task 4: Order Processing Time Analysis

• Mapping Name: Map_Order_Processing

• Workflow Name: Workflow_Order_Processing

• Session Name: Session_Order_Processing

• Target Table: Order_Processing

Problem Statement: Evaluate order processing efficiency by analyzing the time taken
between order placement and shipment,

Operations:

• Calculate the processing days for each order by subtracting the order date from
the ship date and store it in new column Processing_days.

• Categorize processing days (e.g., Less than 1 day then Immediate delivery, 1 to 3
days then Moderate Delivery, 3 or more days then Long term delivery ).

• Count the number of orders falling with in each categorise processing days for
each to analyze processing days distributions.

• Drop the unnecessary columns, kindly check the sample output.

• Load data into the Order_Processing target table (For columns check sample
output)

• After completing of mapping, in the workflow manager.

• We are going to connect to repository “Repo_etlclass”, so double click on the


folder to connect.

• Enter user name and password then select “Connect Button”.

Username : Administrator

Password : Administrator

• create session Session_Order_Processing

• Step 1 – Double click on the session object in wokflow manager. It will open
a task window to modify the task properties.

• Step 2 – In the edit task window


1. Select mapping tab
2. Select connection property
3. Assign the connection to source and target, the connection which we
created in early steps.(In targets property – load data as normal and
click on insert check box)
4. Select OK Button
Creating workflow
• Step 1 – In workflow designer
1. Select workflows menu
2. Select create option
• Step 2 – In create workflow window

1. Enter workflow name - Workflow_Order_Processing


2. Select OK Button ( leave other options as default)
->When you create a workflow, it does not consist of any tasks. So, to execute any task in
a workflow you have to add task in it.
• Step 1 – To add session task that we have created in Task developer to the
workflow desinger
1. In the navigator tree, expand the tasks folder
2. Drag and drop the command task to workflow designer
• After link task to the workflow and start the workflow
->Kindly check the workflow monitor if it success or not.
Sample Output:

CATEGORISE_PROCESSING_ ORDERS_COUN
DAYS T

Immediate delivery 32

Long Term delivery 641

After completing the challenge, Double click on the SAMPLE_TEST.EXE file to get the
sample score

Click on the submit button to validate your solution

You might also like