Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (1 vote)
566 views

SAS Assignment For Practice

This assignment involves performing various data analysis tasks on multiple datasets including: 1. Importing and summarizing sales data, creating summary tables by brand, item, and month. 2. Splitting datasets into equal parts by salary and randomly. 3. Filling in missing data and creating cumulative series. 4. Creating analytic plan variables and standardizing variable names. 5. Aggregating TV viewership data to the national level using population weights. 6. Summarizing and transforming daily sales data by store, state, and week. 7. Explaining the difference between sorting with Nodupkey vs Noduprecs.

Uploaded by

Amit Anand
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
566 views

SAS Assignment For Practice

This assignment involves performing various data analysis tasks on multiple datasets including: 1. Importing and summarizing sales data, creating summary tables by brand, item, and month. 2. Splitting datasets into equal parts by salary and randomly. 3. Filling in missing data and creating cumulative series. 4. Creating analytic plan variables and standardizing variable names. 5. Aggregating TV viewership data to the national level using population weights. 6. Summarizing and transforming daily sales data by store, state, and week. 7. Explaining the difference between sorting with Nodupkey vs Noduprecs.

Uploaded by

Amit Anand
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

ASSIGNEMENT 1

1. Import the Sales file and map brands using the brand mapping also given in
mapping tab of attached excel:
a. using SAS IMPORT WIZARD
b. using PROC IMPORT
c. using CUSTOM CODE with below format:
Data Type
Length
Format
Brand
Character
50
Item
Numeric
50
Store
Numeric
10
Month
Character
5
Monthly_QTY
Numeric
8
Comma Format
Monthly_SALES
Numeric
8
Dollar Format
Raw File:

Sales.zipx

2.

Prepare the summary table as per below mentioned format:


a.

Bra
nd
Des
c

Ite
m

# of
Stor
es

Avera
ge
QTY

Sales Volume

CLM
QTY

Averag
e
SALES

Sales Value

CLM
SALE
S

XXX

XXX

XXX

XXX

XXX (Sum of Monthly


Qty)

XXX:XXX

XXX

XXX (Sum of Monthly


Sales)

XXX:XXX

b.

Brand
Desc

Mon
th

Average QTY

Max:Min
Qty

Average
SALES

Max:MIN Sales

XXX

XXX

XXX

XXX:XXX

XXX

XXX:XXX

c. Based on Monthly QTY, prepare a summary table for the top 10 items for

each Brand,
Provided that all the top 10 items are selling in equal number of stores.

3. How will you split a dataset in four equal parts having:


a. Observations picked up on the basis of descending salary? For e.g.
People with Top 25% salary should be outputted in first dataset, Top 2650 go into second dataset and so on.
b. Records are randomly picked up from input dataset? E.g.
Example:
Create a table with some sample data with below format to perform the
above exercise.
EmpNo
Name
Addres
Mobile
Salary
s
ID1
XXXX1
Add 1
+91-7892123456

56,065.00
ID2
XXXX2
Add 2
+91-9728737466

34,013.00
ID3
XXXX3
Add 3
+91-8285405233

40,138.00

4. You have the data set shown in below example as Data Set A. Prepare new
data sets Data Set B and Data Set C also shown below:
Data Set A: Raw Dataset

ID

Unit

ID

101

Mont
h
Jan

87.89

102

Mont
h
Jan

101

Feb

20.95

102

Feb

101

Mar

24.14

102

Mar

101

Apr

29.13

Unit

ID
103

Mont
h
Jan

23.705
13
27.565
96
45.223
87

Unit
98.96

103

Feb

53.11

103

Mar

17.17

103
103
103
103

Apr
May
Jun
Jul

21.52
63.26
24.32
42.93

Data Set B: Missing Months are filled with last value of Available months.

ID
101
101
101
101
101
101
101
101
101
101
101
101

Mont
h
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec

Unit

ID

87.89
20.95
24.14
29.13
29.13
29.13
29.13
29.13
29.13
29.13
29.13
29.13

102
102
102
102
102
102
102
102
102
102
102
102

Mont
h
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec

Unit

ID

23.70
27.56
45.22
45.22
45.22
45.22
45.22
45.22
45.22
45.22
45.22
45.22

103
103
103
103
103
103
103
103
103
103
103
103

Mont
h
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec

Unit
98.96
53.11
17.17
21.52
63.26
24.32
42.93
42.93
42.93
42.93
42.93
42.93

Data Set C: Missing Months are filled with Average of Available months.

ID
101
101
101
101
101
101
101
101
101
101
101
101

5.

Mont
h
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec

Unit

ID

87.89
20.95
24.14
29.13
40.53
40.53
40.53
40.53
40.53
40.53
40.53
40.53

102
102
102
102
102
102
102
102
102
102
102
102

Mont
h
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec

Unit

ID

23.70
27.56
45.22
32.16
32.16
32.16
32.16
32.16
32.16
32.16
32.16
32.16

103
103
103
103
103
103
103
103
103
103
103
103

Mont
h
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec

Unit
98.96
53.11
17.17
21.52
63.26
24.32
42.93
45.90
45.90
45.90
45.90
45.90

Dummy_2 Data set is Item x Date level data but all the items have not same
start and end date. The time-series is also not continuous for the items i.e. in
between Start Date and End Date some dates WKLY QTY and WKLY SALES are
missing.
Prepare the data set as a continuous time-series for each Item. The missing
WKLY QTY and WKLY SALES will be replaced by:

a.
b.
c.
d.

Zeros
Average of the available time-series for each item
Average of top 5 WKLY QTY and WKLY SALES for each item
Save all the above 3 data set in different sheet of an Excel File using
SAS.

Dummy2.zipx

6. Attached Dummy_3 Data set is Item x Store level Monthly SALES data.
a. Prepare a summary table to know distribution of sales among the
stores.
b. Prepare a data set with duplicate observation as per following criteria:
i. If Total Sales of an Item is LESS than 15% of Total Sales for a
particular Store then the Item x store combination will be
repeated for 3 times.
ii. If Total Sales of an Item is MORE than 15% but LESS than 50 %
of Total Sales for a particular Store then the Item x store
combination will be repeated for 5 times.
iii. If Total Sales of an Item is MORE than 50% of Total Sales for a
particular Store then the Item x store combination will be
repeated for 10 times.
c. Prepare a summary table for the above data set as per below
mentioned format
Ite
m
XXX
XXX

Sto
re
XXX
XXX

Total SALES
(Item)
XXX
XXX

Total SALES
(Store)
XXX
XXX

%Sales Item
by Store
XXX
XXX

# of
Repetition
XXX
XXX

Dummy3.zipx

7. Create Cumulative Series for each of the variable in attached raw file:
a. using RETAIN function
b. without using RETAIN function

Dummy4.xlsx

8. Use the Dummy 4.xlsx and create below APLs:


(In _AAPPPL format of APL AA stands for Ad-stock, PPP stands for Power and
L stands for Lag value)
TV_REG_GRP_100401
TV_REG_DOL_000000
DSP_NAT_IMP_200600
DSP_NAT_CLK_400402
9. Input data set has following values:
Variable Name APL
WK_END_DT
Region
TV_REG_GRP__Variable_100401
Variable_TV_REG_DOL_000000
DSP_NAT_IMPRESSIONS_200600
DSP_NATIONAL_CICKS_400402

Write a dataset so that output

dataset will have following values:


Variable Name APL

Variable Name

WK_END_DT
Region
TV_REG_GRP__Variable_100401
Variable_TV_REG_DOL_000000
DSP_NAT_IMPRESSIONS_200600
DSP_NATIONAL_CICKS_400402

WK_END_DT
Region
TV_REG_GRP__Variable
Variable_TV_REG_DOL
DSP_NAT_IMPRESSIONS
DSP_NATIONAL_CICKS_400402

10.TV GRP data is provided at region, week level. Create national level TV data.
Population data is provided for all regions.

Region
101
101
102
102
103
103

Region
101
102
103

TV GRP data
Week
01-Jul-2013
08-Jul-2013
01-Jul-2013
08-Jul-2013
01-Jul-2013
08-Jul-2013

TV_REG_GRP
10
15
30
25
8
10

Population
Population
10500
22300
12800

11.Import following data of daily sales data of two products (Shoes and Bags) in
different stores. Sales data is provided as State code Store number level (NY
12 means 12th store of New York).
Sales data
Store_ID
Date
Sales_Shoes
Sales_Bags
NY 10
01-Jul-2013
10
20
NY 10
03-Jul-2013
10
NY 10
08-Jul-2013
5
8
NY 203
02-Jul-2013
8
9
NY 203
03-Jul-2013
10
NY 203
10-Jul-2013
15
20
NJ 20
01-Jul-2013
3
7
NJ 20
03-Jul-2013
4
1
NJ 20
08-Jul-2013
5
8
NJ 123
02-Jul-2013
6
5

NJ 123
NJ 123

03-Jul-2013
10-Jul-2013

4
3

a. Create a dataset having distinct list of all the stores (resulting dataset
will only have one variable).
b. Create a dataset having total sales of both products at state, week
level (week ending Saturday) - like Total Sales in NY during the week
30th June 6th July. New dataset will have 3 columns State,
Week_end_date, and Total_Sales.
c. Merge with a predefined list of stores and report those store ids for
which no sale information is provided.
d.
Store_ID
NY 10
NJ 10
NJ 123
NY 203
NJ 20
12.A dataset contains both numeric and character variables. Write a code:
a. Which would attach the text End to all the character variables?
b. Which would add 10 to all the numeric values?
c. Which would Convert Negative Values to Positive Values but positive, 0
or missing will remain same.
13.Please provide the output when a dataset is sorted using the Nodupkey and
Noduprecs.
Explain the difference between two outputs.

SOLUTION Q12:
Variable1 = Variable1|| End;
New_Nunber = abs(negative-value) ORNew_Number = negativevalue* -1
--------------------All the best
---------------------

You might also like