SAS Assignment For Practice
SAS Assignment For Practice
1. Import the Sales file and map brands using the brand mapping also given in
mapping tab of attached excel:
a. using SAS IMPORT WIZARD
b. using PROC IMPORT
c. using CUSTOM CODE with below format:
Data Type
Length
Format
Brand
Character
50
Item
Numeric
50
Store
Numeric
10
Month
Character
5
Monthly_QTY
Numeric
8
Comma Format
Monthly_SALES
Numeric
8
Dollar Format
Raw File:
Sales.zipx
2.
Bra
nd
Des
c
Ite
m
# of
Stor
es
Avera
ge
QTY
Sales Volume
CLM
QTY
Averag
e
SALES
Sales Value
CLM
SALE
S
XXX
XXX
XXX
XXX
XXX:XXX
XXX
XXX:XXX
b.
Brand
Desc
Mon
th
Average QTY
Max:Min
Qty
Average
SALES
Max:MIN Sales
XXX
XXX
XXX
XXX:XXX
XXX
XXX:XXX
c. Based on Monthly QTY, prepare a summary table for the top 10 items for
each Brand,
Provided that all the top 10 items are selling in equal number of stores.
56,065.00
ID2
XXXX2
Add 2
+91-9728737466
34,013.00
ID3
XXXX3
Add 3
+91-8285405233
40,138.00
4. You have the data set shown in below example as Data Set A. Prepare new
data sets Data Set B and Data Set C also shown below:
Data Set A: Raw Dataset
ID
Unit
ID
101
Mont
h
Jan
87.89
102
Mont
h
Jan
101
Feb
20.95
102
Feb
101
Mar
24.14
102
Mar
101
Apr
29.13
Unit
ID
103
Mont
h
Jan
23.705
13
27.565
96
45.223
87
Unit
98.96
103
Feb
53.11
103
Mar
17.17
103
103
103
103
Apr
May
Jun
Jul
21.52
63.26
24.32
42.93
Data Set B: Missing Months are filled with last value of Available months.
ID
101
101
101
101
101
101
101
101
101
101
101
101
Mont
h
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Unit
ID
87.89
20.95
24.14
29.13
29.13
29.13
29.13
29.13
29.13
29.13
29.13
29.13
102
102
102
102
102
102
102
102
102
102
102
102
Mont
h
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Unit
ID
23.70
27.56
45.22
45.22
45.22
45.22
45.22
45.22
45.22
45.22
45.22
45.22
103
103
103
103
103
103
103
103
103
103
103
103
Mont
h
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Unit
98.96
53.11
17.17
21.52
63.26
24.32
42.93
42.93
42.93
42.93
42.93
42.93
Data Set C: Missing Months are filled with Average of Available months.
ID
101
101
101
101
101
101
101
101
101
101
101
101
5.
Mont
h
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Unit
ID
87.89
20.95
24.14
29.13
40.53
40.53
40.53
40.53
40.53
40.53
40.53
40.53
102
102
102
102
102
102
102
102
102
102
102
102
Mont
h
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Unit
ID
23.70
27.56
45.22
32.16
32.16
32.16
32.16
32.16
32.16
32.16
32.16
32.16
103
103
103
103
103
103
103
103
103
103
103
103
Mont
h
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Unit
98.96
53.11
17.17
21.52
63.26
24.32
42.93
45.90
45.90
45.90
45.90
45.90
Dummy_2 Data set is Item x Date level data but all the items have not same
start and end date. The time-series is also not continuous for the items i.e. in
between Start Date and End Date some dates WKLY QTY and WKLY SALES are
missing.
Prepare the data set as a continuous time-series for each Item. The missing
WKLY QTY and WKLY SALES will be replaced by:
a.
b.
c.
d.
Zeros
Average of the available time-series for each item
Average of top 5 WKLY QTY and WKLY SALES for each item
Save all the above 3 data set in different sheet of an Excel File using
SAS.
Dummy2.zipx
6. Attached Dummy_3 Data set is Item x Store level Monthly SALES data.
a. Prepare a summary table to know distribution of sales among the
stores.
b. Prepare a data set with duplicate observation as per following criteria:
i. If Total Sales of an Item is LESS than 15% of Total Sales for a
particular Store then the Item x store combination will be
repeated for 3 times.
ii. If Total Sales of an Item is MORE than 15% but LESS than 50 %
of Total Sales for a particular Store then the Item x store
combination will be repeated for 5 times.
iii. If Total Sales of an Item is MORE than 50% of Total Sales for a
particular Store then the Item x store combination will be
repeated for 10 times.
c. Prepare a summary table for the above data set as per below
mentioned format
Ite
m
XXX
XXX
Sto
re
XXX
XXX
Total SALES
(Item)
XXX
XXX
Total SALES
(Store)
XXX
XXX
%Sales Item
by Store
XXX
XXX
# of
Repetition
XXX
XXX
Dummy3.zipx
7. Create Cumulative Series for each of the variable in attached raw file:
a. using RETAIN function
b. without using RETAIN function
Dummy4.xlsx
Variable Name
WK_END_DT
Region
TV_REG_GRP__Variable_100401
Variable_TV_REG_DOL_000000
DSP_NAT_IMPRESSIONS_200600
DSP_NATIONAL_CICKS_400402
WK_END_DT
Region
TV_REG_GRP__Variable
Variable_TV_REG_DOL
DSP_NAT_IMPRESSIONS
DSP_NATIONAL_CICKS_400402
10.TV GRP data is provided at region, week level. Create national level TV data.
Population data is provided for all regions.
Region
101
101
102
102
103
103
Region
101
102
103
TV GRP data
Week
01-Jul-2013
08-Jul-2013
01-Jul-2013
08-Jul-2013
01-Jul-2013
08-Jul-2013
TV_REG_GRP
10
15
30
25
8
10
Population
Population
10500
22300
12800
11.Import following data of daily sales data of two products (Shoes and Bags) in
different stores. Sales data is provided as State code Store number level (NY
12 means 12th store of New York).
Sales data
Store_ID
Date
Sales_Shoes
Sales_Bags
NY 10
01-Jul-2013
10
20
NY 10
03-Jul-2013
10
NY 10
08-Jul-2013
5
8
NY 203
02-Jul-2013
8
9
NY 203
03-Jul-2013
10
NY 203
10-Jul-2013
15
20
NJ 20
01-Jul-2013
3
7
NJ 20
03-Jul-2013
4
1
NJ 20
08-Jul-2013
5
8
NJ 123
02-Jul-2013
6
5
NJ 123
NJ 123
03-Jul-2013
10-Jul-2013
4
3
a. Create a dataset having distinct list of all the stores (resulting dataset
will only have one variable).
b. Create a dataset having total sales of both products at state, week
level (week ending Saturday) - like Total Sales in NY during the week
30th June 6th July. New dataset will have 3 columns State,
Week_end_date, and Total_Sales.
c. Merge with a predefined list of stores and report those store ids for
which no sale information is provided.
d.
Store_ID
NY 10
NJ 10
NJ 123
NY 203
NJ 20
12.A dataset contains both numeric and character variables. Write a code:
a. Which would attach the text End to all the character variables?
b. Which would add 10 to all the numeric values?
c. Which would Convert Negative Values to Positive Values but positive, 0
or missing will remain same.
13.Please provide the output when a dataset is sorted using the Nodupkey and
Noduprecs.
Explain the difference between two outputs.
SOLUTION Q12:
Variable1 = Variable1|| End;
New_Nunber = abs(negative-value) ORNew_Number = negativevalue* -1
--------------------All the best
---------------------