Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
79 views

Pandas DataFrame Notes

The document provides information for a job application test for SessionM. It contains a self-scoring sheet where the applicant rates their experience with various technologies on a scale of 1 to 10. It also contains a problem set with 4 questions for CoolBrand, an online retailer. The questions involve forecasting future product profits, identifying trends and anomalies in historical profit data, comparing future profit estimates for products, and steps for deploying code to production and automating jobs.

Uploaded by

scribd_sandeep
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views

Pandas DataFrame Notes

The document provides information for a job application test for SessionM. It contains a self-scoring sheet where the applicant rates their experience with various technologies on a scale of 1 to 10. It also contains a problem set with 4 questions for CoolBrand, an online retailer. The questions involve forecasting future product profits, identifying trends and anomalies in historical profit data, comparing future profit estimates for products, and steps for deploying code to production and automating jobs.

Uploaded by

scribd_sandeep
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Welcome!

Thank you for applying to SessionM! Please note the following:

● This document contains two parts, a self-scoring sheet and a problem set.
● We will not consider how much time it took you to prepare your submission.
● Our intention is you don’t spend more than ​2 hours​ on this test.
● We value partial responses and even pseudo-code on any problem in the problem set.
● Once you are ready, please submit your response to ​avazquezreina@sessionm.com​ (DropBox
link, zip file, etc). If you email us a zip file, please send us a separate email just in case to let us
know that you did so in case the original one gets blocked by our spam/anti-virus filter.

1. Self-scoring sheet
How would you grade your own experience and proficiency in the following areas and technologies?
Please use a scale of 1 to 10, where 10 = high proficiency/expertise, and add any clarification notes you
consider relevant.

● Deploying SW to a production environment ● Unix/Linux command line


● Doing customer-facing work ● Shell scripting (Bash, Zsh, etc)
● PySpark ● Scala
● Spark in Scala ● git (on the command line)
● Python ● Luigi, Airflow or Pinball
● Pandas ● Jenkins
● Scikit-learn ● AWS
● Statsmodels ○ EMR
● PyMC ○ EC2
● TensorFlow, Keras, or PyTorch ○ Redshift
● Hive ○ S3
● MySQL or PostgreSQL ● Google Cloud
● ElasticSearch ● Microsoft Azure

2. Problem set
Each problem (in the next page) has a number of points that we’ll use to score your submission. The total
maximum​ score you can obtain in this problem set is 100 points. We don’t expect our candidates to reach
this score, or even attempt to solve every problem. We encourage you to allocate your effort wisely to
maximize your score.

We are looking for solutions written in Python and ideally in Spark, and encourage you to use open
source packages and libraries whenever possible. Certain questions are meant to be open ended, but
feel free to contact us with questions at any time. Good luck! :)
Preliminaries
Our client CoolBrand sells a number of products online. We help them track its profit per day per product.
We have captured this number in thousands of dollars in ​this dataset​ with rows and columns representing
dates and products respectively.

1) Forecasting. ​60 points


Please help CoolBrand forecast how much profit they will have per product in each day in the days
immediately following the dataset’s time window until the end of the year. Note that they need a forecast
of multiple data points (one per day) per product, not just a total single number per product for the rest of
the year. They will value results even on a handful of their top products!

2) Trends and anomalies. ​60 points


CoolBrand sometimes notices some trends and anomalies in profitability. They have a hard time
separating them from “natural” random variation. Assuming that they didn’t change anything on their end
(pricing, advertising, promotions, etc) during the dataset’s time window, can you help them identify these
trends and/or anomalies in the data?

3) Comparisons.​ 60 points
CoolBrand executives are interested in comparing the total profitability of their products during the rest of
2017. They are looking to get two matrices. The first one P would contain elements P(i,j) with your
estimate of the difference in profit between product i and j (e.g. in thousands of $ or %) during the time left
in the year. The second one C would contain elements C(i,j) with your confidence in the corresponding
P(i,j) estimation. They would like you to compute C(i,j) and P(i,j) on at least a handful of their top products.

4) Deploying code to production​ (60 points)


Can you write unit tests for one of the problems above? What steps would you need to take before, during and
after deploying your code to a production system? Finally, if you had to automate one of these jobs to run them
on a daily basis, what technologies would you use, and how exactly would you use them?

You might also like