Pandas DataFrame Notes
Pandas DataFrame Notes
● This document contains two parts, a self-scoring sheet and a problem set.
● We will not consider how much time it took you to prepare your submission.
● Our intention is you don’t spend more than 2 hours on this test.
● We value partial responses and even pseudo-code on any problem in the problem set.
● Once you are ready, please submit your response to avazquezreina@sessionm.com (DropBox
link, zip file, etc). If you email us a zip file, please send us a separate email just in case to let us
know that you did so in case the original one gets blocked by our spam/anti-virus filter.
1. Self-scoring sheet
How would you grade your own experience and proficiency in the following areas and technologies?
Please use a scale of 1 to 10, where 10 = high proficiency/expertise, and add any clarification notes you
consider relevant.
2. Problem set
Each problem (in the next page) has a number of points that we’ll use to score your submission. The total
maximum score you can obtain in this problem set is 100 points. We don’t expect our candidates to reach
this score, or even attempt to solve every problem. We encourage you to allocate your effort wisely to
maximize your score.
We are looking for solutions written in Python and ideally in Spark, and encourage you to use open
source packages and libraries whenever possible. Certain questions are meant to be open ended, but
feel free to contact us with questions at any time. Good luck! :)
Preliminaries
Our client CoolBrand sells a number of products online. We help them track its profit per day per product.
We have captured this number in thousands of dollars in this dataset with rows and columns representing
dates and products respectively.
3) Comparisons. 60 points
CoolBrand executives are interested in comparing the total profitability of their products during the rest of
2017. They are looking to get two matrices. The first one P would contain elements P(i,j) with your
estimate of the difference in profit between product i and j (e.g. in thousands of $ or %) during the time left
in the year. The second one C would contain elements C(i,j) with your confidence in the corresponding
P(i,j) estimation. They would like you to compute C(i,j) and P(i,j) on at least a handful of their top products.