Implementing Data Science Projects PDF
Implementing Data Science Projects PDF
The implicit promise is that all your data, which had been lying idle for so long, can now
start contributing to better business decisions. All you need to do is - gather data from all
these multiple sources, analyze them, and generate business relevant insights. Yet, you
stumble upon new challenges every day in managing these projects, defining the
outputs, and trying to link the outputs to the business goals. So how exactly is a ‘data
science’ project implemented? What are the key skills you or your team need for creating
such data science solutions?
In this post, I highlight some of the key aspects which in my opinion are essential for
driving successful data analytics solutions and thinking. I identify five key
skills/personalities which in my opinion are central to the success of any data science
endeavor. I highlight why a combination of the skills are essential for deriving business
relevant insights, and creating scalable solutions.
Yet, the models developed in this team can quickly become very difficult to implement
(imagine a combination of Natural language processing, machine learning, and social
network analysis in a single module) and impossible to scale & deploy unless supported
by some other key skill sets.
The person who can handle loads of (big) data without batting an eyelid. Typically able to
find any needle in any haystack - the big data person is able to build castles and
databases in the cloud. She is proficient in skills like - ETL, Big Data, and cloud
computing platforms. These people form the backbone of the data science projects and
are key in making scalable and deployable solutions. It is often said that almost 80% of
the time in any analytics project is spent on gathering, cleaning, and massaging the data.
To link this to the business objectives is the obvious next step. And this is where the
domain expert makes her entrance.
While adequate expertise in the first two skillsets, ensure great data models that work,
you will need a domain expert or a business person to actually put this to (your clients’)
use. Typically, this person is very cognizant of the Industry specific analytics and
measures, and has excellent communication and presentation skills. This is the person
who typically has an MBA background or/and years of industry experience.
Another increasingly important aspect of a data science project is the design and
visualization of the results and analysis. This is essential since you are trying to present
sophisticated analysis to people who may not have experience/training/interest in
statistical and data science methods. Add to this the fact that all your outputs now need
to be responsive, i.e., view equally well on the laptop, tablet, or mobile. The outputs and
insights you generate must be a natural part of the workday of the end-user. Thus,
understanding the user journey, the personas, and the user interactions become crucial.
You may not be building the next Apple, but a reasonably intuitive interface is still
essential. Especially, if you are building guided analytics projects.
The product manager needs to manage and make the diverse group work together and
agree on key points. The product manager for data science projects needs to be an all-
rounder with program management, client interfacing, data science, and team
management skills. Experience in herding cats is a bonus. These are the people who
need to understand the data models, as well as the end user and guide the outcomes of
the cross-functional team towards measurable business goals.
Very often organizations form teams which consists of experts with only a subset of
these skills. The allure of data gathering, data cleaning, model building, model
optimizing, and generating reports, is not just interesting, but also very addictive. It is
also one of the most oft repeated mistakes in the data science world. To avoid this, make
sure you do not lose sight of the ‘whys’ by concentrating too much on the ‘how’s. A
correctly balanced team is one of the basic prerequisites on your journey towards solving
ever more sophisticated and challenging data science problems.