Build a Career in Data Science
By Emily Robinson and Jacqueline Nolis
5/5
()
Data Science
Career Development
Job Search
Machine Learning
Data Science Community
Mentor
Hero's Journey
Journey
Self-Discovery
Quest
Call to Adventure
Power of Knowledge
Reward
Coming of Age
Chosen One
Education
Data Science Skills
Work-Life Balance
Interviewing
Stakeholders
About this ebook
You are going to need more than technical knowledge to succeed as a data scientist. Build a Career in Data Science teaches you what school leaves out, from how to land your first job to the lifecycle of a data science project, and even how to become a manager.
Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.
About the technology
What are the keys to a data scientist’s long-term success? Blending your technical know-how with the right “soft skills” turns out to be a central ingredient of a rewarding career.
About the book
Build a Career in Data Science is your guide to landing your first data science job and developing into a valued senior employee. By following clear and simple instructions, you’ll learn to craft an amazing resume and ace your interviews. In this demanding, rapidly changing field, it can be challenging to keep projects on track, adapt to company needs, and manage tricky stakeholders. You’ll love the insights on how to handle expectations, deal with failures, and plan your career path in the stories from seasoned data scientists included in the book.
What's inside
Creating a portfolio of data science projects
Assessing and negotiating an offer
Leaving gracefully and moving up the ladder
Interviews with professional data scientists
About the reader
For readers who want to begin or advance a data science career.
About the author
Emily Robinson is a data scientist at Warby Parker. Jacqueline Nolis is a data science consultant and mentor.
Table of Contents:
PART 1 - GETTING STARTED WITH DATA SCIENCE
1. What is data science?
2. Data science companies
3. Getting the skills
4. Building a portfolio
PART 2 - FINDING YOUR DATA SCIENCE JOB
5. The search: Identifying the right job for you
6. The application: Résumés and cover letters
7. The interview: What to expect and how to handle it
8. The offer: Knowing what to accept
PART 3 - SETTLING INTO DATA SCIENCE
9. The first months on the job
10. Making an effective analysis
11. Deploying a model into production
12. Working with stakeholders
PART 4 - GROWING IN YOUR DATA SCIENCE ROLE
13. When your data science project fails
14. Joining the data science community
15. Leaving your job gracefully
16. Moving up the ladder
Emily Robinson
Emily Robinson is a senior data scientist at Warby Parker, and holds a Master's in Management. Emily's academic background includes the study of leadership, negotiation, and experiences of underrepresented groups in STEM.
Related to Build a Career in Data Science
Related ebooks
Think Like a Data Scientist: Tackle the data science process step-by-step Rating: 0 out of 5 stars0 ratingsHow to Lead in Data Science Rating: 0 out of 5 stars0 ratingsGrokking Machine Learning Rating: 0 out of 5 stars0 ratingsGrokking Artificial Intelligence Algorithms Rating: 0 out of 5 stars0 ratingsPYTHON FOR DATA ANALYSIS: A Practical Guide to Manipulating, Cleaning, and Analyzing Data Using Python (2023 Beginner Crash Course) Rating: 0 out of 5 stars0 ratingsIntroducing Data Science: Big data, machine learning, and more, using Python tools Rating: 5 out of 5 stars5/5Data Science Bookcamp: Five real-world Python projects Rating: 5 out of 5 stars5/5Machine Learning Bookcamp: Build a portfolio of real-life projects Rating: 4 out of 5 stars4/5Practical Data Analysis Rating: 4 out of 5 stars4/5Python Data Science Essentials Rating: 0 out of 5 stars0 ratingsMastering Python for Data Science Rating: 3 out of 5 stars3/5Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python Rating: 0 out of 5 stars0 ratingsPython Data Science Essentials - Second Edition Rating: 4 out of 5 stars4/5Principles of Data Science Rating: 4 out of 5 stars4/5Pandas in Action Rating: 0 out of 5 stars0 ratingsData Analysis with Python and PySpark Rating: 0 out of 5 stars0 ratingsR for Data Science Rating: 5 out of 5 stars5/5Learning Predictive Analytics with Python Rating: 0 out of 5 stars0 ratingsPython: Real-World Data Science Rating: 0 out of 5 stars0 ratingsDeveloping Analytic Talent: Becoming a Data Scientist Rating: 3 out of 5 stars3/5Python Data Analysis - Second Edition Rating: 0 out of 5 stars0 ratingsPython For Data Science Rating: 0 out of 5 stars0 ratingsPractical Data Analysis Cookbook Rating: 0 out of 5 stars0 ratingsDeep Learning Fundamentals in Python Rating: 4 out of 5 stars4/5Python Data Analysis Rating: 4 out of 5 stars4/5Big Data: Using SMART Big Data, Analytics and Metrics To Make Better Decisions and Improve Performance Rating: 4 out of 5 stars4/5
Computers For You
Elon Musk Rating: 4 out of 5 stars4/5The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution Rating: 4 out of 5 stars4/5Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics Rating: 4 out of 5 stars4/5The Invisible Rainbow: A History of Electricity and Life Rating: 5 out of 5 stars5/5Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition Rating: 4 out of 5 stars4/5Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad Rating: 0 out of 5 stars0 ratingsExcel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratingsThe ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 4 out of 5 stars4/5The Professional Voiceover Handbook: Voiceover training, #1 Rating: 5 out of 5 stars5/5How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally Rating: 4 out of 5 stars4/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5Uncanny Valley: A Memoir Rating: 4 out of 5 stars4/5CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide Rating: 5 out of 5 stars5/5Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls Rating: 4 out of 5 stars4/5Deep Search: How to Explore the Internet More Effectively Rating: 5 out of 5 stars5/5101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters Rating: 4 out of 5 stars4/5The Hacker Crackdown: Law and Disorder on the Electronic Frontier Rating: 4 out of 5 stars4/5The Best Hacking Tricks for Beginners Rating: 4 out of 5 stars4/5Fundamentals of Programming: Using Python Rating: 5 out of 5 stars5/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Rating: 4 out of 5 stars4/5What Video Games Have to Teach Us About Learning and Literacy. Second Edition Rating: 4 out of 5 stars4/5The Huffington Post Complete Guide to Blogging Rating: 3 out of 5 stars3/5The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling Rating: 0 out of 5 stars0 ratings
Reviews for Build a Career in Data Science
1 rating0 reviews
Book preview
Build a Career in Data Science - Emily Robinson
Build a Career in Data Science
Emily Robinson Jacqueline Nolis
Copyright
For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity.
For more information, please contact
Special Sales Department
Manning Publications Co.
20 Baldwin Road
PO Box 761
Shelter Island, NY 11964
Email: orders@manning.com
©2020 by Emily Robinson and Jacqueline Nolis. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.
Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine.
Development editor: Karen Miller
Review editor: Ivan Martinović
Production editor: Lori Weidert
Copy editor: Kathy Simpson
Proofreader: Melody Dolab
Typesetter: Dennis Dalinnik
Cover designer: Leslie Haimes
ISBN: 9781617296246
Printed in the United States of America
Dedication
From Emily, to Michael, and From Jacqueline, to Heather, Amber, and Laura, for the love and support you provided us throughout this journey.
Brief Table of Contents
Copyright
Brief Table of Contents
Table of Contents
Preface
Acknowledgments
About This Book
About the Authors
About the Cover Illustration
1. Getting started with data science
Chapter 1. What is data science?
Chapter 2. Data science companies
Chapter 3. Getting the skills
Chapter 4. Building a portfolio
2. Finding your data science job
Chapter 5. The search: Identifying the right job for you
Chapter 6. The application: Résumés and cover letters
Chapter 7. The interview: What to expect and how to handle it
Chapter 8. The offer: Knowing what to accept
3. Settling into data science
Chapter 9. The first months on the job
Chapter 10. Making an effective analysis
Chapter 11. Deploying a model into production
Chapter 12. Working with stakeholders
4. Growing in your data science role
Chapter 13. When your data science project fails
Chapter 14. Joining the data science community
Chapter 15. Leaving your job gracefully
Chapter 16. Moving up the ladder
Epilogue
Appendix. Interview questions
Index
List of Figures
List of Tables
Table of Contents
Copyright
Brief Table of Contents
Table of Contents
Preface
Acknowledgments
About This Book
About the Authors
About the Cover Illustration
1. Getting started with data science
Chapter 1. What is data science?
1.1. What is data science?
1.1.1. Mathematics/statistics
1.1.2. Databases/programming
1.1.3. Business understanding
1.2. Different types of data science jobs
1.2.1. Analytics
1.2.2. Machine learning
1.2.3. Decision science
1.2.4. Related jobs
1.3. Choosing your path
1.4. Interview with Robert Chang, data scientist at Airbnb
What was your first data science journey?
What should people look for in a data science job?
What skills do you need to be a data scientist?
Summary
Chapter 2. Data science companies
2.1. MTC: Massive Tech Company
2.1.1. Your team: One of many in MTC
2.1.2. The tech: Advanced, but siloed across the company
2.1.3. The pros and cons of MTC
2.2. HandbagLOVE: The established retailer
2.2.1. Your team: A small group struggling to grow
2.2.2. Your tech: A legacy stack that’s starting to change
2.2.3. The pros and cons of HandbagLOVE
2.3. Seg-Metra: The early-stage startup
2.3.1. Your team (what team?)
2.3.2. The tech: Cutting-edge technology that’s taped together
2.3.3. Pros and cons of Seg-Metra
2.4. Videory: The late-stage, successful tech startup
2.4.1. The team: Specialized but with room to move around
2.4.2. The tech: Trying to avoid getting bogged down by legacy code
2.4.3. The pros and cons of Videory
2.5. Global Aerospace Dynamics: The giant government contractor
2.5.1. The team: A data scientist in a sea of engineers
2.5.2. The tech: Old, hardened, and on security lockdown
2.5.3. The pros and cons of GAD
2.6. Putting it all together
2.7. Interview with Randy Au, quantitative user experience researcher at Google
Are there big differences between large and small companies?
Are there differences based on the industry of the company?
What’s your final piece of advice for beginning data scientists?
Summary
Chapter 3. Getting the skills
3.1. Earning a data science degree
3.1.1. Choosing the school
3.1.2. Getting into an academic program
3.1.3. Summarizing academic degrees
3.2. Going through a bootcamp
3.2.1. What you learn
3.2.2. Cost
3.2.3. Choosing a program
3.2.4. Summarizing data science bootcamps
3.3. Getting data science work within your company
3.3.1. Summarizing learning on the job
3.4. Teaching yourself
3.4.1. Summarizing self-teaching
3.5. Making the choice
3.6. Interview with Julia Silge, data scientist and software engineer at RStudio
Before becoming a data scientist, you worked in academia; how have the skills learned there helped you as a data scientist?
When deciding to become a data scientist, what did you use to pick up new skills?
Did you know going into data science what kind of work you wanted to be doing?
What would you recommend to people looking to get the skills to be a data scientist?
Summary
Chapter 4. Building a portfolio
4.1. Creating a project
4.1.1. Finding the data and asking a question
4.1.2. Choosing a direction
4.1.3. Filling out a GitHub README
4.2. Starting a blog
4.2.1. Potential topics
4.2.2. Logistics
4.3. Working on example projects
4.3.1. Data science freelancers
4.3.2. Training a neural network on offensive license plates
4.4. Interview with David Robinson, data scientist
How did you start blogging?
Are there any specific opportunities you have gotten from public work?
Are there people you think would especially benefit from doing public work?
How has your view on the value of public work changed over time?
How do you come up with ideas for your data analysis posts?
What’s your final piece of advice for aspiring and junior data scientists?
Summary
Chapters 1–4 resources
Books
Blog posts
2. Finding your data science job
Chapter 5. The search: Identifying the right job for you
5.1. Finding jobs
5.1.1. Decoding descriptions
5.1.2. Watching for red flags
5.1.3. Setting your expectations
5.1.4. Attending meetups
5.1.5. Using social media
5.2. Deciding which jobs to apply for
5.3. Interview with Jesse Mostipak, developer advocate at Kaggle
What recommendations do you have for starting a job search?
How can you build your network?
What do you do if you don’t feel confident applying to data science jobs?
What would you say to someone who thinks I don’t meet the full list of any job’s required qualifications?
What’s your final piece of advice to aspiring data scientists?
Summary
Chapter 6. The application: Résumés and cover letters
6.1. Résumé: The basics
6.1.1. Structure
6.1.2. Deeper into the experience section: generating content
6.2. Cover letters: The basics
6.2.1. Structure
6.3. Tailoring
6.4. Referrals
6.5. Interview with Kristen Kehrer, data science instructor and course creator
How many times would you estimate you’ve edited your résumé?
What are common mistakes you see people make?
Do you tailor your résumé to the position you’re applying to?
What strategies do you recommend for describing jobs on a résumé?
What’s your final piece of advice for aspiring data scientists?
Summary
Chapter 7. The interview: What to expect and how to handle it
7.1. What do companies want?
7.1.1. The interview process
7.2. Step 1: The initial phone screen interview
7.3. Step 2: The on-site interview
7.3.1. The technical interview
7.3.2. The behavioral interview
7.4. Step 3: The case study
7.5. Step 4: The final interview
7.6. The offer
7.7. Interview with Ryan Williams, senior decision scientist at Starbucks
What are the things you need to do to knock an interview out of the park?
How do you handle the times where you don’t know the answer?
What should you do if you get a negative response to your answer?
What has running interviews taught you about being an interviewee?
Summary
Chapter 8. The offer: Knowing what to accept
8.1. The process
8.2. Receiving the offer
8.3. Negotiation
8.3.1. What is negotiable?
8.3.2. How much you can negotiate
8.4. Negotiation tactics
8.5. How to choose between two good
job offers
8.6. Interview with Brooke Watson Madubuonwu, senior data scientist at the ACLU
What should you consider besides salary when you’re considering an offer?
What are some ways you prepare to negotiate?
What do you do if you have one offer but are still waiting on another one?
What’s your final piece of advice for aspiring and junior data scientists?
Summary
Chapter 5–8 resources
Books
Blog posts and courses
3. Settling into data science
Chapter 9. The first months on the job
9.1. The first month
9.1.1. Onboarding at a large organization: A well-oiled machine
9.1.2. Onboarding at a small company: What onboarding?
9.1.3. Understanding and setting expectations
9.1.4. Knowing your data
9.2. Becoming productive
9.2.1. Asking questions
9.2.2. Building relationships
9.3. If you’re the first data scientist
9.4. When the job isn’t what was promised
9.4.1. The work is terrible
9.4.2. The work environment is toxic
9.4.3. Deciding to leave
9.5. Interview with Jarvis Miller, data scientist at Spotify
What were some things that surprised you in your first data science job?
What are some issues you faced?
Can you tell us about one of your first projects?
What would be your biggest piece of advice for the first few months?
Summary
Chapter 10. Making an effective analysis
10.1. The request
10.2. The analysis plan
10.3. Doing the analysis
10.3.1. Importing and cleaning data
10.3.2. Data exploration and modeling
10.3.3. Important points for exploring and modeling
10.4. Wrapping it up
10.4.1. Final presentation
10.4.2. Mothballing your work
10.5. Interview with Hilary Parker, data scientist at Stitch Fix
How does thinking about other people help your analysis?
How do you structure your analyses?
What kind of polish do you do in the final version?
How do you handle people asking for adjustments to an analysis?
Summary
Chapter 11. Deploying a model into production
11.1. What is deploying to production, anyway?
11.2. Making the production system
11.2.1. Collecting data
11.2.2. Building the model
11.2.3. Serving models with APIs
11.2.4. Building an API
11.2.5. Documentation
11.2.6. Testing
11.2.7. Deploying an API
11.2.8. Load testing
11.3. Keeping the system running
11.3.1. Monitoring the system
11.3.2. Retraining the model
11.3.3. Making changes
11.4. Wrapping up
11.5. Interview with Heather Nolis, machine learning engineer at T-Mobile
What does machine learning engineer
mean on your team?
What was it like to deploy your first piece of code?
If you have things go wrong in production, what happens?
What’s your final piece of advice for data scientists working with engineers?
Summary
Chapter 12. Working with stakeholders
12.1. Types of stakeholders
12.1.1. Business stakeholders
12.1.2. Engineering stakeholders
12.1.3. Corporate leadership
12.1.4. Your manager
12.2. Working with stakeholders
12.2.1. Understanding the stakeholder’s goals
12.2.2. Communicating constantly
12.2.3. Being consistent
12.3. Prioritizing work
12.3.1. Both innovative and impactful work
12.3.2. Not innovative but still impactful work
12.3.3. Innovative but not impactful work
12.3.4. Neither innovative nor impactful work
12.4. Concluding remarks
12.5. Interview with Sade Snowden-Akintunde, data scientist at Etsy
Why is managing stakeholders important?
How did you learn to manage stakeholders?
Was there a time where you had difficulty with a stakeholder?
What do junior data scientists frequently get wrong?
Do you always try to explain the technical part of the data science?
What’s your final piece of advice for junior or aspiring data scientists?
Summary
Chapters 9–12 resources
Books
Blogs
4. Growing in your data science role
Chapter 13. When your data science project fails
13.1. Why data science projects fail
13.1.1. The data isn’t what you wanted
13.1.2. The data doesn’t have a signal
13.1.3. The customer didn’t end up wanting it
13.2. Managing risk
13.3. What you can do when your projects fail
13.3.1. What to do with the project
13.3.2. Handling negative emotions
13.4. Interview with Michelle Keim, head of data science and machine learning at Pluralsight
When was a time you experienced a failure in your career?
Are there red flags you can see before a project starts?
How does the way a failure is handled differ between companies?
How can you tell if a project you’re on is failing?
How can you get over a fear of failing?
Summary
Chapter 14. Joining the data science community
14.1. Growing your portfolio
14.1.1. More blog posts
14.1.2. More projects
14.2. Attending conferences
14.2.1. Dealing with social anxiety
14.3. Giving talks
14.3.1. Getting an opportunity
14.3.2. Preparing
14.4. Contributing to open source
14.4.1. Contributing to other people’s work
14.4.2. Making your own package or library
14.5. Recognizing and avoiding burnout
14.6. Interview with Renee Teate, director of data science at HelioCampus
What are the main benefits of being on social media?
What would you say to people who say they don’t have the time to engage with the community?
Is there value in producing only a small amount of content?
Were you worried the first time you published a blog post or gave a talk?
Summary
Chapter 15. Leaving your job gracefully
15.1. Deciding to leave
15.1.1. Take stock of your learning progress
15.1.2. Check your alignment with your manager
15.2. How the job search differs after your first job
15.2.1. Deciding what you want
15.2.2. Interviewing
15.3. Finding a new job while employed
15.4. Giving notice
15.4.1. Considering a counteroffer
15.4.2. Telling your team
15.4.3. Making the transition easier
15.5. Interview with Amanda Casari, engineering manager at Google
How do you know it’s time to start looking for a new job?
Have you ever started a job search and decided to stay instead?
Do you see people staying in the same job for too long?
Can you change jobs too quickly?
What’s your final piece of advice for aspiring and new data scientists?
Summary
Chapter 16. Moving up the ladder
16.1. The management track
16.1.1. Benefits of being a manager
16.1.2. Drawbacks of being a manager
16.1.3. How to become a manager
16.2. Principal data scientist track
16.2.1. Benefits of being a principal data scientist
16.2.2. Drawbacks of being a principal data scientist
16.2.3. How to become a principal data scientist
16.3. Switching to independent consulting
16.3.1. Benefits of independent consulting
16.3.2. Drawbacks of independent consulting
16.3.3. How to become an independent consultant
16.4. Choosing your path
16.5. Interview with Angela Bassa, head of data science, data engineering, and machine learning at iRobot
What’s the day-to-day life as a manager like?
What are the signs you should move on from being an independent contributor?
Do you have to eventually transition out of being an independent contributor?
What advice do you have for someone who wants to be a technical lead but isn’t quite ready for it?
What’s your final piece of advice to aspiring and junior data scientist?
Summary
Chapters 13–16 resources
Books
Blogs
Epilogue
Appendix. Interview questions
A.1. Coding and software development
A.1.1. FizzBuzz
A.1.2. Tell whether a number is prime
A.1.3. Working with Git
A.1.4. Technology decisions
A.1.5. Frequently used package/library
A.1.6. R Markdown or Jupyter Notebooks
A.1.7. When should you write functions or packages/libraries?
A.1.8. Example manipulating data in R/Python
A.2. SQL and databases
A.2.1. Types of joins
A.2.2. Loading data into SQL
A.2.3. Example SQL query
A.2.4. Example SQL query continued
A.2.5. Data types
A.3. Statistics and machine learning
A.3.1. Statistics terms
A.3.2. Explain p-value
A.3.3. Explain a confusion matrix
A.3.4. Interpreting regression models
A.3.5. What is boosting?
A.3.6. Favorite algorithm
A.3.7. Training vs. test data
A.3.8. Feature selection
A.3.9. Deploying a new model
A.3.10. Model behavior
A.3.11. Experimental design
A.3.12. Flaws in experimental design
A.3.13. Bias in sampled data
A.4. Behavioral
A.4.1. Project that had the most impact
A.4.2. Data surprises
A.4.3. Previous job reflections
A.4.4. Senior person making a mistake based on data
A.4.5. Disagreements with teammates
A.4.6. Difficult problems
A.5. Brain teasers
A.5.1. Estimation
A.5.2. Combinatorics
Index
List of Figures
List of Tables
Preface
How do I get your job?
As veteran data scientists, we’re constantly being asked this question. Sometimes, we’re asked directly; at other times, people ask indirectly through questions about the decisions we’ve made in our careers to get where we are. Under the surface, the people asking the questions seem to have a constant struggle, because so few resources are available for finding out how to become or grow as a data scientist. Lots of data scientists are looking for help with their careers and often not finding clear answers.
Although we’ve written blog posts with tactical advice on how to handle specific moments in a data science job, we’ve struggled with the lack of a definitive text covering the end-to-end of starting and growing a data science career. This book was written to help these people—the thousands of people who hear about data science and machine learning but don’t know where to start, as well as those who are already in the field and want to understand how to move up.
We were happy to get this chance to collaborate in creating this book. We both felt that our respective backgrounds and viewpoints complemented each other and created a better book for you. We are
Jacqueline Nolis— I received a BS and MS in mathematics and a PhD in operations research. When I started working, the term data science didn’t yet exist, and I had to figure out my career path at the same time that the field was defining itself. Now I’m a consultant, helping companies grow data science teams.
Emily Robinson— I got my undergraduate degree in decision sciences and my master’s in management. After attending a three-month data science bootcamp in 2016, I started working in data science, specializing in A/B testing. Now I work as a senior data scientist at Warby Parker, tackling some of the company’s biggest projects.
Throughout our careers, we’ve both built project portfolios and experienced the stress of adjusting to a new job. We’ve felt the sting of being rejected for jobs we wanted and the triumph of seeing our analyses positively affect the business. We’ve faced issues with a difficult business partner and benefited from a supportive mentor. Although these experiences taught us so much in our careers, to us the true value comes from sharing them with others.
This book is meant to be a guide to career questions in data science, following the path that a person will take in the career. We start with the beginning of the journey: how to get basic data science skills and understand what jobs are actually like. Then we go through getting a job and how to get settled in. We cover how to grow in the role and eventually how to transition up to management—or out to a new company. Our intention is for this book to be a resource that data scientists continue to go back to as they hit new milestones in their careers.
Because the focus on career is very important for this book, we chose to not focus deeply on the technical components of data science; we don’t cover topics such as how to choose the hyperparameters of a model or the minute details of Python packages. In fact, this book doesn’t include a single equation or line of code. We know that plenty of great books out there cover these topics; we wanted instead to discuss the often-overlooked but equally important nontechnical knowledge needed to succeed in data science.
We included many personal experiences from respected data scientists in this book. At the end of each chapter, you’ll find an interview describing how a real, human data scientist personally handled dealing with the concepts that the chapter covers. We’re extremely happy with the amazing, detailed, and vulnerable responses we got from all the data scientists we talked to. We feel that the examples they provide from their lives can teach much more than any broad statement we might write.
Another decision we made in writing this book was to make it opinionated. By that, we mean we intentionally chose to focus on the lessons we’ve learned as professional data scientists and by talking to others in the community. At times, we make statements not everyone might agree with, such as suggesting that you should always write a cover letter when applying for jobs. We felt that the benefit of providing viewpoints that we strongly believe are helpful to data scientists was more important than trying to write something that contained only objective truths.
We hope that you find this book to be a helpful guide as you progress in your data science career. We’ve written it to be the document we wish we had when we were aspiring and junior data scientists; we hope that you’ll be glad to have it now.
Acknowledgments
First and foremost, we’d like to thank our spouses, Michael Berkowitz and Heather Nolis. Without them, this book would not have been possible (and not just because Michael wrote the first draft of some of the sections despite being a bridge professional and not a data scientist, or because Heather evangelized half of the machine learning engineering content).
Next, we want to acknowledge the staff at Manning who guided us through this process, improved the book, and made it possible in the first place. Thank you especially to our editor, Karen Miller, who kept us on track and coordinated all the various moving parts.
Thank you to all the reviewers who read the manuscript at various points and provided invaluable detailed feedback: Brynjar Smári Bjarnason, Christian Thoudahl, Daniel Berecz, Domenico Nappo, Geoff Barto, Gustavo Gomes, Hagai Luger, James Ritter, Jeff Neumann, Jonathan Twaddell, Krzysztof Jędrzejewski, Malgorzata Rodacka, Mario Giesel, Narayana Lalitanand Surampudi, Ping Zhao, Riccardo Marotti, Richard Tobias, Sebastian Palma Mardones, Steve Sussman, Tony M. Dubitsky, and Yul Williams. Thank you as well to our friends and family members who read the book and offered their own suggestions: Elin Farnell, Amanda Liston, Christian Roy, Jonathan Goodman, and Eric Robinson. Your contributions helped shape this book and made it as helpful to our readers as possible.
Finally, we want to thank all of our end-of-chapter interviewees: Robert Chang, Randy Au, Julia Silge, David Robinson, Jesse Mostipak, Kristen Kehrer, Ryan Williams, Brooke Watson Madubuonwu, Jarvis Miller, Hilary Parker, Heather Nolis, Sade Snowden-Akintunde, Michelle Keim, Renee Teate, Amanda Casari, and Angela Bassa. Additionally, we’re grateful for those who contributed to sidebars throughout the book and suggested interview questions for the appendix: Vicki Boykis, Rodrigo Fuentealba Cartes, Gustavo Coelho, Emily Bartha, Trey Causey, Elin Farnell, Jeff Allen, Elizabeth Hunter, Sam Barrows, Reshama Shaikh, Gabriela de Queiroz, Rob Stamm, Alex Hayes, Ludamila Janda, Ayanthi G., Allan Butler, Heather Nolis, Jeroen Janssens, Emily Spahn, Tereza Iofciu, Bertil Hatt, Ryan Williams, Peter Baldridge, and Hlynur Hallgrímsson. All these people provided valuable perspectives, and together, they know much more than we ever could.
About This Book
Build a Career in Data Science was written to help you enter the field of data science and grow your career in it. It walks you through the role of a data scientist, how to get the skills you need, and the steps to getting a data science job. After you have a job, this book helps you understand how to mature in the role and eventually become a larger part of the data science community, as well as a senior data scientist. After reading this book, you should be confident about how to advance your career.
Who should read this book
This book is for people who have not yet entered the field of data science but are considering it, as well as people who are in the first few years of the role. Aspiring data scientists will learn the skills they need to become data scientists, and junior data scientists will learn how to become more senior. Many of the topics in the book, such as interviewing and negotiating an offer, are worthwhile resources to come back to throughout any data science career.
How this book is organized: a roadmap
This book is broken into four parts, arranged in the chronological order of a data science career. Part 1 of the book, Getting started with data science, covers what data science is and what skills it requires:
Chapter 1 introduces the role of a data scientist and the different types of jobs that share that title.
Chapter 2 presents five example companies that have data scientists and shows how the culture and type of each company affects the data science positions.
Chapter 3 lays out the different paths a person can take to get the skills needed to be a data scientist.
Chapter 4 describes how to create and share projects to build a data science portfolio.
Part 2 of the book, Finding your data science job, explains the entire job search process for data science positions:
Chapter 5 walks through the search for open positions and how to find the ones worth investing in.
Chapter 6 explains how to create a cover letter and résumé and then adjust them for each job you apply for.
Chapter 7 provides details on the interview process and what to expect from it.
Chapter 8 is about what to do after you receive an offer, focusing on how to negotiate it.
Part 3 of the book, Settling into data science, covers the basics of the early months of a data science job:
Chapter 9 lays out what to expect in the first few months of a data science job and shows you how to make the most of them.
Chapter 10 walks through the process of making analyses, which are core components of most data science roles.
Chapter 11 focuses on putting machine learning models into production, which is necessary in more engineering-based positions.
Chapter 12 explains how to communicate with stakeholders—a task that data scientists have to do more than most other technical roles.
Part 4 of the book, Growing in your data science role, covers topics for more seasoned data scientists who are looking to continue to advance their careers:
Chapter 13 describes how to handle failed data science projects.
Chapter 14 shows you how to become part of the larger data science community through activities such as speaking and contributing to open source.
Chapter 15 is a guide to the difficult task of leaving a data science position.
Chapter 16 ends the book with the roles data scientists can get as they move up the corporate ladder.
Finally, we have an appendix of more than 30 interview questions, example answers, and notes on what the question is trying to assess and what makes a good answer.
People who haven’t been data scientists before should start at the beginning of the book, whereas people who already are in the field may begin with a later chapter to guide them in a challenge they’re currently facing. Although the chapters are ordered to flow like a data science career, they can be read out of order according to readers’ needs.
The chapters end with interviews of data scientists in various industries who discuss how the topic of the chapter has shown up in their career. The interviewees were selected due to their contributions to the field of data science and the interesting journeys they followed as they became data scientists.
liveBook discussion forum
Purchase of Build a Career in Data Science includes free access to a private web forum run by Manning Publications where you can make comments about the book, ask technical questions, and receive help from the author and from other users. To access the forum, go to https://livebook.manning.com/#!/book/build-a-career-in-data-science/discussion. You can also learn more about Manning's forums and the rules of conduct at https://livebook.manning.com/#!/discussion.
Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the author can take place. It is not a commitment to any specific amount of participation on the part of the author, whose contribution to the forum remains voluntary (and unpaid). We suggest you try asking the author some challenging questions lest their interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.
About the Authors
Emily Robinson
WRITTEN BY JACQUELINE NOLIS
Emily Robinson is a brilliant senior data scientist at Warby Parker and previously worked at DataCamp and Etsy.
I first met Emily at Data Day Texas 2018, when she was one of the few people who attended my talk on data science in industry. At the end of my speech, she shot her hand up and asked a great question. To my surprise, an hour later we had swapped; I was watching her calmly and casually give a great presentation while I was eagerly waiting to raise my hand and ask her a question. That day, I knew she was a hard-working and clever data scientist. A few months later, when it came time for me to find someone to co-author a book, she was at the top of my list. When I sent her the email asking whether she would be interested, I figured that there was a good chance she would say no; she was probably out of my league.
Working with Emily on this book has been a joy. She is deeply thoughtful about the struggles of junior data scientists and has the ability to clearly understand what is important. She is constantly getting her work done and somehow also is able to squeeze out extra blog posts while doing it. Now having seen her at more conferences and social events, I’ve watched as she’s talked to many data scientists and made all of them feel comfortable and welcome. She’s also an expert in A/B testing and experimentation, but it’s clear that this just happens to be the area she’s working in at the moment; she could pick up any other part of data science and be an expert in that if she wanted to.
My only disappointment is that I’m writing these words about her at the end of creating the book, and with us finishing, someone besides me will have the next opportunity to collaborate with her.
Jacqueline Nolis
WRITTEN BY EMILY ROBINSON
Whenever someone asks me whether I would recommend writing a book, I always say, Only if you do it with a co-author.
But that’s not actually the full picture. It should be Only if you do it with a co-author who is as fun, warm, generous, smart, experienced, and caring as Jacqueline.
I’m not sure what it’s like working with a normal
co-author, because Jacqueline has always been amazing, and I feel incredibly lucky to have gotten to work with her on this project.
It would be easy for someone as accomplished as Jacqueline to be intimidating. She has a PhD in industrial engineering, got $100,000 for winning the third season of the reality television show King of the Nerds, was a director of analytics, and started her own successful consulting firm. She’s spoken at conferences across the country and is regularly asked back by her alma mater to advise math undergraduates (her major) on careers. When she spoke at an online conference, the compliments about her presentation flooded the chat, such as the best so far,
excellent presentation,
really helpful,
and great, dynamic presentation.
But Jacqueline never makes anyone feel inferior or bad for not knowing something; rather, she loves making difficult concepts accessible, such as in her great presentation called Deep learning isn’t hard, I promise.
Her personal life is equally impressive: she has a wonderfully vibrant house in Seattle with her wife, son, two dogs, and three cats. I’m hoping that she might also one day adopt a certain co-author to fill out the very few empty spaces. She and her wife, Heather, have even given a presentation to a packed audience of 1,000 people eager to hear about how they used R to deploy machine learning models to production at T-Mobile. They also possibly have the best meet-cute story of all time: they met on the aforementioned show King of the Nerds, where Heather was also a competitor.
I’m very thankful to Jacqueline, who could have earned much more money for much less aggravation by doing anything other than writing this book with me. It is my hope that our work encourages aspiring and junior data scientists to become contributors to our community who are as great as Jacqueline is.
About the Cover Illustration
Saint-Sauver
The figure on the cover of Build a Career in Data Science is captioned Femme de l'Aragon,
or Aragon Woman.
The illustration is taken from a collection of dress costumes from various countries by Jacques Grasset de Saint-Sauveur (1757–1810), titled Costumes de Différents Pays, published in France in 1797. Each illustration is finely drawn and colored by hand. The rich variety of Grasset de Saint-Sauveur’s collection reminds us vividly of how culturally apart the world’s towns and regions were just 200 years ago. Isolated from each other, people spoke different dialects and languages. In the streets or in the countryside, it was easy to identify where they lived and what their trade or station in life was just by their dress.
The way we dress has changed since then and the diversity by region, so rich at the time, has faded away. It is now hard to tell apart the inhabitants of different continents, let alone different towns, regions, or countries. Perhaps we have traded cultural diversity for a more varied personal life—certainly for a more varied and fast-paced technological life.
At a time when it is hard to tell one computer book from another, Manning celebrates the inventiveness and initiative of the computer business with book covers based on the rich diversity of regional life of two centuries ago, brought back to life by Grasset de Saint-Sauveur’s pictures.
Part 1. Getting started with data science
If you do a Google search for how to become a data scientist, you’ll likely be confronted with a laundry list of skills, from statistical modeling to programming in Python through communicating effectively and making presentations. One job description might describe a role that’s close to a statistician’s, whereas another employer is looking for someone who has a master’s degree in computer science. When you look for ways to gain those skills, you’ll find options ranging from going back to school for a master’s degree to doing a bootcamp to starting to do data analysis in your current job. Put together, all these combinations of paths can feel insurmountable, especially to people who aren’t yet certain that they even want to be data scientists.
The good news is that there isn’t a single data scientist who has all these skills. Data scientists share a foundation of knowledge, but they each have their own specialties, to the point that many couldn’t swap jobs. The first part of this book is designed to help you understand what all these types of data scientists are and how to make the best decisions to start your career. By the end of this part, you should be prepared with the skills and understanding to start your job search.
Chapter 1 covers the basics of data science, including the skills you need for the job and the different types of data scientists. Chapter 2 goes into detail about the role of a data scientist at five types of companies to help you better understand what the job will be like. Chapter 3 covers the paths to getting the skills required for being a data scientist and the advantages and disadvantages of each. Finally, Chapter 4 covers how to create a portfolio of data science projects to get hands-on experience doing data science and create a portfolio to show to potential employers.
1 What is data science?
This chapter covers
· The three main areas of data science
· The different types of data science jobs
The sexiest job of the 21st century.
The best job in America.
Data scientist, a title that didn’t even exist before 2008, is now the position employers can’t hire enough of and job seekers strive to become. There’s good reason for the hype: data science is a hugely growing field, with a median base salary of more than $100,000 in the United States in 2019 (http://mng.bz/XpMp). At a good company, data scientists enjoy a lot of autonomy and are constantly learning new things. They use their skills to solve significant problems, such as working with doctors to analyze drug trials, helping a sports team pick its new draftees, or redesigning the pricing model for a widget business. Finally, as we discuss in chapter 3, there’s no one way to become a data scientist. People come from all backgrounds, so you’re not limited based on what you chose to study as an undergraduate.
But not all data science jobs are perfect. Both companies and job seekers can have unrealistic expectations. Companies new to data science may think that one person can solve all their problems with data, for example. When a data scientist is finally hired, they can be faced with a never-ending to-do list of requests. They might be tasked with immediately implementing a machine learning system when no work has been done to prepare or clean the data. There may be no one to mentor or guide them, or even empathize with the problems they face. We’ll discuss these issues in more depth in chapters 5 and 7, where we’ll help you avoid joining companies that are likely to be a bad fit for a new data scientist, and in chapter 9, where we’ll advise you on what to do if you end up in a negative situation.
On the other side, job seekers may think that there will never be a dull moment in their new career. They may expect that stakeholders will follow their recommendations routinely, that data engineers can fix any data quality issues immediately, and that they’ll get the fastest computing resources available to implement their models. In reality, data scientists spend a lot of time cleaning and preparing data, as well as managing the expectations and priorities of other teams. Projects won’t always work out. Senior management may make unrealistic promises to clients about what your data science models can deliver. A person’s main job may be to work with an archaic data system that’s impossible to automate and requires hours of mind-numbing work each week just to clean up the data. Data scientists may notice lots of statistical or technical mistakes in legacy analyses that have real consequences, but no one is interested, and they’re so overloaded with work that they have no time to try to fix them. Or a data scientist may be asked to prepare reports that support what senior management has already decided, so they may worry about being fired if they give an independent answer.
This book is here to guide you through the process of becoming a data scientist and developing your career. We want to ensure that you, the reader, get all the great parts of being a data scientist and avoid most of the pitfalls. Maybe you’re working in an adjacent field, such as marketing analytics, and wondering how to make the switch. Or maybe you’re already a data scientist, but you’re looking for a new job and don’t think you approached your first job search well. Or you want to further your career by speaking at conferences, contributing to open source, or becoming an independent consultant. Whatever your level, we’re confident that you’ll find this book helpful.
In the first four chapters, we cover the main opportunities for gaining data science skills and building a portfolio to get around the paradox of needing experience to get experience. Part 2 shows how to write a cover letter and resume that will get you an interview and how to build your network to get a referral. We cover negotiation strategies that research has shown will get you the best offer possible.
When you’re in a data science job, you’ll be writing analyses, working with stakeholders, and maybe even putting a model into production. Part 3 helps you understand what all those processes look like and how to set yourself up for success. In part 4, you’ll find strategies for picking yourself back up when a project inevitably fails. And when you’re ready, we’re here to guide you through the decision of where to take your career: advancing to management, continuing to be an individual contributor, or even striking out as an independent consultant.
Before you begin that journey, though, you need to be clear on what data scientists are and what work they do. Data science is a broad field that covers many types of work, and the better you understand the differences between those areas, the better you can grow in them.
1.1. What is data science?
Data science is the practice of using data to try to understand and solve real-world problems. This concept isn’t exactly new; people have been analyzing sales figures and trends since the invention of the zero. In the past decade, however, we have gained access to exponentially more data than existed before. The advent of computers has assisted in the generation of all that data, but computing is also our only way to process the mounds of information. With computer code, a data scientist can transform or aggregate data, run statistical analyses, or train machine learning models. The output of this code may be a report or dashboard for human consumption, or it could be a machine learning model that will be deployed to run continuously.
If a retail company is having trouble deciding where to put a new store, for example, it may call in a data scientist to do an analysis. The data scientist could look at the historical data of locations where online orders are shipped to understand where customer demand is. They may also combine that customer location data with demographic and income information for those localities from census records. With these datasets, they could find the optimal place for the new store and create a Microsoft PowerPoint presentation to present their recommendation to the company’s vice president of retail operations.
In another situation, that same retail company may want to increase online order sizes by recommending items to customers while they shop. A data scientist could load the historical web order data and create a machine learning model that, given a set of items currently in the cart, predicts the best item to recommend to the shopper. After creating that model, the data scientist would work with the company’s engineering team so that every time a customer is shopping, the new machine learning model serves up the recommended items.
When many people start looking into data science, one challenge they face is being overwhelmed by the amount of things they need to learn, such as coding (but which language?), statistics (but which methods are most important in practice, and which are largely academic?), machine learning (but how is machine learning different from statistics or AI?), and the domain knowledge of whatever industry they want to work in (but what if you don't know where you want to work?). In addition, they need to learn business skills such as effectively communicating results to audiences ranging from other data scientists to the CEO. This anxiety can be exacerbated by job postings that ask for a PhD, multiple years of data science experience, and expertise in a laundry list of statistical and programming methods. How can you possibly learn all these skills? Which ones should you start with? What are the basics?
If you’ve looked into the different areas of data science, you may be familiar with Drew Conway’s popular data science Venn diagram. In Conway’s opinion (at the time of the diagram’s creation), data science fell into the intersection of math and statistical knowledge, expertise in a domain, and hacking skills (that is, coding). This image