Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Ltcwbball Preview ALTVV

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Preview of Learn to Code with

Basketball

v0.1.2
Copyright Notice

Copyright © 2023 by Nathan Braun. All rights reserved.

By viewing this digital book, you agree that under no circumstances shall you use this book or any
portion of it for anything but your own personal use and reference. To be clear: you shall not copy,
re‑sell, sublicense, rent out, share or otherwise distribute this book, or any other Learn to Code with
Basketball digital product, whether modified or not, to any third party.

i
Contents

Note About This Preview 1

What People Are Saying 2

1. Introduction 4
The Purpose of Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
What is Data? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
More in the full version — . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
What is Analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Types of Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
More in the full version — . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
High Level Data Analysis Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1. Collecting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2. Storing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3. Loading Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4. Manipulating Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5. Analyzing Data for Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Connecting the High Level Analysis Process to the Rest of the Book . . . . . . . . . . . . . . 9

2. Python 10
Introduction to Python Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
How to Read This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Important Parts of the Python Standard Library . . . . . . . . . . . . . . . . . . . . . . . . 11
Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Interlude: How to Figure Things Out in Python . . . . . . . . . . . . . . . . . . . . . . 14
Bools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
if statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Container Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
More basic Python in the full version — . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

ii
Preview of Learn to Code with Basketball

7. Modeling 19
Introduction to Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
The Simplest Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
More modeling in the full version — . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Appendix A: Prerequisites: Tooling 23


Files Included with this Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Console (REPL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Using Spyder and Keyboard Shortcuts . . . . . . . . . . . . . . . . . . . . . . . . . . 29

v0.1.2 iii
Note About This Preview

The purpose of this preview is to help you decide if Learn to Code with Basketball is right for you. There
are a lot of options for learning data science and how to code, many of them free.

So, the decision about whether or not to learn from this book will come down to:

1. How interesting and motivating basketball and the NBA are to you as a topic.
2. How the book compares to alternatives, specifically whether it can save you enough time
(e.g. by explaining things clearly, putting everything in one place, focusing on the right stuff) to
justify paying for it vs working through some free tutorial or other book.

Don’t underestimate 1. Sports are a major gateway to coding, especially for the self taught crowd (like
me). There’s a reason Nate Silver started out by programming baseball models on the side at his day
job. The concepts you’ll learn here are applicable to any analysis problem, not just basketball.

But this preview is to help you get a sense of 2. Unless you value it very cheaply, your time is by far
the biggest cost when learning anything. It’s important to make sure this book is a fit. The preview
includes selections from the chapters 2 (basic Python) and 7 (modeling) to help you do that.

Though I’d recommend trying out Python, it does require some setup (see the appendix for more)
and I understand people may want to get a sense of the book’s teaching style at a glance. If that’s you,
check out modeling chapter (no code required) and learn linear regression while you’re at it!

Other highlights from the preview:

• The High Level Data Analysis Process gives an overview of data science generally.
• More String Methods and How to Figure Things Out in Python explains why I’m not teaching
you all 44 of Python’s options for working with text (spoiler: to not bog you down) and what you
should do instead.

I sincerely hope it’s useful. Happy coding!

Nate

1
What People Are Saying

“The book here was really, really well done…” ‑ Bill Connelly, ESPN

“This is amazingly awesome. I’ve recently slowly crept into data science driven by a pet passion
for fantasy sport analytics. …I’m roughly 40 pages into this book and the way the learning is
framed here is 10x what you’ll get someplace else.” ‑ u/Nick58

“I’m like ~40 pages in and the simple intro to python chapter is much more engaging for me
personally because it’s info I’m interested in. I’ve taken automate the boring stuff, python for
finance, etc and while those courses are great.. I seem to be understanding it better because
its about a subject I like.” ‑ u/financenstuff

“Incredible work! Bought it right away. Only 3 chapters in and this book is already better than
expected. Worth every penny. Thank you!” ‑ u/TheMotizzle

“…probably the best / most complete Pandas walk through I’ve seen.” ‑ Bill S

“I’ve probably picked up more, and at a better pace, using this than a lot of the free online tools
I’d been trying the past few months. ‑ Ryan P

“Love the book, bought it a before last season of the nfl. My python has come a very long way
thanks to you.” ‑ u/TomatoHead7

“…it helped me tremendously … I wouldn’t be where I’m at with the Python language today
without this book to kick start things.” ‑ u/F1rstxLas7

“I can’t tell you how many times I’ve tried to get into programming and gave up because it was
so dry. This has been such a nice change of pace and I’m loving it.” ‑ Paval M

2
Preview of Learn to Code with Basketball

“Just picked up LTCWFF last week and I am really enjoying it so far. This is exactly what I needed
to finally get past tutorial hell and apply Python to something I love.” ‑ Philip D

“I have always wanted to learn a language but always seemed to get discouraged by the ‘Hello
World’ chapters that were never ending. I like that your book cuts out the riff raff and teaches the
important things! I’m flying through the book and feel like I’m learning a ton! Best wishes
from a satisfied customer” ‑ Jason K

“I recently purchased LTCWFF and could not be more satisfied with the content. …it has been
great to work through your in‑depth examples learning new skills. I had a previous interest in
this sort of analysis and have had intermediate programming experience, but never could tie the
two together.” ‑ Owen B

“I purchased this the other day and thus far it’s been great refreshers for basic Python… I appre‑
ciate the Anki cards … they’re helping cement the terminology and such.” ‑ u/michaelmanieri

“…very informative and good intro to coding. Additionally, [Nate] would answer any ques‑
tions I emailed him within 24 hours. Excellent customer service and pushed new editions to ev‑
eryone who had already paid. I really appreciate [Nate]’s commitment to his product.” ‑ u/leds‑
deadbaby

“I was amazed by how you broke down complicated concepts and made them easier to un‑
derstand.” ‑ Ryan C

v0.1.2 3
1. Introduction

The Purpose of Data Analysis

The purpose of data analysis is to get interesting or useful insights.

• I’m a sports better, will the Orlando Magic win more or less than 22.5 games this season?
• I’m an NBA GM, who should I draft with my first pick?
• I’m a mad scientist, how many more champions would Michael Jordan won had he not retired
to play baseball?

Data analysis is one (hopefully) accurate and consistent way to get these insights.

Of course, that requires data.

What is Data?

At a very high level, data is a collection of structured information.

You might have data about anything, but let’s take a basketball game, say Lakers vs Clippers, October
22, 2019. What would a collection of structured information about it look like?

Let’s start with collection, or “a bunch of stuff.” What is a basketball game a collection of? How about
shots? This isn’t the only acceptable answer — a collection of players, teams, possessions, or quarters
would fit — but it’ll work. A basketball game is a collection of shots. OK.

Now information — what information might we have about each shot in this collection? Maybe: the
player shooting, where it is on the court, how much time is left, what the score was, whether it went
in, or whether it was a dunk.

Finally, it’s structured as a big rectangle with columns and rows. A row is a single item in our collection
(a shot here). A column is one piece of information (shooter, distance, etc).

This is an efficient, organized way of presenting information. When we want to know, “who took the
first shot in the second quarter, how far was it and did it go in?”, we can find the right row and columns,
and say “Oh, Patrick Beverley from 25 feet, and no.”

4
Preview of Learn to Code with Basketball

shot name period min_left sec_left dist made value


31 L. Williams 1 2 26 16 True 2
32 Q. Cook 1 2 10 1 True 2
33 A. Davis 1 1 49 1 True 2
34 M. Harrell 1 1 35 10 False 2
35 K. Leonard 1 0 59 8 True 2
36 Q. Cook 1 0 44 18 False 2
37 J. Green 1 0 20 24 True 3
38 P. Beverley 2 11 43 25 False 3
39 K. Leonard 2 11 25 15 True 2
40 K. Caldwell-Pope 2 11 13 26 False 3
41 K. Leonard 2 11 2 23 True 3
42 L. James 2 10 48 29 True 3
43 J. Dudley 2 10 19 24 True 3
44 T. Daniels 2 10 1 26 False 3
45 M. Harkless 2 9 50 1 True 2

More in the full version —

Excel. CSVs. Delimited data. More example datasets. Granularity.

v0.1.2 5
Preview of Learn to Code with Basketball

What is Analysis?

How many basketballs are in the following picture?

Figure 0.1: Few basketballs

Pretty easy question right? What about this one?

Figure 0.2: Many basketballs

Researchers have found that humans automatically know how many objects they’re seeing, as long
as there are no more than three or four. Any more than that, and counting is required.

v0.1.2 6
Preview of Learn to Code with Basketball

If you open up the player‑game data this book comes with, you’ll notice it’s 2114 rows and 31
columns.

From that, do you think you would be able to glance at it and immediately tell me who the “best”
player was? Worst? Most consistent or clutch? Of course not.

Raw data is the numerical equivalent of a pile of basketballs. It’s a collection of facts, way more than
the human brain can reliably and accurately make sense of and meaningless without some work.

Data analysis is the process of transforming this raw data to something smaller and more useful you
can fit in your head.

Types of Data Analysis

Broadly, it is useful to think of two types of analysis, both of which involve reducing a pile of data into
a few, more manageable number of insights.

1. Single number type summary statistics.


2. Models that help us understand relationships between data.

More in the full version —

Key to using summary statitics. Moneyball. Relationship between data and models. Modeling in prac‑
tice. Modeling the future.

v0.1.2 7
Preview of Learn to Code with Basketball

High Level Data Analysis Process

Now that we’ve covered both the inputs (data) and final outputs (analytical insights), let’s take a very
high level look at what’s in between.

Everything in this book will fall somewhere in one of the following steps:

1. Collecting Data

Whether you scrape a website, connect to a public API, download some spreadsheets, or enter it your‑
self, you can’t do data analysis without data. The first step is getting ahold of some.

This book covers how to scrape a website and get data by connecting to an API. It also suggests a few
ready‑made datasets.

2. Storing Data

Once you have data, you have to put it somewhere. This could be in several spreadsheet or text files
in a folder on your desktop, sheets in an Excel file, or a database.

This book covers the basics and benefits of storing data in a SQL database.

3. Loading Data

Once you have your data stored, you need to be able to retrieve the parts you want. This can be easy if
it’s in a spreadsheet, but if it’s in a database then you need to know some SQL — pronounced “sequel”
and short for Structured Query Language — to get it out.

This book covers basic SQL and loading data with Python.

4. Manipulating Data

Talk to any data scientist, and they’ll tell you they spend most of their time preparing and manipulat‑
ing their data. Basketball data is no exception. Sometimes called munging, this means getting your
raw data in the right format for analysis.

There are many tools available for this step. Examples include Excel, R, Python, Stata, SPSS, Tableau,
SQL, and Hadoop. In this book you’ll learn how to do it in Python, particularly using the library Pan‑
das.

v0.1.2 8
Preview of Learn to Code with Basketball

The boundaries between this step and the ones before and after it can be a little fuzzy. For example,
though we won’t do it in this book, it is possible to do some basic manipulation in SQL. In other words,
loading (3) and manipulating (4) data can be done with the same tools. Similarly Pandas — the primary
tool we’ll use for data manipulation (4) — also includes basic functionality for analysis (5) and input‑
output capabilities (3).

Don’t get too hung up on this. The point isn’t to say, “this technology is always associated with this
part of the analysis process”. Instead, it’s a way to keep the big picture in mind as you are working
through the book and your own analysis.

5. Analyzing Data for Insights

This step is the model, summary stat or plot that takes you from formatted data to insight.

This book covers a few different analysis methods, including summary stats, a few modeling tech‑
niques, and data visualization.

We will do these in Python using the scikit‑learn, statsmodels, and matplotlib libraries, which cover
machine learning, statistical modeling and data visualization respectively.

Connecting the High Level Analysis Process to the Rest of the Book

Again, everything in this book falls into one of the five sections above. Throughout, I will tie back what
you are learning to this section so you can keep sight of the big picture.

This is the forest. If you ever find yourself banging your head against a tree — either confused or won‑
dering why we’re talking about something — refer back here and think about where it fits in.

Some sections above may be more applicable to you than others. Perhaps you are comfortable ana‑
lyzing data in Excel, and just want to learn how to get data via scraping a website or connecting to an
API. Feel free to focus on whatever sections are most useful to you.

v0.1.2 9
2. Python

Introduction to Python Programming

This section is an introduction to basic Python programming.

Much of the functionality in Python comes from third party libraries (or packages), specially designed
for specific tasks.

For example: the Pandas library lets us manipulate tabular data. And the library BeautifulSoup is
the Python standard for scraping data from websites.

We’ll write code that makes heavy use of both later in the book. But, even when using third party
packages, you will also be using a core set of Python features and functionality. These features —
called the standard library — are built‑in to Python.

This section of the book covers the parts of the standard library that are most important. All the Python
code we write in this book is built upon the concepts covered in this chapter. Since we’ll be using
Python for nearly everything, this section touches all parts of the high level, five‑step data analysis
process.

How to Read This Chapter

This chapter — like the rest of the book — is heavy on examples. All the examples in this chapter are in‑
cluded in the Python file 02_python.py. Ideally, you would have this file open in your Spyder editor
and be running the examples (highlight the line(s) you want and press F9 to send it to the REPL/con‑
sole) as we go through them in the book.

If you do that, I’ve included what you’ll see in the REPL here. That is:

In [1]: 1 + 1
Out[1]: 2

Where the line starting with In[1] is what you send, and Out[1] is what the REPL prints out. These
are lines [1] for me because this was the first thing I entered in a new REPL session. Don’t worry if

10
Preview of Learn to Code with Basketball

the numbers you see in In[ ] and Out[ ] don’t match exactly what’s in this chapter. In fact, they
probably won’t, because as you run the examples you should be exploring and experimenting. That’s
what the REPL is for.
Nor should you worry about messing anything up: if you need a fresh start, you can type reset into
the REPL and it will clear out everything you’ve run previously. You can also type clear to clear all
the printed output.
Sometimes, examples build on each other (remember, the REPL keeps track of what you’ve run pre‑
viously), so if something isn’t working, it might be relying on code you haven’t run yet.
Let’s get started.

Important Parts of the Python Standard Library

Comments

As you look at 02_python.py you might notice a lot of lines beginning with #. These are comments.
When reading your code, the computer will ignore everything from # to the end of the line.
Comments exist in all programming languages. They are a way to explain to anyone reading your
code (including your future self) more about what’s going on and what you were trying to do when
you wrote it.
The problem with comments is it’s easy for them to become out of date. This often happens when you
change your code and forget to update the comment.
An incorrect or misleading comment is worse than no comment. For that reason, most beginning
programmers probably comment too often, especially because Python’s syntax (the language related
rules for writing programs) is usually pretty clear.
For example, this would be an unnecessary comment:
# print the result of 1 + 1
print(1 + 1)

Because it’s not adding anything that isn’t obvious by just looking at the code. It’s better to use de‑
scriptive names, let your code speak for itself, and save comments for particularly tricky portions of
code.

Variables

Variables are a fundamental concept in any programming language.

v0.1.2 11
Preview of Learn to Code with Basketball

At their core, variables1 are just named pieces of information. This information can be anything from
a single number to an entire dataset — the point is that they let you store and recall things easily.

The rules for naming variables differ by programming language. In Python, they can be any upper or
lowercase letter, number or _ (underscore), but they can’t start with a number.

While you can name your variables whatever you want (provided it follows the rules), the convention
in Python for most variables is all lowercase letters, with words separated by underscores.

Conventions are things that, while not strictly required, programmers include to make it easier
to read each other’s code. They vary by language. So, while in Python I might have a variable
assists_per_game, a JavaScript programmer would write assistsPerGame instead.

Assigning data to variables

You assign a piece of data to a variable with an equals sign, like this:

In [1]: three_pt_made = 4

Another, less common, word for assignment is binding, as in three_pt_made is bound to the number
4.

Now, whenever you use three_pt_made in your code, the program automatically substitutes it with
4 instead.

In [2]: three_pt_made
Out[2]: 4

In [3]: 3*three_pt_made
Out[3]: 12

One of the benefits of developing with a REPL is that you can type in a variable, and the REPL will
evaluate (i.e. determine what it is) and print it. That’s what the code above is doing. But note while
three_pt_made is 4, the assignment statement itself, three_pt_made = 4, doesn’t evaluate to
anything, so the REPL doesn’t print anything out.

You can update and override variables too. Going into the code below, three_pt_made has a value
of 4 (from the code we just ran above). So the right hand side, three_pt_made + 1 is evaluated

1
Note: previously we talked about how, in the language of modeling and tabular data, variable is another word for column.
That’s different than what we’re talking about here. A variable in a dataset or model is a column; a variable in your
code is named piece of information. You should usually be able to tell by the context which one you’re dealing with.
Unfortunately, imprecise language comes with the territory when learning new subjects, but I’ll do my best to warn you
about any similar pitfalls.

v0.1.2 12
Preview of Learn to Code with Basketball

first (4 + 1 = 5), and then the result gets (re)assigned to three_pt_made, overwriting the 4 it held
previously.

In [4]: three_pt_made = three_pt_made + 1

In [5]: three_pt_made
Out[5]: 5

Types

Like Excel, Python includes concepts for both numbers and text. Technically, Python distinguishes
between two types of numbers: integers (whole numbers) and floats (numbers that may have decimal
points), but the difference isn’t important for us right now.

In [6]: over_under = 216 # int


In [7]: fg_percentage = 0.48 # float

Text, called a string in Python, is wrapped in either single (') or double (") quotes. I usually just use
single quotes, unless the text I want to write has a single quote in it (like in D’Angelo), in which case
trying to use 'D'Angelo Russel' would give an error.

In [8]: starting_c = 'Karl-Anthony Towns'


In [9]: starting_pg = "D'Angelo Russel"

You can check the type of any variable with the type function.

In [10]: type(starting_c)
Out[10]: str

In [11]: type(over_under)
Out[11]: int

Keep in mind the difference between strings (quotes) and variables (no quotes). A variable is a named
of a piece of information. A string (or a number) is the information.

One common thing to do with strings is to insert variables inside of them. The easiest way to do that
is via f‑strings.

In [12]: starters = f'{starting_c}, {starting_pg}, etc.'

In [13]: starters
Out[13]: "Karl-Anthony Towns, D'Angelo Russel, etc."

Note the f immediately preceding the quotation mark. Adding that tells Python you want to use vari‑
ables inside your string, which you wrap in curly brackets.

v0.1.2 13
Preview of Learn to Code with Basketball

f‑strings are new as of Python 3.8, so if they’re not working for you make sure that’s at least the version
you’re using.

Strings also have useful methods you can use to do things to them. You invoke methods with a . and
parenthesis. For example, to make a string uppercase you can do:

In [14]: 'from downtown!'.upper()


Out[14]: 'FROM DOWNTOWN!'

Note the parenthesis. That’s because sometimes these take additional data. For example the
replace method takes two strings: the one you want to replace, and what you want to replace it
with:
In [15]: 'Ron Artest'.replace('Artest', 'World Peace')
Out[15]: 'Ron World Peace'

There are a bunch of these string methods, most of which you won’t use that often. Going through
them all right now would bog down progress on more important things. But occasionally you will
need one of these string methods. How should we handle this?

The problem is we’re dealing with a comprehensiveness‑clarity trade off. And, since anything short of
Python in a Nutshell: A Desktop Quick Reference (which is 772 pages) is going to necessarily fall short
on comprehensiveness, we’ll do something better.

Rather than teaching you all 44 of Python’s string methods, I am going to teach you how to quickly see
which are available, what they do, and how to use them.

Though we’re nominally talking about string methods here, this advice applies to any of the program‑
ming topics we’ll cover in this book.

Interlude: How to Figure Things Out in Python

“A simple rule I taught my nine year‑old today: if you can’t figure something out, figure out how
to figure it out.” — Paul Graham

The first tool you can use to figure out your options is the REPL. In particular, the REPL’s tab comple‑
tion functionality. Type in a string like 'lebron james' then . and hit tab. You’ll see all the options
available to you (this is only the first page, you’ll see more if you keep pressing tab).

'lebron james'.
capitalize() encode() format()
isalpha() isidentifier() isspace()
ljust() casefold() endswith()
format_map() isascii() islower()

v0.1.2 14
Preview of Learn to Code with Basketball

Note: tab completion on a string directly like this doesn’t always work in Spyder. If it’s not working for
you, assign 'lebron james' to a variable and tab complete on that. Like this2 :
In [16]: foo = 'lebron james'
Out[16]: foo.
capitalize() encode() format()
isalpha() isidentifier() isspace()
ljust() casefold() endswith()
format_map() isascii() islower()

Then, when you find something you’re interested in, enter it in the REPL with a question mark after it,
like 'lebron james'.capitalize? (or foo.capitalize? if you’re doing it that way).
You’ll see:
Signature: str.capitalize(self, /)
Docstring:
Return a capitalized version of the string.

More specifically, make the first character have upper case and
the rest lower case.

So, in this case, it sounds like capitalize will make the first letter uppercase and the rest of the
string lowercase. Let’s try it:
In [17]: 'lebron james'.capitalize()
Out[17]: 'Lebron james'

Great. Many of the items you’ll be working with in the REPL have methods, and tab completion is a
great way to explore what’s available.
The second strategy is more general. Maybe you want to do something that you know is string related
but aren’t necessarily sure where to begin or what it’d be called.
For example, maybe you’ve scraped some data that looks like:
In [18]: ' lebron james'

But you want it to be like this, i.e. without the spaces before “lebron”:
In [19]: 'lebron james'

Here’s what you should do — and I’m not trying to be glib here — Google: “python string get rid of
leading white space”.
2
The upside of this Spyder autocomplete issue is you can learn about the programming convention “foo”. When dealing
with a throwaway variable that doesn’t matter, many programmers will name it foo. Second and third variables that
don’t matter are bar and baz. Apparently this dates back to the 1950’s.

v0.1.2 15
Preview of Learn to Code with Basketball

When you do that, you’ll see the first result is from stackoverflow and says:
“The lstrip() method will remove leading whitespaces, newline and tab characters on a string
beginning.”

A quick test confirms that’s what we want.


In [20]: ' lebron james'.lstrip()
Out[20]: 'lebron james'

Stackoverflow

Python — particularly the data libraries we’ll be using — became popular during the golden age of
stackoverflow.com, a programming question and answer site that specializes in answers to small, self‑
contained technical problems.
How it works: people ask questions related to programming, and other, more experienced program‑
mers answer. The rest of the community votes, both on questions (“that’s a very good question, I was
wondering how to do that too”) as well as answers (“this solved my problem perfectly”). In that way,
common problems and the best solutions rise to the top over time. Add in Google’s search algorithm,
and you usually have a way to figure out exactly how to do most anything you’ll want to do in a few
minutes.
You don’t have to ask questions yourself or vote or even make a stackoverflow account to get the
benefits. In fact, most people probably don’t. But enough people do, especially when it comes to
Python, that it’s a great resource.
If you’re used to working like this, this advice may seem obvious. Like I said, I don’t mean to be glib.
Instead, it’s intended for anyone who might mistakenly believe “real” coders don’t Google things.
As programmer‑blogger Umer Mansoor writes,

Software developers, especially those who are new to the field, often ask this question… Do
experienced programmers use Google frequently?

The resounding answer is YES, experienced (and good) programmers use Google… a lot. In fact,
one might argue they use it more than the beginners. [that] doesn’t make them bad program‑
mers or imply that they cannot code without Google. In fact, truth is quite the opposite: Google
is an essential part of their software development toolkit and they know when and how to use it.

A big reason to use Google is that it is hard to remember all those minor details and nuances es‑
pecially when you are programming in multiple languages… As Einstein said: ‘Never memorize
something that you can look up.’

Now you know how to figure things out in Python. Back to the basics.

v0.1.2 16
Preview of Learn to Code with Basketball

Bools

There are other data types besides strings and numbers. One of the most important ones is bool (for
boolean). Boolean’s — which exist in every language — are for binary, yes or no, true or false data.
While a string can have almost an unlimited number of different values, and an integer can be any
whole number, bools in Python only have two possible values: True or False.

Similar to variable names, bool values lack quotes. So "True" is a string, not a bool.

A Python expression (any number, text or bool) is a bool when it’s yes or no type data. For example:

# some numbers to use in our examples


In [21]: team1_pts = 110
In [22]: team2_pts = 120

# these are all bools:


In [23]: team1_won = team1_pts > team2_pts

In [24]: team2_won = team1_pts < team2_pts

In [25]: teams_tied = team1_pts == team2_pts

In [26]: teams_did_not_tie = team1_pts != team2_pts

In [27]: type(team1_won)
Out[27]: bool

In [28]: teams_did_not_tie
Out[28]: True

Notice the == by teams_tied. That tests for equality. It’s the double equals sign because — as we
learned above — Python uses the single = to assign to a variable. This would give an error:

In [29]: teams_tied = (team1_pts = team2_pts)


...
SyntaxError: invalid syntax

So team1_pts == team2_pts will be True if those numbers are the same, False if not.

The reverse is !=, which means not equal. The expression team1_pts != team2_pts is True if the
values are different, False if they’re the same.

You can manipulate bools — i.e. chain them together or negate them — using the keywords and, or,
not and parenthesis.

v0.1.2 17
Preview of Learn to Code with Basketball

In [30]: shootout = (team1_pts > 130) and (team2_pts > 130)

In [31]: at_least_one_good_team = (team1_pts > 120) or (team2_pts > 120)

In [32]: you_guys_are_bad = not ((team1_pts > 100) or (team2_pts > 100))

In [33]: meh = not (shootout or


at_least_one_good_team or
you_guys_are_bad)

if statements

Bools are used frequently; one place is with if statements. The following code assigns a string to a
variable message depending on what happened.

In [34]
if team1_won:
message = "Nice job team 1!"
elif team2_won:
message = "Way to go team 2!!"
else:
message = "must have tied!"

In [35]: message
Out[35]: 'Way to go team 2!!'

Notice how in the code I’m saying if team1_won, not if team1_won == True. While the latter
would technically work, it’s a good way to show anyone looking at your code that you don’t really
understand bools. team1_won is True, it’s a bool. team1_won == True is also True, and it’s still a
bool. Similarly, don’t write team1_won == False, write not team1_won.

Container Types

Strings, integers, floats, and bools are called primitives; they’re the basic building block types.

There are other container types that can hold other values. Two important container types are lists
and dicts. Sometimes containers are also called collections.

More basic Python in the full version —

Lists. Dicts. Loops. Comprehensions. Functions. Defining your own functions. Side effects and why
they’re bad. Libraries. The os library.

v0.1.2 18
7. Modeling

Introduction to Modeling

The Simplest Model

Let’s say we want a model that takes in distance to the hoop and predicts whether a basket from there
will go in or not. So we might have something like:

basket or not = model(feet to basket)

Terminology

First some terminology: the variable “basket or not” is our output variable1 . There’s always exactly
one output variable.

The variable “feet to basket” is our input variable 2 . In this case we just have one, but we could have
as many as we want. For example:

basket or not = model(feet to basket, time left in game)

OK. Back to:

basket or not = model(feet to basket)

Here’s a question: what is the simplest implementation for model(...) we might come up with?

How about:

model(...)= No

So give it any distance from the hoop, and our model spits out: “no, it will not be a basket”. Since the
majority of shots don’t go in (average field goal percentage in our data is 46%), this model will be more
accurate than not! But since it never says anything besides no, it’s not that interesting or useful.
1
Other terms for this variable include: left hand side variable (it’s to the left of the equals sign); dependent variable (its
value depends on the value of distance to the basket), or y variable (traditionally output variables are denoted with y,
inputs with x’s).
2
Other words for input variables include: right hand side, independent, explanatory, or x variables.

19
Preview of Learn to Code with Basketball

What about:

prob basket = 1 + -0.01*distance in ft + -0.0000001*distance in ft ^ 2

So from 1 foot out we’d get a probability of 0.99, 3 feet 0.97, 10 feet 0.90 and 99 feet 0.000002. This
is more interesting. I made the numbers up, so it isn’t a good model (for 50 feet it gives about a 0.50
probability of a shot going in, which is way too high). But it shows how a model transforms inputs to
an output using some mathematical function.

Linear regression

This type of model format:

output variable =
some number + another number*data + yet another number*other data

is called linear regression. It’s linear because when you have one piece of data (input variable), the
equation is a line on a set of x, y coordinates, like this:

y = m*x + b

If you recall math class, m is the slope, b the intercept, and x and y the horizontal and vertical axes.

Notice instead of saying some number, another number and input and output data we use b, m, x and y.
This shortens things and gives you an easier way to refer back to parts of the equation. The particular
letters don’t matter (though people have settled on conventions). The point is to provide an abstract
way of thinking about and referring to parts of our model.

A linear equation can have more than one data term in it, which is why statisticians use b0 and b1
instead of b and m. So we can have:

y = b0 + b1*x1 + b2*x2 + ... + ... bn*xn

Up to any number n you can think of. As long as it’s a bunch of x*b terms added together it’s a linear
equation. Don’t get tripped up by the notation: b0, b1, and b2 are different numbers, and x1 and
x2 are different columns of data. The notation just ensures you can include as many variables as you
need to (just add another number).

In our probability‑of‑basket model that I made up, x1 was feet from the basket, and x2 was feet from
the basket squared. We had:

prob basket = b0 + b1*(distance in feet)+ b2*(distance in feet ^ 2)

Let’s try running this model in Python and see if we can get better values for b0, b1, and b2.

v0.1.2 20
Preview of Learn to Code with Basketball

Remember: the first step in modeling is making a dataset where the columns are your input variables
and one output variable. So we need a three column DataFrame with distance, distance squared, and
made basket or not. We need it at the shot level. Let’s do it.

In many ways, getting everything to this point is the whole reason we’ve learned Pandas, SQL, scraping
data and everything else. All for this:

In [1]: df[['made', 'dist', 'dist_sq']].head()


Out[1]:
made dist dist_sq
0 1 2 4
1 0 26 676
2 1 25 625
3 0 26 676
4 0 18 324

Now we just need to pass it to our modeling function, which we get from the third party library
statsmodels. We’re using the ols function. OLS stands for Ordinary Least Squares, and is another
term for basic, standard linear regression.

We have to tell smf.ols which column is the output variable and which are the inputs, then run it.
Once we’ve done that, we can look at the results:

v0.1.2 21
Preview of Learn to Code with Basketball

In [4]: results.summary2()
Out[4]:
"""
Results: Ordinary least squares
===================================================================
Model: OLS Adj. R-squared: 0.050
Dependent Variable: made AIC: 23533.8158
Date: 2022-06-08 09:59 BIC: 23557.0168
No. Observations: 16876 Log-Likelihood: -11764.
Df Model: 2 F-statistic: 441.0
Df Residuals: 16873 Prob (F-statistic): 2.15e-187
R-squared: 0.050 Scale: 0.23609
--------------------------------------------------------------------
Coef. Std.Err. t P>|t| [0.025 0.975]
--------------------------------------------------------------------
Intercept 0.6197 0.0067 91.8279 0.0000 0.6065 0.6330
dist -0.0177 0.0010 -17.7935 0.0000 -0.0197 -0.0158
dist_sq 0.0003 0.0000 8.2325 0.0000 0.0002 0.0003
-------------------------------------------------------------------
Omnibus: 68294.390 Durbin-Watson: 2.012
Prob(Omnibus): 0.000 Jarque-Bera (JB): 2269.755
Skew: 0.163 Prob(JB): 0.000
Kurtosis: 1.233 Condition No.: 797
===================================================================

"""

We get back a lot of information from this regression. The part we’re interested in — the values for b0,
b1, b2 — are under Coef (for coefficients). They’re also available in results.params.

Remember the intercept is another word for b0. It’s the value of y when all the data is 0. In this case,
we can interpret as the probability of making a basket when you’re right next to — 0 feet away — the
basket. The other coefficients are next to dist and dist_sq.

So instead of my made up formula from earlier, the formula that best fits this data is:

0.6197 + -0.017*dist + 0.0003*(dist ^ 2).

More modeling in the full version —

Prediction. Statistical significance. P Values. The replication crisis. Monte Carlo. “Holding things con‑
stant”. Fixed effects. Linear transformations. Interactions. Logistic Regression. CART Trees. Random
Forest.

v0.1.2 22
Appendix A: Prerequisites: Tooling

Files Included with this Book

This book is heavy on examples, most of which use small, “toy” datasets. You should be running and
exploring the examples as you work through the book.

The first step is grabbing these files. They’re available at:

https://github.com/nathanbraun/code‑basketball‑files/releases

Figure 0.1: LTCWBB Files on GitHub

23
Preview of Learn to Code with Basketball

If you’re not familiar with Git or GitHub, no problem. Just click the Source code link under the latest
release to download the files. This will download a file called code-basketball-files-vX.X.X.
zip, where X.X.X is the latest version number (v0.8.0 in the screenshot above).

When you unzip these (note in the book I’ve dropped the version number and renamed the directory
just code-basketball-files, which you can do too) you’ll see four sub‑directories: code, data,
anki, solutions-to-excercises.

You don’t have to do anything with these right now except know where you put them. For example,
on my mac, I have them in my home directory:

/Users/nathanbraun/code-basketball-files

If I were using Windows, it might look like this:

C:\Users\nathanbraun\code-basketball-files

Set these aside for now and we’ll pick them up in chapter 2.

Python

In this book, we will be working with Python, a free, open source programming language.

This book is hands on, and you’ll need the ability to run Python 3 code and install packages. If you can
do that and have a setup that works for you, great. If you do not, the easiest way to get one is from
Anaconda.

1. Go to: https://www.anaconda.com/products/individual

2. Scroll (way) down and click on the button under Anaconda Installers to download the 3.x version
(3.8 at time of this writing) for your operating system.

v0.1.2 24
Preview of Learn to Code with Basketball

Figure 0.2: Python 3.x on the Anaconda site

3. Then install it1 . It might ask whether you want to install it for everyone on your computer or just
you. Installing it for just yourself is fine.

4. Once you have Anaconda installed, open up Anaconda Navigator and launch Spyder.

5. Then, in Spyder, go to View ‑> Window layouts and click on Horizontal split. Make sure pane
selected on the right side is ‘IPython console’.

Now you should be ready to code. Your editor is on left, and your Python console is on the right. Let’s
touch on each of these briefly.

1
One thing about Anaconda is that it takes up a lot of disk space. This shouldn’t be a big deal. Most computers have much
more hard disk space than they need and using it will not slow down your computer. Once you are more familiar with
Python, you may want to explore other, more minimalistic ways of installing it.

v0.1.2 25
Preview of Learn to Code with Basketball

Figure 0.3: Editor and REPL in Spyder

v0.1.2 26
Preview of Learn to Code with Basketball

Editor

This book assumes you have some familiarity working in a spreadsheet program like Excel, but not
necessarily any familiarity with code.

What are the differences?

A spreadsheet lets you manipulate a table of data as you look at. You can point, click, resize columns,
change cells, etc. The coder term for this style of interaction is “what you see is what you get” (WYSI‑
WYG).

In contrast, Python code is a set of instructions for working with data. You tell your program what to
do, and Python does (aka executes or runs) it.

It is possible to tell Python what to do one instruction at a time, but usually programmers write mul‑
tiple instructions out at once. These instructions are called “programs” or “code”, and (for Python,
each language has its own file extension) are just plain text files with the extension .py.

When you tell Python to run some program, it will look at the file and run each line, starting at the
top.

Your editor is the text editing program you use to write and edit these files. If you wanted, you could
write all your Python programs in Notepad, but most people don’t. An editor like Spyder will do nice
things like highlight special Python related keywords and alert you if something doesn’t look like
proper code.

Console (REPL)

Your editor is the place to type code. The place where you actually run code is in what Spyder calls
the IPython console. The IPython console is an example of what programmers call a read‑eval(uate)‑
print‑loop, or REPL.

A REPL does exactly what the name says, takes in (“reads”) some code, evaluates it, and prints the
result. Then it automatically “loops” back to the beginning and is ready for new code.

Try typing 1+1 into it. You should see:

In [1]: 1 + 1
Out[1]: 2

The REPL “reads” 1 + 1, evaluates it (it equals 2), and prints it. The REPL is then ready for new in‑
put.

A REPL keeps track of what you have done previously. For example if you type:

v0.1.2 27
Preview of Learn to Code with Basketball

In [2]: x = 1

And then later:


In [3]: x + 1
Out[3]: 2

the REPL prints out 2. But if you quit and restart Spyder and try typing x + 1 again it will complain
that it doesn’t know what x is.
In [1]: x + 1
...
NameError: name 'x' is not defined

By Spyder “complaining” I mean that Python gives you an error. An error — also sometimes called an
exception — means something is wrong with your code. In this case, you tried to use x without telling
Python what x was.

Get used to exceptions, because you’ll run into them a lot. If you are working interactively in a REPL
and do something against the rules of Python it will alert you (in red) that something went wrong,
ignore whatever you were trying to do, and loop back to await further instructions like normal.

Try:

In [2]: x = 1

In [3]: x = 9/0
...

ZeroDivisionError: division by zero

Since dividing by 0 is against the laws of math2 , Python won’t let you do it and will throw (or raise) an
error. No big deal — your computer didn’t crash and your data is still there. If you type x in the REPL
again you will see it’s still 1.

We’ll mostly be using Python interactively like this, but know Python behaves a bit differently if you
have an error in a file you are trying to run all at once. In that case Python will stop and quit, but —
because Python executes code from top to bottom — everything above the line with your error will
have run like normal.

2
See https://www.math.toronto.edu/mathnet/questionCorner/nineoverzero.html

v0.1.2 28
Preview of Learn to Code with Basketball

Using Spyder and Keyboard Shortcuts

When writing programs (or following along with the examples in this book) you will spend a lot of your
time in the editor. You will also often want to send (run) code — sometimes the entire file, usually just
certain sections — to the REPL. You also should go over to the REPL to examine certain variables or try
out certain code.

At a minimum, I recommend getting comfortable with the following keyboard shortcuts in Spyder:

Pressing F9 in the editor will send whatever code you have highlighted to the REPL. If you don’t have
anything highlighted, it will send the current line.

F5 will send the entire file to the REPL.

You should get good at navigating back and forth between the editor and the REPL. On Windows:

• control + shift + e moves you to the editor (e.g. if you’re in the REPL).
• control + shift + i moves you to the REPL (e.g. if you’re in the editor).

On a Mac, it’s command instead of control:

• command + shift + e (move to editor).


• command + shift + i (move to REPL).

v0.1.2 29

You might also like