Forging Python PDF
Forging Python PDF
Miki Tebeka
This book is for sale at http://leanpub.com/forging-python
Forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Which Python? . . . . . . . . . . . . . . . . . . . . . . . . 11
Project Structure . . . . . . . . . . . . . . . . . . . . . . . 26
Managing Dependencies . . . . . . . . . . . . . . . . . . 33
Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 59
Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Going Faster . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Thanks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Contributors . . . . . . . . . . . . . . . . . . . . . . . . 128
Forward
There is a saying: “I’ve learned a lot from my teachers, more
from my peers and most from my students.”
I’ve been learning from teachers and peers for a long time, and
love the informal round table talks as a form of education.
I try to implement this method of teaching at my company
353solutions¹ and people really like it. As a bonus I’m learning
so much from my students.
Terry Pratchett said “Writing is the most fun you can have by
yourself.” I did have fun writing this book, not only from the
writing process itself but also from the discussions with the
people who helped. I am grateful to anyone who contributed
and taught me along the way.
I hope this book will inspire you to come out and talk to people
as a way of learning. You will get many different perspectives
on the problems your facing, and as Alan Kay said: “A change
in perspective is worth 80 IQ points.”
This book is open source, feel free to head over to https:
//github.com/tebeka/forging-python and submit bugs, offer
ideas and ask questions. I will do my best to improve this book
according to your suggestions.
Happy Hacking,
Miki Tebeka, April 2018
¹https://www.353solutions.com
Dedication
To Adi & Shira.
TL;DR
• Have a good mental model
• Aim for readability
• Don’t stop writing the first time the code
works
• Read other people’s code
• Find a mentor
• Learn how to speak the language
²⁰http://www.catb.org/esr/writings/taoup/html/ch01s06.html
²¹http://www.artima.com/weblogs/viewpost.jsp?thread=331531
²²http://bit.ly/2bu1ZGt
Which Python?
Gentlemen, choose your weapons.
- A Night in Casablanca
Graybeard: Trust me, there are days I regret it. But most days
I’m very happy - it fits my preferences. Which is exactly what
the Python you choose should do for you. So let me ask you -
what are your speed requirements?
Youngstar: The faster the better?
Graybeard: Then why not pick assembly as your program-
ming language? Even better - manufacture your own hard-
ware.
Youngstar: I see what you mean. I need write some business
requirements and then see if Python fits them. I have a hunch
it will.
Graybeard: In God we trust; all others must bring data.
Youngstar: Good one. Yours?.
Graybeard: Not mine - W. Edwards Deming.
Youngstar: I’ll spec and measure. Now let’s talk on Python 2
vs Python 3.
Graybeard: OK. Python 3 is the future, choose it.
Youngstar: That was easy! Should I tell it to all the people
who still use Python 2?
Graybeard: There are many good reasons to keep using
Python 2.
Youngstar: Because you’re and old fossil who can’t change?
Graybeard: Get off my lawn!
Youngstar: Sure, can I finish my beer first?
Graybeard: I’d say dependencies are the main reason. How-
ever the situation has improved significantly in the last couple
Which Python? 16
Youngstar: I don’t have plan for that now, and as you said
earlier switching is not that painful.
Graybeard: Just make sure you have a good test suite.
Youngstar: Will do, but testing is a big subject and we’re
getting to the point where my boyfriend gets jealous of you.
Final recommendation?
Graybeard: Don’t be lazy, do your homework and find the
right Python, or other programming language, for you. Note
that switching from one Python to another shouldn’t be that
difficult. At one place we had to switch from Python 3 to 2 due
to dependency issue, it took us about half a day to do that.
Youngstar: So the decision is not that crucial?
Graybeard: It is, don’t take it lightly. We were lucky the
switch was easy, you might not be.
Which Python? 18
TL;DR
• Choose CPython 3.x if you have a new
project with few dependencies
– Python 3 is the future
• Choose CPython 2.x if you have older code
base or dependencies that does not support
Python 3
• Choose Jython³⁹ if you need interaction
with Java
– Or if you’re in a Java shop and want
to sneak Python in the back door ;)
• Similarly, choose IronPyton⁴⁰ if you need
interaction with .NET
• Choose PyPy⁴¹ if you need some speed and
love living on the edge
• Use Anaconda⁴² distribution if you use a lot
of scientific Python packages
³⁹http://www.jython.org/
⁴⁰http://ironpython.net/
⁴¹http://pypy.org/
⁴²https://store.continuum.io/cshop/anaconda/
IDEs and Editors
All mail clients suck. This one just sucks less.
- Michael R. Elkins (mutt website⁴³)
Youngstar: OK. Let’s start with what you’re using. Why are
you using Vim?
Graybeard: As I said - it takes time to master Vim and get
used to its dual editing mode. However once you’ve mastered
Vim you’ll be super productive with it not just in Python but
with almost any other language. Vim itself is pretty bare-
bones editor, but it has a rich plugin ecosystem which can
transform it to a powerful IDE. One of the main advantages, at
least for backend developers like me, is that on most Unix like
systems - it’s already there. Vim can work in “terminal mode”
which does not require a windowing system. This means you
can SSH to a box and start editing. Oh - and you can write
Vim scripts in Python.
Youngstar: Isn’t Vim old?
Graybeard: In tech old usually means working - take me for
example.
Youngstar: Ha! What’s the other editor old developers use?
The lispy one?
Graybeard: Emacs⁴⁷?
Youngstar: That’s the one.
Graybeard: Emacs is a text editor that does everything⁴⁸. It
has excellent Python support with python-mode⁴⁹ and many
core Python developers use it.
Youngstar: Then why don’t you use it?
Graybeard: Since I picked the dark side of the editor war.
Youngstar: And something more modern?
⁴⁷http://www.gnu.org/software/emacs/
⁴⁸http://xkcd.com/378/
⁴⁹https://launchpad.net/python-mode
IDEs and Editors 21
TL;DR
• Give Vim or Emacs a try, they will rock
your world
– See here⁶⁴ on how to turn Vim into a
Python IDE
• PyCharm is a good choice
– Make sure you have plenty of RAM
– Also if you’re in a Java shop - there’s
probably a lot of knowledge on IntelliJ
(which PyCharm is based off)
• Visual Studio Code great
• If you’re in a Windows shop, give Visual
Studio a try
• If you’re doing a lot of scientific Python -
take a look at Spyder
⁶⁴http://unlogic.co.uk/2013/02/08/vim-as-a-python-ide/
Project Structure
organizations which design systems … are con-
strained to produce designs which are copies of the
communication structures of these organizations.
- Conway’s Law
⁶⁵https://github.com/blog/744-today-s-outage
Project Structure 28
1 archer
2 ├── README.md
3 ├── Makefile
4 ├── run_tests.py
5 ├── requirements.txt
6 ├── archer
7 │ └── __init__.py
8 ├── docs
9 └── tests
10 └── test_archer.py
TL;DR
• Start with an established project structure
(like GreyBeard’s example above)
• Separate code from tests
• Have a README with an elevator pitch and
development instructions
• Use a Makefile or other tool to automate
common tasks
• Have one script to run the tests
• Look into Sphinx⁷⁰ for generating docu-
mentation
– But only if you need to
⁷⁰http://www.sphinx-doc.org/en/stable/
Managing
Dependencies
Only the paranoid survive.
- Andy Grove
does basically the same work. And there also a newer tool
called pipenv⁷³.
Youngstar: That’s nice, one less dependency. What are the
differences between virtualenv, venv and pipenv?
Graybeard: With virtualenv you can specify a different
Python interpreter, for example even if your default Python is
3 you can still create a virtual environment with the Python
2 interpreter.
Also since venv is in the Python standard library, it’ll updated
only when a new version of Python is released. virtualenv
will probably have a faster release cycle.
pipenv combines pip and virtualenv to one tool.
TL;DR
• Depending on the cost of error - pick a
strategy for versioning
• Version your dependencies, write them
down and place them in source control
• Use wheels when possible
• conda is a good alternative to pip
• docker will give you even more control but
it comes with a cost
• You might want to invest in your own
internal package repository
• Have a process for evaluating new pack-
ages. Lean toward old and stable ones
Storage
Two rules of database systems
- Luca Candela
Youngstar: I don’t?
Graybeard: No - you need recovery. You’ll be surprised
how many companies had backups of their data but couldn’t
restore from it when time came.
Youngstar: So backup is part of recovery. How often should
I do it?
Graybeard: Again, depending on your audit and recovery
needs - this question can have very different answer. Another
thing is that backups tend to grow in size and accumulate,
have a good retention policy.
If you use a hosted database - that might take care of backup
and recovery for you.
Youngstar: Hosted?
Graybeard: Yup. And considering that they take all the
operations headache from you it might be a good solution.
Google has BigQuery⁹⁷, Amazon has Athena⁹⁸… and many
others.
An extra benefit for BigQuery and Athena is that they scale.
Both claim they can process billions of records in seconds.
Youngstar: Don’t they cost money?
Graybeard: TANSTAAFL⁹⁹. Don’t make the common mistake
of underestimating the cost of running your own servers.
Deployment, monitoring, alerting, backup and more - all take
time and effort. And developer time is expensive. In The
Art of Unix Programming¹⁰⁰ Eric Raymond says the rule
of Economy is: “Programmer time is expensive; conserve it
⁹⁷https://cloud.google.com/bigquery/what-is-bigquery
⁹⁸https://aws.amazon.com/athena/
⁹⁹There ain’t no such thing as a free lunch
¹⁰⁰http://www.catb.org/esr/writings/taoup/html
Storage 49
TL;DR
• Start simple, shelve¹⁰⁵ is a great option
• Know your data and queries before select-
ing a database
– Think of things like embedded vs
client server, SQL vs NoSQL vs Key/-
Value vs Graph …
• Consider hosted database - let someone else
wake at 3am
• Pick a mature database
• Make sure you can recover from backup
• Have a policy to trim your backups
¹⁰⁵https://docs.python.org/3/library/shelve.html
Testing
A computer lets you make more mistakes faster
than any invention in human history, with the
possible exceptions of handguns and tequila.
- Mitch Radcliffe
personally write tests after the first or second draft of the code
is working.
Youngstar: How do you know it’s working?
Graybeard: I try it out in the REPL.
Youngstar: The what?
Graybeard: REPL stands for “read eval print loop”, you might
also know it as “the interactive prompt”. You write little pieces
of code and test them as you go. After I’m done and happy
with the code, I write some tests.
People underestimate how much does the REPL help during
development, give it a try next time.
Youngstar: OK, I will. Which testing framework do you use?
Graybeard: I personally prefer pytest¹⁰⁸, I’ve used unittest¹⁰⁹
with discover mode as well.
Youngstar: Why do you prefer pytest?
Graybeard: I find pytest simpler, and I always go for simple.
Also love their parametrize fixtures¹¹⁰ which let you run the
same test with different input (AKA table driven testing).
Their xunit output¹¹¹ is great for Jenkins¹¹² integration as well.
Oh, and I also use tox¹¹³ for testing the same code on multiple
versions/implementations of Python.
Youngstar: I’ll start with pytest then, don’t need multi
version testing currently. How do I run the tests?
¹⁰⁸http://pytest.org/
¹⁰⁹https://docs.python.org/3/library/unittest.html
¹¹⁰https://docs.pytest.org/en/latest/parametrize.html
¹¹¹https://docs.pytest.org/en/latest/usage.html#creating-junitxml-format-files
¹¹²http://jenkins-ci.org/
¹¹³https://tox.readthedocs.org/
Testing 54
TL;DR
• Find the “gain vs pain” balance for your
tests
• Have one script to run tests
• Have a CI system, Jenkins is a good bet
• Separate tests to ones developers run and
ones Jenkins runs
• Cleanup on setup
• Make it impossible for tests to get into
production
• Avoid mocking as much as you can
• No matter how hard your test, some bugs
will slip though - be ready for this
¹²²https://docs.python.org/3/library/unittest.mock.html
¹²³https://pypi.python.org/pypi/mock
Configuration
Amateurs think about tactics, but professionals
think about logistics.
- General Robert H. Barrow
TL;DR
• Start simple. A Python based configuration
system with overrides will get you a long
way
• Know that most times you move configu-
ration complexity to another system.
• Learn about the various solutions out there
and what people do, then adapt to your
system what works.
• Give more than one way to specify con-
figuration. Usually we have default < con-
figuration file < environment variables <
command line switches
• Make sure “secrets” are protected in your
configuration system and not check into
source control
Debugging
If debugging is the process of removing bugs, then
programming must be the process of putting them
in.
- Edsger Dijkstra
1 if some_complex_condition():
2 import pdb; pdb.set_trace()
TL;DR
• Writing simple code will make debugging
easier
• Understand the bug before you fix it
• Know how to work a debugger. Both from
IDE and command line
• When fixing a bug try to make sure these
kind of bugs won’t happen again
• Use logs, err on the TMI side
• Use automation to make sure debugging
code doesn’t get to the source tree
• Give your subconscious time to work
Deployment
May the queries flow, and the pagers remain silent.
- SRE¹⁴¹ Benediction
Graybeard: We’ll fix that later. However today it’s more com-
mon for companies to host data outside. And even companies
that say “we host data ourselves” usually mean “on our hosted
servers”. Sometimes you can’t host data outside due to legal
reasons or some regulation.
Youngstar: IANAL, but I think I’m OK with hosting data
outside.
Graybeard: What most companies underestimate is the cost
of having your own servers. Scaling up becomes much more
painful, and you need people doing rotation who can drive
at 3AM to some Colo, have the right keys and know how to
reboot the servers.
Youngstar: Colo?
Graybeard: Short for “co-location centre”. It’s usually a se-
cure place for your servers with good network, security and
other goodies.
Youngstar: So not from the office network?
Graybeard: Sadly I’ve seen that too.
Youngstar: OK, I’ll start with the cloud then. Which one?
Graybeard: There are many options and many variables you
need to consider. As usual - some research required.
Youngstar: Such as pricing?
Graybeard: Pricing is one aspect. However most companies
don’t fathom how much time consuming operations can be.
Youngstar: And by time you mean money.
Graybeard: Exactly. I’d do my best to limit my operational
involvement.
Youngstar: OK, less ops is better. What else?
Deployment 73
TL;DR
• Don’t underestimate how much operations
will cost you in time and money
• Pick a solution that will reduce the opera-
tions burden
– Automate everything you can
• Do your homework. Learn about deploy-
ment methods, tools and procedures
• Be ready to roll back releases
• Mark release in your monitoring tools
Monitoring & Alerting
On a long enough timeline, the survival rate for
everyone drops to zero.
- “Fight Club” movie
TL;DR
• Identify your KPIs and monitor them
• Start with simple thresholds and move to
more sophisticated systems later
• Have a pager duty rotation, everyone
should pitch in
• Automate recovery as much as you can
• Update a “red book” for solving problems
• Do a postmortem for every outage
• Have ops drills
¹⁶²http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html
¹⁶³https://github.com/Netflix/SimianArmy
Security
First rule of computer security: don’t buy a com-
puter. Second rule: if you buy one, don’t turn it on.
- Dark Avenger
Youngstar: I was going over our HTTP logs and found some
weird stuff there.
Graybeard: “Little Bobby Tables”¹⁶⁴?
Youngstar: There was some SQL injection, some trying to
run script and other fishy requests. How do I protect myself
against such things?
Graybeard: One thing you need to keep in mind is that if
someone is really targeting you - you will get hacked. Hackers
managed to get into NASA, banks and many other secure
places.
Youngstar: So I should just give up?
Graybeard: Why do you lock your door when you leave the
house?
Youngstar: So bad people won’t be able to get in?
Graybeard: And you think that people who rob banks can’t
get in your house?
Youngstar: They’ll be able to. But I do it to deter most casual
thieves. Oh, I see where you’re going with this. I shouldn’t
make myself an easy target.
¹⁶⁴https://xkcd.com/327/
Security 86
TL;DR
• Get in a security mindset
• Appoint someone to be in charge of secu-
rity
• Make security part of your process. Do
security audits and look for violations in
code review
• Keep software up to date, make sure
patches happen
• Secure in layers. Invest in the ones that give
you the best benefit
• If you have money, hire a pentesting team
• Have a process to keep secrets
Going Faster
Write clear, precise code. Every ten years it will run
1,000 times faster.
- Joe Armstrong
doubling of speed for core Python that has occurred over the
last decade has occurred one little step at a time, none of the
them being individually “dramatic”²⁰⁸.
Youngstar: OK, I’ll remember that - baby steps.
Graybeard: And now, let’s continue with our baby steps
toward Ballmer Peak²⁰⁹.
Youngstar: Two beers coming up.
TL;DR
• Know the numbers you need to hit
• Don’t optimize before you measure
• Know the tools and algorithms available to
you
• Cheat whenever you can
• Think pain vs gain
• Hardware is cheap comparing to developer
time
• Moving to multi-machine setup is painful,
try to stay with one
• Do whatever you can in Python before
dropping to C
• Don’t expect miracles, small steps will get
you there
²⁰⁸http://bugs.python.org/issue25823
²⁰⁹https://xkcd.com/323/
Process
The only ‘best practice’ you should be using all the
time is ‘Use Your Brain’.
- Steven Robbins
Process Adoption
TL;DR
• Start with light process.
– Scrum is a good starting point
• Create a culture
• Fix the process, do retrospectives
• Use a chat room, add a nice bot to it
• Automate whatever you can
• Measure your process (metrics)
• Example process:
– Sprint start: Pick issues for sprint
– During sprint: Feature branches, code
review, communicate in chat room, CI
with Jenkins. No new issues during
sprint, morning standups
– After sprint: retrospective which
yields new issues
Time Management
I love deadlines. I like the whooshing sound they
make as they fly by.
- Douglas Adams
²³¹https://en.wikipedia.org/wiki/Time_management#The_Eisenhower_Method
Time Management 114
1 ^
2 |
3 +---------------+------------+
4 | | |
5 u | urgent | urgent |
6 r | not importnat | important |
7 g | | |
8 e |---------------+-------------+
9 n | | |
10 t | not urgent | not urgent |
11 | not important | important |
12 | | |
13 +---------------+-------------+-->
14 important
Youngstar: Google.
Graybeard: Google is a good start. How are your search
skills?
Youngstar: Hmm, never thought of that. I’d guess I’m an
average googler.
Graybeard: Searching is important, invest time in getting
better. Google has some excellent free online courses on
searching called Power Searching²³⁹. As a rule of thumb add
python to every query.
Graybeard: Edward Hodnett said “If you do not ask the right
questions, you do not get the right answers. A question asked
in the right way often points to its own answer.” Do you ask
the right questions?
Youngstar: I think so, not sure. Any guidelines?
Graybeard: I’ll give you two reading assignments. One is
“How To Ask Questions The Smart Way”²⁴² by Eric Raymond
where he goes into great details on how to formulate ques-
tions. The second is a research done by takipi people called
“The anatomy of a Great Stack Overflow Question²⁴³”.
Youngstar: Sure, I’ll add them to my pile of things to read.
Can you give me the TL;DR?
Graybeard: (sighs) Sure, but it doesn’t mean you’re off the
hook for reading these.
Youngstar: Agree.
Graybeard: The two main points are respecting people’s time
and giving enough context. If people feel that you’re asking
them a trivial question instead of RTFM - they won’t answer
you. However if they see that you tried something before
asking them - they’ll be more inclined to help. The other thing
is giving context. The worst questions you can get “Help, it
doesn’t work”. Give enough information of what you were
trying to do, what did you expect to happen and what did
actually happened.
Youngstar: This sounds a lot like bug reports.
Graybeard: Well, we’re in a technical world - so a lot of
²⁴²http://www.catb.org/esr/faqs/smart-questions.html
²⁴³http://www.takipiblog.com/2014/02/03/the-anatomy-of-a-great-stack-overflow-
question-after-analyzing-10000/
Asking Questions 122
things that don’t work seems like bugs. Check out PBKAC²⁴⁴
acronym sometime.
Also be polite and make sure to thank people. Oh, and don’t
feed the trolls.
Youngstar: Trolls are the ones who argue for just for the sake
of arguing?
Graybeard: Yeah, and they usually do it in a foul manner.
Youngstar: Any other advice?
Graybeard: Get a rubber duck.
Youngstar: What? Like the ones kids use in a bath?
Graybeard: What do you mean by “kids”? I have one like that.
Youngstar: For real?
Graybeard: You’re never too old to have a bath with your
rubber duck. Has it ever happen to you that you were stuck on
something, went to a co-worker for help, started to describe
the problem and somewhere in the middle said - “never mind,
I found the solution.”?
Youngstar: Sure, several times. How does this relate to a
rubber duck?
Graybeard: There’s something about formulating our ques-
tions verbally or in writing that helps us solve the problem.
The idea here is that instead of wasting a co-worker’s time,
you talk to the rubber duck. This is known as Rubber Duck-
ing²⁴⁵. Place a rubber duck next to your monitor and describe
your problems to it when you’re stuck.
²⁴⁴http://www.acronymfinder.com/Problem-Between-Keyboard-And-Chair-
(PBKAC).html
²⁴⁵http://c2.com/cgi/wiki?RubberDucking
Asking Questions 123
TL;DR
• Learn how to ask questions - it’s one of the
most valuable skills you’ll ever acquire
• Start at StackOverflow, they probably al-
ready have an answer
• Learn how to search properly
• Formulate your question before asking
– Talk to the rubber duck
• Know the difference between synchronous
and asynchronous mediums of communi-
cation
• #python IRC channel on freenode is
crowded but you can get help right now -
as long as you follow the rules
• Ask yourself questions to stay in focus and
to improve
Thanks
This book is the combined effort of many people.
I’m thankful to my family who were patient with me staring
at nothing or typing frantically on the keyboard at many
odd hours. I’m also thankful to many people who contributed
ideas, proofing and general help without asking anything in
return.
I’d like to say special thanks to my cousin Dorit Lewin who
encouraged me to change direction when the initial version of
this book bored me and indirectly shaped the conversational
format of this book.
Contributors
Here’s a list of people who contributed to the book. I apologize
to anyone who I forgot.
• Adi Tebeka
• Dean Fenster
• Dorit Lewin
• Eliran Bivas
• Iddo Berger
• Lior Rudnik
• Ran Tavory
• Raymond Hettinger
• Ro’ee Orland
Thanks 129
• Shay Priel
• Shmuel Amar
• Udi Oron
• Venkata Krishnan. V
• Yaki Tebeka