Linode UnderstandingDatabases ExtendedEdition
Linode UnderstandingDatabases ExtendedEdition
Understanding Databases:
Deploy High-Performance Database
Clusters in Modern Applications
by
Justin Mitchel
Akamai Technologies
Understanding Databases:
Deploy High-Performance Database
Clusters in Modern Applications
Justin Mitchel
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system or transmitted in any form by any means, electronic, mechanical, photocopying,
recording or otherwise without the prior permission of the publisher or in accordance
with the provision of the Copyright, Design and Patents Act 1988 or under the terms
of any license permitting limited copying issued by the Copyright Licensing Agency.
Published by:
Akamai Technologies
249 Arch Street
Philadelphia, PA 19106
I love you all and look forward to what our future brings!
Section 1
Introduction
Every web application requires a database. In your tech stack, the database enables your application to be truly
interactive, from storing contact form submissions to powering advanced personalization features. The term
“stack” explains how each technology used in your application is an individual component and has its own layer,
even if they are installed together right after a server is deployed or as part of an image.
Modern applications constantly change how we think about data and how relationships are created between
datasets. This rapidly evolving data landscape requires careful selection of a database management system
(DBMS)—or possibly multiple databases—that meets your users’ performance expectations. Here are just some
of the factors that shape your choice of a primary or auxiliary database.
In Part 1 of this ebook, you’ll develop a high-level understanding of industry-standard databases, the design
of database architectures, and different deployment methods in the cloud. Part 2 contains a practical project
application designed by Justin Mitchel of Coding for Entrepreneurs. The project includes step-by-step instructions
on how to use Django, Docker, and Linode Managed Databases for MySQL to create a production-ready
application from scratch.
WATCH NOW
Section 1
Evaluating the Strengths and
Limitations of Database Operations
Databases are anything but “one size fits all.” From privacy and compliance to supporting specific data types,
databases only get more advanced as an application scales. Even if you aren’t a database administrator (DBA),
you need to factor in database capabilities and limitations when adding new features or optimizing existing
features to your applications.
PostgreSQL
17.99%
MongoDB
17.89%
Redis
12.58%
Elasticsearch
10.52%
MySQL
9.76%
Firebase
8.09%
SQLite
7.74%
Cassandra
5.14%
DynamoDB
5.03%
Microsoft SQL Server
3.11%
MariaDB
2.93%
Oracle
2.8%
Couchbase
1.88%
IBM DB2
0.57%
0% 2.5% 5% 7.5% 10% 12.5% 15% 17.5% 20%
Share of Respondents
According to the survey, just under 18% of respondents identified PostgreSQL as one of their most-wanted database
skills. MongoDB ranked almost the same with software developers stating they are not developing with it, but want to.
Source: Statista
·
Relational Databases (also referred to as SQL Databases). Data is organized in tables, where the columns
of a table correspond to the data’s attributes, including the type of an attribute. For example, an employee
table for an HR software suite could contain a column that stores an integer value representing
an employee’s salary, and another column could store a text value representing their name. The rows
of the table represent instances of the data; for example, each row in an employee table would correspond
to a different employee. Information between multiple tables is linked via keys. Relational databases use
the Structured Query Language (SQL).
·
Non-Relational Databases (also referred to as NoSQL (“Not Only SQL”) Databases). Non-relational
databases have more flexible query methods. These query methods vary significantly by DBMS type.
There are five types of non-relational databases:
·
Columnar Data Stores: Data is organized in a similar row-and-column table structure like
in a relational database. However, the data is stored by column, and data from specific columns
can be fetched. In comparison, data in a relational database is stored by row, which allows you
to fetch specific rows from the table.
·
Key-Value Stores: Data is stored as a collection of key-value pairs, where data can be retrieved
by specifying a key that is associated with the data.
·
Document Stores: Data is stored in documents, like those on your computer’s filesystem.
·
Document Data Stores: A document store where the files’ type is JSON, XML, or some other
data-encoding format. The structure of the encoded data is flexible and can allow for complicated
queries.
·
Graph Databases: Data is stored using a graph structure, where entities correspond to nodes in the
graph and the edges between nodes represent relationships between entities. An example
of a graph structure would be the connections between users on a social network. Graph databases
offer efficient querying for highly-interconnected data.
Relational Non-Relational
ID: 249 dept: Linode
ACID Compliance:
· A tomic: Transactions are made up of components, and all components in a transaction must succeed.
Otherwise, the transaction will fail and the database will remain unchanged. A transaction must have
defined success or fail actions, or unit of work.
·
C onsistent: Successful transactions follow the database’s pre-defined parameters and restrictions.
·
I solated: Concurrent transactions are isolated from one another. The result of executing several
concurrent transactions will be the same as if they were executed in a sequence.
·
D urable: Once a transaction is written to the database, it will persist, even in the event of a system failure
for the server that the database is running on.
Out of the box, NoSQL databases are not ACID compliant, because they are designed to prioritize flexibility
and scalability instead of pure consistency.
BASE Model:
· Basically A vailable: Data is replicated and dispersed across clusters so that failures of part of a cluster
should still leave the data available in other locations.
· S oft State: Consistency is not immediately enforced across the database cluster after an item in the
database is updated. During this time, fetching an updated record from the database may result
in different values for different read operations.
· Eventually Consistent: Data will eventually reach consistency as updated records are replicated across
the nodes in the database cluster.
The clear difference between the ACID and BASE transaction models is seen in the practical use of SQL vs NoSQL
databases. NoSQL databases adhering to the BASE model are built for the flexibility and scalability required
by massive high-availability deployments. SQL databases adhering to the ACID model value consistency and
data integrity above all else.
SQL
MySQL is one of the most widely used database engines. MySQL is a component of the LAMP stack,
which serves as the foundation for many content management systems, including WordPress.
MySQL is known for its high performance (especially for processing read operations), ease of use,
and scalability. Originally founded as an open source project, MySQL was acquired by Oracle in 2010.
A free community edition of MySQL is still available, but some features and scalability require a paid
enterprise license.
Strengths
· Speed: Query limits and default read-only permissions make MySQL extremely performant,
and there are more options available for memory-optimized tables.
· Integrations: Due to its popularity, MySQL has a larger variety of third-party tools and integrations
than other database types.
Weaknesses
· Write Performance: Without optimizations, performance will decrease for write-heavy applications.
· Tiers: Features are split between the free Community edition and the paid Enterprise edition,
which limits access to features and potential scalability.
SQL
PostgreSQL, also known as Postgres, is considered to be the most advanced open source database and
a more robust SQL database alternative to MySQL. Postgres is also the stronger choice for workloads that
are dependent on database write operations and the ability to add custom data types. Postgres is open
source and all features are available without a paid license for any size deployment, in contrast to MySQL.
Strengths
· True Open Source: Postgres is maintained by a global community, and a single version of the
software is available that contains all features and has no fees.
· Extensibility: Postgres supports a wider range of data types and indices than MySQL, and additional
feature extensions are provided by the open source community.
Weaknesses
· Server Memory Demands: Postgres forks a new process for each new client connection.
Connections allocate a non-trivial amount of memory (about 10 MB).
· Increased Complexity over MySQL: Postgres adheres more strongly to SQL standards so it can
be a difficult starter database for standard web applications. For example, querying in Postgres
is case-sensitive.
NoSQL
MongoDB is a document database that stores data as JSON documents to provide more flexibility for
scaling and querying data based as an application evolves. MongoDB is a solution for database
requirements that stray away from relational data schemas. This approach allows users to start creating
documents without needing to first establish a document structure, which provides developers with
more flexibility when updating their applications.
Strengths
· Scalability: It’s in the name: Mongo, short for “humongous,” is built to store large volumes of data.
· Search Flexibility: Supports graph search, geo-search, map-reduce queries, and text search.
· Flexible Data Options: More functionality for temporary tables to support complex processes or test
new applications/features without needing to switch databases.
Weaknesses
· Concurrent Multi-Location Data Write Operations: Updating a consistent field in several locations
(e.g. a commonly used location) can cause lags in performance compared to a relational database
· Query Formatting: MongoDB can have a steeper learning curve for developers who are familiar
with other databases because queries are written with a JSON syntax, instead of SQL.
NoSQL
Redis (short for “REmote DIctionary Server”) is an open source in-memory database that is useful for
applications that require rapid data updates in real time. Redis stands out from relational databases
by using key-value pairs to store data, resulting in faster response times when fetching data. Redis is ideal
for projects with a clearly-defined data structure and when speed is more important than scalability.
Strengths
· Speed: Since Redis only stores data in memory, it’s a highly-performant database that rapidly
returns results.
· Benchmarking: The built-in benchmark tool, redis-benchmark, provides insights on the average
number of requests your Redis server is able to handle per second.
· Ease of Use: Supports a variety of extensions, including RediSearch, which provides full text search
for a Redis database.
Weaknesses
· Memory Restrictions: Since Redis is an in-memory store, all data must fit within your available
memory.
· Rigidity: Redis doesn’t have a query language, and entering functions via the available commands
can limit your ability to customize results.
NoSQL
Cassandra is an open source columnar database that supports very large amounts of structured data.
Cassandra does not declare a primary server; instead, data is distributed in a cluster of nodes, each
of which can process client requests. This provides an “always-on” architecture that is extremely
appealing for enterprise applications that cannot experience database downtime. Cassandra was originally
an open source project developed by Facebook in 2008 to optimize message inbox search. It was later
declared an Apache top-level project in 2010.
Strengths
· “Ring” Architecture: Multiple nodes are organized logically in a “ring.” Each node can accept read
and write requests so there is no single point of failure.
· Ease of Use: Cassandra’s query language, CQL, has a syntax similar to traditional SQL and has
a reduced learning curve for developers switching from a SQL database.
· Write Speed: Cassandra is efficient for writing very large amounts of data. When performing a write
operation, replicas of a new record are stored across multiple cluster nodes and these replicas are
created in parallel. Only a subset of those nodes need to complete a replica update for the write
operation to be considered successful, which means that the write operation can finish sooner.
Weaknesses
· Read Time: Records in a Cassandra database are assigned a primary key attribute. The value of the
primary key determines which cluster nodes a record is stored on. When querying data by primary
key, read performance is fast, because the node that stores the data can be found quickly. However,
querying data using attributes other than the primary key is slower.
· Not ACID-compliant out of the box. Instead, Cassandra offers several levels of trade-off between data
consistency and availability for developers.
Section 2
Designing Database Architecture
Many websites and web applications can be operated with a single-server configuration, where your database
exists on the same server as all of the application’s other software components. This can be appropriate when
your server (possibly when paired with external storage, like block storage) has enough compute and storage
resources to accommodate your application’s data and traffic. This approach is fine for smaller, general use
websites that might host some forms and media files. For example, a WordPress blog can run well on
a single-server setup.
As your application’s features and data requirements grow, your database infrastructure needs additional
flexibility and scalability to meet the performance expectations of your users or customers. Necessary changes
could include moving the database to its own compute instance(s), or designing an advanced architecture with
replicas, read-only instances, sharding, and/or other components. These changes can enhance both the
performance and security of your database.
Application performance is a shared responsibility. Even if you or your team doesn’t currently prioritize conversion
rates, poor application performance will come back to haunt you in other ways, including: decreased user trust,
violation of your application’s SLA (the guaranteed availability of your services), and an increase in support tickets.
Lee Atchison
Caching at Scale with Redis
To help get you started, here are five examples of database-specific reference architecture that show how these
setups differ, and their benefits.
A WordPress site commonly leverages a software stack called LAMP, which is composed of the Linux operating
system, Apache HTTP web server, MySQL database, and PHP programming language. LAMP is one of the most
popular web application software stacks in use. There are also alternatives to the LAMP stack. For example,
you can use Python and an associated web application framework (like Django) instead of PHP. Or, you can use
the NGINX web server instead of Apache; this is referred to as the LEMP stack (the “E” comes from the
pronunciation of NGINX, which is “Engine X”). There are many other variations of this concept, but the following
scenario focuses on a traditional LAMP stack.
2. You need to consider the network security and availability of your server. Many cloud providers offer
network filtering to prevent or reduce the impact of denial-of-service and other kinds of network attacks.
However, it’s still important to fine-tune your network security configuration with a firewall.
In this section’s example architecture, the firewall is a process that runs on your server. The firewall intercepts
network packets when they are first processed by the operating system and only allows traffic on certain
network ports to proceed. For a web application, your firewall would allow HTTP (port 80) and HTTPS (port 443)
traffic. Linux and other operating systems have built-in firewall options, like iptables, ufw, and FirewallD.
The resulting setup has one server running the LAMP stack where network traffic is filtered by a firewall also
running on the server. There are limitations for this setup:
· Growth and scaling: With one server, you’re confined to that server’s memory, compute, and disk space
resources. If your website starts seeing more traffic, you might need to increase the memory, compute,
and/or disk space of the server to handle the load. This is referred to as vertical scaling. This process
can vary in difficulty, depending on your cloud provider’s tooling, or on hardware availability for
on-premise deployments. Vertical scaling can only offer a limited solution for capacity, as there are
upper limits on the resources for a single server.
· Availability: Operating a single server means that there is a single point of failure for your web
application. If the server needs hardware maintenance, or if one of the software components in the
stack halts unexpectedly, then your web application will not be available.
To some extent, the issue of traffic growth can be addressed by moving the database to its own server, as in the
following diagram:
By doing this, the compute, memory, and disk space needs of your web server and database can be adjusted
independently. This is still an example of vertical scaling, and it does not resolve availability issues.
In the previous section, a two-server solution was presented that separated the web server from the database
server. While it did not solve vertical scaling’s problems, the separation of web servers and database servers
is an important step towards creating a horizontal scaling solution.
In a horizontal scaling solution, the single web server is replaced with a high availability cluster of web servers,
and the single database server is replaced with another, separate high availability cluster of database servers.
By having two separate clusters in your architecture for these services, you can horizontally scale them
independently of each other and choose server instances with different hardware that is better suited for each
workload.
There are several different ways to architect your database cluster, each with varying degrees of complexity
and different trade-offs. This section and the diagram below illustrate how to use MySQL replicas to facilitate
high availability.
In the diagram above, a load balancer receives traffic from the Internet. A load balancer is a service that forwards
and distributes traffic among another set of servers. Cloud providers generally offer load balancing as a service
(Linode’s NodeBalancer solution is pictured). This service can also be implemented with open source software,
including HAProxy, NGINX, and the Apache HTTP Server.
Before the forwarded traffic arrives at your web servers, it is intercepted by a cloud firewall. The cloud firewall
lifts the firewalling burden from the operating system to the network level. Many cloud providers offer managed
cloud firewalls, but this service can also be implemented with open source software on commodity hardware.
After being filtered, the traffic is distributed among a cluster of web servers. In many cases, these web servers
can all be identical clones of each other. You can add or remove web servers in this cluster as needed to handle
scaling traffic demand, which is an example of horizontal scalability. If a server fails, the load balancer will
re-route traffic to other healthy servers, so the service remains available. Server creation can be automated with
configuration management and Infrastructure as Code (IaC) tools like Ansible and Terraform.
The web servers in turn request data from a database cluster. Unlike the web servers, the database server are not
all identical in function. In particular, only one database is set to receive write operations from the web
application. This server is designated as the primary database. For example, if you operated a news website,
the new articles that your writers published would be added to the database on this server.
The other databases in this cluster act as replicas of the primary database. These servers receive all data added
to the primary database through an asynchronous process. Because this process is asynchronous, write
operations on the primary database are still fast to execute. These servers then receive all read operations from
the web server cluster.
MySQL offers tools to help add new replicas to the cluster when needed, so your applications’ read operations
can be horizontally scaled. However, there is only one primary database in this architecture, so write operations
cannot also be horizontally scaled. Still, this architecture can offer significant benefits to certain kinds
of applications. In the news organization website example, the source of high-traffic demand is from readers
of the website, not from writers adding articles. This traffic pattern aligns with the trade-offs in this setup.
There are a few other notable issues with this architecture. First, when a new write operation is received by the
primary database, it immediately persists the updated records.The database does not wait for the replication
process to complete, because that is asynchronous. This means that in the event of a primary database failure,
the replicas might not have the most recently updated records. Also, when the primary database fails, it acts
as a single point of failure for write operations. Still, read operations will continue on the replicas in this scenario,
so a website can continue to display existing content. If the primary database fails, MySQL provides tools
to manually promote one of the replicas to the position of the new primary database, after which write
operations can resume.
· MySQL Group Replication is a replication solution in which the database servers automatically coordinate
with each other. For example, in the event of a primary database outage, a replica server is automatically
promoted to the new primary position. MySQL Group Replication can be configured with a single primary
database, or it can be configured to have multiple primary databases that receive write operations.
However, maintaining multiple primary databases involves other trade-offs.
· InnoDB Cluster bundles a MySQL Group Replication cluster with MySQL Router, which facilitates routing
traffic from the web servers to the database cluster, and MySQL Shell, which is an advanced administration
client for the cluster.
· Galera is a multi-primary database solution where all databases in the cluster can receive write and read
operations using completely synchronous replication between them. It is also compatible with forks
of MySQL like MariaDB and Percona.
To address this issue, an architecture that implements database sharding can be used. With sharding, a single,
large table is split into multiple smaller tables. The process of splitting a table is referred to as partitioning.
When stored across multiple servers, these smaller tables are referred to as shards. The databases in your cluster
each store one of the shards. Together, the databases in the cluster constitute your full data set. For example,
if a human resources application stored employee records for 1,000 companies, but found that the dataset size
was too big for a single table, then the records could be split into two shards representing 500 companies each.
MySQL NDB Cluster is a solution that provides automatic sharding for MySQL. The diagram below shows
an example of how a MySQL NDB Cluster would fit into the example web application. This diagram omits the
load balancer, cloud firewall, and web server cluster as those remain the same:
The MySQL NDB Cluster in this diagram provides three MySQL servers that accept read and write SQL commands
from the web servers. The number of these servers can be horizontally scaled to meet demand and provide
failover. These servers do not store data. Instead, they update and retrieve records from a separate set of data
nodes.
The data nodes store shards of the dataset and run the ndbd data node daemon process. Each shard can have
multiple replica nodes, and you can configure how many replicas per shard there should be. The total number
of data nodes is equal to the number of shards multiplied by the number of replicas per shard. In the diagram
above, there are two shards, and two replicas per shard (a primary replica and a secondary replica), for a total
of four data nodes. Having multiple replicas per shard allows for recovery for failing nodes, and this recovery
process is automatic (similar to MySQL Group Replication).
Using sharding can offer very high scaling of dataset size, but it can also make your application logic more
complex. In particular, you need to carefully configure how your data is partitioned into multiple shards, because
this decision impacts overall database performance. Sharding requires a more in-depth understanding of your
database and the underlying infrastructure.
Incorporating Monitoring
Database monitoring comes in a variety of formats, ranging from high-level reporting of system health
to assessing granular operations that could impact application performance.
Monitoring Options
· DBMS monitoring extensions: Extensions or add-ons within the database layer that provide insights
on query efficiency, database connections, and more.
· Cloud provider monitoring tools: Most cloud providers include free monitoring for your database
infrastructure to show metrics like CPU usage and network transfer. Some cloud providers also offer
database monitoring as part of their managed database service.
· External monitoring tools: Database-specific monitoring tools are designed to provide more insight
beyond metrics like your underlying infrastructure’s CPU usage or network traffic. There is a wide variety
of these tools available that fit different workloads. Here are a few examples of free open source database
monitoring tools.
· Prometheus & MySQL Exporter: Built on top of the popular open source infrastructure monitoring
tool, Prometheus, MySQL Server Exporter allows you to create a series of collector flags.
· VictoriaMetrics: A time series database and monitoring solution to help process real-time metrics
using the PromQL, MetricsQL, and Graphite query languages. Best suited for small to medium size
database environments. Deploy via the Linode Marketplace.
· Percona Monitoring & Management: Optimize database performance and track behavior patterns
for MySQL, PostgreSQL, and MongoDB. Deploy via the Linode Marketplace.
If you just want to keep an eye on CPU usage, relying on more standard infrastructure monitoring solutions will
suffice. If you want to be able to observe performance at the query or table level, or check database logs during
very specific points in time, an external monitoring tool will give you additional insight that can benefit your
application.
Section 3
Deploying a Database with
a Managed Database Service
As shown in the previous section, there are many different ways you can deploy databases in the cloud and choose
between automated or manual maintenance. Users and customers have come to expect high-performance
applications as the norm, and a managed solution helps make this faster and simpler to achieve. As a result,
many cloud providers include a managed database service as part of their core offerings, also referred
to as Database as a Service (DBaaS).
Easily deploying a highly-available database via a managed service is a significant benefit for all kinds
of workloads and applications. A highly-available database is typically a cluster of three nodes, which is
composed of a primary database node and two replicas. While undergoing maintenance or in case of node
failure, there is another node available to make sure your application doesn’t experience downtime.
Depending on the cloud provider, a managed database service could include additional tools, including private
networking and custom database monitoring alerts.
In general, managed database services allow you to rapidly scale your database with minimal time spent
on administration. As a result, you can spend more time tuning and developing your application.
· Consider dependencies: Databases are tied to applications, which, in turn, can be tied to other
applications in your environment. Consider dependency mapping if you’re not moving your entire
IT ecosystem in unison.
· Compute resource compatibility: Check to make sure the provider’s compute plans and additional
products (like firewalls or load balancers) are compatible with the managed database service.
If they are not compatible, establish alternatives for those services before migrating.
· DBMS version support: In addition to making sure your preferred DBMS is supported by your cloud
provider’s DBaaS, check which versions it supports. You might need to upgrade your existing database
to a provider-supported version before performing the migration. When upgrading a database to a new
version, perform testing to make sure that your existing database features are compatible with the
new version.
· Network transfer allotment: Depending on the size of your current database and the network transfer
provided by the new cloud provider, migrating a database can be a hefty expense. Carefully examine how
network transfer is calculated, how much is included, and if the cloud provider has any special migration
assistance to reduce (or even eliminate) this upfront cost. Make sure to also estimate any outbound
network transfer costs from your original on premise deployment or cloud provider.
· Steps to connect your new database to your application: If your current database configuration was
set up in the past and not documented, make a list of the necessary tasks to point your application’s
DB connection to your new database deployment. Having this information documented will help mitigate
user impact and avoid downtime. These steps will vary depending on the rest of your application’s
architecture.
Choosing between a cloud provider’s managed database service and a self-managed database deployment
comes down to application requirements and personal preference. If you don’t need to maintain a specific
database version or use a specific operating system, paying for a managed service can save you time and
effort in the long run. As we reviewed in Section 3, there are many ways to set up your database architecture
depending on your application requirements and how you want to prioritize your development time.
Section 4
Managed Databases and the
Alternative Cloud
Alternative cloud providers have become credible competition for Amazon Web Services (AWS), Google Cloud
Platform (GCP), and Microsoft Azure. These cloud providers have historically maintained an oligopoly over the
cloud market. The “Big 3” are now being challenged by smaller alternative cloud providers including Linode
(now part of Akamai), DigitalOcean, and Vultr. Alternative cloud providers now command one-third of all
spending on cloud services, according to Techstrong Research (April 2022).
Alternative providers are comparable in terms of services offered, global availability, and more competitive
pricing. The overall market is responding positively to the viability of alternative providers. This is especially true
for small businesses and independent developers who don’t want or need the mass amounts of auxiliary and
proprietary services offered by the Big 3.
What makes alternative cloud providers a growing threat to the Big 3? They focus on platform simplicity with
core cloud offerings, human-centric support, and competitive pricing. According to a recent SlashData survey,
usage of alternative cloud providers has nearly doubled over the past four years, while usage of the Big 3 (AWS,
Azure, GCP) only grew 18%.
Recent high-profile outages from hyperscale cloud providers have created devastating impacts for production
applications. These outages have driven developers and businesses to consider alternative cloud providers
and to adopt a multicloud strategy to better protect their applications.
The Big 3 cloud providers are excellent at finding a pain point for database deployment or management.
These pain points are turned into additional paid services or irresistible features for their proprietary
(and usually expensive) managed database service. Some workloads definitely benefit from this; for example,
a MySQL database deployed using AWS Relational Database and AWS Redshift can provide faster access
to business insights. Also, while you can easily find another cloud provider that provides managed MySQL
database clusters, AWS provides solutions for more wide-ranging storage architectures, like data warehouses.
If you’re looking for a reliable cloud provider to host an industry-standard database as a managed or unmanaged
deployment, and your use case doesn’t require an enterprise-level tool (like the AWS Redshift example), it’s likely
that choosing one of the Big 3 won’t be worth the premium cost.
You can access the same (or better) hardware and network infrastructure with alternative cloud providers for
a fraction of the cost. Independent cloud benchmarking firm Cloud Spectator consistently finds that Linode
and other alternative cloud providers outperform the Big 3 in price-performance and, in many cases, overall
performance.
Knowledge is power. Knowing exactly what to shop for as a cloud consumer and how alternative cloud providers
compare is the key to building more performant and cost-efficient applications.
· Powerful Hardware: A reputable alternative provider will have comparable or better CPUs, storage,
and GPUs when lined up against hyperscale offerings.
· High Service Level Agreement: Look for at least 99.99% guaranteed uptime SLA and a public statement
about data rights that’s easy to find.
· Global Footprint: Look at the alternative provider’s current data center locations and expansion
roadmap, how their pricing differs for each data center, and how network traffic is routed between data
centers.
· 24/7/365 No-Tier Support: A lack of reliable and accessible technical support is a huge factor for why
developers with smaller workloads leave the hyperscale providers. Before making the switch or starting
new services, check the provider’s support services and search for loopholes.
· Extensive Documentation: A provider should have thorough and easy-to-follow documentation on their
platform, as well as more general tutorials on your database of choice, Infrastructure as Code tools,
and the other tech in your stack.
· Security and Compliance Information: Strict security requirements for data centers, networking,
and security-focused products are critical checkboxes when considering alternative cloud providers.
A managed database service is a great fit for use cases that require a relatively hands-off approach to database
maintenance and uptime. If your application doesn’t require a proprietary database service or other tool from
a specific cloud provider, looking to the alternative cloud for a managed service is a cost-effective and risk-averse
approach to your database deployment.
Section 5
Our Take
Performant and secure applications are critical to business growth and continuity and must run in tandem with
highly-tuned databases. To keep databases running fast and smoothly, every level of your infrastructure must
be considered, from hardware specs to consistent patching and maintenance. Fortunately, you no longer need
to be a DBA to manage a performant- and highly-available database.
The advancement of cloud services and open source tools makes it possible for any developer to streamline the
deployment and maintenance of a database. By eliminating database management overhead, you have more
time to develop innovative applications.
When considering different cloud databases, developers can “choose their own adventure” from a range
of options tailored to their needs and preferences. Alternative cloud providers make this endeavor more
accessible and cost-effective, and Linode empowers developers to focus on their code and build feature-rich
applications, instead of server and database administration. Actively expanding your knowledge of databases
will help you make the right decisions for your application, ensuring that you meet your users’ expectations
for performance, uptime, and security.
Extended eBook
Deploy Django to Linode using
Managed Databases for MySQL
About Linode
Akamai’s Linode cloud is one of the easiest-to-use and most trusted infrastructure-as-a-service
platforms. Built by developers for developers, Linode accelerates innovation by making cloud
computing simple, affordable and accessible to all. Learn more at linode.com or follow Linode on
Twitter and LinkedIn.
Welcome
This book explores how to sustainably and efficiently deploy Django into production on Linode. Each chapter
with step-by-step instructions goes with production-ready code available on our GitHub.
Since this book is about deploying Django into production, we’ll limit the amount of manual work and opt for
as much automation as possible. To do this, we’ll focus on these core areas:
•
CI / CD with Git, GitHub, and GitHub Actions
•
Django on Docker & DockerHub (as well as using WatchTower)
•
Load Balancing with NGINX
•
Production Databases with Managed MySQL by Linode
•
Local/Development use of production-like databases
•
Terraform to provision infrastructure on Linode
•
Ansible to configure infrastructure on Linode in tandem with Terraform
•
Django-based file uploads & Django static files on Linode Object Storage
This book focuses on getting Django into production, but most of what we cover can be adapted for many other
technologies as well. The reason? Our Django project will run through Docker with Docker containers. Docker is
a massive key to putting many, many different applications in production. It’s a great tool, and we’ll use it as the
backbone that Django runs on.
Before all the branded names for the tech stack start to scare you off, I want you to remember this who thing
is just a bunch of documents and nothing more. Just because it’s called Docker or Python or Terraform,
they are still all just documents that describe... something.
With that in mind, I want you to think of the above list like this:
☐ GitHub: A place to store our code, almost like a folder on your computer
☐ GitHub Actions: An app that automatically starts a chain of events related to testing (verifying the
document does what the writer intended), building (let’s put a stamp on this code and say that this
version is done), and running our code in production (find the code that should run the best, is ready,
and let’s and run that)
☐ Django: A bunch of python documents to handle the logic of an HTML website, store data, and handle
user data
☐ Docker: Essentially a document that we write to describe, on the operating system level, how our Django
app needs to run. Docker itself does more than this, but we mostly work on the document part
(a Dockerfile )
☐ Load Balancing with NGINX: A document that we write to tell nginx to send incoming website traffic
to the different servers running our Docker-based Django app
☐ Managed databases: a powerful spreadsheet that you never need to open or touch, thanks to Linode
and Django
☐ Terraform: A document describing all of the products and services we need from Linode (like virtual
machines) to run our Django project
☐ Ansible: A few documents we use to tell all of the products and services what software our Django
project needs to run
☐ Linode Object Storage: A file system (sort of) that allows us to store files that don’t change very often
like images and videos, as well as other documents like CSS and some kinds of JavaScript
If you read the above and thought to yourself, “well that’s overly simplistic” -- you’re right!
If you read the above and thought to yourself, “oh no, what am I doing with my life” -- you’re right!
The nice thing about what we’re doing here is taking the guesswork out of making this list work, work correctly,
and work reliably. Most of the technology we will cover here has been well established and will likely remain
unchanged for the foreseeable future.
Requirements
Software
Are you new to Python? Watch any of the video series on https://cfe.sh/topics/try-python
Are you new to Django? Watch any of the video series on https://cfe.sh/topics/try-django
That’s it. You just need some Python experience and some Django experience to do this book. If you’re new
at both and you do this book anyways, let me know how it goes: https://twitter.com/justinmitchel
I’m a big fan of Just In Time learning. That means learning a concept right before you need to use it. My parents
literally named me after this principal. That’s obviously not true, or is it?
Hardware
It’s important to note that almost all of this can be done just using GitHub and Linode, although I recommend
a computer where you have root access (admin access).
If you have access to the internet, you can use the in-browser coding experience on GitHub. What’s that,
you say? Just press period ( . ) on any GitHub repo, and it will give you the ability to write code inside of an
in-browser code editor.
1. https://github.com/codingforentrepreneurs/cfe-django-blog
2. https://github.com/codingforentrepreneurs/deploy-django-linode-mysql
The first repo is what we will be using as our base Django project. This Django project will evolve outside the
context of this book.
The second repo will be the final state as it relates to this book. This repo will only change if there are errors
or updates needed with the book and/or videos series.
We have all of our projects open-sourced on codingforentrepreneurs GitHub so you can use that as a resource
to learn from as well!
Asking Questions
The best way to get help is to formulate the question you have to be as clear and non-specific as possible.
Let’s say you get this error:
Before you ask for help, consider asking yourself the following:
☐ What did I run to get here?
☐ Did it work before? If so, did I skip a step to make it work?
☐ If someone knew nothing about my project, could they answer what is going on with this error?
☐ Did I quit my current session? Restart my computer?
☐ Should I just google the error I see ModuleNotFoundError: No module named ‘django’ ?
☐ What are 3 or 4 ways that I can ask these questions?
I have found that asking myself the above questions often solves the problem for me, but most of the time,
I find solutions in two places: Google and StackOverflow.
Aside from trying to self-diagnose the solution, asking for help often forces you to reframe the question and
thus find an answer during that process. I encourage you to ask questions whenever you get stuck. I don’t love
answering the same question over and over, but when I do, I realize that it’s an opportunity for me to explain
a given concept or problem better.
This book is the result of me getting a lot of questions over the years. These questions are asked both by students
I’ve taught, questions I’ve asked myself, and questions that other people are asking other people.
Questions are the foundation for finding more questions. Sure, we get answers along the way, but more and
better questions are ultimately about how we find excellence and perhaps meaning, in our lives.
Let’s not forget that the answer to the question in the previous section is simple: either Django is not installed,
or you didn’t activate the virtual environment, or both! Pat yourself on the back if you knew this, or pat yourself
on the back because you now know this. Either way, give yourself a pat on the back.
Appendix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Appendix A: GitHub Actions Secrets Reference. . . . . . . . . . . . . . . . . . . . . . . . 192
Appendix B: Final Project Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
Appendix C: .gitignore and .dockerignore Reference examples . . . . . . . . 196
Appendix D: Self-Hosted Runners with GitHub Actions. . . . . . . . . . . . . . . . . . . 201
Appendix E: Using the GitHub CLI for Setting Action Secrets. . . . . . . . . . . . . . . 203
Chapter 1
Getting Started
As with most web applications, I like starting with the application itself. This is certainly the most fun as well
as the part you’ll modify the most over the lifetime of the project. Applications are what you get to change
constantly and evolve to customers’ and users’ requests. Infrastructure has a keyword in its structure,
which means it’s stable. Or at least stable as in not changing.
Soon enough, we’ll be spending a lot of time ensuring our CI/CD pipeline is set so that our infrastructure is set.
For now, let’s get our Django application in play.
•
Option 1 Clone a pre-existing Django Project
•
Option 2 Create a fresh Django project
Taking this route assumes that you already how to build a Django project and use a virtual environment. If you
don’t, skip to Option 1 further in this chapter.
1. Clone Repo
Create
Don’t use built-in venv? Know how to create your own virtual environment? Do it. I can’t see what you’re doing.
source venv/bin/activate
Activate on Windows
.\venv\Scripts\activate
Install
# required keys
DJANGO_DEBUG=”1”
DJANGO_SECRET_KEY=”gy_1$n9zsaacs^a4a1&-i%e95fe&d3pa+e^@5s*tke*r1b%*cu”
DATABASE_BACKEND=”postgres”
# mysql db setup
MYSQL_DATABASE=”cfeblog-m-db”
MYSQL_USER=”cfeblog-m-user”
MYSQL_PASSWORD=”RaSNF5H3ElCbDrGUGpdRSEx-IuDzkeHFL_S_QBuH5tk”
MYSQL_ROOT_PASSWORD=”2mLTcmdPzU2LOa0TpAlLPoNf1XtIKsKvNn5WBiszczs”
MYSQL_TCP_PORT=3307
MYSQL_HOST=”127.0.0.1”
You should change these values as needed. DJANGO_DEBUG must not be 1 in production.
3.
Test variables are working
$(venv) python
This is for all you newbies out there that are looking to cross the chasm and get your Python skills to the next
level. Let's create a blank Django project so the rest of this book will build on top of that blank project.
If you are new to Django and you're doing this chapter, I applaud you. To me, getting your project into production
is just an amazing feeling and one worth celebrating. Give yourself a high five.
If Python 3.10 is not available on your machine, you can use Python 3.7 or later although some packages may not
be supported in older versions of Python.
mkdir -p ~/dev/cfeproj
Python virtual environments exist to isolate your python projects from one another so code versions do not
cause conflicting issues. For this book, I will be using the built-in venv module. If you prefer pipenv , poetry ,
virtualenv , virtualenvwrapper , or any other virtual environment manager, you can use those. This book
just assumes you have a virtual environment running and installed for your project.
macOS/Linux
Windows
C:\Python310 is not always going to be the default location of Python 3.10 if you install from python.org.
Every time you want to work on your project, you must activate the virtual environment:
macOS/Linux
source venv/bin/activate
Windows
.\venv\Scripts\activate
macOS/Linux
$(venv)
Windows
(venv)>
From here on out, you can use just python as your command:
macOS/Linux
$(venv) python -V
Windows
(venv)> python -V
Do you see a pattern? Activating your virtual environment removes the need to write venv/bin every time you
need to run a virtual-environment-only command.
5. Initialize Git
Version control is an absolute must and git is the best way to do it. This book was written using Git.
What is Git? Basically, a ~~nerdy~~ effective way to keep a record of file changes. Here’s a rough example of
a workflow:
1. Open a file.
2. Type something.
3. Save the file.
4. Tell Git about it.
5. Delete the file.
6. Ask Git for it, the file is back.
7. Open the file. Add stuff.
8. Tell Git about it.
9. Share the file with a friend.
10.
Friend ruins file and tells Git about it.
11.
You un-friend friend.
12.
Ask Git to revert the file back to before you gave it to the friend. Git does.
The key is, that Git tracks changes to files (if you tell it to) and allows you to look at those files at any state in the
past. It’s neat.
Version control can get very complex especially when you’re a large enough team and/or large project. In this
book, I will not be touching on how complex Git can get. I’ll mostly follow the lame example I laid out above.
git init creates what's called a "repository" in the folder you call it in. This is often referred to as a repo .
This repo is what tracks all of our changes over time. If you are super curious, take a look in the .git folder
inside your project right now.
Now we can track our project pretty simply with the following commands:
1.
git add .
2.
git commit -m “your message about the changes”
3.
git push --all
1.
I made changes to my current project’s code ( git add . or git add --all or git add path/to/
some/file or git add path/to/some/dir/ )
2.
I am sure I want them tracked by Git and here’s what I did “your message about the changes”
( git commit -m “some msg” )
3.
I want this code pushed to a remote repo like GitHub. This push often means pushing this code into
production. ( git push --all or git push origin master or git push origin main )
If you know git well, you’ll know that what I wrote above is pretty simplistic and skips a lot about the power
of using git . The point of this section is to help beginners get the job done.
If you are a beginner in git I recommend using any of the following tools:
• VS Code has built-in support for managing git and file changes. It’s a lot more visual than the git
CLI tool.
• GitHub Desktop I don’t use this one myself but many, many developers do and rely on it to manage their
git repos.
Create .gitignore
It’s important for git to ignore some files and some file changes. A .gitignore will do that for us.
For example, we do not want to track changes within venv or track files that contain passwords (like .env )
so we’ll do the following:
cd ~/dev/cfeproj
echo “venv” >> .gitignore
echo “.env” >> .gitignore
echo “*.py[cod]” >> .gitignore
echo “__pycache__/” >> .gitignore
echo “.DS_Store” >> .gitignore
This is a cross-platform friendly way to add items to a file. Here's the resulting file:
.gitignore
venv
.env
*.py[cod]
__pycache__/
.DS_Store
There's a lot more we can add to a .gitignore file. Consider reviewing [this GitHub Python .gitignore file]
6. Install Requirements:
Okay, now that we have version control setup, we'll need to start adding project dependencies that python calls
requirements (since it's often stored in requirements.txt ).
• django>=3.2,<4.0 : This gives us the latest long-term support version of Django (known as the most
recent LTS)
• gunicorn : our production-grade web server
• python-dotenv : for loading environment variables in Django
• black (optional): for auto-formatting code
• pillow : also known as the Python Image Library mainly for the ImageField in Django
• boto3 : for using Linode Object Storage with Python
• django-storages for using Linode Object Storage with Django (depends on boto3 )
• mysqlclient : for connecting Django to a production-grade MySQL database.
To install these packages, we'll create requirements.txt in the root of our project with the following
contents:
django>=3.2,<4.0
## our production-grade webserver
gunicorn
## loading in environment variables
python-dotenv
# for formatting code
black
## for image file uploads in Django
pillow
## for leveraging 3rd party static/media file servers
boto3
django-storages
## for mysql databases
mysqlclient
Upgrade pip :
Pip likes to be upgraded from time to time. Let's do that now to start on a good note.
Verify
asgiref==3.5.0
black==22.3.0
boto3==1.21.28
botocore==1.24.28
click==8.1.0
Django==3.2.12
django-dotenv==1.4.2
django-storages==1.12.3
gunicorn==20.1.0
jmespath==1.0.0
mypy-extensions==0.4.3
mysqlclient==2.1.0
pathspec==0.9.0
Pillow==9.0.1
platformdirs==2.5.1
python-dateutil==2.8.2
pytz==2022.1
s3transfer==0.5.2
six==1.16.0
sqlparse==0.4.2
tomli==2.0.1
urllib3==1.26.9
Please note that some version(s) and packages will be different. The requirements.txt shows the
requirements (aside from version numbers) that must be the same.
Let's take a look at what our current directory looks like (without venv ):
.
cfeproj
__init__.py
asgi.py
settings.py
urls.py
wsgi.py
manage.py
requirements.txt
The above file layout was created using tree --gitignore ( brew install tree )
Boot it up
8. Clone Project
Now that you can create a Django project from scratch, it’s time to copy one that I have already created for you.
This book isn’t going to explore everything about Django, but it will give you a hands-on practical example
of bringing a real Django project into production.
If you feel like you could use a bit more time learning the basics of Django (which I highly recommend if you’re
lost at all at this point) please consider watching any version of my Try Django series on https://cfe.sh/topics/
try-django or my youtube channel https://cfe.sh/youtube.
Chapter 2
Git & GitHub Actions
As we write code, it’s important to leverage version control, also known as git . If you don’t have Git installed
and set up on your local machine with your Django code, go back to Getting Started and locate the initialize Git
section.
Using version control allows us to move our code around and use services like GitHub, GitLab, Bitbucket,
and many others. In this book, we’re going to be working around GitHub and GitHub Actions.
GitHub is a place to store code. GitHub Actions is a place to run workflows & pipelines mostly based on this code.
GitHub Actions enable us to do what’s called Continuous Integration and Continuous Delivery (also known as CI/
CD).
CI/CD is the practice of constantly improving your app by automating the stages of development and
deployment.
For us, CI/CD means that we’ll use GitHub Actions to run the following pipeline (in this order too):
1.
Update/Store/Retrieve our Production Secrets & Environment Variables
2.
Test our Django code against an Ephemeral Test MySQL Database ( python manage.py test )
3.
Add our Django Static Files to Linode Object Storage ( python manage.py collectstatic )
4.
Build our Docker container for our Django Application ( docker build -t myuser/myapp -f Docker
file . )
5.
Push our Docker container into DockerHub ( docker push myuser/myapp --all-tags )
6.
Provision any infrastructure changes via Terraform ( terraform apply -auto-approve )
7.
Configure our infrastructure via Ansible ( ansible-playbook main.yaml )
8.
Promote our Docker-based Django app into Production via Ansible ( ansible-playbook main.yaml )
In this chapter we’re going to focus on the following pieces of the above CI/CD pipeline:
•
Creating a GitHub repo (repository)
•
Adding/updating GitHub Secrets
•
Creating our first GitHub Actions workflow for testing Django
The remainder of the book will be dedicated to building out our CI/CD pipeline in GitHub Actions.
Assumptions
•
You have Django code locally as we did in this section
•
You have Git installed and initialized as we did in this section
•
You already have a .gitignore as we did in this section
•
You have a GitHub account (if not, sign up on https://github.com)
After we have code, it’s important to start working with version control and GitHub. If you’re not going to use
GitHub my next recommendation would be GitLab. Doing this process in GitLab is similar, but we won’t cover
it in this book.
This chapter will set up an automated way to test our Django code after we push it to GitHub.
$ git remote -v
We do not need this repo attached any longer, let’s remove it:
Options:
• Repository name : Pick a name or use Deploy Django
• Public or Private (I like to pick private unless I need to share)
Leave everything else blank such as README , .gitignore , license because our cloned project
•
in Getting Started already has these things.
You can make this new repository private or public. I often stick to private for my production workflows that
don’t need to be exposed to the public.
• https://github.com/your-username/your-private-repo
• https://github.com/your-username/your-private-repo.git
Typically when you leave README , .gitignore , and license blank, you should see something like this:
Using git push --set-upstream origin main is nice because then you can just run git push instead of
git push origin main
If you go to your repo and then the Actions tab (under https://github.com/your-username/your-pri-
vate-repo/actions ) you should see a workflow already running.
1. Test our Django code with a MySQL Database ( python manage.py test )
2. Add our static files to Linode Object Storage ( python manage.py collectstatic )
3. Build a Docker container for our Django app ( docker build )
4. Login & push our container to Docker Hub ( docker push )
5. Terraform to apply changes (if any) to infrastructure via our GitHub repo Secrets ( terraform apply )
6. We’ll use Ansible to configure (or reconfigure) our infrastructure via our GitHub repo Secrets
( ansible-playbook main.yaml )
Each step above will be discussed in detail in future chapters so we’ll skip a lot of the discussion and get right
down to action, GitHub Action (Dad jokes for the win!).
My approach is to create 7 workflows to handle the 6 steps. Each step will have its own workflow with some
overlaps as needed.
A workflow is simply a series of steps GitHub Actions needs to take. The format is in yaml so it might be a bit
strange to you but I promise it will make sense after you use it for some time.
Every GitHub Action workflow uses the yaml format. It is stored in .github/workflows right next to your
code and your .git folder. The workflow file looks essentially like this:
yaml
name: Give me a Name, Any Name
workflow_call:
Allows you to run this workflow manually from the GitHub Actions tab
and the GitHub API
workflow_dispatch:
push:
branches: [main]
pull_request:
branches: [main]
A workflow run is made up of one or more jobs that can run sequentially or in parallel jobs:
my_first_workflow_job:
runs-on: ubuntu-latest
This allows you to change the directory you want this job to run in
defaults:
run:
working-directory: ./src
env:
HELLO_WORLD: true
Add in a service (like MySQL, Postgres, Redis, etc) you need running during
your “my_first_workflow_job” job
services:
mysql:
image: mysql:8.0.28
steps:
Like with everything, I believe in Just-In-Time learning. GitHub Actions is just a tool that follows a yaml
document’s configuration. VS Code has GitHub workflow support. (GitHub workflow is the name of the
document that defines a GitHub Action.)
As far as I can tell, you have an unlimited amount of workflow files in your GitHub repo. There is a limit to the
number of jobs that you can run, but the number of files seems to be uncapped.
Understanding every nuance of the GitHub Action workflow file is not a trivial task. The GitHub Actions
documentation is fantastic and worth a look.
yaml
on:
workflow_call:
workflow_dispatch:
push:
branches: [main]
pull_request:
branches: [main]
These are all GitHub Action triggers. These signal to GitHub Actions when this trigger should run. There are many
more trigger options but these, I would argue, are by far the most commonly used.
These are the only triggers we'll end up using throughout this book.
The key with this workflow is that we have run python manage.py test . If the test, fails, the workflow fails.
Once we have everything configured, this means the rest of our CI/CD pipeline will not run.
cd path/to/your/project
mkdir -p .github/workflows/
Using -p with mkdir means it will create both folders here. This command should also work in PowerShell
on Windows
If you cloned our project in [Getting Started](./02-getting-started.md), then you may already have:
• .github/workflows/test-django-mysql.yaml
• .github/workflows/test-django-postgres.yaml
yaml
A workflow run is made up of one or more jobs that can run sequentially
or in parallel jobs:
django_mysql:
runs-on: ubuntu-latest
defaults:
run:
working-directory: ./
env:
MYSQL_DATABASE: cfeblog_db
MYSQL_USER: cfe_blog_user # do *not* use root; cannot re-create the root user
MYSQL_ROOT_PASSWORD: 2mLTcmdPzU2LOa0TpAlLPoNf1XtIKsKvNn5WBiszczs
MYSQL_TCP_PORT: 3306
MYSQL_HOST: 127.0.0.1
GITHUB_ACTIONS: true
DJANGO_SECRET_KEY: test-key-not-good
DATABASE_BACKEND: mysql
services:
mysql:
image: mysql:8.0.28
env:
python-version: “3.10”
- name: Install requirements
run: |
pip install -r requirements.txt
- name: Run Tests
env:
python-version: “3.10”
- DEBUG:
name: Install
“0” requirements
run:
# must
| use the root user on MySQL to run tests
pip
MYSQL_USER:
install -r“root”
requirements.txt
- name: Run Tests
If any of these above items fail, the workflow will fail and alert us (this includes emailing our GitHub account).
These failures are great because they halt a push into production with code that doesn't pass tests.
How amazing is that?
In this workflow, we did see some django-template-like syntax in this workflow with ${{ env.DATABASE_BACK
END }} . This is how we can do string substitution within a workflow. In this case, ${{ env.DATABASE_BACKEND
}} refers to the jobs block, the django_mysql job, the env block, and finally DATABASE_BACKEND .
This same sentence I look at it like:
• jobs:django_mysql:env:DATABASE_BACKEND
or
• jobs["django_mysql"]["env"]["DATABASE_BACKEND"]
Now, everything in jobs:django_mysql:env: is hard-coded to this workflow. In the Action Secrets section,
we'll discuss how to abstract these variables away into secrets.
It's important to keep in mind that what you can do with GitHub Actions is nearly unlimited. What I have in this
book is simply an approach that has served me very well on dozens of projects and I hope it does the same for
you. If you find an approach that works better, please use it and tell me about it!
https://github.com/your-username/your-private-repo/actions
Replace your-username and your-private-repo with the correct values from when you created a new repo
above.
At this point, you should see the 1- Test Django & MySQL workflow listed. With this workflow listed you can
do the following:
• Run this workflow automatically when you do git push origin main (because of the
push:branches: [main]: configuration)
• Run this workflow manually (one-off) when you click "Run Workflow" in the GitHub Actions tab
(because of the workflow_dispatch: configuration)
• Run this workflow automatically from another workflow inside your repo (more on this later.)
(because of the workflow_call: configuration)
If you don't see this workflow listed, here are some possible reasons:
•
You didn't save it in the correct location (should be under .github/workflows )
•
You didn't save in the correct file extension (should be .yaml or .yml )
•
You didn't change the git remote origin to your repo correctly
•
You didn't commit and/or push your code to GitHub correctly
The last workflow we'll create in this chapter is the .github/workflows/all.yaml workflow. This one will
be the primary entry point to all other workflows. The reason is simple: I want to ensure the workflows run in the
order I need them to run in and if any workflow fails, the entire list stops in its tracks.
1. Run this
2. Then this
3. Then this
4. If this fails, stop for good
5. (Won't be run, 4 failed)
6. (Won't be run, 4 failed)
7. (Won't be run, 4 failed)
8. (Won't be run, 4 failed)
9. (Won't be run, 4 failed)
10. (Won't be run, 4 failed)
11. (Won't be run, 4 failed)
12. (Won't be run, 4 failed)
yaml
name: 0 - Run Everything
on:
workflow_dispatch:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
test_django:
uses: ./.github/workflows/test-django-mysql.yaml
yaml
push:
branches: [main]
pull_request:
branches: [main]
Now if you go to your GitHub Actions tab on your repo, you should see that all.yaml is running and if you click
on the running job, you should see something like:
If you see:
This is because we have multiple versions of python related to these blocks in ./.github/workflows/test-
django-mysql.yaml :
yaml
strategy:
matrix:
python-version: ["3.8", "3.9", "3.10"]
# Steps represent a sequence of tasks that will be executed as part of the job
steps:
# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
- name: Checkout code
uses: actions/checkout@v2
- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
In the Replace the test-django-mysql.yaml file section, we do not have this code. That’s because there is no
need to test multiple versions of Python if we have no intention of using them in production.
In future chapters, we’re going to be adding new workflows to this project. Whenever we need that new
workflow to run automatically, we need to update all.yaml much like this:
yaml
name: 0 - Run Everything
on:
workflow_dispatch:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
test_django:
uses: ./.github/workflows/test-django-mysql.yaml
test_django_again:
needs: test_django
uses: ./.github/workflows/test-django-mysql.yaml
test_django_again_again:
needs: test_django_again
uses: ./.github/workflows/test-django-mysql.yaml
• test_django
• test_django_again
• test_django_again_again
Each job will run the workflow I designate in the uses key. In this case, each job runs the same workflow
( ./.github/workflows/test-django-mysql.yaml ) which is fine because this example is here to illustrate
the other key: needs . Let's define how needs will work in this one:
• test_django does not have a needs key, will run when all.yaml runs.
• test_django_again needs test_django to succeed otherwise this job will not run. That's because
needs:test_django is prevalent in this job.
• test_django_again_again needs test_django and test_django_again to succeed otherwise
this job will not run. That's because needs:test_django_again is prevalent in this job AND because
needs:test_django is prevalent in the test_django_again job.
• DJANGO_SECRET_KEY
• MYSQL_USER
• MYSQL_PASSWORD
• MYSQL_ROOT_PASSWORD
• MYSQL_HOST
• LINODE_BUCKET_ACCESS_KEY
• LINODE_BUCKET_SECRET_KEY
• DOCKERHUB_USERNAME
• DOCKERHUB_TOKEN
• LINODE_OBJECT_STORAGE_DEVOPS_ACCESS_KEY
• LINODE_OBJECT_STORAGE_DEVOPS_SECRET_KEY
• ROOT_USER_PW
• SSH_DEVOPS_KEY_PUBLIC
• SSH_DEVOPS_KEY_PRIVATE
•
And the many others as outlined in Appendix A
If done correctly, using GitHub Actions to store secure data in GitHub Action Secrets is generally considered
to be a safe option. Just remember that using non-official third-party items in GitHub Actions can leave you
open to potential security threats. Security is a never-ending battle so be sure to continue to stay vigilant and
learn from as many sources as possible on how to better secure your secrets with GitHub Actions.
Aside from what is in this book, further security efforts are encouraged.
Once you have a key added, you can reference it in a GitHub Action using:
${{ secrets.ALL_CAPS_WITH_UNDERSCORES_FOR_SPACES }}
In ./.github/workflows/test-django-mysql.yaml change:
yaml
DJANGO_SECRET_KEY: test-key-not-good
to:
yaml
DJANGO_SECRET_KEY: ${{ secrets.DJANGO_SECRET_KEY }}
You can also change ${{ env.DJANGO_SECRET_KEY }} to ${{ secrets.DJANGO_SECRET_KEY }} but you
don't have to.
First off, secrets will not be passed to forked repos that submit pull requests. That means people who are not
collaborators can’t just run your workflows and get your secrets.
So this:
yaml
on:
workflow_call:
Becomes this:
yaml
on:
workflow_call:
secrets:
DJANGO_SECRET_KEY:
required: true
yaml
jobs:
test_django_with_secret:
uses: ./.github/workflows/test-django-mysql.yaml
secrets:
DJANGO_SECRET_KEY: ${{ secrets.DJANGO_SECRET_KEY }}
We'll repeat this process several times going forward. This is a super easy step to forget but, luckily for us, GitHub
Actions will let us know when we do.
In .github/workflows/all.yaml , change:
yaml
jobs:
test_django:
uses: ./.github/workflows/test-django-mysql.yaml
to:
yaml
test_django:
uses: ./.github/workflows/test-django-mysql.yaml
secrets:
DJANGO_SECRET_KEY: ${{ secrets.DJANGO_SECRET_KEY }}
act -j test_django
And many other commands like it. Learning Act is outside the scope of this book but it's important to know
about it to maintain as much portability as possible!
Chapter 3
Containerize Django with Docker
Let’s assume I know nothing about setting up Python and Python virtual environments, but I am decent with
using the command line.With this, I ask you to send me your Django project so I can test it out on my local
computer. What do you do?
• Install Docker (such as Docker Desktop) as it is free and works on macOS, Linux, and Windows (you might
need to use Windows Subsystem for Linux).
• Open the command line and run docker run my-dockerhub-reop/my-django-container
•
Install Python (Wait, which version? From where?)
•
Clone code with Git (Is Git installed? Do you know how to use Git?)
•
Create a Virtual Environment (What’s that?)
•
Activate Virtual Environment
•
Install Requirements (Wait, is it pipenv , poetry , or what?)
•
Run Django
As you can see, running Django from zero is a lot more complex than Docker .The above scenario is true for
almost every application written in just about any language.Running a Node.js/Express.js app? Same challenge.
Ruby/Ruby On Rails? Same thing. Java/Spring? Same.
Setting up environments to run your code is challenging enough without considering but even more so
in production since it must run every time you deploy a new version and all dependencies must be installed
otherwise a lot of catastrophic errors can/will occur. As you may know, not all versions of Python (or Node,
or Ruby, or Java) will even be readily available on your servers. Generally speaking, if Docker is on the machine,
your Docker apps can run.
Docker apps (or Docker containers) are isolated environments at the OS level instead of just the Python level like
Virtual Environments. This means that Docker can run nearly any version of Python (yes, probably even Python
2.1) right next to any other version of Python (like Python 3.11) on the exact same server while the server is none
the wiser.
Is Docker Easy?
At first, Docker seems incredibly hard because there’s so much that Docker can do. Then you start hearing about
Docker Compose, Docker Swarm, Kubernetes, Serverless, and other new terms. You hear about them because
Docker is fundamental to them. You unlock a world of opportunity once you start running your apps through
Docker.
I think of Docker as just another document with a list of logical instructions. Here’s a simple Dockerfile
(the Docker instructions document) that will work with your Django app in your local development environment:
dockerfile
FROM python:3.10.4-slim
COPY . /app
WORKDIR /app
RUN python3 -m venv /opt/venv
RUN /opt/venv/bin/pip install pip --upgrade && \
/opt/venv/bin/pip install -r requirements.txt
CMD /opt/venv/bin/gunicorn cfeblog.wsgi:application --bind “0.0.0.0:8000”
This example is a non-production Dockerfile , but it's not far off. What I did here is almost exactly what
I described before about how sharing a Django app without Docker works:
•
Install Python: FROM python:3.10.4-slim
•
Clone code: COPY . /app
•
Create a Virtual Environment: RUN python3 -m venv /opt/venv
•
Activate Virtual Environment: not needed because I just use /opt/venv/bin instead
•
Review this section if this part is unclear
•
Install Requirements: RUN /opt/venv/bin/pip install pip --upgrade && /opt/venv/bin/pip
install -r requirements.txt
•
Run Django: /opt/venv/bin/gunicorn cfeblog.wsgi:application --bind "0.0.0.0:8000"
• gunicorn is a production-ready way to run Django; python manage.py runserver is not.
Before I used Docker regularly, I would look at a Dockerfile like this and think it adds yet another layer
of complexity on top of a potentially already complex Django project. Now, I have concluded that it might feel
very complex initially, but in the long run, it makes our lives much less complex.Especially when you need
portability and repeatability.
There are many reasons for portability and Docker.Here are just a few good ones:
• Switch your hosting provider within hours instead of days, weeks, or months.
• Share with team members and/or other businesses without touching production or requiring specific
knowledge other than docker run .
• Testing production code on non-production machines/hosts.
• Horizontally scale your virtual machines to meet demand with ease.
• Move to a managed serverless architecture that runs on Docker containers & Kubernetes.
• Create a custom Kubernetes cluster to host all of your Django / Docker projects.
• Faster recovery when servers and/or data center running your code go down and cannot recover.
• No more using Dave's custom bash spaghetti scripts when Dave left the team 4 years ago.
• Your Platform as a Service (PaaS) leaks all credentials in their service and you need to migrate real quick.
Those are just a few examples of when you might want to use Docker. I'm sure there are more.
If you need more convincing (or if you're already convinced), let's write some actual code to see how simple
it is and why using Docker is the right call for production.
1. Go to Docker Desktop
2. Download for your operating system (macOS, Windows, Windows Subsystem for Linux (WSL), or Linux)
3. Complete the installation process for your machine
Think of Docker containers much like you think of smartphone apps -- you can easily delete them, install again,
delete again, install again, delete again, as often as you like. You can repeat this for nearly any app and for nearly
any number of times all while your data for the app/service remains intact (thanks to managed Databases and
Object Storage -- these are covered later in this book).
How this works with Django, FastAPI, Flask, Express.js, and Ruby on Rails, is essentially the same:
The programming language we use does not matter to Docker at all, as long as it runs without causing major
errors on the system. This is exactly like how your macOS or Windows systems can always run the exact same
apps. (Can you read the sarcasm?)
What about PostgreSQL, MySQL, Redis, MongoDB, and Cassandra -- should we use Docker for those? Even with
Docker containers being ephemeral, you can still run databases with Docker containers by attaching persistent
volumes (non-ephemeral data storage volumes) but we’re going to skip that layer of complexity in this book.
For our persistent data, we’ll use managed Linode services for databases (Chapter 5) and object storage
([Chapter 7).
If you cloned our project in the Getting Started Chapter, you will already have the following docker files:
• Dockerfile
• docker-compose.yaml
You should now remove these files. We are going to replace these files with the contents of this book. If, for some
reason, you’re having issues with the code within the book use this repo as your primary reference for the
absolute final code from this book.
dockerfile
FROM python:3.10.3-slim
libmemcached-dev \
default-libmysqlclient-dev \
libjpeg-dev \
zlib1g-dev \
build-essential \
python3-dev \
python3-setuptools \
gcc \
make && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
ENV PYTHON_VERSION=3.10
ENV DEBIAN_FRONTEND noninteractive
# Locale Setup
RUN locale-gen en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8`
RUN sed -i -e ‘s/# en_US.UTF-8 UTF-8/en_US.UTF-8 UTF-8/’ /etc/locale.gen \
&& locale-gen
RUN dpkg-reconfigure locales
This file is a bit more complex than our earlier example but that's only because we added a few os-level
configuration items to ensure the stability of this app and Python 3.10.
If this dockerfile is failing you, please review the code that we have hosted on our GitHub repo for this book
at Deploy Django on Linode with MySQL and not the repo for the code we cloned in this section (the book code
and cloned repo will differ slightly).
gunicorn (docs) is the defacto standard for running Python web applications in production. You can use
it with Django, FastAPI (with a uvicorn worker), Flask, and many others.
This command, as you likely know, is not suitable for production. Instead, we want to use gunicorn since it's
a production-ready Python Web Server Gateway Interface (WSGI). gunicorn makes it easy for Django ,
Flask , FastAPI , or any other Python web application to understand web requests. In other words, nearly
every Python web application can use gunicorn in production; gunicorn can even run with a single Python
file that has no web framework and still be able to handle HTTP requests!
To replace our development command with our production-ready command, it will be:
gunicorn mydjangopro.wsgi:application
Windows users: At the time of this writing, gunicorn does not work natively on Windows but will work
in Docker containers.
The mydjangopro.wsgi:application is going to be custom per project, per framework. Let’s take a look
at the wsgi.py file in our Django project:
python
“””
WSGI config for mydjangopro project.
import os
os.environ.setdefault(‘DJANGO_SETTINGS_MODULE’, ‘mydjangopro.settings’)
application = get_wsgi_application()
If you go into your Django project, next to settings.py you'll see both urls.py and wsgi.py .
Your wsgi.py will resemble the above code replacing mydjangopro with your project's name.
For some environments, we'll need to bind gunicorn to a specific port which we can do with:
The key to CMD is it’s centered around runtime and not build time. This is important because build time will
likely not have access to our Database and related services.
Now we have:
CMD [ “/app/config/entrypoint.sh” ]
You can select either method that works for you but I prefer using a custom bash script so I can provide
additional details as needed.
Here’s my entrypoint.sh :
config/entrypoint.sh
sh
#!/bin/bash
APP_PORT=${PORT:-8000}
cd /app/
/opt/venv/bin/gunicorn --worker-tmp-dir /dev/shm cfeblog.wsgi:application --bind
“0.0.0.0:
${APP_PORT}”
APP_PORT=${PORT:-8000} If PORT is sent in the Environment Variables APP_PORT will be that value.
If it's not set, APP_PORT will use the fallback value of 8000 ensuring we have a PORT for gunicorn to bind to.
But with if we wanted to make sure we ran python manage.py migrate every time I ran this container?
We cannot effectively run this command during a container build time -- even if it's possible. python
manage.py migrate relies on a connection to a database which likely relies on environment variables so
python manage.py migrate should fail if we are building this container correctly.
dockerfile
Both of these seem like great ideas, but there are two key problems:
We don't have a production database attached to this container yet (ie it's build time not runtime)
•
•
We don't have environment variables (secrets) attached to this container yet
• python manage.py migrate sometimes seems to work during build time because you never removed
the sqlite configuration in Django
We have to remember that the Dockerfile builds the container. CMD and config/entrypoint.sh are for
when we run the Docker container app.
config/entrypoint.sh
sh
#!/bin/bash
APP_PORT=${PORT:-8000}
cd /app/
/opt/venv/bin/python manage.py migrate
/opt/venv/bin/gunicorn --worker-tmp-dir /dev/shm cfeblog.wsgi:application --bind
“0.0.0.0:
${APP_PORT}”
• /opt/venv/bin/python manage.py migrate running this has essentially zero effect if migrations
are already done, but has a major effect if they are not done.
• python manage.py collectstatic will be saved for another process
• python manage.py createsuperuser could be ran in config/entrypoint.sh as well
5. .dockerignore
Docker containers need as little of your code as possible because the building process will
•
Install all dependencies
•
Become much larger if unnecessary files are added
Using a .dockerignore is the best way to ensure we do not accidentally add too many files to our docker
containers. It works exactly like a .gitignore file but is used when you do stuff like:
dockerfile
COPY . /app
Here is the minimum .dockerignore I recommend you use for this project:
staticfiles/
mediafiles/
.DS_Store
venv
.env
*.py[cod]
__pycache__/
You should also just add the GitHub Python .gitignore file contains to the bottom of this .dockerignore
docker build
• At its core, this command tells docker we're going to build a new Docker container image
• -t is the flag to "tag" this container. This is a name you get to pick and if you plan to use Docker Hub
(like we do later in this chapter), you'll have to include your username so it's
-t codingforentrepreneurs/myproject instead of just -t myproject
• -f Dockerfile this is telling Docker which Dockerfile to use to build this container. You can have
as many Dockerfiles as you'd like.
• . The final period ( . ) tells docker where to build this container. As you may recall, our Dockerfile
does COPY . /app . The COPY . /app will correspond directly to this final period.
docker run
• At its core, this command tells docker that we want to run an image
• -e is a flag for passing an environment variable to this run command. You can have many of these.
You can also use --env-file like: docker run --env-file .env ... . The syntax for -e
is KEY=VALUE so -e KEYA=value_a -e KEYB=value_b and so on.
• -p This allows your Docker container to run on a port that the rest of your system can access. You know
how when you run python manage.py runserver and you can access Django at port 8000 (by default)
in your browser? This is the same concept, but for Docker. Without -p , your browser (and the rest
of your computer) has no way to connect to what's running within the container. Remember how I said
Docker containers are isolated? This is one way to "expose" it to your computer (or external world)
• The cfe-django-blog in docker run ... cfe-django-blog is the tag for this container.
It is identical to what you used in the docker build -t call. What's more, you can run official Docker
images (like Nginx, MySQL, Python, and others) using this same run command. Basically docker run
nginx -- this will even download the image for you if you don't have it. Neat!
Before we go any further, let's take a look at the GitHub Action to just build our Docker container. It's as simple
as:
.github/workflows/container.yaml
yaml
on:
workflow_call:
workflow_dispatch:
jobs:
docker:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Build container image
run: |
docker build -f Dockerfile -t myproject .
The goal of seeing this workflow is to show you how simple it's going to be to build our container in GitHub
Actions. It's also important to know that this workflow is not passing any environment variables to the build
command. The only time our Docker Container Image should need environment variables is during runtime.
And yes, in a few sections we'll change docker build -f Dockerfile -t myproject . to include a tag
of a container we need to use.
So instead of doing:
docker compose up
During development, I rarely have my Django application running through Docker. You can run it through docker
but I really don't want to rebuild my Django container every time I make a change.
In addition to running Docker containers when we need, Docker Compose can also:
yaml
version: “3.9”
services:
app:
image: cfe-django-blog
build:
context: .
dockerfile: Dockerfile
restart: unless-stopped
env_file: .env
ports:
- “8000:8000”
If docker works on your machine but the command docker compose doesn't, then you'll need to install the
pip package docker-compose with python3 -m pip install docker-compose . After you do that, you can
run docker-compose instead of docker compose . We'll see this again in the Ansible chapter when we use
Docker compose in production.
This will:
•
Build our app based on app:build:context and app:build:dockerfile:
•
Run our app with local port 8000 mapped to the container port 8000
•
Automatically restart our app unless we manually stop it ( restart: unless-stopped )
•
Use our local .env file to run this container
Now, you try to remember how to run these two lines or just opt for Docker compose. You probably already know
what I recommend.
yaml
version: “3.9”
services:
app:
image: cfe-django-blog
build:
context: .
dockerfile: Dockerfile
restart: unless-stopped
env_file: .env
ports:
- “8000:8000”
profiles:
- app_too
depends_on:
- mysql_db
mysql_db:
image: mysql:8.0.26
restart: always
env_file: .env
ports:
- “3307:3307”
expose:
- 3307
volumes:
- mysql-volume:/var/lib/mysql
profiles:
- main
- app_too
volumes:
mysql-volume:
Did it fail? If you see Bind for 0.0.0.0:3307 failed: port is already allocated ; this means that the
port 3007 is in use elsewhere. You can either stop that service or choose a new port. If you choose a new port
be sure to update it in .env and wherever else it's declared.
We do not need to build the image ( image: mysql:8.0.26 ) for the mysql_db service because it's already
built and stored on Docker Hub (as mentioned above).
•
If you change service names or add/remove services, you might need to run docker compose --profile
main down --remove-orphans
•
To remove all volumes and delete all database data, run docker compose --profile main down -v
Let's say we wanted to add Redis to our local environment, here's what you need to update your
docker-compose.yaml to:
yaml
version: “3.9”
services:
app:
image: cfe-django-blog
build:
context: .
dockerfile: Dockerfile
restart: unless-stopped
env_file: .env
ports:
- “8000:8000”
profiles:
- app_too
depends_on:
- mysql_db
mysql_db:
image: mysql:8.0.26
restart: always
env_file: .env
ports:
- “3307:3307”
expose:
- 3307
volumes:
- mysql-volume:/var/lib/mysql
profiles:
- main
- app_too
redis_db:
image: redis
restart: always
expose:
- 6388
ports:
- “6388:6388”
volumes:
- redis_data:/data
entrypoint: redis-server --appendonly yes --port 6388
profiles:
- main
- app_too
volumes:
mysql-volume:
redis-volume:
In the Ansible Chapter, we will use a version of Docker compose. Here's a preview of what it will be:
docker-compose.prod.yaml
yaml
version: “3.9”
services:
watchtower:
image: index.docker.io/containrrr/watchtower:latest
restart: always
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /root/.docker/config.json:/config.json
command: --interval 30
profiles:
- app
app:
image: index.docker.io/codingforentrepreneurs/cfe-django-blog.com:latest
restart: always
env_file: ./.env
container_name: prod_django_app
environment:
- PORT=8080
ports:
- “80:8080”
expose:
- 80
volumes:
- ./certs:/app/certs
profiles:
- app
redis:
image: redis
restart: always
ports:
- “6379:6379”
expose:
- 6379
volumes:
- redis_data:/data
entrypoint: redis-server --appendonly yes
profiles:
- redis
volumes:
redis_data:
I'll explain in more detail how this works later but for now, a high-level overview is:
• watchtower : this small container will run next to our App container. It will enable graceful updates
to our running App container. This means that when we push our container image into Docker Hub,
watchtower will automatically fetch and restart it for us! How it's configured above also works with
private repositories! (Assuming the host machine is authenticated with Docker Hub)
• services:app:expose -- this app is now going to run on port 80. This is intentional so our IP address
is what connects directly to the services running within our Docker container.
• services:redis -- this is a quick and easy way to get a Redis instance running in production.
A managed instance is preferred but that often comes at a higher cost. Heavy workloads should not use
this method in my opinion.
•
GitHub is for code
•
Docker Hub is for containers
They each have their own kind of version control: GitHub with Git and Docker Hub with container tags.
Containers are often huge (5-20 GB), whereas code is often small (under 10 MB; if not under 2 MB).
Containers are not stored as one big file but rather broken up into smaller chunks called layers. Docker Hub
is a registry designed to manage these containers and their respective layers. Docker has open-sourced the
Registry process as you can read about here. This means you can deploy your own Docker Hub if you want
(you'd say that you're deploying your own Docker Registry).
The nice thing about both GitHub and Docker Hub is that they provide:
•
Public repos
•
Private repos
•
Easily discovered repos
•
Ways to automatically build containers
•
Ways to share containers with hosting platforms
To that end, let's push our built code onto Docker Hub.
Before we create a repo, I want to remind you that we use GitHub Actions to build our containers. Once we have
a Docker Hub account and an API Token, we will never have to manually create a repo again; GitHub Actions will
create the repo for us except it defaults to a Public repo. You'll just need to log in to Docker Hub and change it to
be private.
Docker Hub allows for 1 Private Repository for every free account and Unlimited Public Repositories. Given the
sheer size of Docker containers, the limitations of the free account are justified. I gladly pay $60/year for Docker
Pro just for unlimited private repositories.
In Build Settings, you might be tempted to connect to GitHub but _don't do it_. GitHub Actions will do all of the
container building we need. It seems that free accounts might not have this ability any longer.
There are several advantages to using GitHub Actions (as opposed to Docker Hub) to automate our container
building including:
• Flexibility to change our Docker Registry (for example, choosing not to use Docker Hub)
• Build and or Push errors are in one place (GitHub Actions)
• Secrets are all in one place (GitHub Actions)
• Speed; I've found that GitHub Actions is dramatically faster than Docker Hub when building containers
(especially on the Docker Hub free account)
• Ease; Building containers is easy no matter where you do it; automating the build with GitHub Actions was
super easy remember?
• Building vs CI/CD; A free account on Docker Hub just builds containers, it doesn't perform CI/CD
• Automated Notifications; Do you want to send a slack message when your container is built? How about
an SMS? A Tweet? Within a GitHub Workflow, you can do that along with some more code you'd have
to write. Docker Hub has webhooks but who's making an ingestion server just to relay build notifications.
I want to make it clear that if you use Gitlab CI/CD instead of GitHub Actions, everything above is still likely to be
true.
KEY: DOCKERHUB_USERNAME
Value: Your username
KEY: DOCKERHUB_TOKEN
Value: e90485e4-9477-4b55-8b9f-3c54a65a522a (your just created Access Token)
KEY: DOCKERHUB_APP_NAME
Value: add your own unique name or cfe-django-blog
yaml
on:
workflow_call:
secrets:
DOCKERHUB_USERNAME:
required: true
DOCKERHUB_TOKEN:
required: true
DOCKERHUB_APP_NAME:
required: true
workflow_dispatch:
jobs:
docker:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Build container image
run: |
docker build -f Dockerfile \
-t ${{ secrets.DOCKERHUB_USERNAME }}/${{ secrets.DOCKERHUB_APP_NAME }}:latest \
-t ${{ secrets.DOCKERHUB_USERNAME }}/${{ secrets.DOCKERHUB_APP_NAME }}:
${GITHUB_SHA::7}-${GITHUB_RUN_ID::5} \
.
- name: Push image
run: |
docker push ${{ secrets.DOCKERHUB_USERNAME }}/${{ secrets.DOCKERHUB_APP_NAME }}
--all-tags
For every build, I add the :latest tag so I know for sure that's the most recent build. This line will be
something like -t codingforentrepreneurs/cfe-django-blog:latest .
As a command-line command:
or in a Dockerfile:
docker
FROM codingforentrepreneurs/cfe-django-blog:latest
Since this repository is private, the only way the above two commands work is if we log in to Docker on the
machine attempting to use this Container image. We'll revisit this again when we implement Ansible.
You can have multiple tags per Docker container image. It's nice because then it allows us to have the latest
version like what we did with :latest as well as a historical record of versions especially if we need to roll
back our deployments.
Here's a Docker tag approach for tagging I like using on GitHub Actions:
GITHUB_SHA and GITHUB_RUN_ID are both examples of GitHub Actions Environment Variables there are
many.
GITHUB_SHA is the specific git commit cut down to the first 7 characters ::7 . GITHUB_RUN_ID is the
GitHub workflow run id cut down to the first 5 characters ::5 .
yaml
on:
workflow_dispatch:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
test_django:
uses: ./.github/workflows/test-django-mysql.yaml
build_container:
needs: test_django
uses: ./.github/workflows/container.yaml
secrets:
DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}
DOCKERHUB_TOKEN: ${{ secrets.DOCKERHUB_TOKEN }}
DOCKERHUB_APP_NAME: ${{ secrets.DOCKERHUB_APP_NAME }}
Chapter 4
Linode Managed Databases: Using
MySQL for Django
The goal of this chapter is to set up a managed database cluster on Linode, prepare it for your project,
and integrate it with Django.
We’re going to be doing this exact same process for both development and production. We do this to ensure
our development environment matches our production environment as much as possible.
I use docker compose to run my development database (like we did in the previous chapter) with the same
database version as my production environment. I’ll leave it up to you to choose.
Navigate to Databases
0.0.0.0/0
IMPORTANT: this value will allow all connections from anywhere. The database we’re using in this step is only
for development.
It’s also important to note that this step can take up to 20 minutes to complete.
For production, I recommend using at least a 2 node cluster using at minimum the shared Linode 2 GB
instance. More on this in the next chapter.
1. Go to Databases
2. Select your database cfe-db-dev
3. Under Connection Details , click Download CA Certificate
4. At the root of your project, create a directory called certs
5. Update .gitignore with certs/ : echo ‘certs/’ >> .gitignore
6. Move your download certificate to certs/ and rename it to db.crt like:
mv ~/downloads/django-db-dev-ca-certificate.crt ~/dev/django-project/certs/db.crt
Once we create a production cluster, the db.crt contents will be stored within GitHub Action Secrets. (More on
this in the next chapter). Renaming our CA Certificate makes switching database clusters convenient.
A MySQL client is typically much smaller in size than a MySQL server installation since all we're doing is
connecting to an already running MySQL server.
Using homebrew
• MySQL Shell (can use JavaScript, Python, or SQL commands): brew install mysql-shell
or via the docs
• MySQL Server Community Edition: brew install mysql or via the docs
Using chocolatey
•
MySQL Shell via the docs
•
MySQL Server Community Edition: choco install mysql or via the docs
There are many different Linux distributions so I recommend you check the official MySQL docs for this. If you’re
using Ubuntu (like we did with our Python-based Dockerfile), you can:
Alternatives:
• MySQL Client (docker): sudo apt-update && sudo apt-get install -y
default-libmysqlclient-dev
• MySQL Shell sudo apt-update && sudo apt-get install mysql-shell or via the docs
• sudo apt-update && sudo apt-get -y install mysql-server
It’s important that you do not use the root user on your database for security reasons.
•
host: lin-cfe-mysql-primary.servers.linodedb.net
•
user: linroot
•
password: not-a-real-one
Enter password:
Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.
mysql>
For brevity in the book, I will just show the following for each MYSQL command:
mysql
mysql>
Now we’re going to create our database and database user for django. I’ll use the following settings:
•
name: cfeblog_dev_db
•
user: cfeblog_dev_db_user
•
user password: J-10TDQ8hky-vdKDcksKgsDfrTgjOyOkbeKUfGwgX8w (generated with
python -c “import secrets;print(secrets.token_urlsafe(32))” )
•
timezone: +00:00 -- This is also the UTC timezone and the default timezone in Django.
mysql
mysql> CREATE DATABASE IF NOT EXISTS cfeblog_dev_db CHARACTER SET utf8mb4 COLLATE
utf8mb4_0900_ai_ci;
mysql> CREATE DATABASE IF NOT EXISTS cfeblog_dev_db_user IDENTIFIED BY
‘J-10TDQ8hky-vdKDcksKgsDfrTgjOyOkbeKUfGwgX8w’;
mysql> SET GLOBAL time_zone = “+00:00”;
mysql> GRANT ALL PRIVILEGES ON cfeblog_dev_db.* TO cfeblog_dev_db_user;
At this point, if everything above was successful, I would normally update my .env to match these values.
In this guide, I will do that at a later step to provide additional context.
Enter password:
Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.
mysql>
SQL
We should see:
SQL
SHOW TABLES;
At this point, we have no tables in this database and we should receive no errors with the last two commands.
You could experiment with my SQL now and create tables and insert data. We’re not going to do that. Instead,
we’ll opt to let Django manage creating our tables (via python manage.py migrate ) and insert our data when
we save model instances.
sql
Notice that I used test_cfeblog\_% , why is this? This syntax will match any database that starts with test_
cfeblog . Django, by default, will prepend test_ to our database name when we run python manage.py
test so test_cfeblog will solve that.
We could use test_\_% instead of test_cfeblog\_% but then that would give this user permission to
other Django projects we might use with this test database, and I don’t want that.
sql
In general, granting unlimited permissions is something we want to avoid, even at the development level.
sql
If you revoke all permissions, you will have to re-grant permissions for our primary database ( cfeblog_dev_db
from the previous step):
sql
sql
sql
You can store this data as a .sql file. To do so, use the folder db-init/ and the file init.sql and execute
it with mysql at any time with:
[client]
user=linroot
password=your-database-password
host=lin-cfe-mysql-primary.servers.linodedb.net
port=3306
Be sure to add db-init/ to .gitignore if you decide to use this method. We’ll be building off this idea in
the next chapter.
DATABASE_BACKEND=mysql
MYSQL_DATABASE=cfeblog_dev_db
MYSQL_USER=cfeblog_dev_db_user
MYSQL_PASSWORD=J-10TDQ8hky-vdKDcksKgsDfrTgjOyOkbeKUfGwgX8w
MYSQL_ROOT_PASSWORD=J-10TDQ8hky-vdKDcksKgsDfrTgjOyOkbeKUfGwgX8w
MYSQL_TCP_PORT=3306
MYSQL_HOST=lin-cfe-mysql-primary.servers.linodedb.net
MYSQL_DB_REQUIRE_SSL=true
.env is loaded into our Django project using a third-party package called python-dotenv. It's a simple way to
manage your environment variables on your local machine through Django.
These values correspond to the following Django database settings (found in settings.py ). We have our
mysql database settings for MySQL in cfeblog/settings/db/mysql.py based on what we did in Getting
Started and after we cloned the cfe-django-blog repo.
python
import os
BASE_DIR = settings.BASE_DIR
DEBUG= settings.DEBUG
MYSQL_USER = os.environ.get(“MYSQL_USER”)
MYSQL_PASSWORD = os.environ.get(
“MYSQL_ROOT_PASSWORD”
) # using the ROOT User Password for Local Tests
MYSQL_DATABASE = os.environ.get(“MYSQL_DATABASE”)
MYSQL_HOST = os.environ.get(“MYSQL_HOST”)
MYSQL_TCP_PORT = os.environ.get(“MYSQL_TCP_PORT”)
MYSQL_DB_IS_AVAIL = all(
[MYSQL_USER, MYSQL_PASSWORD, MYSQL_DATABASE, MYSQL_HOST, MYSQL_TCP_PORT]
)
MYSQL_DB_CERT = BASE_DIR / “certs” / “db.crt”
if MYSQL_DB_IS_AVAIL:
DATABASES = {
“default”: {
“ENGINE”: “django.db.backends.mysql”,
“NAME”: MYSQL_DATABASE,
“USER”: MYSQL_USER,
“PASSWORD”: MYSQL_PASSWORD,
“HOST”: MYSQL_HOST,
“PORT”: MYSQL_TCP_PORT,
}
}
if MYSQL_DB_CERT.exists() and not DEBUG:
DATABASES[“default”][“OPTIONS”] = {“ssl”: {“ca”: f”{MYSQL_DB_CERT}”}}
python
BASE_DIR = settings.BASE_DIR
DEBUG= settings.DEBUG
7. Django Commands
In our cloned Django project repo, we have backup data under the directory fixtures/ . (In Django, this
backup data is called fixtures.) In this portion, we're going to load these fixtures (backup data) into our database
cluster:
This will create all the tables needed in our database ( cfeblog_dev_db ) assuming we granted the correct
permissions in the previous steps.
Load Fixtures
Run Tests
Was it successful?
•
username: admin
•
password: thisisdjango
These credentials come from the fixtures that we loaded a few steps ago. If they fail, just create a new user with:
8. Fresh Start
Every once and a while you need to destroy your database and start again. When I am doing tests, I do this
constantly.
Create a Backup:
First and foremost, Linode automatically backs up your database clusters for you which, of course, is one of the
many benefits of using a managed service.
Every once in a while you might want a one-off backup of your own -- even if it’s just for learning purposes.
First, let’s create a backups folder that we will omit from git:
mkdir -p backups
echo “\nbackups/” >> .gitignore
Option 1: Inline
mkdir -p backups
This should prompt you for the password that we set above. After this command completes, you will have a new
mybackupdb.sql file containing your Django database backup.
Either of these options will only backup the database this user has access to and/or the one we specified:
cfeblog_dev_db .
To use this backup (reload or restore), you can review the GitHub Action for migrating a database in the next
chapter.
We should log in to the root user to drop our database (depending on the permissions you gave the non-root
user above):
SQL
That's it. The database is gone. Django can no longer log in and store data.
2. Review MySQL
Review Databases
SQL
SQL
Do you see:
SQL
Do you see:
SQL
We only deleted the database, we did not delete the permission to manage the database. Pretty neat right?
3. Re-create Database
SQL
mysql> CREATE DATABASE IF NOT EXISTS cfeblog_dev_db CHARACTER SET utf8mb4 COLLATE
utf8mb4_0900_ai_ci;
Query OK, 1 row affected (0.04 sec)
We should not have to change the timezone since we did a global change in previous sections. Just verify the
UTC time with:
sql
sql
mysql> exit;
Bye
Chapter 5
Managed MySQL & Django
in Production
In the last chapter, we set up a managed MySQL database cluster to work with our Local Development
environment. In this chapter, we’ll set it up to work in our production environment.
1.
MySQL can be complicated, but configuring a database with one user is not
2.
We did a lot of heavy lifting for MySQL & our Django project in the last chapter
3.
GitHub Actions helps ensure what we do in this chapter works well every time
Now that we’re preparing our project for production, we’ll use Linode’s recommended 3 Node cluster.
1.
Log in to Linode
2.
Create MySQL Cluster:
0.0.0.0/0
To configure this database cluster using the GitHub Action below, we'll use 0.0.0.0/0 and after we complete
the MySQL configuration you should remove 0.0.0.0/0 as it allows any IP Address to attempt to
connect to this cluster. And yes, you need to update the Access Control list to include the IP Addresses
of your Instances.
We'll delete this file shortly because we do not need it to remain on our local machine. If you decide that you
need it again, you can always re-download it.
Just remember that if you do decide to keep this file locally, never commit it with Git.
Key: MYSQL_DB_CERT
Value: Enter something that resembles:
-----BEGIN CERTIFICATE-----
MIIDQzCCAiugAwIBAgIJALBU6zR0XJzzMA0GCSqGSIb3DQEBCwUAMDcxEzARBgNV
BAMMCmxpbm9kZS5jb20xEzARBgNVBAoMA0GCSqGSIb3DQEBCwUAMDcxEzARBgNVD
MCAXDTIyMA0GCSqGSIb3DQEBCwUAMDcxEzARBgNVDEQ1WjA3MRMwEQYDVQQDDAps
aW5vZGUuY29tMRMwEQYDVMA0GCSqGSIb3DQEBCwUAMDcxEzARBgNVEDSDzCCASIw
DQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBALjvCV+UqcXFQqpCr5vRBqMksXbc
AayBjgiS+CLQUVPLzAkn8O5vTHfZmeE3Y7iO7wRrGVS22DUB1/Y66jJkMaX6dxmE
B7zp/BCTKEtmd3+J2TS645N23htI6s+3zYg4m8Z6SuRZyWZUlfoOTlD4/9puNeVZ
Ccq2p3+ZdLanMA0GCSqGSIb3DQEBCwUAMDcxEzARBgNVDASDFkMA0GCSqGSIb3DQ
FMrSAdVEixh/rxOUdumHJ8aYITZBMAwGA1UdEwQFMAMBAf8wDQYJKoZIhvcNAQEL
BQADggEBABWfqZ+teQdjnf99mXyHMxGJf5kvPwyp29dxArOL1HUxpqPjY96OdIRg
f+lGO+BkwCMA0GCSqGSIb3DQEBCwUAMDcxEzARBgNVDEmoiFemheb0eSZp19jp5U
hEKahLBJMA0GCSqGSIb3DQEBCwUAMDcxEzARBgNV323MA0GCSqGSIb3DQEBCwUAM
Wh2mgycTtvSxrMiSFC8JDey+Hu7pzH0RDiU1X5/0Wx3lXFaUEpLhaA1UO8/QzU3a
uPb8M+FHkaSzXUNt9b7hiaehlg0XAzBcSbynLfGIkNdMsRGWqUCsWS2wfU17sYim
J1PR3bF5QMfxiDfOSiGVOimPLVN074c=
-----END CERTIFICATE-----
First off, we need to add all of the other credentials to GitHub Actions for our automated database configuration
to work.
Key: MYSQL_DB_ROOT_USER
Value: linroot
You’ll find the root user in the Linode console.
Key: MYSQL_DB_ROOT_PASSWORD
Value some-val-in-console
You’ll find the root user password in the Linode console.
Key: MYSQL_DB_HOST
Value (from linode console)
This will resemble something like: lin-3032-3038-mysql-primary-private.servers.linodedb.net
Key: MYSQL_DB_PORT
Value 3306 (but verify in linode console)
Port 3306 is the standard MySQL port.
Key: MYSQL_DATABASE
Value cfe_django_blog_db (you pick a valid SQL database name)
The easiest way to write valid SQL database name is to use basic Latin letters, digits 0-9, dollar, underscore
( [0-9,a-z,A-Z$_] ). To read more about this topic, visit here.
Key: MYSQL_USER
Value cfe_django_blog_user you pick a valid SQL user name
A valid SQL user name consists of alphanumeric characters and underscores. You can read more about it here.
Key: MYSQL_PASSWORD
Value create one
Create a strong password. When in doubt, use the SQL functions here to validate your password within MySQL.
Here’s how I create a strong password:
macOS/Linux
Windows
Assuming your Python3.10 is stored under C:\ , you can run:
This will yield a new secret over and over. Something like:
•
s8W2w_NLg0K_46BWr4gXlBr9NvR52NbFBU8uT_MSwaA
•
3Fnju743wDCbOgsp2FQD-57Fobknnlz9bdrKptsKBwk
•
sDu9Nd8be14hlKQ_f7l-BXTJYmB4S1iSunIiJhjSUEU
Now we'll implement an automated version of this section from the last chapter. We're essentially going to run
a .sql script to create our Django database.
We should only have to run this one time, so we do not need to integrate it into all.yaml . This is also why
it only responds to a workflow_dispatch .
Also notice that we have zero code being checked out here; that's because this is an easily portable workflow
that you can use again and again.
And now the workflow, let's create the workflow to put the previous keys into action.
.github/workflows/mysql-init.yaml
yaml
on:
workflow_dispatch:
jobs:
mysql_client:
runs-on: ubuntu-latest
steps:
- name: Update packages
run: sudo apt-get update
- name: Install mysql client
run: sudo apt-get install mysql-client -y
- name: Mysql Version
run: echo $(mysql --version)
- name: Config Create
run: |
cat << EOF > db-config
[client]
user=${{ secrets.MYSQL_DB_ROOT_USER }}
password=${{ secrets.MYSQL_DB_ROOT_PASSWORD }}
host=${{ secrets.MYSQL_DB_HOST }}
port=${{ secrets.MYSQL_DB_PORT }}
EOF
- name: Add SSL Cert
run: |
cat << EOF > db.crt
${{ secrets.MYSQL_DB_CERT }}
EOF
As mentioned in the last chapter, the --defaults-extra-file must be the first argument.
After you create this file locally, perform the following steps
Future Runs
Solving Errors
If everything is configured correctly, the above workflow should work every time. That said, errors can almost
certainly still occur. Here are a few ideas on how to solve errors with the mysql_init.yaml workflow:
• If you're getting a connection error, check that you're using the correct MySQL cluster host and
not the private network host from the Linode console.
• If you see Unable to get private key in the errors, be sure to check you're using the flag
--ssl-ca and not --ssl-cert with your mysql commands.
• Manually configure the database and database user for Django with what we did in the previous
chapter.
• Use nektos/act to test workflow locally.
• Use the MySQL Docker container's bash (command line) to perform everything manually.
If you continue to have errors, please submit an issue to the official final project GitHub repo at:
https://github.com/codingforentrepreneurs/deploy-django-linode-mysql
Databases (and a database dump), can be pretty large. Keep in mind that GitHub Actions is currently restricted
to 14GB of SSD disk space, read more about it here. If you want to use more than 14GB, consider using
a self-hosted runner which we discuss in Appendix D.
yaml
on:
workflow_dispatch:
jobs:
mysql_client:
runs-on: ubuntu-latest
steps:
- name: Update packages
run: sudo apt-get update
- name: Install mysql client
run: sudo apt-get install mysql-client -y
- name: Mysql Version
run: echo $(mysql --version)
- name: Current Host Config
run: |
cat << EOF > db-config-current
[client]
user=${{ secrets.MYSQL_DB_ROOT_USER }}
password=${{ secrets.MYSQL_DB_ROOT_PASSWORD }}
host=${{ secrets.MYSQL_DB_HOST }}
port=${{ secrets.MYSQL_DB_PORT }}
EOF
- name: New Host Config
run: |
cat << EOF > db-config-new
[client]
user=${{ secrets.MYSQL_DB_ROOT_USER }}
password=${{ secrets.MYSQL_DB_ROOT_PASSWORD }}
host=${{ secrets.MYSQL_DB_HOST }}
port=${{ secrets.MYSQL_DB_PORT }}
EOF
- name: Current Host SSL Cert
run: |
cat << EOF > db-current.crt
${{ secrets.MYSQL_DB_CERT }}
EOF
You can use this workflow as a reference whenever you need to upgrade your database cluster or migrate it to
a Linode Managed Database.
In some migration cases, you will not need a certificate to perform this migration.
You absolutely can use Terraform to provision your MySQL database, but then we have another challenge:
updating our GitHub Action secrets with the contents of our database certificate. While this challenge is pretty
easy to solve with the GitHub CLI and Appendix E, it is outside the scope of this book at this time due to
unnecessarily added complexity.
Chapter 6
Linode Object Storage for Django
In this section, we’re going to implement Linode Object Storage as our Static File server for any Django project.
Related Resources
•
How to manage static files (Django Docs)
•
How to deploy static files (Django Docs)
•
Django Storages Docs
Django uses an MVT Architecture, which consists of simple Python data models (how Django syncs with
a Database), views (how Django responds to URL requests), and templates (how Django renders out the
response to a URL request).
Any file that is not required for Django’s MVT to work but is necessary for the Web App to work, we’ll consider
a static file. These files include:
•
Images
•
JavaScript
•
Cascading Style Sheets (CSS)
•
CSV files (for download)
•
PDF files (for download)
•
Videos
Make no mistake, Django can produce these files with the right package but serving these files is something
Django should not do. As the docs state:
[Using Django static views] is grossly inefficient and probably insecure, so it is unsuitable for production.
Instead of using Django, you’ll want to use a dedicated server and/or an external service for serving these types
of files. In this section, we’ll explore setting up Linode Object Storage to serve these files on our behalf.
Object Storage is an S3-compatible storage solution.
Managing a dedicated server for our static files is possible and can be very interesting to implement, but it adds
unnecessary complications and costs since storage solutions like Linode Object Storage exist.
•
Development
•
Production
During development, Django can serve the static files using Django static views.
cfeproj/urls.py
python
urlpatterns = [
# ... the rest of your URLconf goes here ...
]
if settings.DEBUG:
urlpatterns += static(settings.STATIC_URL, document_root=settings.STATIC_ROOT)
urlpatterns += static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)
As the Django docs state, this method is grossly inefficient and probably insecure, so it is unsuitable for
production. The only time I use this method in development is when my static files change state constantly
(i.e I'm using a React.js fronted with my Django backend). I typically opt for the production method (next section)
even while in development. In some cases, I have a production bucket on Linode Object Storage as well as
a development bucket to help isolate the two environments.
Install django-storages
• boto3 (docs): This allows any Python project to connect Linode Object Storage.
• django-storages (docs): This configures Django specifically to use Object Storage and boto3 .
•
If this command is unfamiliar/new to you, please review 1 - Getting Started
•
Unlike many Django packages, you should not need to add django-storages or boto3
to INSTALLED_APPS (in settings.py ).
DJANGO_STORAGE_SERVICE=linode
LINODE_BUCKET=
LINODE_BUCKET_REGION=
LINODE_BUCKET_ACCESS_KEY=
LINODE_BUCKET_SECRET_KEY=
• DJANGO_STORAGE_SERVICE is a simple flag I use so I can change Object Storage service(s) in any project;
it gives me the flexibility to choose.
• LINODE_BUCKET is the name of the Linode Object Storage bucket we'll create later in this chapter.
• LINODE_BUCKET_REGION is the region of our Linode Object Storage bucket (the physical location of the
servers that hold our static files).
• LINODE_BUCKET_ACCESS_KEY is a public API key for accessing our bucket (we'll create this after
we create our bucket).
• LINODE_BUCKET_SECRET_KEY is a private API key for accessing our bucket (we'll create this after
we create our bucket).
• Security: we don't want to expose these values to the world -- just to our current server/runtime.
• Flexibility: we want to be able to turn these on/off at any time as well as change these values at anytime.
• Reusability: creating Python code that leverages environment variables over hard-coded values
improves reusability.
At the end of this project, we'll have 2 separate buckets in Linode Object Storage. In this section, we'll set up
a bucket specifically for our Django project. In the next chapter, we will set up yet another bucket for Terraform.
1. Login to Linode
2. Navigate to Object Storage
3. Click Create Bucket
a. Add a label, such as: lets-deploy-django (change as this value must be unique)
b. Add a region, such as Atlanta, GA -- ideally the location is nearest to you, or your end-users
c. Click Create Bucket
4. Create an API Key
Now that we have lets-deploy-django in the us-southeast-1 region ( Atlanta, GA ), let’s create an API
Key.
1. Go to Object Storage
2. Select the tab for Access Keys
3. Click Create Access Key
4. Add a label, such as My Production Key for Django Deployment
5. Select limited access (To be more secure, always limit your access keys if you can)
6. Locate your bucket and check Read/Write access
7. Click Create Access Key under the list of permissions
8. After a moment, copy the two keys available Access Key and Secret Key . It will look something like:
1.
Update your Development .env file and add the following
DJANGO_STORAGE_SERVICE=linode
LINODE_BUCKET=Bucket from Linode
LINODE_BUCKET_REGION=Region from Linode
LINODE_BUCKET_ACCESS_KEY=Access Key From Linode
LINODE_BUCKET_SECRET_KEY=Secret Key from Linode
DJANGO_STORAGE_SERVICE=linode
LINODE_BUCKET=lets-deploy-django
LINODE_BUCKET_REGION=us-southeast-1
LINODE_BUCKET_ACCESS_KEY=Q58FBTEEOUSNJRWWBJWS
LINODE_BUCKET_SECRET_KEY=5bXTyL0D75dMNqewYHjwp19MTX6AbaohZdniPYte
2.
Navigate to your GitHub repo
3.
Navigate to Action Secrets
( https://github.com/your-username/your-repo/settings/secrets/actions )
4.
Add the keys from above one by one
cfeproj/settings.py
python
import os
DJANGO_STORAGE_SERVICE = os.environ.get(“DJANGO_STORAGE_SERVICE”)
LINODE_BUCKET = os.environ.get(“LINODE_BUCKET”)
LINODE_BUCKET_REGION = os.environ.get(“LINODE_BUCKET_REGION”)
LINODE_BUCKET_ACCESS_KEY = os.environ.get(“LINODE_BUCKET_ACCESS_KEY”)
LINODE_BUCKET_SECRET_KEY = os.environ.get(“LINODE_BUCKET_SECRET_KEY”)
LINODE_OBJECT_STORAGE_READY = all([LINODE_BUCKET, LINODE_BUCKET_REGION,
LINODE_BUCKET_ACCESS_KEY, LINODE_BUCKET_SECRET_KEY])
• You see AWS_ (like AWS_S3_ENDPOINT_URL and AWS_STORAGE_BUCKET_NAME ) because Linode Object
Storage is S3 compatible. boto3 (and boto ) is maintained by Amazon Web Services (AWS) and since
django-storages leverages boto3 (in this case), we must use the boto3 configuration.
• AWS_DEFAULT_ACL=”public-read” means that all uploaded files (both STATIC_ROOT
and MEDIA_ROOT ) will be public. If you want to see ACL options check the How I do it section below
• storages.backends.s3boto3.S3Boto3Storage comes directly from django-storages
mkdir -p cfeproj/storages/services/linode/
echo “” > cfeproj/storages/__init__.py
echo “” > cfeproj/storages/services/__init__.py
• DEFAULT_FILE_STORAGE = "storages.backends.s3boto3.S3Boto3Storage"
• STATICFILES_STORAGE = "storages.backends.s3boto3.S3Boto3Storage"
Both of these settings will upload files to the root of our Object Storage bucket. This is not ideal for a few reasons:
•
You may override files on accident
•
You may confuse user-generated content with developer-generated content
•
Archiving or moving content is much more complex
•
Implementing a proper Content Delivery Network (CDN) can be more complex
•
All Django static files in 1 Location ( static/ )
•
All default media files (uploaded via a FileField/Image filed) in 1 Location ( media/ )
•
One-off media file storage (such as private/ or reports/ etc)
cfeproj/storages/backends.py
python
class PublicS3Boto3Storage(S3Boto3Storage):
location = ‘static’
class MediaS3BotoStorage(S3Boto3Storage):
location = ‘media’
class PrivateS3BotoStorage(S3Boto3Storage):
location = ‘private’
• static/ → https://{LINODE_BUCKET_REGION}.linodeobjects.com/static/
→ because of PublicS3Boto3Storage
• media/ → https://{LINODE_BUCKET_REGION}.linodeobjects.com/media/
→ because of MediaS3BotoStorage
• private/ → https://{LINODE_BUCKET_REGION}.linodeobjects.com/private/
→ because of PrivateS3BotoStorage
Before we change the Django project defaults, we can use each of these storage options in a FileField/ImageField
like the following example:
python
# exampleapp/models.py
from django.conf import settings
from cfeproject.storages.backends import PrivateS3BotoStorage
User = settings.AUTH_USER_MODEL
class ExampleModel(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE)
private = models.FileField(upload_to=handle_private_file,
storage=PrivateS3BotoStorage())
public_image = models.ImageField(storage=PublicS3Boto3Storage())
media_image = models.ImageField(storage=PrivateS3BotoStorage())
• AWS_DEFAULT_ACL=”public-read”
This means that _all_ of our storage backends are public-read (not-private) which means we have:
• PublicS3Boto3Storage = public-read
• MediaS3BotoStorage = public-read
• PrivateS3BotoStorage = public-read
• storages.backends.s3boto3.S3Boto3Storage = public-read
When we want:
• PublicS3Boto3Storage = public-read
• MediaS3BotoStorage = private
• PrivateS3BotoStorage = private
• storages.backends.s3boto3.S3Boto3Storage = public-read
Access Control List ( ACL ) is how we grant permission to various objects in Object Storage. The available options
are listed here but the ones I use most are:
python
class PublicS3Boto3Storage(S3Boto3Storage):
location = ‘static’
default_acl = ‘public-read’
class MediaS3BotoStorage(S3Boto3Storage):
location = ‘media’
default_acl = ‘private’
class PrivateS3BotoStorage(S3Boto3Storage):
location = ‘private’
default_acl = ‘private’
At the time of this writing, django-storages does not support this feature natively.
Below is the mixin that we can use for the above settings:
cfeproj/storages/mixins.py
python
class DefaultACLMixin():
“””
Adds the ability to change default ACL for objects
within a `S3Boto3Storage` class.
CANNED_ACL_OPTIONS = [
‘private’,
‘public-read’,
‘public-read-write’,
‘aws-exec-read’,
‘authenticated-read’,
‘bucket-owner-read’,
‘bucket-owner-full-control’
]
def get_default_settings(self):
_settings = super().get_default_settings()
_settings[‘default_acl’] = self.get_default_acl()
return _settings
def get_default_acl(self):
_acl = self.default_acl or None
if _acl is not None:
if _acl not in self.CANNED_ACL_OPTIONS:
acl_options = “\n\t”.join(self.CANNED_ACL_OPTIONS)
raise Exception(f”The default_acl of \”{_acl}\” is invalid. Please use one
of the following:\n{acl_options}”)
return _acl
cfeproj/storages/backends.py
python
You can now have a per-storage acl setting that overrides the AWS_DEFAULT_ACL="public-read"
configuration.
cfeproj/storages/services/linode.py
python
import os
LINODE_BUCKET = os.environ.get(“LINODE_BUCKET”)
LINODE_BUCKET_REGION = os.environ.get(“LINODE_BUCKET_REGION”)
LINODE_BUCKET_ACCESS_KEY = os.environ.get(“LINODE_BUCKET_ACCESS_KEY”)
LINODE_BUCKET_SECRET_KEY = os.environ.get(“LINODE_BUCKET_SECRET_KEY”)
LINODE_OBJECT_STORAGE_READY = all([LINODE_BUCKET, LINODE_BUCKET_REGION,
LINODE_BUCKET_ACCESS_KEY, LINODE_BUCKET_SECRET_KEY])
if LINODE_OBJECT_STORAGE_READY:
AWS_S3_ENDPOINT_URL = f”https://{LINODE_BUCKET_REGION}.linodeobjects.com”
AWS_ACCESS_KEY_ID = LINODE_BUCKET_ACCESS_KEY
AWS_SECRET_ACCESS_KEY = LINODE_BUCKET_SECRET_KEY
AWS_S3_REGION_NAME = LINODE_BUCKET_REGION
AWS_S3_USE_SSL = True
AWS_STORAGE_BUCKET_NAME = LINODE_BUCKET
AWS_DEFAULT_ACL=”private” #private is the default
DEFAULT_FILE_STORAGE = “cfeproj.storages.backends.MediaS3BotoStorage”
STATICFILES_STORAGE = “cfeproj.storages.backends.PublicS3Boto3Storage”
cfeproj/storages/conf.py
python
import os
DJANGO_STORAGE_SERVICE = os.environ.get(“DJANGO_STORAGE_SERVICE”)
if DJANGO_STORAGE_SERVICE == ‘linode’:
from .services.linode import * # noqa
python
STATIC_URL = “/static/”
STATIC_DIRS = [BASE_DIR / “staticfiles”]
STATIC_ROOT = BASE_DIR / “staticroot”
MEDIA_URL = “/media/”
MEDIA_ROOT = BASE_DIR / “mediafiles”
Be sure to keep STATIC_ROOT and MEDIA_ROOT above from .storages.conf import * so we have
a fallback for python manage.py collectstatic
In ./.github/workflows/staticfiles.yaml add:
yaml
# A workflow run is made up of one or more jobs that can run sequentially or in
parallel
jobs:
# This workflow contains a single job called “build”
django_staticfiles:
# The type of runner that the job will run on
runs-on: ubuntu-latest
# Add in environment variables for the entire “build” job
env:
GITHUB_ACTIONS: true
DATABASE_BACKEND: mysql
DJANGO_SECRET_KEY: just-a-test-database
DJANGO_STORAGE_SERVICE: linode
LINODE_BUCKET: ${{ secrets.LINODE_BUCKET }}
LINODE_BUCKET_REGION: ${{ secrets.LINODE_BUCKET_REGION }}
LINODE_BUCKET_ACCESS_KEY: ${{ secrets.LINODE_BUCKET_ACCESS_KEY }}
LINODE_BUCKET_SECRET_KEY: ${{ secrets.LINODE_BUCKET_SECRET_KEY }}
steps:
# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
- name: Checkout code
uses: actions/checkout@v2
- name: Setup Python 3.10
uses: actions/setup-python@v2
with:
python-version: “3.10”
- name: Install requirements
run: |
pip install -r requirements.txt
- name: Collect Static
run: |
python manage.py collectstatic --noinput
yaml
collectstatic:
needs: test_django
uses: ./.github/workflows/staticfiles.yaml
secrets:
LINODE_BUCKET: ${{ secrets.LINODE_BUCKET }}
LINODE_BUCKET_REGION: ${{ secrets.LINODE_BUCKET_REGION }}
LINODE_BUCKET_ACCESS_KEY: ${{ secrets.LINODE_BUCKET_ACCESS_KEY }}
LINODE_BUCKET_SECRET_KEY: ${{ secrets.LINODE_BUCKET_SECRET_KEY }}
Chapter 7
Using Terraform to Provision
Infrastructure
Now we need the actual hardware to run our Django project and we’ll use Linode Instances/Virtual Machines.
Turning on a Linode instance is simple:
1. Log in to Linode
2. Click Create Linode
3. Add your configuration
4. You’re done
You probably know there’s a bit more to it than just these 4 steps, but that’s the gist of it.The nuance of this gist
is why tools like Terraform exist -- no more guessing what exactly was spun up, when, and by whom. If we need
to scale resources up or down, it can be as simple as changing a single variable in our GitHub Actions secrets.
• Setting the kind of virtual machine to provision (Ubuntu 18.04, Ubuntu 20.04, CentOS, or other Linux
operating systems)
• The number of instances to run
• What region to run in
• How much disk space does each instance need
• Adding in the ssh keys and root user password
All of the above steps are important to also track over time through version control ( git ) and update
automatically when changes happen (through GitHub Actions ).
In this chapter, we’re going to set up our minimal infrastructure -- mostly just our virtual machines -- using
Terraform so that we’re nearly ready to start running our Django project.After we use Terraform to spin things
up, we’ll use Ansible to configure the Linode Instances to run the software we to run. We’ll cover Ansible in the
next chapter and I cover both in more detail in my series Try Infrastructure as Code.
1. Install Terraform
All official installation options are here.
bash
powershell
bash
sudo apt-get update && sudo apt-get install -y gnupg software-properties-common curl
curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
sudo apt-add-repository “deb [arch=amd64] https://apt.releases.hashicorp.com
$(lsb_release -cs) main”
sudo apt-get update && sudo apt-get install terraform
• devops/tf
• devops/ansible
We'll use these folders over the next two chapters. The devops/tf folder needs to exist in this way so we can
initialize our Terraform project. Now, let's talk about storing Terraform state.
• keys/
• devops/tf/terraform.tfvars
• devops/tf/backend
You can either add these now or when we do it later in this chapter.
Terraform monitors the state of the infrastructure that we have spun up through Terraform. This means that
Terraform keeps a record of what should be running based on what our Terraform files ( .tf ) declare. You can
read more about Terraform state here.
We’ll use Linode Object Storage as our cloud-based storage location for the Terraform State -- this is important
because it allows us to leverage GitHub Actions to handle all things we need Terraform to do. Without a cloud-
based storage location for the Terraform state, automating our entire CI/CD pipeline would be incredibly difficult.
Creating a Linode bucket for Terraform is the same process in the chapter Object Storage for Django for setting
up Linode Object Storage.
1. Log in to Linode
2. Navigate to Object Storage
3. Click Create Bucket
4. Add a label, such as: lets-store-infra -- change as this value must be unique and this must be
different than the Django bucket
5. Add a region, such as Atlanta, GA -- ideally the location is near your or your end-users; this location
can be different than the Django bucket
6. Click Create Bucket
7. Create an API Key
Now that we have lets-store-infra in the us-southeast-1 region ( Atlanta, GA ), let’s create an API
Key:
1. Go to Object Storage
2. Select the tab for Access Keys
3. Click Create Access Key
4. Add a label, such as My Production Key for Terraform and Django CI/CD
5. Select limited access (To be more secure, always limit your access keys if you can)
6. Locate your bucket, and check Read/Write access.
7. Click Create Access Key (under the list of permissions)
8. After a moment, copy the two keys available Access Key and Secret Key . It will look something like:
As a side note, we can use Terraform to manage our Linode Buckets but I prefer not to, primarily so I do not
accidentally delete a lot of stored content by running 1 Terraform command.
LINODE_OBJECT_STORAGE_DEVOPS_BUCKET=lets-store-infra
LINODE_OBJECT_STORAGE_DEVOPS_BUCKET_ENDPOINT=us-southeast-1.linodeobjects.com
LINODE_OBJECT_STORAGE_DEVOPS_ACCESS_KEY=35HOB2YT9FN5EHMTPQAS
LINODE_OBJECT_STORAGE_DEVOPS_SECRET_KEY=ZVlNdIO89CBEF3BATS6bMmTisH54iGrHz3bB1Dqx
Key: LINODE_OBJECT_STORAGE_DEVOPS_TF_KEY
Value: your_project_name.tfstate (or what we set below as django-infra.tfstate )
skip_credentials_validation = true
skip_region_validation = true
bucket=”your_bucket”
key=”your_project_name.tfstate”
region=”us-southeast-1”
endpoint=”us-southeast-1.linodeobjects.com”
access_key=”your_object_storage_public_key”
secret_key=”your_object_storage_secret_key”
skip_credentials_validation = true
skip_region_validation = true
bucket=”lets-store-infra”
key=”django-infra.tfstate”
region=”us-southeast-1”
endpoint=”us-southeast-1.linodeobjects.com”
access_key=”35HOB2YT9FN5EHMTPQAS”
secret_key=”ZVlNdIO89CBEF3BATS6bMmTisH54iGrHz3bB1Dqx”
Right now, update .gitignore to include develops/tf/backend because we have several secrets that we
must keep secret.
I hope you see this step and wonder "How am I going to add this to GitHub Actions?" If you're starting to think
in this way, that's great! If you're not, that's okay, I think you just need more practice. A core theme behind this
book is to constantly think "how do I automate this step" which often translates to "how do I put this in GitHub
Actions" -- it took me a while to truly start thinking this way but once I did, I never looked back. It's also why this
book exists.
5. Initialize Terraform
The standard command is terraform init but for simplicity, we're keeping our Terraform code alongside our
Django code. In this case, we'll use the following command:
•
(2) 2GB or (3) 2GB Linode Instances running our Docker-based Django App
•
(1) 4GB or (1) 8GB Linode Instance running NGINX for Load Balancing (horizontal scale as needed)
•
(1) 2GB Linode Instance to run our Docker-based Redis Instance
•
(1) 2GB Linode Instance to run our Docker-based Django & Celery Instance
•
Group A: (2) 2GB or (3) 2GB Linode Instances
•
Group B: (1) 4GB or (1) 8GB Linode Instance
•
Group C: (1) 2GB Linode Instance
•
Group D: (1) 2GB Linode Instance
•
Group A: Docker-based Django App
•
Group B: NGINX for Load Balancing
•
Group C: Docker-based Redis
•
Group D: Docker-based Django & Celery
I think this illustrates how Ansible and Terraform work in harmony together -- Terraform turns things on/off and
Ansible configures them to work in a specific way.
These groups (names, what resources they need, and what software) have been decided specifically for this
book but they can be virtually anything and in virtually any configuration you may need. You can also easily add
or remove services as needed.
Create terraform.tfvars
Now, let’s create the devops/tf/terraform.tfvars file and get all of the related content. This is a resource
file that contains all of the variable values your Terraform project may need (including secrets). I tend to think
of these as input arguments to all of the .tf files.
Location: devops/tf/terraform.tfvars
linode_pa_token=”your_linode_persona_access_token”
authorized_key=”your_public_ssh_key”
root_user_pw=”your_root_user_password”
app_instance_vm_count=1
linode_image=”linode/ubuntu20.04”
Right now, update .gitignore to include devops/tf/terraform.tfvars because we have several secrets
that we must keep secret.
Key: LINODE_IMAGE
Value: linode/ubuntu20.04
1. Login to Linode
2. Navigate to API Tokens
3. Click Create a Personal Access Token`
4. Under Label add Deploy Django Terraform PAT
5. Expiry select In 1 month (yeah, do the shortest possible; these are the keys to your kingdom)
6. For the services,
• Select All: Select None so everything is turned off. If I don’t have a service listed below, we don’t need it.
• Domains: Select Read/Write
• Images: Select Read
• IPs: Select Read/Write
• Linodes: Select Read/Write
• NodeBalancers: Select Read/Write
• Object Storage: Select Read
• Volumes: Select Read/Write
Key: LINODE_PA_TOKEN
Value: c10451d868642d1b72eeaa598a2b3c3548bf9250ad8bed9609ae314986258aa9
1.
In your local devopts/tf/terraform.tfvars add your token like:
linode_pa_token=”c10451d868642d1b72eeaa598a2b3c3548bf9250ad8bed9609ae314986258aa9”
...
Creating SSH keys is easy on macOS, Linux, and Windows. It's just ssh-keygen . Now, if you just run
ssh-keygen you will be prompted for all kinds of options. Instead, let's do something more predictable
for us all:
1.
Generate Public / Private SSH Keys
mkdir -p keys/
ssh-keygen -f keys/tf-github -t rsa -N ‘’
echo “keys/” >> .gitignore
•
Create a directory in our project called keys
•
Generate 2 new ssh keys called tf-github and tf-github.pub in the folder keys/
•
Add keys/ to your .gitignore file
2.
On GitHub Action Secrets, add the following:
Key: SSH_DEVOPS_KEY_PUBLIC
Value: (enter the contents of tf-github.pub ) -- be sure to remove any newlines/line breaks at the end.
Key: SSH_DEVOPS_KEY_PRIVATE
Value: (enter the contents of tf-github ) -- newlines/line breaks are okay.
3.
In your local devops/tf/terraform.tfvars add your public key only like:
...
authorized_key=(enter the contents of `tf-github.pub`)
...
4.
What’s the private key for? Ansible. It’s for the next step.
5.
Can I delete tf-github.pub and tf-github after they are on GitHub Action secrets?
You can, but I suggest that you do it after you’ve run this process a few times.
•
Python: python3 -c “import secrets;print(secrets.token_urlsafe(32))”
•
Bash: openssl rand -base64 32
•
Django: python -c ‘from django.core.management.utils import get_random_secret_key;
print(get_random_secret_key())’
There’s probably a lot more out there. Just pick one and generate a key.
...
root_user_pw=”kbZAjeicfQLXDzasYIr8pUd2--ceEv3XdoU4OJTyfXM”
authorized_key=(enter the contents of `tf-github.pub`)
...
After you have this password, update your GitHub Action secrets to:
Key: ROOT_USER_PW
Value: kbZAjeicfQLXDzasYIr8pUd2--ceEv3XdoU4OJTyfXM
For app_instance_vm_count , we can decide exactly how many Linodes we want running our Docker-based
Django App. The more instances we run, the more expensive it will be (um, duh right?). To start, just use 1 .
Key: DJANGO_VM_COUNT
Value: 1
In this part, we’ll declare a bunch of things in Terraform files describing what we want to happen. That’s
essentially what IaC tools are: declare what you want and if done right, IaC tools will deliver that exact thing.
• main.tf
• variables.tf
• terraform.tfvars
• locals.tf
• linodes.tf
• outputs.tf
• backend
• templates/sample.tpl
• templates/ansible-inventory.tpl
•
other auto-generated Terraform files that we don’t need to worry about.
Terraform treats all of the .tf files as 1 big file. I named these files in this way to follow convention but as long
as .tf exists on the file, you can name it what you want; including having all of the code in the files as literally
1 big file.
Again, think of these as 1 big file, the order doesn’t matter but the namespace of each resource does.
resourceType “custom_name” {
resourceArg = yourValue
resourceArgAgain = yourOtherValue
resourceArgVar = var.my_variable
resourceArgDir = locals.root_dir
}
The above example is to merely show you the strange format that Terraform uses called HCL (HashiCorp
Configuration Language)-- it's almost like a Python diction or JavaScript Object Notion but it's not exactly
there for either.
Whenever I get stuck trying to diagnose something not working with HCL or Terraform here's what I try:
main.tf
Location: devops/tf/main.tf
terraform {
required_version = “>= 0.15”
required_providers {
linode = {
source = “linode/linode”
version = “1.27.2”
}
}
backend “s3” {
skip_credentials_validation = true
skip_region_validation = true
}
}
provider “linode” {
token = var.linode_pa_token
}
• terraform : this block is required in every Terraform project so we can declare all providers we'll
be using (there are many)
• required_provider > linode : This block declares the version of Linode's provider we want to use.
Check the official Linode Terraform provider docs here for the latest.
• backend "s3" this is how we leverage Object Storage as our Terraform state backend.
• provider "linode" when adding providers, in this case, the linode one, we need to provide
the personal access token. var.linode_pa_token will be discussed in the next section.
variables.tf
Location: devops/tf/variables.tf
variable “linode_pa_token” {
sensitive = true
}
variable “authorized_key” {
sensitive = true
}
variable “root_user_pw” {
sensitive = true
}
variable “app_instance_vm_count” {
default = 0
}
variable “linode_image” {
default = “linode/ubuntu20.04”
}
Do any of these variables look familiar to you? They should. They match identically to what we did with
terraform.tfvars . This, of course, was intentional.
Let's say variables.tf declares a value without a default? Terraform will prompt for user input to provide
this value. Neat! Using sensitive = true means Terraform will not output these values in the logs.
Using default=something means we can have a fallback value for items missing from terraform.tfvars .
To add to this, terraform.tfvars will not be added to your Git repo ever. We will generate this file in GitHub
Actions right along with the backend file.
locals.tf
Location: devops/tf/locals.tf
hcl
locals {
tf_dir = “${abspath(path.root)}”
templates_dir = “${local.tf_dir}/templates”
devops_dir = “${dirname(abspath(local.tf_dir))}”
ansible_dir = “${local.devops_dir}/ansible”
}
I use locals.tf as a place to initialize constants or hard coded values. Similar to const myVar = 'SomeValue
in JavaScript, locals.tf are very often used for things that won't change based on terraform.tfvars .
The docs state that "[Locals] can be helpful to avoid repeating the same values or expressions" -- so that's how
we'll use it.
In this project, I just store the various filesystem paths that I reference throughout the project. That is what's
happening with: ${dirname(abspath(path.root))} .
The easiest way to understand expressions like this is by using terraform console :
• "${}" : this how you can do String Substitution (it works a lot like JavaScript). Docs
• path.root docs points to the main.tf since that's where we declared terraform {} .
This is known as the root module.
• dirname() and abspath() are both filesystem functions. dirname gets the directory name
of a path. abspath() gets the absolute path of a module on the local file system.
So dirname(abspath(path.root)) should be /path/to/your/proj/devops where
• abspath(path.root) should be /path/to/your/proj/devops/tf
• local.tf_dir references another variable ( tf_dir ) within the locals block.
Neat!
Here’s how you can play around with this data, enter the console with:
> path.root
“.”
> abspath(path.root)
“/Users/cfe/Dev/django-prod/devops/tf/”
> dirname(path.root)
“.”
> dirname(abspath(path.root))
“/Users/cfe/Dev/django-prod/devops/tf/”
Having locals compute these values for us makes it easier to re-use them in other modules. For example,
we’ll use this in the next part like this:
• ”${local.templates_dir}/ansible-inventory.tpl”
and
• ”${local.ansible_dir}/inventory.ini”
linodes.tf
In this section, we’ll build out the requirements discussed before in this section]:
•
Group A: (2) 2GB Linode Instances (Django App)
•
Group B: (1) 4GB Linode Instance (Load Balancer)
•
Group C: (1) 2GB Linode Instance (Redis)
•
Group D: (1) 2GB Linode Instance (Celery Worker)
Let’s convert each of these into their actual instance types Docs:
•
Group A: 2x g6-standard-1 (Django App)
•
Group B: 1x g6-standard-2 (Load Balancer)
•
Group C: 1x g6-standard-1 (Redis)
•
Group D: 1x g6-standard-1 (Celery Worker)
Now, let’s start writing the requirements for the Linode provider in: devops/tf/linodes.tf .
We’ll start by adding just Group A from above. Keep in mind that everything in this resource block contains
a lot of items we have already discussed and fleshed out. This is merely the implementation. Let’s take a look:
Group A
• "linode_instance" "app_vm" This part is telling Terraform that this is using the provider Linode 's
resource known as linode_instance docs. app_vm , here is just naming this resource for Terraform
and Terraform-only. If you need to name the instances themselves on Linode, check out the label
portion below.
• count= Count refers, of course, to the number of instances we want to create with these exact
specifications. var.app_instance_vm_count references the variable in both variables.tf and
terraform.tfvars . If this number is 0, then Terraform will ensure that 0 instances are running.
If the number is larger than 0, then Terraform will ensure exactly that number is running.
• image = var.linode_image Linode has many Image types that can be used here. In this case we
reference linode_image in variables.tf which as a default value of linode/ubuntu20.04 .
This is great because if we omit linode_image in terraform.tfvars it will use this default value.
We can use custom images here as well but we'll leave that for another time.
• region = "us-central" I hard-coded the region here, but this can easily be put in locals since
we re-use it over and over.
• authorized_keys = [ var.authorized_key ] this is the public SSH key(s) we want this machine to
have
• root_pass = var.root_user_pw this is the password we want the default root user to have. In the
case of linode/ubuntu20.04 this root user will have the username: root but that's not always true
for all image types.
• private_ip = true gives us a private IP address so our instances can communicate within the
region's network.
• tags = ["app", "app-node"] these tags are optional but nice to have when reviewing the Linode
Console.
• (optional); You can also use a label= here. If you omit it, Linode will create one for you. I omit it here
since Linode requires these labels to be unique. You can make them semi-unique by using something
like label = "app_vm-${count.index + 1}"
Group B
lifecycle {
prevent_destroy = true
}
}
This group is almost the same as Group A with some minor changes. The notable ones are:
Group C
lifecycle {
prevent_destroy = true
}
The only key thing here is that we do not want to accidentally destroy our Redis data server via Terraform.
Group D
depends_on = [linode_instance.app_redis]
}
• count As we have seen, this argument is optional when we only want 1 instance. The reason I have
this here is so I can easily update my code to horizontally scale the worker as needed by changing this
value to a different one and/or by abstracting it to a variable as well. In other words, this is here to
future-proof this resource.
• label = “app_worker-${count.index + 1}” this of course corresponds with the fact I have the
count parameter at all.
• depends_on = [linode_instance.app_redis] here’s something that’s new to us.
instance.app_redis refers directly to the block for Group C. This depends_on array ensures that
Terraform will not create this resource until what it depends on is created. The portion
linode_instance.app_redis will be discussed in a future section. If you’re curious how it works
right now, spin up linode console .
lifecycle {
prevent_destroy = true
}
}
lifecycle {
prevent_destroy = true
}
depends_on = [linode_instance.app_redis]
}
outputs.tf
Location: devops/tf/outputs.tf
output “instances” {
value = [for host in linode_instance.app_vm.*: “${host.label} : ${host.ip_address}”]
}
output “loadbalancer” {
value = “${linode_instance.app_loadbalancer.ip_address}”
}
output “redisdb” {
value = “${linode_instance.app_redis.ip_address}”
}
Templates
Terraform can render content based on a template. First, let's make a sample template:
Hi ${ varA },
This syntax is a lot like Jinja or Django Templates just slightly different. You can find more details here, let’s use
this file so we can see how it works.
Assuming you have templates_dir in your locals.tf and your current working directory in the command
line is devops/tf/ , you can run:
terraform console
then
Hi Something cool,
- what
- is
- this
Let’s use this template so we can prepare our instances to be managed directly by Ansible (which we’ll discuss in
the next chapter).
[webapps]
%{ for host in webapps ~}
${host}
%{ endfor ~}
[loadbalancer]
${loadbalancer}
[redis]
${redis}
[workers]
%{ for host in workers ~}
${host}
%{ endfor ~}
Now that we have this, we can use the following templatefile() declaration:
templatefile(“${local.templates_dir}/ansible-inventory.tpl”, {
webapps=[for host in linode_instance.app_vm.*: “${host.ip_address}”]
loadbalancer=”${linode_instance.app_loadbalancer.ip_address}”
redis=”${linode_instance.app_redis.ip_address}”
workers=[for host in linode_instance.app_worker.*: “${host.ip_address}”]
})
First, I highly recommend you test this in the terraform console to get your mind around each argument.
Now that we understand how to use templatefile with our resources. Let’s implement it in a new resource
called local_file in devops/tf/linodes.tf :
• local_file allows Terraform to manage local files on your local machine. This file can be almost
anything, which is pretty cool.
• "local_file" "ansible_inventory" is merely to let me know this resource is to manage our
Ansible inventory ( inventory.ini ) file (more on this file in the next chapter.
• filename = "${local.ansible_dir}/inventory.ini" This is the destination of the local_file
resource that Terraform will manage 100%. If you change this file outside Terraform, Terraform will
change it back after you run the correct commands.
lifecycle {
prevent_destroy = true
}
}
lifecycle {
prevent_destroy = true
}
}
depends_on = [linode_instance.app_redis]
}
8. Run Commands
This will check if your Terraform files are technically valid. If the files are invalid, Terraform will not attempt to
make changes. This command can also help you diagnose errors.
validate will not check for errors that may/may not occur with any given resource API. For example, if you
give a Linode Instance the same name twice, Terraform will not recognize this as an error but the Linode API will
raise an error.
This will prompt you to approve the changes (requiring command line input) if they are valid changes (as in valid
Terraform-based changes).
This will not prompt you to approve the changes, it will make them (if they are valid).
These are two commands doing the same thing. Both will prompt you (requiring command line input) to
approve the destruction.
This will not prompt you to destroy all resources provisioned by Terraform.
This will allow you to test/review your Terraform project and various features of Terraform. It’s a good idea to use
-chdir=devops/tf/ to ensure the console is working at the root of your Terraform code.
Working through this book I have assumed you do not own a custom domain. The nice thing is you can add it
whenever you need it. Here’s the resource block you would add:
KEY: ALLOWED_HOST
VALUE: .yourcustom-domain.com
For my ALLOWED_HOST , I would add .tryiac.com ; notice the leading period ( . ) on this domain name;
that's intentional.
yaml
on:
workflow_call:
secrets:
DJANGO_VM_COUNT:
required: true
LINODE_IMAGE:
required: true
LINODE_OBJECT_STORAGE_DEVOPS_BUCKET:
required: true
LINODE_OBJECT_STORAGE_DEVOPS_TF_KEY:
required: true
LINODE_OBJECT_STORAGE_DEVOPS_ACCESS_KEY:
required: true
LINODE_OBJECT_STORAGE_DEVOPS_SECRET_KEY:
required: true
LINODE_PA_TOKEN:
required: true
ROOT_USER_PW:
required: true
SSH_DEVOPS_KEY_PUBLIC:
required: true
SSH_DEVOPS_KEY_PRIVATE:
required: true
workflow_dispatch:
jobs:
terraform:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Setup Terraform
uses: hashicorp/setup-terraform@v1
with:
terraform_version: 1.1.9
- name: Add Terraform Backend for S3
run: |
cat << EOF > devops/tf/backend
skip_credentials_validation = true
skip_region_validation = true
bucket=”${{ secrets.LINODE_OBJECT_STORAGE_DEVOPS_BUCKET }}”
key=”${{ secrets.LINODE_OBJECT_STORAGE_DEVOPS_TF_KEY }}”
region=”us-southeast-1”
endpoint=”us-southeast-1.linodeobjects.com”
access_key=”${{ secrets.LINODE_OBJECT_STORAGE_DEVOPS_ACCESS_KEY }}”
secret_key=”${{ secrets.LINODE_OBJECT_STORAGE_DEVOPS_SECRET_KEY }}”
EOF
- name: Add Terraform TFVars
run: |
cat << EOF > devops/tf/terraform.tfvars
linode_pa_token=”${{ secrets.LINODE_PA_TOKEN }}”
authorized_key=”${{ secrets.SSH_DEVOPS_KEY_PUBLIC }}”
root_user_pw=”${{ secrets.ROOT_USER_PW }}”
app_instance_vm_count=”${{ secrets.DJANGO_VM_COUNT }}”
linode_image=”${{ secrets.LINODE_IMAGE }}”
EOF
- name: Terraform Init
run: terraform -chdir=./devops/tf init -backend-config=backend
- name: Terraform Validate
run: terraform -chdir=./devops/tf validate -no-color
- name: Terraform Apply Changes
run: terraform -chdir=./devops/tf apply -auto-approve
In the next chapter in Part 2: GitHub Actions for Infrastructure, we'll finish off this workflow and update our
all.yaml workflow as well to also use GitHub Actions.
Chapter 8
Using Ansible to Configure Infrastructure
In this section, we’ll learn how to use Ansible to configure our infrastructure to run our application.
Wait, didn’t we just configure our infrastructure with Terraform? Yes and no. To reiterate what we discussed in the
last chapter, Terraform turns things on, Ansible makes them work the correct way.
It’s true, that Terraform can configure our infrastructure to a degree (using provisioners) but it’s not nearly as
powerful as Ansible is.
Ansible verifies the state of all of the Linode instances at runtime. Here are a few examples of what I mean:
•
Is NGINX installed? Ansible says yes.
•
Is Apache installed? Ansible says yes. Can we remove it? Ansible removes it.
•
Is Docker installed? Ansible says no. Can we install it? Ansible installs it.
•
Is our system up to date? Ansible says no. Can we update it? Ansible updates it.
This same concept applies across all of the machines that Ansible has access to (by way of inventory.ini and
connection keys ( ssh keys , root password , etc).
Before we dive in, I will start with: Ansible does not need or care about Terraform in the least. Ansible is perfectly
capable of running all the above (and much more) without aid.
•
Install Python 3.6 and up
•
Install Ansible via Python’s Package Installer (pip)
•
Create an Inventory File ( inventory.ini )
•
Create Playbooks
•
Run Ansible on the playbooks
•
Linode
•
Amazon Web Services
•
Google Cloud
•
Azure
•
Raspberry Pi
•
A Windows laptop that you made the wise decision to turn into a Linux server
ini
[my_linodes]
127.0.0.1
127.0.0.2
127.0.0.3
127.0.0.4
127.0.0.5
127.0.0.6
127.0.0.7
[my_pis]
127.0.0.8
127.0.0.9
127.0.0.10
127.0.0.11
The inventory file syntax, called INI , comes directly from the Microsoft Windows INI files as described in the
Python Config Parser.
ini
[webapps]
65.228.51.122
192.53.185.185
65.228.51.129
[loadbalancer]
172.104.194.157
[redis]
198.58.106.22
[workers]
198.66.106.38
As you may recall, we create this inventory.ini file with Terraform in the Templates section.
I should point out that you can also use domain names in place of IP Addresses. The reason we use IP Addresses
here is for simplicity.
Aside from the inventory file (and keys added to those instances), you can skip Terraform to use Ansible.
1. Log in to Linode
2. Provision at least two instances: one for Django to run on (web apps), and one for NGINX load balancer
to run on
3. Add your local SSH keys
4. Create a root user password (and remember it)
1. Ansible Basics
Have you ever used ssh to log in to a machine? I hope that answer is yes, but the idea is that you use Secure
Shells ( ssh ) like this:
ssh root@some_ip
After you log in here. You can start using this virtual machine ( remote host ) to do whatever you'd like.
Install things, run Django, do web scraping, and text your friends. Whatever. It's your virtual machine, have fun.
Using ssh to control a couple of machines is not a big deal. Using ssh to automate a couple of machines is
a bit trickier but still doable. Using ssh to configure 100 machines is downright nuts.
Instead, we need a tool like Ansible (or SaltStack, Puppet Bolt, or Chef like I cover in my book Try Infrastructure
as Code) to automate how everything works. Ansible, in a sense, will "ssh" into a remote machine (or group of
remote machines) what we need it to do.
1. Install Ansible
2. Add an inventory file (done)
3. Create a playbook
4. Run Ansible ( ansible-playbook main.yaml )
Playbooks are yaml files that declare the state we want any given host (or host group) in. The host / host
group can be declared inline with the Playbook or when we run Ansible or both!
A host is simply a virtual machine (or remote host ) that we have control of. In our case, it's a Linode
instance. A host group is merely a collection of hosts . All of our host entries in inventory.ini
are what we'll use Ansible to control.
2. Install Ansible
In Getting Started, we setup a Python virtual environment. I recommend you stick with that (venv reference).
cd path/to/your/cfe-django-blog
•
macOS/Linux: source venv/bin/activate
•
Windows: .\venv\Scripts\activate
Install Ansible
You should not add Ansible to your requirements.txt because Ansible will not be used while running your
Django project. Ansible will only be used via GitHub Actions.
3. Create a Playbook
Let's just install nginx on all of our machines.
yaml
- hosts: all
become: yes
tasks:
- name: Install Nginx
apt:
name: nginx
state: present
update_cache: yes
[defaults]
ansible_python_interpreter=’/usr/bin/python3’
deprecation_warnings=False
inventory=./inventory.ini
remote_user=”root”
retries=2
• devops/ansible/ansible.cfg
• devops/ansible/inventory.ini (with at least 1 host)
• devops/ansible/main.yaml
Now run:
cd devops/ansible/
ansible-playbook main.yaml
This will install NGINX on all machines. Now let’s get rid of it:
yaml
--
- hosts: all
become: yes
tasks:
- name: Install NGINX
apt:
name: nginx
state: absent
Now run:
ansible-playbook main.yaml
That’s Ansible in a nutshell. Now we just add more complex playbooks to ensure our hosts are configured
exactly as we need them to be configured.
3. Ansible Variables
In devops/ansible/vars/main.yaml add the following:
yaml
---
docker_username: “codingforentrepreneurs”
docker_token: “your-dockerhub-token”
docker_appname: “cfe-django-blog”
All of these variables were created in Chapter 3. They should also already be within your GitHub Action Secrets.
These variables can now be used throughout our Ansible project just like:
yaml
---
- hosts: all
become: yes
vars_files:
- vars/main.yaml
tasks:
- name: Install NGINX
debug: “msg=’Hello there from {{ docker_username }}’”
Let’s update our .gitignore to ensure this file is not added to Git (we’ll auto-generate it in GitHub Actions):
cd path/to/my/project/root/
echo `devops/ansible/vars/main.yaml` >> .gitignore
devops/ansible/ansible.cfg
[defaults]
ansible_python_interpreter=’/usr/bin/python3’
deprecation_warnings=False
inventory=./inventory.ini
remote_user=”root”
host_key_checking=False
private_key_file = ../../keys/tf-github
retries=2
Using this file allows us to keep our ansible-playbook commands very minimal by providing four key values:
• inventory
• remote_user
• host_key_checking
• private_key_file
inventory this is a reference to the file that lists all the instances of Linode Virtual Machines we provisioned
on Linode using Terraform in the previous chapter (or manually if you did it that way).
host_key_checking if this is set to True your ssh connection via Ansible may fail.
private_key_file When we provisioned our instances in the Terraform chapter, we added a public key.
This is a reference to the private key's path relative to this file itself. Update this as needed.
Add this file to .gitignore as we will recreate it with GitHub Actions as well.
At this point:
First, we’ll create default tasks for this role. Here’s what you need to do:
mkdir -p devops/ansible/roles/
mkdir -p devops/ansible/roles/docker-install
mkdir -p devops/ansible/roles/docker-install/tasks
mkdir -p devops/ansible/roles/docker-install/handlers
Now let's create an Ansible role that will call docker-install so that we use Docker whenever we need it:
In devops/ansible/roles/docker-install/tasks/main.yaml
yaml
- git
- build-essential
- python3-dev
- python3-pip
- python3-venv
In devops/ansible/roles/docker-install/handlers/main.yaml
yaml
•
Define standard tasks that the role will take in tasks/main.yaml
•
Define standard handlers (or notification handlers) that the role will make available in handlers/main.
yaml
Tasks allow us to just run whenever the role is used (more on this later). In other words tasks/main.yaml will
run in written order whenever this role is added. Handlers, on the other hand, run only when the handler itself
is notified by name.
Using Ansible Roles greatly improves the reusability and readability of our Ansible playbooks.
Before we can use these roles, let's uncover a few important setup items we need.
Local Testing
Yes, I do recommend you test this code locally. To do so, you'll need the following:
If that's true, we're going to install Ansible locally to give this role and playbook a try.
Install Ansible
cp .env .env.prod
After you’re done with local testing, it’s a good idea to remove .env.prod and keep all of those variables
stored on GitHub Action Secrets.
Create devops/ansible/main.yaml
yaml
---
- hosts: webapps
become: yes
roles:
- roles/docker-install
Notice that we have referenced roles/docker-install . This means that our recently created role will be
added to this main.yaml playbook. In other words:
cd devops/ansible/
Run playbook
ansible-playbook main.yaml
•
Docker Community Edition installed
•
Docker Compose installed via Python Pip (ideal supported version for linux)
yaml
All this does is run apt-get update on each machine. (And yes, it's sudo when you have become: yes
on your playbook)
yaml
This block installs several system-wide dependencies we need for Docker & Docker compose to work.
How?
yaml
On https://get.docker.com there's a super convenient script to install the latest Docker Community edition.
This block will download that script for us:
• get_url will automatically open a URL for us and download the contents
• url this, of course, is the URL we want to open
• dest is the location on our remote (our Linode instance) we want to store this file. When in doubt,
add it to tmp/ or opt/
• mode: 0755 gives our user permission to simply execute this file
• notify: ... , This will notify a handler of our choosing (more on this later) but only if this file changes.
yaml
This block installs several system-wide dependencies we need for Docker & Docker compose to work.
How?
yaml
The command command -v docker >/dev/null 2>&1 will check if the command docker is available on our
remote host. If it does not exist, docker_exists will be a variable we can use elsewhere in Ansible.
yaml
This block exists solely to trigger the exec docker script handler (as we’ll see soon).
yaml
This block re-iterates what we did with docker but now with docker-compose . Keep in mind that the version
of Docker Compose we’re using is via Python Pip so the command is docker-compose and not docker
compose as you may or may not be used to.
docker-install handlers:
First and foremost, the ordering of the handlers matters. If you order them differently, they will execute
differently and potentially cause major issues for your automation pipeline.
yaml
As you may notice, these resource blocks look identical to the tasks/main.yaml resource blocks.
That’s because they are! The only difference here is the name we give these handlers, such as exec docker
script and install docker compose , need to be unique globally. Further, to trigger these handlers,
we must use the name exactly as it’s written and within a notify: . Remember: Each handler should have
a globally unique name. (lifted directly from the Ansible Handler docs)
• notify: install docker compose will notify the handler install docker compose
• notify: another unknown one will notify the handler another unknown one and definitely
won’t notify install docker compose
Handlers are great for repeatable events you need to be able to trigger at some point in time.
6. Ansible Templates
Now we’re going to leverage Ansible Templates to create the following:
•
NGINX Load Balancer configuration
•
Docker Compose production .yaml file
•
Ansible Playbooks
•
Ansible Roles
• inventory.ini
• vars/main.yaml
templates/docker-compose.yaml.jinja2
yaml
version: “3.9”
services:
watchtower:
image: index.docker.io/containrrr/watchtower:latest
restart: always
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /root/.docker/config.json:/config.json
command: --interval 30
profiles:
- app
app:
image: index.docker.io/{{ docker_username }}/{{ docker_appname }}:latest
restart: always
env_file: ./.env
container_name: {{ docker_appname }}
environment:
- PORT=8080
ports:
- “80:8080”
expose:
- 80
volumes:
- ./certs:/app/certs
profiles:
- app
redis:
image: redis
restart: always
ports:
- “6379:6379”
expose:
- 6379
volumes:
- redis_data:/data
entrypoint: redis-server --appendonly yes
profiles:
- redis
volumes:
redis_data:
templates/nginx-lb.conf.jinja
conf
{% if groups[‘webapps’] %}
upstream myproxy {
{% for host in groups[‘webapps’] %}
server {{ host }};
{% endfor %}
}
{% endif %}
server {
listen 80;
server_name localhost;
root /var/www/html;
{% if groups[‘webapps’] %}
location / {
proxy_pass http://myproxy;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
proxy_redirect off;
}
{% endif %}
}
Notice that I never copy my Django code here; this is done on purpose. It’s also important to understand that
Ansible can configure our servers to run our Django apps (or nearly any app for that matter) without the need
for Docker.
In some cases, it is perfectly reasonable to skip using Docker and just opt for Ansible to configure your
environment, this is done a lot in the IT world already.
In most cases, however, I believe that leveraging Docker is far more advantageous. Here are a few reasons
in addition to what we discussed in the Docker Chapter:
mkdir -p develops/ansible/django-app
mkdir -p develops/ansible/django-app/handlers
mkdir -p develops/ansible/django-app/tasks
In devops/ansible/django-app/tasks/main.yaml
yaml
• WEBAPP_NODE_HOST={{ inventory_hostname }}
• LOAD_BALANCER_HOST={{ groups[‘loadbalancer’][0] }}
These two settings are for our Django application. For Django to run correctly in production, we must update
ALLOWED_HOSTS in settings.py .
python
ALLOWED_HOSTS = []
In devops/ansible/django-app/handlers/main.yaml
yaml
The commands above are standard docker-compose commands. We use --profile app or --profile
redis to target specific profiles from our docker-compose.prod.yaml template that we created above.
If we needed to add additional services, we would need to update this template.
Update main.yaml
In devops/ansible/main.yaml :
yaml
---
- hosts: webapps
become: yes
roles:
- docker-install
- django-app
vars_files:
- vars/main.yaml
tasks:
- name: Login to Docker via vars/main.yaml
shell: “echo \”{{ docker_token }}\” | docker login -u {{ docker_username }}
--password-stdin”
- name: Run our Django app in the Background
shell: echo “Running Docker Compose for Django App”
notify: docker-compose start django app
•
First off, remember all handlers will run after all tasks are complete. So that means that
docker-compose start django app will only run after we log in to Docker.
•
Here we use hosts:webapps to target the group of hosts ( [webapps] ) directly from inventory.ini .
•
We use become: yes so we have root privilege to execute commands
•
We need both docker-install and django-app roles for this playbook to run
•
The tasks here merely exist to log in to Docker hub and to run the docker-compose handler from
the django-app role.
mkdir -p devops/ansible/playbooks
In devops/ansible/playbooks/force-rebuild-django-app.yaml :
yaml
---
- hosts: webapps
become: yes
roles:
- docker-install
- docker-app
vars_files:
- vars/main.yaml
tasks:
- name: Rebuild Django App on Web App Hosts
shell: echo “Forcing Django App Rebuild”
notify: docker-compose force rebuild django app
Once again, keep in mind that we're using just the webapps group of hosts from the inventory.ini .
cd devops/ansible/
ansible-playbook playbooks/force-rebuild-django-app.yaml
In our docker-compose.yaml.jinja template, we have a service called watchtower that references the
image containrrr/watchtower . This image exists precisely to update any/all of our Docker containers when
they are updated. Let's take a look at the declaration in docker-compose.yaml.jinja :
yaml
watchtower:
image: index.docker.io/containrrr/watchtower:latest
restart: always
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /root/.docker/config.json:/config.json
command: --interval 30
Well, every time we build and push our Docker container for our Django app, this watchtower service will
automatically update our Linode instance’s running container. It does this pretty gracefully and honestly.
Much better than running a force rebuild, as we did in the last section ( ansible-playbook playbooks/force-
rebuild-django-app.yaml ).
mkdir -p devops/ansible/roles/nginx-lb
mkdir -p devops/ansible/roles/nginx-lb/tasks
mkdir -p devops/ansible/roles/nginx-lb/handlers
In devops/ansible/roles/nginx-lb/tasks/main.yaml :
yaml
All this does is install nginx , the nginx service is started, and add a new configuration that combines the
recently created ./templates/nginx-lb.conf.jinja with our inventory.ini .
So, why are we not running nginx through Docker? After everything I laid out before, this is a valid question
and a worthy one at that. The answer: nginx is incredibly efficient and requires very little setup to run
incredibly well. The only change that will likely occur in this app is if we add additional web app nodes,
which would just require us to run the nginx-lb role once again.
In devops/ansible/roles/nginx-lb/handlers/main.yaml :
yaml
Update main.yaml
In devops/ansible/main.yaml :
yaml
---
- hosts: webapps
become: yes
roles:
- docker-install
- django-app
vars_files:
- vars/main.yaml
tasks:
- name: Login to Docker via vars/main.yaml
shell: “echo \”{{ docker_token }}\” | docker login -u {{ docker_username }}
--password-stdin”
- name: Run our Django app in the Background
shell: echo “Running Docker Compose for Django App”
notify: docker-compose start django app
- hosts: loadbalancer
become: yes
roles:
- nginx-lb
cd devops/ansible
ansible-playbook main.yaml
yaml
on:
workflow_call:
secrets:
ALLOWED_HOST:
required: false
DJANGO_SECRET_KEY:
required: true
DJANGO_VM_COUNT:
required: true
DOCKERHUB_APP_NAME:
required: true
DOCKERHUB_TOKEN:
required: true
DOCKERHUB_USERNAME:
required: true
LINODE_BUCKET_REGION:
required: true
LINODE_BUCKET_ACCESS_KEY:
required: true
LINODE_BUCKET_SECRET_KEY:
required: true
LINODE_IMAGE:
required: true
LINODE_OBJECT_STORAGE_DEVOPS_BUCKET:
required: true
LINODE_OBJECT_STORAGE_DEVOPS_TF_KEY:
required: true
LINODE_OBJECT_STORAGE_DEVOPS_ACCESS_KEY:
required: true
LINODE_OBJECT_STORAGE_DEVOPS_SECRET_KEY:
required: true
LINODE_BUCKET:
required: true
LINODE_PA_TOKEN:
required: true
MYSQL_DB_CERT:
required: true
MYSQL_DATABASE:
required: true
MYSQL_HOST:
required: true
MYSQL_ROOT_PASSWORD:
required: true
MYSQL_TCP_PORT:
required: true
MYSQL_USER:
required: true
ROOT_USER_PW:
required: true
SSH_PUB_KEY:
required: true
SSH_DEVOPS_KEY_PUBLIC:
required: true
SSH_DEVOPS_KEY_PRIVATE:
required: true
workflow_dispatch:
jobs:
terraform_ansible:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Setup Terraform
uses: hashicorp/setup-terraform@v1
with:
terraform_version: 1.1.9
- name: Add Terraform Backend for S3
run: |
cat << EOF > devops/tf/backend
skip_credentials_validation = true
skip_region_validation = true
bucket=”${{ secrets.LINODE_OBJECT_STORAGE_DEVOPS_BUCKET }}”
key=”${{ secrets.LINODE_OBJECT_STORAGE_DEVOPS_TF_KEY }}”
region=”us-southeast-1”
endpoint=”us-southeast-1.linodeobjects.com”
access_key=”${{ secrets.LINODE_OBJECT_STORAGE_DEVOPS_ACCESS_KEY }}”
secret_key=”${{ secrets.LINODE_OBJECT_STORAGE_DEVOPS_SECRET_KEY }}”
EOF
MYSQL_PASSWORD=${{ secrets.MYSQL_PASSWORD }}
MYSQL_ROOT_PASSWORD=${{ secrets.MYSQL_DB_ROOT_PASSWORD }}
MYSQL_TCP_PORT=${{ secrets.MYSQL_DB_PORT }}
MYSQL_HOST=${{ secrets.MYSQL_DB_HOST }}
# static files connection
LINODE_BUCKET=${{ secrets.LINODE_BUCKET }}
LINODE_BUCKET_REGION=${{ secrets.LINODE_BUCKET_REGION }}
LINODE_BUCKET_ACCESS_KEY=${{ secrets.LINODE_BUCKET_ACCESS_KEY }}
LINODE_BUCKET_SECRET_KEY=${{ secrets.LINODE_BUCKET_SECRET_KEY }}
EOF
- name: Adding or Override Ansible Config File
run: |
cat << EOF > devops/ansible/ansible.cfg
[defaults]
ansible_python_interpreter=’/usr/bin/python3’
deprecation_warnings=False
inventory=./inventory.ini
remote_user=”root”
host_key_checking=False
private_key_file = ./devops-key
retries=2
EOF
- name: Adding Ansible Variables
run: |
mkdir -p devops/ansible/vars/
cat << EOF > devops/ansible/vars/main.yaml
---
docker_appname: “${{ secrets.DOCKERHUB_APP_NAME }}”
docker_token: “${{ secrets.DOCKERHUB_TOKEN }}”
docker_username: “${{ secrets.DOCKERHUB_USERNAME }}”
EOF
- name: Run main playbook
run: |
ANSIBLE_CONFIG=devops/ansible/ansible.cfg ansible-playbook devops/ansible/
main.yaml
At this point, I encourage you to go through this workflow line-by-line to see if you understand exactly what’s
going on. Here’s a hint - it runs everything that we did in the last two chapters.
Using ANSIBLE_CONFIG along with a path to our ansible.cfg file allows us to run this command anywhere
on our machine just as long as our paths are correct (to the config file as well as the initial playbook).
on:
workflow_dispatch:
jobs:
test_django:
uses: ./.github/workflows/test-django-mysql.yaml
build_container:
needs: test_django
uses: ./.github/workflows/container.yaml
secrets:
DOCKERHUB_APP_NAME: ${{ secrets.DOCKERHUB_APP_NAME }}
DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}
DOCKERHUB_TOKEN: ${{ secrets.DOCKERHUB_TOKEN }}
update_infra:
needs: build_container
uses: ./.github/workflows/infra.yaml
secrets:
ALLOWED_HOST: ${{ secrets.ALLOWED_HOST }}
DJANGO_SECRET_KEY: ${{ secrets.DJANGO_SECRET_KEY }}
DJANGO_VM_COUNT: ${{ secrets.DJANGO_VM_COUNT }}
DOCKERHUB_APP_NAME: ${{ secrets.DOCKERHUB_APP_NAME }}
DOCKERHUB_TOKEN: ${{ secrets.DOCKERHUB_TOKEN }}
DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}
LINODE_BUCKET_REGION: ${{ secrets.LINODE_BUCKET_REGION }}
collectstatic:
needs: test_django
uses: ./.github/workflows/staticfiles.yaml
secrets:
LINODE_BUCKET: ${{ secrets.LINODE_BUCKET }}
LINODE_BUCKET_REGION: ${{ secrets.LINODE_BUCKET_REGION }}
LINODE_BUCKET_ACCESS_KEY: ${{ secrets.LINODE_BUCKET_ACCESS_KEY }}
LINODE_BUCKET_SECRET_KEY: ${{ secrets.LINODE_BUCKET_SECRET_KEY }}
Do you remember all the commands to run Terraform and Ansible? I rarely do, which is why I almost always
create a Makefile (like we do in Appendix F).
There’s a good chance that your environment variables will change as your project grows.
•
GitHub Action Secrets
• .github/workflows/infra.yaml
• .github/workflows/all.yaml
•
Local .env and .env.prod
If your Django/Python code changes, you must run python manage.py makemgirations -- I suggest doing this
during development, and in a development environment. If you run makemigrations your entrypoint.sh
will automatically run the migrations needed.
This can cause issues in production, which is why it’s critical to ensure your development and production
environments are as similar as possible.
In the last chapter, we showed you an optional way to add a custom domain to your load balancer instance.
Once you have this custom domain mapped, you can use Certbot (Let’s Encrypt) to add a free HTTPS setup.
This process may require manual setup initially but it’s well worth it. This is a great guide from Linode on how
to implement Let’s Encrypt on your load balancer: https://www.linode.com/docs/guides/install-lets-encrypt-
to-create-ssl-certificates/
Keep in mind that using NGINX without Docker makes setting up Certbot/Let’s Encrypt much easier.
Once you set up Certbot, your NGINX configuration will look something like:
Location: /etc/nginx/sites-enabled/default
Contents:
conf
upstream myproxy {
server 123.23.123.23;
server 123.23.123.23;
server 123.23.123.23;
}
server {
server_name tryiac.com www.tryiac.com *.tryiac.com;
root /var/www/html;
location / {
proxy_pass http://myproxy;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
proxy_redirect off;
}
server {
if ($host = www.tryiac.com) {
return 301 https://$host$request_uri;
} # managed by Certbot
if ($host = tryiac.com) {
return 301 https://$host$request_uri;
} # managed by Certbot
listen 80;
server_name tryiac.com www.tryiac.com *.tryiac.com;
return 404; # managed by Certbot
}
Naturally, you would want to replace all instances of tryiac.com with whatever domain you’re using.
Now, let’s just update your templates/nginx-lb.conf.jinja with the certbot’s (Let’s Encrypt) additions:
conf
{% if groups[‘webapps’] %}
upstream myproxy {
{% for host in groups[‘webapps’] %}
server {{ host }};
{% endfor %}
}
{% endif %}
server {
server_name tryiac.com www.tryiac.com *.tryiac.com;
root /var/www/html;
{% if groups[‘webapps’] %}
location / {
proxy_pass http://myproxy;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
proxy_redirect off;
}
{% else %}
index index.html;
location / {
try_files $uri $uri/ =404;
}
{% endif %}
Naturally, you would want to replace all instances of tryiac.com with whatever domain you’re using.
Now, let’s just update your templates/nginx-lb.conf.jinja with the certbot’s (Let’s Encrypt) additions:
conf
server {
if ($host = www.tryiac.com) {
return 301 https://$host$request_uri;
} # managed by Certbot
if ($host = tryiac.com) {
return 301 https://$host$request_uri;
} # managed by Certbot
listen 80;
server_name tryiac.com www.tryiac.com *.tryiac.com;
return 404; # managed by Certbot
}
Once again replace tryiac.com where needed and default to Certbot's (Let's Encrypt) paths for the SSL
certificates.
Chapter 9
Final Thoughts
This book has been about sustainably and reliably deploying Django in production using Docker containers,
NGINX for load balancing, virtual machines on Linode, a managed database on Linode, a managed object storage
on Linode, CI/CD with GitHub, and IaC automation with Terraform and Ansible. What I love about each piece of
technology we discussed here is that there’s so much more we could dive into. The majority of these chapters
could probably have books dedicated to them on their own.
I hope this book will be the beginning or continuation of your journey into the fascinating world of production
workloads and applications. Production may feel like the end of a journey, but I believe it’s really where the
learning begins.
The technical piece of how to get into production is important for sure, but what might be more important is
answering:
The reason I pose these questions now is to challenge a common trap we engineers and entrepreneurs often fool
ourselves into thinking:
This statement may even be true, but it’s important to consistently test our assumption that it is. Learning while
doing work that matters has much more significance to one’s well-being than learning while working on some-
thing that doesn’t. What can make matters worse is working on something that never mattered in the first place,
especially if you wasted hours/days/months/years/decades on it. Be a scientist and constantly test assumptions,
observe outcomes, form new assumptions, and repeat.
I hope that you take this book as a guide for taking action on continuously testing and deploying your projects
and your ideas.
I hope you enjoyed it. If you did, please shoot me send me a message on Twitter: @justinmitchel.
Thank you!
Justin Mitchel
Appendix A
GitHub Actions Secrets Reference
The final GitHub Action secrets for this book are as follows:
1. `ALLOWED_HOST`
2. `DJANGO_DEBUG`
3. `DJANGO_SECRET_KEY`
4. `DJANGO_STORAGE_SERVICE`
5. `DJANGO_VM_COUNT`
6. `DOCKERHUB_APP_NAME`
7. `DOCKERHUB_TOKEN`
8. `DOCKERHUB_USERNAME`
9. `LINODE_BUCKET`
10. `LINODE_BUCKET_REGION`
11. `LINODE_BUCKET_ACCESS_KEY`
12. `LINODE_BUCKET_SECRET_KEY`
13. `LINODE_IMAGE`
14. `LINODE_OBJECT_STORAGE_DEVOPS_BUCKET`
15. `LINODE_OBJECT_STORAGE_DEVOPS_BUCKET_ENDPOINT`
16. `LINODE_OBJECT_STORAGE_DEVOPS_TF_KEY`
17. `LINODE_OBJECT_STORAGE_DEVOPS_ACCESS_KEY`
18. `LINODE_OBJECT_STORAGE_DEVOPS_SECRET_KEY`
19. `LINODE_PA_TOKEN`
20. `MYSQL_DATABASE`
21. `MYSQL_DB_CERT`
22. `MYSQL_DB_HOST`
23. `MYSQL_DB_ROOT_PASSWORD`
24. `MYSQL_DB_PORT`
25. `MYSQL_PASSWORD`
26. `MYSQL_USER`
27. `ROOT_USER_PW`
28. `SSH_DEVOPS_KEY_PUBLIC`
29. `SSH_DEVOPS_KEY_PRIVATE`
If you do not have these secrets, there’s a good chance your project will not work as intended.
Appendix B
Final Project Structure
We have a lot of code in this book, so I created this appendix for your reference. Use this as a guide for the final
project structure. You can find the final project code at:
https://github.com/codingforentrepreneurs/deploy-django-linode-mysql
.
Dockerfile
LICENSE
Makefile
README.md
articles
__init__.py
__pycache__
admin.py
apps.py
management
__init__.py
__pycache__
commands
__init__.py
__pycache__
backup_articles.py
migrations
0001_initial.py
0002_article_updated_by.py
__init__.py
__pycache__
models.py
templates
articles
article_detail.html
article_list.html
tests.py
urls.py
utils.py
views.py
certs
db-test.crt
db.crt
cfe-django-blog.code-workspace
cfeblog
__init__.py
__pycache__
asgi.py
dbs
__init__.py
__pycache__
mysql.py
postgres.py
settings.py
storages
__init__.py
__pycache__
backends.py
client.py
conf.py
mixins.py
services
__init__.py
__pycache__
linode.py
urls.py
wsgi.py
config
entrypoint.sh
db-init
init.sql
mysql-config.cfg
devops
ansible
ansible.cfg
django-app
handlers
main.yaml
tasks
main.yaml
docker-install
handlers
main.yaml
tasks
main.yaml
inventory.ini
main.yaml
nginx-lb
handlers
main.yaml
tasks
main.yaml
templates
docker-compose.yaml.jinja2
nginx-lb.conf.jinja
vars
main.yaml
tf
backend
linodes.tf
locals.tf
main.tf
outputs.tf
templates
ansible-inventory.tpl
variables.tf
docker-compose.prod.yaml
docker-compose.yaml
fixtures
articles.json
auth.json
keys
tf-github
tf-github.pub
manage.py
mediafiles
articles
hello-world
07472194-a6d5-11ec-887f-acde48001122.jpg
requirements.txt
staticfiles
empty.txt
staticroot
empty.txt
templates
base.html
navbar.html
44 directories, 73 files
Appendix C
.gitignore and .dockerignore
Reference Examples
.gitignore / .dockerignore :
# mysql related
db-init/
certs/db.crt
# environment variables
.env.prod
# devops related
devops/tf/backend
devops/ansible/vars/main.yaml
devops/ansible/inventory.ini
devops/ansible/ansible.cfg
keys/tf-github
keys/tf-github.pub
# django related
staticroot/admin/
scripts/
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/# .tfstate files
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
.python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version
control.
# However, in case of collaboration, if having platform-specific dependencies or
dependencies
# having no cross-platform support, pipenv may install dependencies that don’t work,
or not
# install all needed dependencies.
#Pipfile.lock
# Celery stuff
celerybeat-schedule
celerybeat.pid
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# .tfstate files
*.tfstate
*.tfstate.*
# Exclude all .tfvars files, which are likely to contain sensitive data, such as
# password, private keys, and other secrets. These should not be part of version
# control as they are data points which are potentially sensitive and subject
# to change depending on the environment.
*.tfvars
*.tfvars.json
# Ignore override files as they are usually used to override resources locally and
# are not checked in.
override.tf
override.tf.json
*_override.tf
*_override.tf.json
# Include override files you do wish to add to version control using negated pattern
# !example_override.tf
# Include tfplan files to ignore the plan output of command: terraform plan
-out=tfplan
# example: *tfplan*
Appendix D
Self-Hosted Runners with
GitHub Actions
It should be no surprise that GitHub Actions have limited resources available for you to use. Not to worry, we can
upgrade what resource(s) we may need with our own custom backend (self-hosted runner) to execute the GitHub
Actions workflows in any given repository.
Here are some of the resources available to us by default (read more here):
• 14GB of SSD Disk Space
• 7GB of RAM
• 2-Core CPU
Amazingly, we can run much of what we need for free with these specs. If you want to use more than 14GB
(or these specs), you can easily use a self-hosted runner for GitHub Actions, which means that instead of using
GitHub’s virtual machines, you can bring your own.
From this:
yaml
To this:
yaml
Appendix E
Using the GitHub CLI for
Setting Action Secrets
The GitHub CLI can do nearly anything you need to do for your applications and is a great tool for automation.
One of the core tenants of this book is portability, which is why I opted to leave this section as an appendix item
instead of making it a core requirement for the book.
You can probably use the GitHub CLI without using GitHub, but that’s a customization rabbit hole I do not want
to go down in this book.
Yes, the .github/workflows folder was designed for GitHub, but as we discussed in Chapter 2 will allow us to
execute these workflows anywhere, thus making them portable.
Either way, using the GitHub CLI to set our GitHub Action Secrets automatically is a good idea. Here’s how you’ll
do that:
Reference Docs
brew install gh
choco install gh
Linux Ubuntu
This is copied directly from this document on the GitHub CLI repo
• workflow
• read:org
• read:user
2.
Add Personal Access Token to a file called my-gh-persona-token.txt
3.
Authenticate with GH
Assuming you have your database certificate available (via db.crt ), you can run:
There are many other commands and options for setting secrets in the GitHub CLI docs here.
Appendix F
Makefile
In many of my projects, I use a Makefile so that I can write commands like:
make infra_up
make infra_down
make migrate
For me, using make is just an easy way to remember which commands I use for my local development.
Installation
macOS / Linux
If you are running macOS or Linux, you should not have to install anything.
Example Makefile
path/to/your/project/Makefile
makefile
tf_console:
terraform -chdir=devops/tf/ console
tf_plan:
terraform -chdir=devops/tf/ plan
tf_apply:
terraform -chdir=devops/tf/ apply
tf_upgrade:
terraform -chdir=devops/tf/ init -upgrade
ansible:
ANSIBLE_CONFIG=devops/ansible/ansible.cfg venv/bin/ansible-playbook devops/
ansible/main.yaml
infra_up:
terraform -chdir=devops/tf/ apply
ANSIBLE_CONFIG=devops/ansible/ansible.cfg venv/bin/ansible-playbook devops/
ansible/main.yaml
infra_down:
terraform -chdir=devops/tf/ apply -destroy
infra_init:
terraform -chdir=devops/tf/ init -backend-config=backend
make tf_console
This will enter our Terraform console in relation to the Terraform files in devops/tf/
make ansible
This will use the virtual environment version of Ansible (we don’t need to activate the virtual environment for
this to run correctly) and then apply the main.yaml playbook.
make infra_up
This will make all infrastructure changes via Terraform and all infrastructure configuration changes via Ansible.