Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
13 views

Simulating and Visualizing Real-Life Events in Python With SimPy - by Kevin Brown - Towards Data Science

The document discusses using the SimPy library in Python to build discrete event simulations. It provides an example of simulating visitor arrival and processing at an event entrance. The simulation accounts for visitors arriving by bus and either purchasing tickets or having pre-purchased tickets. It also discusses visualizing the simulation results through different approaches like Matplotlib, HTML canvas, and AR/VR.

Uploaded by

Thi Kim
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Simulating and Visualizing Real-Life Events in Python With SimPy - by Kevin Brown - Towards Data Science

The document discusses using the SimPy library in Python to build discrete event simulations. It provides an example of simulating visitor arrival and processing at an event entrance. The simulation accounts for visitors arriving by bus and either purchasing tickets or having pre-purchased tickets. It also discusses visualizing the simulation results through different approaches like Matplotlib, HTML canvas, and AR/VR.

Uploaded by

Thi Kim
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Simulating and Visualizing Real-Life

Events in Python with SimPy


We walk through the development of a complete model from the
events industry, and show three different ways to visualize the results
(including AR/VR)

Kevin Brown · Follow


Published in Towards Data Science · 11 min read · Jun 11, 2021

195 2

Discrete Event Simulation (DES) has tended to be the domain of specialized


products such as SIMUL8 [1] and MatLab/Simulink [2]. However, while
performing an analysis in Python for which I would have used MatLab in the
past, I had the itch to test whether Python has an answer for DES as well.

DES is a way to model real-life events using statistical functions, typically for
queues and resource usage with applications in health care, manufacturing,
logistics and others [3]. The end goal is to arrive at key operational metrics
such as resource usage and average wait times in order to evaluate and
optimize various real-life configurations. SIMUL8 has a video depicting how
emergency room wait times can be modelled [4], and MathWorks has a
number of educational videos to provide an overview of the topic [5], in
addition to a case study on automotive manufacturing [6]. The SimPy [7]
library provides support for describing and running DES models in Python.
Unlike a package such as SIMUL8, SimPy is not a complete graphical
environment for building, executing and reporting upon simulations;
however, it does provide the fundamental components to perform
simulations and output data for visualization and analysis.

This article will first walk through a scenario and show how to it can be
implemented in SimPy. It will then look at three different approaches to
visualizing the results: a Python-native solution (with Matplotlib [8] and
Tkinter [9]), an HTML5 canvas-based approach, and an interactive AR/VR
visualization. We will conclude by using our SimPy model to evaluate
alternative configurations.

The Scenario
For our demonstration, I will use an example from some of my previous
work: the entrance queue at an event. However, other examples that follow a
similar pattern could be a queue at a grocery store or a restaurant that takes
online orders, a movie theatre, a pharmacy, or a train station.

We will simulate an entrance that is served entirely by public transit: on a


regular basis a bus will be dropping off several patrons who will then need to
have their tickets scanned before entering the event. Some visitors will have
badges or tickets they pre-purchased in advance, while others will need to
approach seller booths first to purchase their tickets. Adding to the
complexity, when visitors approach the seller booths, they will do so in
groups (simulating a family/group ticket purchase); however, each person
will need to have their tickets scanned separately.

The following depicts the high-level layout of this scenario.

Image by Author and Matheus Ximenes

In order to simulate this, we will need to decide on how to represent these


different events using probability distributions. The assumptions made in
our implementation include:

A bus will arrive on average 1 every 3 minutes. We will use an exponential


distribution with a λ of 1/3 to represent this

Each bus will contain 100 +/- 30 visitors determined using a normal
distribution (μ = 100, σ = 30)
Visitors will form groups of 2.25 +/– 0.5 people using a normal
distribution (μ = 2.25, σ = 0.5). We will round this to the closest whole
number

We’ll assume that a fixed ratio of 40% of visitors will need to purchase
tickets at the seller booths, another 40% will arrive with a ticket already
purchased online, and 20% will arrive with staff credentials

Visitors will take one minute on average to exit the bus and walk to the
seller booth (normal, μ = 1, σ = 0.25), and another half minute to walk
from the sellers to the scanners (normal, μ = 0.5, σ = 0.1). For those
skipping the sellers (tickets pre-purchased or staff with badges), we’ll
assume an average walk of 1.5 minutes (normal, μ = 1.5, σ = 0.35)

Visitors will select the shortest line when they arrive, where each line has
one seller or scanner

A sale requires 1 +/- 0.2 minutes to complete (normal, μ = 1, σ = 0.2)

A scan requires 0.05 +/- 0.01 minutes to complete (normal, μ = 0.05, σ =


0.01)

With that in mind, let’s start with the output and work backwards from there.

Image by Author
The graph on the left-hand side represents the number visitors arriving per
minute and the graphs on the right-hand side represent the average time the
visitors exiting the queue at that moment needed to wait before being
served.

SimPy Simulation Set Up


The repository with the complete runnable source can be found at
https://github.com/dattivo/gate-simulation with the following snippets lifted
from the simpy example.py file. In this section we will step through the
SimPy-specific set up; however, note that the parts that connect to Tkinter
for visualization are omitted to focus on the DES features of SimPy.

To begin, let’s start with the parameters of the simulation. The variables that
will be most interesting to analyze are the number of seller lines
(SELLER_LINES) and the number of sellers per line (SELLERS_PER_LINE) as
well as their equivalents for the scanners (SCANNER_LINES and
SCANNERS_PER_LINE). Also, note the distinction between the two possible
queue/seller configurations: although the most prevalent configuration is to
have multiple distinct queues that a visitor will select and stay at until they’re
served, it has also become more mainstream in retail to see multiple sellers
for one single line (e.g., quick checkout lines at general merchandise big box
retailers).

1 BUS_ARRIVAL_MEAN = 3
2 BUS_OCCUPANCY_MEAN = 100
3 BUS_OCCUPANCY_STD = 30
4
5 PURCHASE_RATIO_MEAN = 0.4
6 PURCHASE_GROUP_SIZE_MEAN = 2.25
7 PURCHASE_GROUP_SIZE_STD = 0.50
8
9 TIME_TO_WALK_TO_SELLERS_MEAN = 1
10 TIME_TO_WALK_TO_SELLERS_STD = 0.25
11 TIME_TO_WALK_TO_SCANNERS_MEAN = 0.5
12 TIME_TO_WALK_TO_SCANNERS_STD = 0.1
13
14 SELLER_LINES = 6
15 SELLERS_PER_LINE = 1
16 SELLER_MEAN = 1
17 SELLER_STD = 0.2
18
19 SCANNER_LINES = 4
20 SCANNERS_PER_LINE = 1
21 SCANNER_MEAN = 1 / 20
22 SCANNER_STD = 0.01

parameters.py hosted with ❤ by GitHub view raw

With the configuration complete, let’s start the SimPy process by first
creating an “environment”, all the queues (Resources), and run the
simulation (in this case, until the 60-minute mark).
1 env = simpy.rt.RealtimeEnvironment(factor = 0.1, strict = False)
2
3 seller_lines = [ simpy.Resource(env, capacity = SELLERS_PER_LINE) for _ in range(SELLER_LINES) ]
4 scanner_lines = [ simpy.Resource(env, capacity = SCANNERS_PER_LINE) for _ in range(SCANNER_LINES)
5
6 env.process(bus_arrival(env, seller_lines, scanner_lines))
7
8 env.run(until = 60)
 

setup.py hosted with ❤ by GitHub view raw

Note that we are creating a RealtimeEnvironment which is intended for


running a simulation in near real-time, particularly for our intentions of
visualizing this as it runs. With the environment set up, we generate our
seller and scanner line resources (queues) that we will then in turn pass to
our “master event” of the bus arriving. The env.process() command will
begin the process as described in the bus_arrival() function depicted below.
This function is the top-level event from which all other events are
dispatched. It simulates a bus arriving every BUS_ARRIVAL_MEAN minutes
with BUS_OCCUPANCY_MEAN people on board and then triggers the selling
and scanning processes accordingly.

1 def bus_arrival(env, seller_lines, scanner_lines):


2 # Note that these unique IDs for busses and people are not required, but are included for eve
3 next_bus_id = 0
4 next_person_id = 0
5 while True:
6 next_bus = random.expovariate(1 / BUS_ARRIVAL_MEAN)
7 on_board = int(random.gauss(BUS_OCCUPANCY_MEAN, BUS_OCCUPANCY_STD))
8
9 # Wait for the bus
10 yield env.timeout(next_bus)
11
12 people_ids = list(range(next_person_id, next_person_id + on_board))
13 next_person_id += on_board
14 next_bus_id += 1
15
16 while len(people_ids) > 0:
17 remaining = len(people_ids)
18 group_size = min(round(random.gauss(PURCHASE_GROUP_SIZE_MEAN, PURCHASE_GROUP_SIZE_STD
19 people_processed = people_ids[-group_size:] # Grab the last `group_size` elements
20 people_ids = people_ids[:-group_size] # Reset people_ids to only those remaining
21
22 # Randomly determine if this group is going to the sellers or straight to the scanner
23 if random.random() > PURCHASE_RATIO_MEAN:
24 env.process(scanning_customer(env, people_processed, scanner_lines, TIME_TO_WALK_
25 else:
26 env.process(purchasing_customer(env, people_processed, seller_lines, scanner_line
 

bus_arrival.py hosted with ❤ by GitHub view raw

Since this is the top-level event function, we see that all the work in this
function is taking place within an endless while loop. Within the loop, we are
“yielding” our wait time with env.timeout(). SimPy makes extensive use of
generator functions which will return an iterator of the yielded values. More
information on Python generators can be found in [10].
At the end of the loop, we are dispatching one of two events depending on
whether we’re going directly to the scanners or if we’ve randomly decided
that this group needs to purchase tickets first. Note that we are not yielding
to these processes as that would instruct SimPy to complete each of these
operations in sequence; instead, all those visitors exiting the bus will be
proceeding to the queues concurrently.

Note that the people_ids list is being used is so that each person is assigned a
unique ID for visualization purposes. We are using the people_ids list as a
queue of people remaining to be processed; as visitors are dispatched to
their destinations, they are removed from the people_ids queue.

The purchasing_customer() function simulates three key events: walking to


the line, waiting in line, and then passing control to the
scanning_customer() event (the same function that is called by bus_arrival()
for those bypassing the sellers and going straight to the scanners). This
function picks its line based on what is shortest at the time of selection.

1 def purchasing_customer(env, people_processed, seller_lines, scanner_lines):


2 # Walk to the seller
3 yield env.timeout(random.gauss(TIME_TO_WALK_TO_SELLERS_MEAN, TIME_TO_WALK_TO_SELLERS_STD))
4
5 seller_line = pick_shortest(seller_lines)
6 with seller_line[0].request() as req:
7 yield req # Wait in line
8
9 yield env.timeout(random.gauss(SELLER_MEAN, SELLER_STD)) # Buy their tickets
10
11 env.process(scanning_customer(env, people_processed, scanner_lines, TIME_TO_WALK_TO_SCANN
 

purchasing_customer.py hosted with ❤ by GitHub view raw

Finally, we need to implement the behaviour for the scanning_customer().


This is very similar to the purchasing_customer() function with one key
difference: although visitors may arrive and walk together in groups, each
person must have their ticket scanned individually. Consequently, you will
see the scan timeout repeated for each scanned customer.

1 def scanning_customer(env, people_processed, scanner_lines, walk_duration, walk_std):


2 # Walk to the seller
3 yield env.timeout(random.gauss(walk_duration, walk_std))
4
5 # We assume that the visitor will always pick the shortest line
6 scanner_line = pick_shortest(scanner_lines)
7 with scanner_line[0].request() as req:
8 yield req # Wait in line
9
10 # Scan each person's tickets
11 for person in people_processed:
12 yield env.timeout(random.gauss(SCANNER_MEAN, SCANNER_STD)) # Scan their ticket

scanning_customer.py hosted with ❤ by GitHub view raw


We pass the walk duration and standard deviation to the
scanning_customer() function since those values will vary depending on
whether the visitors walked directly to the scanners or if they stopped at the
sellers first.

Visualizing the Data using Tkinter (Native Python UI)


In order to visualize the data, we added a few global lists and dictionaries to
track key metrics. For example, the arrivals dictionary tracks the number of
arrivals by minute and the seller_waits and scan_waits dictionaries map the
minute of the simulation to a list of waits times for those exiting the queues
in those minutes. There is also an event_log list that we will use in the
HTML5 Canvas animation in the next section. As key events take place (e.g.,
a visitor exiting a queue), the functions under the ANALYTICAL_GLOBALS
heading in simpy example.py file are called to keep these dictionaries and
lists up to date.

We used an ancillary SimPy event to send a tick event to the UI in order to


update a clock, update the current wait averages and redraw the Matplotlib
charts. The complete code can be found in the GitHub repository
(https://github.com/dattivo/gate-
simulation/blob/master/simpy%20example.py); however, the following
snippet provides a skeleton view of how these updates are dispatched from
SimPy.

1 class ClockAndData:
2 def __init__(self, canvas, x1, y1, x2, y2, time):
3 # Draw the initial state of the clock and data on the canvas
4 self.canvas.update()
5
6 def tick(self, time):
7 # Re-draw the the clock and data fields on the canvas. Also update the Matplotlib charts
8
9 # ...
10
11 clock = ClockAndData(canvas, 1100, 320, 1290, 400, 0)
12
13 # ...
14
15 def create_clock(env):
16 while True:
17 yield env.timeout(0.1)
18 clock.tick(env.now)
19
20 # ...
21
22 env.process(create_clock(env))
 

clock.py hosted with ❤ by GitHub view raw

The visualization of the users moving to and from seller and scanner queues
is represented using standard Tkinter logic. We created the QueueGraphics
class to abstract the common parts of the seller and scanner queues.
Methods from this class are coded into the SimPy event functions described
in the previous section to update the canvas (e.g., sellers.add_to_line(1)
where 1 is the seller number, and sellers.remove_from_line(1)). As future
work, we could use an event handler at key points in the process so the
SimPy simulation logic is not tightly coupled to the UI logic specific to this
analysis.

Animating the Data Using HTML5 Canvas


As an alternate visualization, we wanted to export the events from the SimPy
simulation and pull them into a simple HTML5 web application to visualize
the scenario on a 2D canvas. We accomplished this by appending to an
event_log list as SimPy events take place. In particular, the bus arrival, walk
to seller, wait in seller line, buy tickets, walk to scanner, wait in scanner line,
and scan tickets events are each logged as individual dictionaries that are
then exported to JSON at the end of the simulation. You can see some sample
outputs of this here: https://github.com/dattivo/gate-
simulation/tree/master/output

We developed a quick proof-of-concept to show how these events can be


translated into a 2D animation which you can experiment with at
https://dattivo.github.io/gate-simulation/visualize.html. You can see the
source code for the animation logic in https://github.com/dattivo/gate-
simulation/blob/master/visualize.ts.
Image by Author

This visualization benefits from being animated, however, for practical


purposes the Python-based Tkinter interface was quicker to assemble, and
the Matplotlib graphs (which are arguably the most important part of this
simulation) were also smoother and more familiar to set up in Python. That
being said, there is value is seeing the behaviour animated, particularly
when looking to communicate results to non-technical stakeholders.

Animating the Data Using Virtual Reality


Taking the canvas animation one step further, Matheus Ximenes and I
worked together to put together the following AR/VR 3-D visualization using
the same JSON simulation data that the HTML5 canvas is also using. We
implemented this using React [11] which we were already familiar with, and
A-FRAME [12] which was surprisingly accessible and easy-to-learn.

You can test the simulation yourself at: https://www.dattivo.com/3d-


visualization/
3D Visualization of a Simulated Real-Life Event in Virtual Reality

Analyzing the Seller/Scanner Queue Configuration Alternatives


Although this example has been put together to demonstrate how a SimPy
simulation can be created and visualized, we can still show a few examples
to show how the average wait times depend on the configuration of the
queues.

Let’s begin with the case demonstrated in the animations above: six sellers
and four scanners with one seller and scanner per line (6/4). After 60
minutes, we see the average seller wait was 1.8 minutes and the average
scanner wait was 0.1 minutes. From the chart below, we see that the seller
time peaks at almost a 6-minute wait.

We can see that the sellers are consistently backed up (although 3.3 minutes
may not be too unreasonable); so, let’s see what happens if we add an extra
four sellers bumping the total up to 10.

As expected, the average seller wait is reduced to 0.7 minutes and the
maximum wait is reduced to be just over three minutes.

Now, let’s say that by reducing the price of online tickets, we’re able to boost
the number of people arriving with a ticket by 35%. Initially, we assumed
that 40% of all visitors need to buy a ticket, 40% have pre-purchased online,
and 20% are staff and vendors entering with credentials. Therefore, with
35% more people arriving with tickets, we reduce the number of people
needing to purchase down to 26%. Let’s simulate this with our initial 6/4
configuration.

In this scenario, the average seller wait is reduced to 1.0 minutes with a
maximum wait of just over 4-minutes. In this circumstance, increasing
online sales by 35% had a similar effect to adding more seller queues to the
average wait; if waiting time is the metric that we were most interested in
reducing, then at that point we could consider which of these two options
would have a stronger business case.

Conclusions and Future Work


The breadth of mathematical and analytical tools available for Python is
formidable, and SimPy rounds out these capabilities to include discrete
event simulations as well. Compared to commercially packaged tools such as
SIMUL8, the Python approach does leave more to programming. Assembling
the simulation logic and building a UI and measurement support from
scratch may be clumsy for quick analyses; however, it does provide a lot of
flexibility and should be relatively straightforward for anyone already
familiar with Python. As demonstrated above, the DES logic provided by
SimPy results in clean, easy-to-read code.

As mentioned, the Tkinter visualization is the most straightfoward of the


three demonstrated methods to work with, in particular with Matplotlib
support included. The HTML5 canvas and AR/VR approaches have been
handy for putting together a sharable and interactive visualization; however,
their development was non-trivial.

One improvement that would be important to consider when comparing


queue configurations is the seller/scanner utilization. Reducing the time in

Open in app
the queues is only one component of the analysis as the percentage of the Sign up Sign in
time that the sellers and scanners are sitting idle should also be considered
Search
in arriving at the most optimal solution. Additionally, it would also be Write

interesting to add a probability that accounts for someone choosing not to


enter if they see a queue that is too long.

References
[1] https://www.simul8.com/

[2] https://www.mathworks.com/solutions/discrete-event-simulation.html

[3] https://en.wikipedia.org/wiki/Discrete-event_simulation

[4] https://www.simul8.com/videos/

[5] https://www.mathworks.com/videos/series/understanding-discrete-event-
simulation.html

[6] https://www.mathworks.com/company/newsletters/articles/optimizing-
automotive-manufacturing-processes-with-discrete-event-simulation.html

[7] https://simpy.readthedocs.io/en/latest/

[8] https://matplotlib.org/

[9] https://docs.python.org/3/library/tkinter.html

[10] https://wiki.python.org/moin/Generators

[11] https://reactjs.org/

[12] https://aframe.io/
(This post has been adapted from two previously published articles at:
https://dattivo.com/simulating-real-life-events-in-python-with-simpy/ and
https://dattivo.com/3d-visualization-of-a-simulated-real-life-event-in-virtual-
reality/)

Python Data Science Discrete Event Simulation A Frame Matplotlib

Written by Kevin Brown Follow

75 Followers · Writer for Towards Data Science

I am a Principal Software Engineer specialized in systems design/architecture and data


engineering. https://www.linkedin.com/in/kevinpbrown/

More from Kevin Brown and Towards Data Science

Kevin Brown in Javarevisited Dave Melillo in Towards Data Science

Comparing Performance and Building a Data Platform in 2024


Developer Experience of Migratin… How to build a modern, scalable data platform
A practical example of the gains and pitfalls in to power your analytics and data science…
migrating existing Sprint Boot logic to Vert.x

9 min read · Oct 25, 2021 9 min read · Feb 6, 2024

100 1 1.7K 26
Cristian Leo in Towards Data Science Kevin Brown in Management Matters

The Math behind Adam Optimizer The Effects of Non-Technical


Why is Adam the most popular optimizer in Systems on Software Projects
Deep Learning? Let’s understand it by diving… A look at how John Gall’s 1975 Systemantics
still applies to software today

16 min read · Jan 31, 2024 8 min read · Jun 15, 2021

2.5K 20 21

See all from Kevin Brown See all from Towards Data Science

Recommended from Medium

BeWare Artturi Jalli

Optimizing Warehouse Efficiency: I Built an App in 6 Hours that Makes


The Storage Location Assignmen… $1,500/Mo
This article is part of a series about optimizing Copy my strategy!
the placement of items in a warehouse with…

5 min read · Oct 9, 2023 · 3 min read · Jan 23, 2024

66 1 12.7K 151

Lists

Predictive Modeling w/ Coding & Development


Python 11 stories · 490 saves
20 stories · 983 saves

Practical Guides to Machine ChatGPT prompts


Learning 45 stories · 1228 saves
10 stories · 1172 saves
Dave Melillo in Towards Data Science Anmol Tomar in CodeX

Building a Data Platform in 2024 Say Goodbye to Loops in Python,


How to build a modern, scalable data platform and Welcome Vectorization!
to power your analytics and data science… Use Vectorization — a super-fast alternative to
loops in Python

9 min read · Feb 6, 2024 · 5 min read · Dec 28, 2023

1.7K 26 4.6K 55

Diogo Ribeiro Daniel Wu

Enhancing Manufacturing Elevate Your Python Data


Excellence with Python Visualization Skills: A Deep Dive…
A Guide to Predictive Maintenance, Data visualization is a crucial aspect of data
Automation, and Quality Control analysis and exploration. It helps in gaining…

· 17 min read · Feb 7, 2024 8 min read · Nov 22, 2023

46 76 1

See more recommendations

You might also like