Colloquium 2018IMG066
Colloquium 2018IMG066
Colloquium 2018IMG066
Colloquium
A project report,
submitted in complete fulfilment of the requirements for B. Tech project
Submitted By:
2
INTERNSHIP CERTIFICATE
3
ABSTRACT
Monitoring of systems to monitor functioning of systems on the basis of various
multiple metrics is crucial for any system providing service/product to their
customer. Monitoring enables the system maintainer to predict failure and identify
bottlenecks in the system and solve them accordingly. There are various Monitoring
tools already present in the market. BenchRoutes tries to focus on the Monitoring of
API endpoints and provide multiple features for doing so, which makes it unique
from other monitoring software.
The BenchRoutes has already released its version 1 and has proved to be a great tool
for monitoring endpoints by giving information about API jitter, ping, response
Delay, System Monitoring like ram usage, gpu usage etc. But there were also few
challenges to overcome, scalability was a crucial one. For scaling the current
architecture we have to redesign the architecture to reduce resource consumption,
add different scrape intervals for different endpoints, update Querier and API
integration. During the span of this project I, with my Mentor, worked on these
issues and successfully solved each of the challenges.
4
ACKNOWLEDGEMENT
I am highly indebted to Mr. Harkishen Singh(Mentor) and am obliged to give me the
autonomy of functioning and experimenting with ideas. I would like to take this
opportunity to express my profound gratitude to him for his guidance and his
personal interest in my project and constant support coupled with
confidence-boosting and motivating sessions that proved very fruitful and were
instrumental in infusing self-assurance and trust within me. The nurturing and
blossoming of the present work is mainly due to his valuable guidance, suggestions,
astute judgement, constructive criticism, and an eye for perfection. My mentor
always answered a myriad of my doubts with smiling graciousness and prodigious
patience. He never let me feel that I am a novice by always lending an ear to my
views, appreciating and improving them, and by giving me a free hand in my project.
It is only because of their overwhelming interest and helpful attitude; the present
work has attained its stage. Finally, I am grateful to my Institution and colleagues
whose constant encouragement served to renew my spirit, refocus my attention and
energy, and carry out this work.
5
TABLE OF CONTENTS
1. INTRODUCTION……………………………………………………………………………………………………07
2. LITERATURE REVIEW…………………………………………………………………………………………08
2.1. Monitoring…………………………………………………………………………………………………….08
2.2. Concurrency………………………………………………………………………………………………….09
3. PROJECT OBJECTIVES…………………………………………………………………………………………12
3.1. Revamping Configurations Of BenchRoutes………………………………………….12
3.2. Scheduling Jobs for scraping endpoint metrics……………………………………..12
3.3. Adding querier and API integration………………………………………………………..13
4. METHODOLOGY……………………………………………………………………………………………………14
4.1. Configuration……………………………………………………………………………………………….14
4.2. Job…………………………………………………………………………………………………………………..15
4.3. Module…………………………………………………………………………………………………………..17
4.4. Architectural Diagram………………………………………………………………………………..20
5. RESULT………………………………………………………………………………………………………………………21
6. CONCLUSION…………………………………………………………………………………………………………22
7. REFERENCES…………………………………………………………………………………………………………..23
6
1. INTRODUCTION
Modern web applications can have routes ranging from a few to millions in
numbers. This makes it tough to discover the condition and state of such
application at any given point. Bench-routes monitors the routes of a web
application and helps you know about the current state of each route, along
with various related performance metrics.
7
2. LITERATURE REVIEW
Benchroutes domain lies in the field of Monitoring. Importance of this domain is
exponentially increasing with the ever increasing complexities of system that
provides products that we use on a daily basis, eg. Youtube, Google, Zomato, Netflix,
Amazon, Flipkart etc. These are just a handful of examples that we are using on a
daily basis. In this world of Digitalisation, there are systems powering each product
that we use. With a system comes multiple possibilities of failures, and to monitor all
the working of a system lies in the Domain of Monitoring. Let us learn more about
Monitoring.
2.1. Monitoring
Monitoring entails overseeing the entire development process from planning,
development, integration and testing, deployment, and operations. It involves a
complete and real-time view of the status of applications, services, and infrastructure
in the production environment. Features such as real-time streaming, historical
replay, and visualisations are critical components of application and service
monitoring.
Moving to the operations side of the life cycle, the site reliability engineer needs to
understand the services that can be measured and monitored, so if there's a problem,
it can be fixed. If you don’t have a DevOps toolchain that ties all these processes
together, you have a messy, uncorrelated, chaotic environment. If you have a
well-integrated toolchain, you can get better context into what is going on.
Key Capabilities
1. Shift-left testing : Shift-left testing that is performed earlier in the life cycle
helps to increase quality, shorten test cycles, and reduce errors. For DevOps
teams, it is important to extend shift-left testing practices to monitor the
health of pre-production environments. This ensures that monitoring is
implemented early and often, in order to maintain continuity through
8
production and the quality of monitoring alerts are preserved. Testing and
monitoring should work together, with early monitoring helping to assess the
behaviour of the application through key user journeys and transactions. This
also helps to identify performance and availability deviations before
production deployment.
2.2. Concurrency
This concept will be used a lot while designing the revamped architecture of the
system. Concurrency is the execution of multiple instruction sequences at the same
time. It happens in the operating system when there are several process threads
9
running in parallel. The running process threads always communicate with each
other through shared memory or message passing. Concurrency results in sharing of
resources resulting in problems like deadlocks and resource starvation.
Principles of Concurrency :
Both interleaved and overlapped processes can be viewed as examples of concurrent
processes, they both present the same problems.
The relative speed of execution cannot be predicted. It depends on the following:
● The activities of other processes
● The way operating system handles interrupts
● The scheduling policies of the operating system
Problems in Concurrency :
● Sharing global resources –
Sharing of global resources safely is difficult. If two processes both make use
of a global variable and both perform read and write on that variable, then the
order in which various reads and writes are executed is critical.
● Optimal allocation of resources –
It is difficult for the operating system to manage the allocation of resources
optimally.
● Locating programming errors –
It is very difficult to locate a programming error because reports are usually
not reproducible.
● Locking the channel –
It may be inefficient for the operating system to simply lock the channel and
prevent its use by other processes.
Advantages of Concurrency :
● Running of multiple applications –
It enables you to run multiple applications at the same time.
10
● Better resource utilisation –
It enables the resources that are unused by one application can be used for
other applications.
● Better average response time –
Without concurrency, each application has to be run to completion before the
next one can be run.
● Better performance –
It enables better performance by the operating system. When one application
uses only the processor and another application uses only the disk drive then
the time to run both applications concurrently to completion will be shorter
than the time to run each application consecutively.
Drawbacks of Concurrency :
● It is required to protect multiple applications from one another.
● It is required to coordinate multiple applications through additional
mechanisms.
● Additional performance overheads and complexities in operating systems are
required for switching among applications.
● Sometimes running too many applications concurrently leads to severely
degraded performance.
11
3. PROJECT OBJECTIVES
The project had three major objectives which were to be completed over a span of 3
months. The objectives are as follows:
12
3.3. Adding querier and API integration
Since the base architecture of BenchRoute has changed due to the last two ideas, the
earlier APIs need to be changed with new APIs according to the client requirements.
Also, a new TSDB has been integrated with the new Designs, hence querying data
from TSDB should also be changed. Once all these tasks are completed, all different
modules should be integrated to form the final product.
13
4. METHODOLOGY
4.1. Configuration
Bench-routes is configured using local-config.yml. This will be generalised to accept the file
name from a flag.
apis:
- name: route-name-one
every: 3s
protocol: https
domain_or_ip: www.some-url-for-one.com
route: /api/v1/test
method: get
headers:
key_1: value_1
params:
key_1: value_1
key_2: value_2
body:
key_1: value_1
key_2: value_2
- name: route-name-two
every: 5s
protocol: https
domain_or_ip: www.some-url-for-two.com
route: /api/v1/test
method: post
params:
key_1: value_1
key_2: value_2
body:
key_1: value_1
key_2: value_2
To frame the above YAML in words, the root will be an api type. api contains an array of
route where each route has name (string), every (duration), domain_or_ip (string), route
(string), method (string), params (map[string]string), body (map[string]string).
For the ping interval, the interval will be the lowest of all the scrape_intervals corresponding
to the domain. This is to make things simple for now.
14
4.2. Job
A job is a basic unit (low-level type) in bench-routes that contains information about a route.
There will be two types of jobs:
1. monitoringJob
2. machineJob
monitoringJob is for monitoring of API endpoints, for calculations like response time,
response length. machineJob is for performing ping and extracting out the ping and jitter
information from the response.
Monitoring job
Let's discuss monitoringJob first.
Each monitoringJob represents a route only from the api, thereby having a one-to-one
relation between job and a route.
monitoringJob type (and other types related to job) will be contained in a package job, which
will be inside the lib package. It will contain two types: jobInfo and monitoringJob.
monitoringJob inherits the jobInfo type.
15
*http.Request and *http.Client which will be created when the job is created (in the
constructor). This will be stored in the request field of the job.
Storing a request and client in the struct will save us from repeated allocations and save CPU
since we will directly use this request and client as client.Do(request) and get the response.
Earlier, we used to parse URLs in each interval which is not the case now.
Machine Job
On the similar lines, there will be another job, machineJob that will also inherit the jobInfo
type. It will be exactly as the monitoringJob, having the same functions, and implements the
Executable interface. It will have a constructor newMachineJob(app appendable, name,
domainOrUrl string, every time.Duration) and returns a &machineJob{}. The only
difference is that machineJob will have the struct as
16
Each machineJob will have a one-to-one relation with the number of unique domain_or_ip
in the configuration file.
In the Execute() function of machineJob, it will call the Run() function of *ping.Pinger type
instead of the client.Do(request) in the monitoringJob’s Execute. For more info on how
go-ping can be used, see https://github.com/go-ping/ping.
Finally, there will be a NewJob(typ, app, name, url) function that returns an Executable.
This function will be called by other modules and based on the typ’s value, the function will
decide if it’s a machineJob or a monitoringJob and accordingly call their constructors.
4.3. Module
A module is a high-level unit in bench-routes, that calls or manages multiple low-level units
(like jobs).
Monitor module is responsible for managing monitorJob type goroutines. This is for
api/route monitoring and manages (low-level) modules that do the calculation of response
time and length.
17
Machine
Machine type will launch the machineJob. These jobs will be launched after creating their
job based on the route received from the configuration. Once you get a route, create a job
using the factory and passing the type as machine and relevant details, and create a channel
for this. After creating a channel, call the Execute(<- signal) of job (as it implements the
Executable interface) and pass the created channel in this Execute function.
Store the channel created in a map[jobInfo]chan <- struct, since this channel will be used to
signal the running goroutines in Execute() to do the ping operation. You can get the jobInfo
from the created job.Info() as job implements the Info().
The Run of Machine will have a goroutine that listens to the reload channel. If the reload
channel closes, the goroutine exits, meaning it marks shutting down the machine.
Functions:
1. Run(): runs the Machine goroutine and starts listening to the reload channel. It calls
the run of the scheduler and passes a context after creating it.
2. Reload(configuration): receives the configuration and based on that, prepares a map
of map[jobInfo]chan <- struct{} and takes the mux.Lock(). After taking the lock, we
replace the jobs of Machine type with this new map and then release the lock via
mux.Unlock() and pass a signal to the reload channel, signalling the Run to restart the
process.
3. Stop(): Stop closes the reload channel which signals the Run() to stop its operations,
return and exit.
18
Now let's see what a scheduler is.
Scheduler is responsible for sending signals to the sig channel of the job. This triggers the
job to do its job which can be a ping operation or a HTTP request.
Scheduler works by going through all the jobInfo (key of the timeline map) in the timeline
and sees if the difference between the current time and the last_execute of that jobInfo is
greater than or equal to every of that jobInfo. If this is true, it sends the signal to the
corresponding channel (which is the value of the key jobInfo, of the timeline). This process
happens every scanFrequency which we will set as 1 second always.
19
Monitor
The Monitor module is exactly the same as the Machine. The difference is just that it will
handle the monitorJob instead of machineJob in the Machine module case. Everything else
remains the same, including the scheduler.
20
5. RESULT
I have made a new parser package that parses the data from config.yml according to
the new design. You can have a look at the new design from the Configuration
section. I have written tests considering most of the edge cases. The parser also
validates the API config according to the design.
After discussion with the mentor, we came up with a new composite-pattern design.
Now each endpoint is mapped to individual Jobs. The jobs can be termed as
iteratively running go-routines that are launched concurrently once the server is
started. These jobs are then scheduled using a scheduler which keeps track of the
last execution time and schedules accordingly. Also, there is high-level module
abstraction that controls the low-level modules. The calculation of jitter metrics is
also done using the ping metric.
For this implementation, I have built multiple packages: job, module, scheduler and
evaluate. Each package has unit tests and also integration tests.
I have made a new package querier which does the querying from the TSDB. It
searches the range endpoints in the most optimised complexity of log(n) using binary
search. It returns the response with multiple details about the query like
evaluation-time, query-type, values which is an array of values of ping/jitter/monitor
according to the query type etc. I have also written tests for the package. Apart from
this, I have made a new api.go file which is integrated with the querier and contains
all the APIs required by the client-side with proper error handling. Refer to the
commit link for actual code.
Commit Links
1. https://github.com/bench-routes/bench-routes/commit/2c4998c174527ff39fc236
2746354c42a9111439
2. https://github.com/bench-routes/bench-routes/commit/43e68363f8c9e2ff134e9e
f4a4c44d2df82cc8ab
3. https://github.com/bench-routes/bench-routes/commit/f8ed0155ca38228ce57d1
12dac636e1eaa130c1a
21
6. CONCLUSION
I had an exuberant amount of confidence and knowledge after completion of the 12
weeks internship. The objectives that were initially made before the internship
period were finally completed and the revamped architecture solved all the issues
that arose in the earlier version. The concurrency of the system was finally improved
with a scheduler scheduling the independent go-routines which saved a lot of system
resources. The new architecture also addressed the issue of independent scrape
intervals for each API endpoint. Also the updated TSDB can use a new Querier for
efficient querying using Binary Search.
This experience also added secondary skills like adaptability, where I have to adapt
to the persisting code quickly and find the bottlenecks in the system. Agile
methodology was followed during the internship period where work was done in
Sprints with three meetings per week scheduled with my mentor. Planning tasks
beforehand and executing them within the deadline taught me working under a
limited time frame. Finally after all the meetings and hours of learning and writing
code the objectives were finally completed.
22
7. REFERENCES
[1] Lion, J. D.(2010-10-12) .”A Tour of Go”. https://go.dev/tour/list
[2] Gamma, E.; Helm, R.; Johnson, R.;.(1994-10-31). Design Patterns: Elements of
[3] Kumar, G. and Bhatia, P.: 2012, Impact of agile methodology on software
https://morsmachine.dk/go-scheduler.
[5] K. Lemons, ”Go: A New Language for a New Year,” 06 01 2012. [Online].
Available:
http://kylelemons.net/blog/2012/01/06-go-new-languagenew-year.article#TOC_1.1..
https://www.oreilly.com/library/view/rest-api-design/9781449317904/
23