Node - Js High Performance - Sample Chapter
Node - Js High Performance - Sample Chapter
ee
pl
C o m m u n i t y
P U B L I S H I N G
D i s t i l l e d
$ 29.99 US
19.99 UK
E x p e r i e n c e
Diogo Resende
Sa
m
Diogo Resende
he works on. He loves everything about the Internet of Things, which is the ability to
connect everything together and always be connected to the world.
He studied computer science and graduated in engineering. At that time, he deepened
his knowledge of computer networking and security, software development, and cloud
computing. Over the past 10 years, Diogo has embraced different challenges to develop
applications and services to connect people with embedded devices around the world,
building a bridge between old and uncommon protocols and the Internet of today.
ThinkDigital has been his employer and a major part of his life for the last few years.
It offers services and expertise in areas such as computer networking and security,
automation, smart metering, and fleet management and intelligence. Diogo has also
published many open source projects. You can find them all, with an MIT license
style, on his personal GitHub page under the username dresende.
Preface
High performance on a platform such as Node.js means knowing how to take
advantage of every aspect of your hardware and helping memory management
act at its best and correctly decide how to architect a complex application. Do not
panic if your application starts consuming a lot of memory. Instead, spot the leak
and solve it fast. Better yet, monitor and stop it before it becomes an issue.
Preface
Chapter 4, CPU Profiling, is about profiling the processor and understanding when
and why your application hogs your host. In this chapter, you understand the
limits of the language and how to develop applications that can be divided into
several components running across different hosts, allowing better performance
and scalability.
Chapter 5, Data and Cache, explains externally stored application data and how it can
affect your application's performance. It's about data stored locally in the application,
the disk, a local service, a local network service or even the client host. In this chapter,
you get to know that different types of data storage methods have different penalties,
and these must be considered when choosing the best one. You learn that data can
be stored locally or remotely and access to the data can beand should becached
sometimes, depending on the importance of the data.
Chapter 6, Test, Benchmark, and Analyze, is about testing and benchmarking applications.
It's also about enforcing code coverage to avoid unknown application test zones. Then
we cover benchmarks and benchmark analytics. You get to understand how good tests
can pinpoint where to benchmark and analyze specific parts of the application to allow
performance improvements.
Chapter 7, Bottlenecks, covers limits outside the application. This chapter is about the
situations when you realize that the performance limit is not because of the application
programing but external factors, such as the host hardware, network or client. You'll
become aware of the limits that external components can impose on the application,
locally or remotely. Moreover, the chapter explains that sometimes, the limits are on
the client side and nothing can be done to improve the current performance.
[1]
Performance analysis
Performance is the amount of work completed in a defined period of time and with a
set of defined resources. It can be analyzed using one or more metrics that depend on
the performance goal. The goal can be low latency, low memory footprint, reduced
processor usage, or even reduced power consumption.
The act of performance analysis is also called profiling. Profiling is very important
for making optimized applications and is achieved by instrumenting either the
source or the instance of the application. By instrumenting the source, developers
can spot common performance weak spots. By instrumenting an application
instance, they can test the application on different environments. This type of
instrumentation can also be known by the name benchmarking.
Node.js is known for being fast. Actually, it's not that fast; it's just as fast as your
resources allow it. What Node.js is best at is not blocking your application because
of an I/O task. The perception of performance can be misleading in Node.js
applications. In some other languages, when an application task gets blockedfor
example, by a disk operationall other tasks can be affected. In the case of Node.js,
this doesn't happenusually.
Some people look at the platform as being single threaded, which isn't true.
Your code runs on a thread, but there are a few more threads responsible for I/O
operations. Since these operations are extremely slow compared to the processor's
performance, they run on a separate thread and signal the platform when they have
information for your application. Applications blocking I/O operations perform
poorly. Since Node.js doesn't block I/O unless you want it to, other operations can
be performed while waiting for I/O. This greatly improves performance.
[2]
Chapter 1
V8 is an open source Google project and is the JavaScript engine behind Node.js.
It's responsible for compiling and executing JavaScript, as well as managing your
application's memory needs. It is designed with performance in mind. V8 follows
several design principles to improve language performance. The engine has a
profiler and one of the best and fast garbage collectors that exist, which is one of
the keys to its performance. It also does not compile the language into byte code;
it compiles it directly into machine code on the first execution.
A good background in the development environment will greatly increase the chances
of success in developing high-performance applications. It's very important to know
how dereferencing works, or why your variables should avoid switching types. Here
are other useful tips you would want to follow. You can use a style guide like JSCS
and a linter like JSHint to enforce them to for yourself and your team. Here are some
of them:
Avoid cloning objects beacause big objects will slow the operations
Monitoring
After an application is put into production mode, performance analysis becomes
even more important, as users will be more demanding than you were. Users don't
accept anything that takes more than a second, and monitoring the application's
behavior over time and over some specific loads will be extremely important, as it
will point to you where your platform is failing or will fail next.
Yes, your application may fail, and the best you can do is be prepared. Create a backup
plan, have fallback hardware, and create service probes. Essentially, anticipate all the
scenarios you can think of, and remember that your application will still fail. Here are
some of those scenarios and aspects that you should monitor:
[3]
Don't forget the rest of the infrastructure. If your application must perform
at high standards, your infrastructure should too. Your server power supply
should be uninterruptible and stable, as instability will degrade your
hardware faster than it should.
Choose your disks wisely, as faster disks are more expensive and usually
come in smaller storage sizes. Sometimes, however, this is actually not a bad
decision when your application doesn't need that much storage and speed
is considered more important. But don't just look at the gigabytes per dollar.
Sometimes, it's more important to look at the gigabits per second per dollar.
Also, your server temperature and server room should be monitored. High
temperatures degrades performance and your hardware has an operation
temperature limit. Security, both physical and virtual, is also very important.
Everything counts for the standards of high performance, as an application
that stops serving its users is not performing at all.
[4]
Chapter 1
Most common applications will start performing worse over time, not because of
deficit of processing power but because of increasing data size on databases and
disks. You'll notice that the importance of memory increases and fallback disks
become critical to avoiding downtime. It's very important that an application be
able to scale horizontally, whether to shard data across servers or across regions.
A distributed architecture also increases performance. Geographically distributed
servers can be more closed to clients and give a perception of performance. Also,
databases distributed by more servers will handle more traffic as a whole and allow
DevOps to accomplish zero downtime goals. This is also very useful for maintenance,
as nodes can be brought down for support without affecting the application.
[5]
Spike testing is used when a load is increased very fast to see how the
application reacts and performs. This test is very useful and important in
applications that can have spike usages, and operators need to know how the
application will react. Twitter is a good example of an application environment
that can be affected by usage spikes (in world events such as sports or religious
dates), and need to know how the infrastructure will handle them.
All of these tests can become harder as your application grows. Since your user base
gets bigger, your application scales and you lose the ability to be able to load test
with the resources you have. It's good to be prepared for this moment, especially
to be prepared to monitor performance and keep track of soaks and spikes as your
application users start to be the ones responsible for continuously test load.
Composition in applications
Because of this continuous demand of performant applications, composition
becomes very important. Composition is a practice where you split the application
into several smaller and simpler parts, making them easier to understand, develop,
and maintain. It also makes them easier to test and improve.
Avoid creating big, monolithic code bases. They don't work well when you need to
make a change, and they also don't work well if you need to test and analyze any
part of the code to improve it and make it perform better.
The Node.js platform helps youand in some ways, forces you tocompose your
code. Node.js Package Manager (NPM) is a great module publishing service. You
can download other people's modules and publish your own as well. There are tens
of thousands of modules published, which means that you don't have to reinvent
the wheel in most cases. This is good since you can avoid wasting time on creating
a module and use a module that is already in production and used by many people,
which normally means that bugs will be tracked faster and improvements will be
delivered even faster.
The Node.js platform allows developers to easily separate code. You don't have to
do this, as the platform doesn't force you to, but you should try and follow some
good practices, such as the ones described in the following sections.
[6]
Chapter 1
Using NPM
Don't rewrite code unless you need to. Take your time to try some available modules,
and choose the one that is right for you. This reduces the probability of writing faulty
code and helps published modules that have a bigger user base. Bugs will be spotted
earlier, and more people in different environments will test fixes. Moreover, you will
be using a more resilient module.
One important and neglected task after starting to use some modules is to track
changes and, whenever possible, keep using recent stable versions. If a dependency
module has not been updated for a year, you can spot a problem later, but you will
have a hard time figuring out what changed between two versions that are a year
apart. Node.js modules tend to be improved over time and API changes are not rare.
Always upgrade with caution and don't forget to test.
Another good rule to check whether you have a file bigger than it should be; that is,
it should be easy to read and understand in less than 5 minutes by someone new to
the application. If not, it means that it's too complex and it will be harder to track and
fix bugs later on.
Remember that later on, when your application becomes huge, you
will be like a new developer when opening a file to fix something.
You can't remember all of the code of the application, and you need
to absorb a file behavior fast.
[7]
This module has a lot of methods similar to the ones you find in the array object,
such as map, reduce, filter, and each, but for iterating asynchronously. This is
extremely useful when your application gets more complex and some user actions
require some serialized tasks. Error handling is also done correctly and the execution
stop is done as expected. The module helps run serial or parallel tasks.
Also, serial tasks that would usually enforce a developer to nest calls and enter
the callback hell can simply be avoided. This is especially useful when, for example,
you need to perform a transaction on a database with several queries involved.
Another common mistake when writing asynchronous code is throwing errors.
Callbacks are called outside the scope where they are defined, and so you cannot
just put the callback inside a try/catch block. Therefore, avoid doing this unless
it's a very critical error that should make your application stop and quit. In Node.js,
throwing an exception without catching it will trigger an uncaughtException event.
The platform has a rule that is consensual for most developersthe so-called errorfirst callback style. This rule is of extreme importance, since it allows an easier reuse
of your code. Even if you have a function where there's no chance of throwing an
error, or when you just don't want it to throw and use some kind of error handling
inside the function, your callback should always reserve the first argument for an
error event if it's always null. This will allow your function to be used with an async
module. Also, other developers will be counting on this style when debugging, so
always reverse the first argument as an error object.
Plus, you should always reserve the last argument of the function as the callback.
Never define arguments after your callback:
function mySuperFunction(arg1, ..., argN, next) {
// do some voodoo
[8]
Chapter 1
return next(null, my_result); // 1st argument reserved for
error
}
2. Don't nest your conditions, and return as early as possible. If you have a
condition that must return something in a function and if you return, you
don't have to use the else statement. You also avoid a new indent level,
reducing your code and simplifying its revision. If you don't do this, you
will end up in a condition hell, with several levels if you have two or more
conditions to satisfy:
// do this
if (someCondition) {
return false;
}
return someThing;
// instead of this:
if (someCondition) {
return false;
} else {
return someThing;
}
3. Create small and simple functions. Don't span your functions for more lines
than your screen can handle. Even if your task cannot be reused, split the
function into smaller ones. It is even better to put it into a new module and
publish it. In this way, you can reuse them at the frontend if you need them.
This can also allow the engine to optimize some smaller functions when it is
unable to optimize the previous big function. Again, this is important if you
don't want a developer to be reading your application code for a week or two
before being able to touch anything.
[ 10 ]
Chapter 1
After you start adding one or two tests, more will follow. One big advantage of
testing your module from the beginning is that when you spot a bug, you can make
a test case for it, to be able to reproduce it and avoid it in the future.
Code coverage is not crucial but can help you see how your tests cover your module
code base, and if you're just testing a small part. There are some coverage modules,
such as istanbul or jscoverage; choose the one that works best for you. Code
coverage is done together with testing, so if you don't test it, you won't be able to see
the coverage.
As you might want to improve the performance of an application, every dependency
module should be looked at for improvements. This can be done only if you test
them. Dependency version management is of great importance, and it can be hard to
keep track of new versions and changes, but they might give you some good news.
Sometimes, modules are refactored and performance is boosted. A good example of
this is database access modules.
Summary
Together, Node.js and NPM make a very good platform for developing
high-performance applications. Since the language behind them is JavaScript
and most applications these days are web applications, these combinations make
it an even more appealing choice, as it's one less server-side language to learn
(such as PHP or Ruby) and can ultimately allow a developer to share code on the
client and server sides. Also, frontend and backend developers can share, read, and
improve each other's code. Many developers pick this formula and bring with them
many of their habits from the client side. Some of these habits are not applicable
because on the server side, asynchronous tasks must rule as there are many clients
connected (as opposed to one) and performance becomes crucial.
In the next chapter, we will cover some development patterns that help applications
stay simple, fast, and scalable as more clients come along and start putting pressure
on your infrastructure.
[ 11 ]
www.PacktPub.com
Stay Connected: