Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

webslap

webslap is our present-day update to Adam Twiss' ApacheBench utility. In a sincere attempt to pay homage to his work, webslap has been designed to literally be a drop-in equivalent of ApacheBench. Aside from multiple URL support (which of course Siege does as well), webslap when used in TLS and/or automatic gzipping webserver environments is the only benchmarking tool that will keep up with modern day common webserver configurations (without having an equally sized test client environment). Key differences show greatly when testing with TLS, cookies and gzip. webslap came into existence due to the lack of any ability to test large-scale web environments with the existing tools.

It goes without saying, but make sure your ulimit -n is of sufficient size beforehand. Some of the common distro's defaults are surprisingly low.

As with all of our products, webslap is bundled with the HeavyThing library itself. NOTE: compiling from source is not required; the compiled binary is included with the library. Present-day features included in webslap that the other web infrastructure testing utilities are sorely missing:

  • TLS 1.0-1.2 session caching
  • gzip content encoding
  • Session cookie handling
  • ETag handling
  • Last-Modified handling
  • First URL - Then... capability

Download

A standalone executable binary of webslap can be downloaded from our Products page. Please note that you will have to chmod +x it after download.

As with all our products, webslap is bundled with the HeavyThing library itself. The download link for our library is in the top right of every page on our site, along with the SHA256 sum of the download itself. If you have downloaded the same version and your SHA256 does not match, it has been modified by parties other than ourselves.

Table of Contents

Use case scenarios

Multi-stage deployment quality assurance

Of the beneficial use-case scenarios for webslap is in accurately benchmarking deployment environments for ongoing quality assurance. In complex web application environments, it is all-too-common for a single developer mistake to go undetected and end up in a production environment with catastrophic results (read: downtime). Load testing a potentially complex trail of web requests to reasonably mimick real user behaviour has historically been difficult or impossible. Because webslap provides cookie handling along with a "first URL - then" capability, it is trivial to setup deployment test scenarios that can catch these mistakes well before they cause downtime.

Webserver configuration validation/testing

Due to the fact that most performance-enhancing HTTP features are only implemented in browsers, many developers rely on single-session browser debug information to validate their webserver configuration and/or changes. For example, to be able to accurately measure timing and bandwidth effects of various gzip compression levels at the webserver level under load (versus a single browser debug session) is nigh on impossible. TLS session caching and its cascading effects are equally difficult to measure, especially so when migrating to new cryptographic settings (bigger keys, different signature algorithms, etc). Thanks to webslap's support for gzip and TLS session caching, as well as ETag and Last-Modified, benchmarking results can be obtained that match real-world much closer than otherwise possible.

Topdown Service Level Monitoring

For complex web serving environments, often times HTTP requests involve more than one node, instance or server. Performance monitoring scripts can be used of course, but by making use of webslap's "first URL - then" feature, combined session cookies and the option to output results as JSON means that you can simulate actual user logins and navigate real paths through your web application. By placing this setup on one or more remote virtual servers and regularly scheduling the webslap sampling, you can get very detailed performance data that is similar if not identical to what your real user traffic also experiences.

Usage and options

$ ./webslap
Usage: webslap [options] [POST:filename:contenttype:]http[s]://hostname[:port]/path[?query][#ref] [...]
Options are:
    -n requests       Number of requests to perform
    -c concurrency    Number of simultaneous channels
    -cpu count        Number of processes to use
    -first URL        Visit URL before commencing tests
    -g filename       Output TSV per-request data
    -json filename    Output JSON results
    -nokeepalive      Disable keep-alive
    -nogz             Disable ungzip/Accept-Encoding: gzip headers
    -nocookies        Disable session cookies
    -notlsresume      Disable TLS session resumption
    -noetag           Disable ETag/If-None-Match
    -nolastmodified   Disable Last-Modified/If-Modified-Since
    -ordered          Visit URL arglist in order instead of randomly
    -noui             Do not fire up a user interface

The primary options are of course the same as ApacheBench such that webslap can indeed be used as a drop-in replacement, noting however that by default all of the common browser features are enabled by default. This means that simply replacing ab with webslap will not necessarily produce the same results without additional configuration options to webslap to make it behave the same as ab. Each option is detailed below:

Option: -n requests

Simply the total number of requests to perform. Some care must be taken when providing multiple URLs to arrive at an otherwise "correct" number however. In its simplest form whereby only one URL is specified, if the concurrency is set to 1, then a single channel will perform precisely this number of requests. If the concurrency is 2 however, then 2 channels will be open, and each channel will do half this number (obviously). If multiple URLs are specified, each channel iterates through its list (possibly in order), but this does not affect the number of requests to perform. For example, if you wanted 500 channels to each request 3 URLs, you would set the number of requests to 1500, concurrency to 500, and provide three URLs to visit. Default value is 1.

Option: -c concurrency

The number of simultaneous TCP connections to utilise, and is also used for "single client" simulation. In the real world of course, each real user might connect anywhere from 1 to 6 open TCP channels per host. To accurately simulate benchmarking, despite the HeavyThing's webclient object supporting browser-style interaction, webslap forces each concurrency channel to be precisely 1:1 TCP-wise. Caution: watch out for SYN flooding detection if you go too crazy with this number (or disable SYN flood detection on your target webservers), because all of the initial concurrent SYNs go out in one fell swoop. Default value is 1.

Option -cpu count

The number of processes to distribute the workload between. Unlike typical multithreaded models, these are truly lock-free child processes. Default value is 2.

Option -first URL

If this option is specified, each concurrent channel will visit this URL first, before commencing the normal URL list tests. This first URL does not count toward the total number of requests to perform. The intent behind this option is to provide simulated login support for web application testing. An example would be to create a custom script in your webserver environment that checks for the lack of a session cookie, picks a random test user, sets the session cookie such that the remainder of the URLs for normal testing are done under potentially different simulated user logins. Default is no first URL visit.

Option: -g filename

Similar to ApacheBench's -g option, this will create a TSV file of each individual request performed. Different fields are placed however, so if you have existing gplot scripts, you'll need to modify them accordingly. Each line consists of URL, ctime, response code, ctime, dtime, ttime, wait. Default is no TSV output.

Option: -json filename

Very useful option for automated benchmarking at regular intervals when combined with javascript charting utilities such as amCharts. This option writes all of its normal text output values in javascript parse friendly JSON format.

Option: -nokeepalive

By default, HTTP/1.1 connections make use of the keep-alive feature. Specifying this option adds the Connection: close header to each request, but does not otherwise alter client behavior. Particularly when testing TLS session caching, this option can be used to benchmark specific webserver cryptographic timings.

Option: -nogz

By default, webslap sends the Accept-Encoding: gzip header along with each request. Specifying this option omits the header, and is useful for comparing baseline timings versus automatic gzip timings in webserver environments.

Option: -nocookies

By default, each individual concurrency channel maintains its own separate cookiejar object and will accept Set-Cookie headers and send Cookie headers in subsequent requests. Specifying this option disables the cookie handling altogether.

Option: -notlsresume

It is surprising [to us] how many relatively high profile web environments are operating without the use of TLS session caching, as its correct usage results in a considerable improvement in secure user experience in addition to reduced webserver loads. By default, webslap supports and actively attempts to reuse TLS sessions. Specifying this option disables TLS session resumption. Especially for larger key sizes and high-load environments, this option can be used to highlight maximum CPU tolerances for webservers.

Option: -noetag

By default, each individual concurrency channel maintains its own separate stringmap of URLs to corresponding ETag headers if the remote webserver(s) sent them, and subsequent requests for the same URL by said channel will include an If-None-Match header. Specify this option to disable ETag awareness per channel.

Option: -nolastmodified

By default, each individual concurrency channel maintains its own separate stringmap of URLs to corresponding Last-Modified headers if the remote webserver(s) sent them, and subsequent requests for the same URL by said channel will include an If-Modified-Since header. Specify this option to disable Last-Modified awareness per channel.

Option: -ordered

By default, each individual concurrency channel separately randomizes the list of URLs specified. Specifying this option causes each concurrency channel to visit the URL list in the order specified on the command line.

Option: -noui

By default, webslap starts with a very basic TUI that has an update frequency of the otherwise-text-only output 10 times per second, which is useful for conducting larger/longer running tests so you can see results as they happen (and interrupt it, etc). By specifying the -noui option, webslap reverts to the nearly identical output form as ApacheBench.

Tool Comparisons

For the tests that follow, an nginx 1.7.7 webserver was configured with the following setup:


worker_processes  4;

http {
    include       mime.types;
    default_type  application/octet-stream;

    sendfile        on;

    keepalive_timeout  65;

    gzip  on;
    gzip_comp_level 6;

server {
        listen       443;
        server_name  2ton.com.au;

        ssl                  on;
	# Our key is 4096 bits
        ssl_certificate      2ton.crt;
        ssl_certificate_key  2ton.key;

        ssl_session_timeout  5m;
        ssl_session_cache shared:SSL:60m;

	# Our DH parameters are also 4096 bits
        ssl_dhparam dhparam.pem;

        ssl_protocols  TLSv1 TLSv1.1 TLSv1.2;
        ssl_ciphers  DHE-RSA-AES256-SHA256:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-DSS-AES256-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!3DES:!MD5:!PSK;
        ssl_prefer_server_ciphers   on;

        add_header Strict-Transport-Security 'max-age=31536000; includeSubDomains';

        ssl_stapling on;
        ssl_stapling_verify off;
        ssl_trusted_certificate gd_bundle-g2-g1.crt;

        resolver 10.0.0.1;

        location / {
            root   html;
            index  index.html index.htm;
        }
        # pass the PHP scripts to FastCGI server listening on unix:/dev/shm/php.sock:
        #
        location ~ \.php$ {
            root           html;
            fastcgi_pass   unix:/dev/shm/php.sock;
            fastcgi_index  index.php;
            fastcgi_param  SCRIPT_FILENAME  $document_root$fastcgi_script_name;
            include        fastcgi_params;
        }
}

}

For our tests, we are connecting to localhost. Our server is an openSUSE machine, which has its ApacheBench binary installed as ab2, and our test.php script is a one-liner PHP that makes a single call to phpinfo();. For brevity, we omitted all but the most important results from the output of each.

Test 1: ApacheBench, 10x10 keepalive

$ time ab2 -n 100 -c 10 -k https://127.0.0.1/test.php
Server Software:        nginx/1.7.7
Server Hostname:        127.0.0.1
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,DHE-RSA-AES256-SHA256,4096,256

Document Path:          /test.php
Document Length:        67880 bytes

Concurrency Level:      10
Time taken for tests:   11.850 seconds
Complete requests:      100
...
Keep-Alive requests:    0
Total transferred:      6809080 bytes
HTML transferred:       6787980 bytes
Requests per second:    8.44 [#/sec] (mean)
Time per request:       1184.995 [ms] (mean)
Time per request:       118.499 [ms] (mean, across all concurrent requests)
Transfer rate:          561.14 [Kbytes/sec] received
...
real	0m11.857s
user	0m10.635s
sys	0m0.015s

The first thing to notice here is that despite my requesting keep-alive, no requests were done that way. Why this is, without diving into the ab code, is unclear. The second thing to take note of is the 10.635s user wallclock, indicating how much CPU effort was consumed by ab during the test.

Test 2: webslap, 10x10 keepalive

$ time ./webslap -cpu 1 -n 100 -c 10 -noui https://127.0.0.1/test.php
Server Software:        nginx/1.7.7
X-Powered-By:           PHP/5.4.20

Concurrency Level:      10
Time taken for tests:   1.367s
Total requests:         100
...
Keep-alive requests:    90
Non-2xx requests:       0
Total transferred:      1,049,111 bytes
Headers transferred:    26,800 bytes
Body transferred:       6,803,280 bytes
Requests per second:    73.15 [#/sec] (mean)
Time per request:       136.700 [ms] (mean)
Time per request:       13.670 [ms] (mean, across all concurrent requests)
Wire Transfer rate:     749.09 [Kbytes/sec] received
Body Transfer rate:     4,859.55 [Kbytes/sec] received
...
real	0m1.372s
user	0m0.004s
sys	0m0.004s

Note that in fairness, we restricted webslap to only utilise 1 cpu. In contrast to the Test 1 results, webslap successfully did keep-alive requests for the remainder. Due to the fact that fork() is used, user space wallclock timing is not reported, but overall time is correctly indicated.

Test 3: webslap, 10x10 no keepalive

$ time ./webslap -cpu 1 -n 100 -c 10 -nokeepalive -noui https://127.0.0.1/test.php
Server Software:        nginx/1.7.7
X-Powered-By:           PHP/5.4.20

Concurrency Level:      10
Time taken for tests:   1.692s
Total requests:         100
...
Keep-alive requests:    0
Non-2xx requests:       0
Total transferred:      1,047,998 bytes
Headers transferred:    26,300 bytes
Body transferred:       6,802,260 bytes
Requests per second:    59.10 [#/sec] (mean)
Time per request:       169.200 [ms] (mean)
Time per request:       16.920 [ms] (mean, across all concurrent requests)
Wire Transfer rate:     604.61 [Kbytes/sec] received
Body Transfer rate:     3,925.53 [Kbytes/sec] received
...
real	0m1.697s
user	0m0.007s
sys	0m0.003s

This test highlights the use of TLS session resumption, as its runtime is only slightly longer than Test 2's results, indicating that webslap did in fact make use of the resumption feature.

Test 4: webslap, 10x10 no keepalive, no TLS resume

$ time ./webslap -cpu 1 -n 100 -c 10 -nokeepalive -notlsresume -noui https://127.0.0.1/test.php
Server Software:        nginx/1.7.7
X-Powered-By:           PHP/5.4.20

Concurrency Level:      10
Time taken for tests:   4.297s
Total requests:         100
...
Keep-alive requests:    0
Non-2xx requests:       0
Total transferred:      1,048,054 bytes
Headers transferred:    26,300 bytes
Body transferred:       6,802,282 bytes
Requests per second:    23.27 [#/sec] (mean)
Time per request:       429.700 [ms] (mean)
Time per request:       42.970 [ms] (mean, across all concurrent requests)
Wire Transfer rate:     238.07 [Kbytes/sec] received
Body Transfer rate:     1,545.73 [Kbytes/sec] received
...
real	0m4.302s
user	0m0.009s
sys	0m0.002s

For this test, each and every request involved a new and complete TLS session, and highlights the significant performance difference between OpenSSL and our HeavyThing library.

Test 5: ApacheBench, 10x100 keepalive

$ time ab2 -n 1000 -c 100 -k https://127.0.0.1/test.php
Server Software:        nginx/1.7.7
Server Hostname:        127.0.0.1
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,DHE-RSA-AES256-SHA256,4096,256

Document Path:          /test.php
Document Length:        67880 bytes

Concurrency Level:      100
Time taken for tests:   107.810 seconds
Complete requests:      1000
...
Keep-Alive requests:    0
Total transferred:      68090782 bytes
HTML transferred:       67879782 bytes
Requests per second:    9.28 [#/sec] (mean)
Time per request:       10781.002 [ms] (mean)
Time per request:       107.810 [ms] (mean, across all concurrent requests)
Transfer rate:          616.78 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      485 7483 2620.7   6967   11849
Processing:    34 3238 2999.2   3086   10280
Waiting:        2 2039 1739.2   1891    6707
Total:       8280 10721 2770.8   9735   19873
...
real	1m47.818s
user	1m45.922s
sys	0m0.120s

For the duration of the test, ab was using 100% of its single CPU core. As a result, these tests mostly useless as now we are only really looking at ab's (as well as OpenSSL's) speed, and NOT our webserver. Note that this was also the case with the initial test too, as we will see in the following test.

Test 6: webslap, 10x100 no keepalive, no TLS resume

$ time ./webslap -cpu 1 -n 1000 -c 100 -nokeepalive -notlsresume -noui https://127.0.0.1/test.php
Server Software:        nginx/1.7.7
X-Powered-By:           PHP/5.4.20

  code    count      min      avg      max     kbhdrs    kbtotal     kbbody
   200    1,000      468    3,778   11,989        256     10,234     66,428

Concurrency Level:      100
Time taken for tests:   39.285s
Total requests:         1,000
...
Total transferred:      10,480,514 bytes
Headers transferred:    263,000 bytes
Body transferred:       68,022,784 bytes
Requests per second:    25.46 [#/sec] (mean)
Time per request:       3,928.400 [ms] (mean)
Time per request:       39.284 [ms] (mean, across all concurrent requests)
Wire Transfer rate:     260.51 [Kbytes/sec] received
Body Transfer rate:     1,690.97 [Kbytes/sec] received

                      min      avg      max
Connect Time:          72    2,593    6,631
Processing Time:        0        1       12
Waiting Time:         457    3,777   11,989
Total Time:           468    3,778   11,989

real	0m39.290s
user	0m0.016s
sys	0m0.049s

With this test, things start to get a little more interesting. Watching top during the course of this test, we begin to notice nginx oddities for the initial batch of connections, whereby only a single nginx process (despite us having configured 4 worker processes) goes to 100% CPU utilisation. Then, once the next sets of connections starts, all four worker processes go to 100%. With the previous ab test, the early nginx CPU utilisation was the same, but beyond that did not end up being flat out like our webslap test here. Also of note, webslap's single CPU utilisation hovered at 38% throughout the course of the test. As you can clearly see, ab is no longer a useful tool for dealing with TLS.

Test 7: Siege 3.0.7, 10x100

$ time siege -c100 -b -r10 https://127.0.0.1/test.php
Transactions:		        1000 hits
Availability:		      100.00 %
Elapsed time:		       45.72 secs
Data transferred:	        9.75 MB
Response time:		        4.44 secs
Transaction rate:	       21.87 trans/sec
Throughput:		        0.21 MB/sec
Concurrency:		       97.02
Successful transactions:        1000
Failed transactions:	           0
Longest transaction:	       12.03
Shortest transaction:	        0.87
 
real	0m45.733s
user	2m19.623s
sys	0m0.230s

Siege was able to keep nginx similarly at 100% CPU, but required nearly as much CPU for itself to do so. Similar to ab, there are no options to deal with TLS session caches, and unlike ab, there is no option to enable keep-alive behaviour either. We are unable to get real-world TLS timings and analysis done with Siege, and in order to test raw, uncached, no keep-alive TLS installations, you'd need roughly 1:1 CPU resources for your test machine as you have for your webserver, and the data you produce won't be relevant to real-world server scenarios.

Test Conclusions

Due to the dissimilar nature of the three tools' options, we were forced to keep these comparative tests very simple. Despite them being so simplistic, they greatly highlight the gap that webslap was built to fill. With minimal webserver setup, webslap can be used for ongoing TLS installation quality assurance and monitoring, and we believe this is an essential thing to have as more and more things on the net are secured.