Getting Started With Mod - Perl (Tutorial) PDF
Getting Started With Mod - Perl (Tutorial) PDF
Tutorial:
Getting Started with mod_perl
By Stas Bekman
Internet and Intranet programmer
http://singlesheaven.com/stas/
<sbekman@iname.com>
29 Jun 1999 1
Stas Bekman
This document is originally written in POD, converted to HTML by pod2html utility and then to
PostScript by html2ps utility.
2 29 Jun 1999
mod_perl tutorial: Tutorial’s Overview 1 Tutorial’s Overview
1 Tutorial’s Overview
29 Jun 1999 3
1.1 What we will learn Stas Bekman
We will start the tutorial with mod_perl Coding Guidelines. I’ll explain what are the differences
between scripts running under mod_cgi and mod_perl, what changes should be applied in order to
make the existent scripts run under mod_perl. Along with the porting notes I’ll provide guidelines for
a proper mod_perl programming.
The we will proceed to the biggest and the most important part -- improving performance. I’ll explain
the details of tuning the mod_perl enabled server and the scripts running under it, so you can squeeze
every ounce of the power from your server. I’ll present some utilities helping to benchmark and then
make a fine tune of the server. These are different benchmarks - they are not comparing mod_perl
with other alike technologies but different configurations of mod_perl servers, to guide you through
the tuning process. I have to admit - performance tuning is a very hard task, and demands a lot of
understanding and experience, but once you acquire this knowledge - you can make a magic with
your server.
Before you start with mod_perl installation, you should see an overall picture of this wonderful tech-
nology. There is more then one way to use mod_perl enabled webserver. You have to decide what
mod_perl schema you want to use. I’ll talk about picking the right strategy, by presenting various
approaches and discussing the pros and cons of each one.
Once we know what fits our requirements the best, we need to know how to make a short and sucess-
ful installation. Real World Scenarios Implementation is exactly this. You will get all the implemen-
tation details of the scenarios we have just talked about.
When you learn how to setup a basic mod_perl enabled server, you will want to know about more
advanced configuration details. I’ll talk about extended configurations and various configuration
examples.
Once you have a running server, you will want to start and stop it, prevent server’s failures with help
of watchdogs, learn how to run a personal webserver for each developer. We cover it all here.
Databases? You have come to the right place -- I’ll show you how to turn your database connections
to persistent.
Running an ISP business and planning to extend your services, by providing mod_perl services -- I’ll
tell you whether it’s a fantasy or reality.
Finally, I’ll give you a list of related information resources, like learning perl programming and SQL,
understanding security, building databases and more. And the most important -- how to get helped if
you are in trouble and need help.
4 29 Jun 1999
mod_perl tutorial: Tutorial’s Overview 1.2 What prior knowledge is required.
;o)
29 Jun 1999 5
2 mod_perl Technology Overview Stas Bekman
6 29 Jun 1999
mod_perl tutorial: mod_perl Technology Overview 2.1 What is mod_perl
The primary advantages of mod_perl are power and speed. You have full access to the inner-workings of
the web server and can intervene at any stage of request-processing. This allows for customized process-
ing of (to name just a few of the phases) URI->filename translation, authentication, response generation
and logging. There is very little run-time overhead. In particular, it is not necessary to start a separate
process, as is often done with web-server extensions. The most wide-spread such extension mechanism,
the Common Gateway Interface (CGI), can be replaced entirely with perl-code that handles the response
generation phase of request processing. Mod_perl includes 2 general purpose modules for this purpose:
Apache::Registry, which can transparently run existing perl CGI scripts and Apache::PerlRun,
which does a similar job but allows you to run ‘‘dirtier’’ (to some extent) scripts.
You can configure your httpd server and handlers in Perl. You can even define your own configuration
directives.
Many people wonder and ask ‘‘How much of a performance improvement does mod_perl give?’’. Well, it
all depends on what you are doing with mod_perl and possibly who you ask. Developers report speed
boosts from 200% to 2000%. The best way to measure is to try it and see for yourself! (see
http://perl.apache.org/tidbits.html and http://perl.apache.org/stories/ for the facts)
;o)
29 Jun 1999 7
3 mod_perl Coding Guidelines Stas Bekman
8 29 Jun 1999
mod_perl tutorial: mod_perl Coding Guidelines 3.1 Exposing Apache::Registry secrets
I will use a simple CGI script, that initializes a $counter to 0, and prints its value to the screen while
incrementing it.
counter.pl:
----------
#!/usr/bin/perl -w
use strict;
my $counter = 0;
for (1..5) {
increment_counter();
}
sub increment_counter{
$counter++;
print "Counter is equal to $counter !<BR>\n";
}
----------
And that’s what you see when you execute this script at first time. But let’s reload it a few times... See,
suddenly after a few reloads the counter doesn’t start its count from 5 anymore. We continue to reload and
see that it keeps on growing, but not steadily 10, 10, 10, 15, 20... Weird...
Counter is equal to 6 !
Counter is equal to 7 !
Counter is equal to 8 !
Counter is equal to 9 !
Counter is equal to 10 !
We saw two anomalies in this very simple script: Unexpected growth of counter over 5 and inconsistent
growth over reloads. Let’s investigate this script.
29 Jun 1999 9
3.1 Exposing Apache::Registry secrets Stas Bekman
What kind of error is this? We should ask perl to help us. I’m going to enable a special diagnostics mode,
by adding at the top of the script:
use diagnostics;
When the inner subroutine is called, it will probably see the value of
the outer subroutine’s variable as it was before and during the
*first* call to the outer subroutine; in this case, after the first
call to the outer subroutine is complete, the inner and outer
subroutines will no longer share a common value for the variable. In
other words, the variable will no longer be shared.
Actually perl detected a closure, which is sometimes a wanted effect, but not in our case (see perldoc
perlsub for more information about closures). While diagnostics.pm sometimes is handy for
debugging purpose - it drastically slows down your CGI script. Make sure you remove it in your produc-
tion server.
Do you see a nested named subroutine in my script? I do not!!! What is going on? I suggest to report a
bug. But wait, may be a perl interpreter sees the script in a different way, may be the code goes through
some changes before it actually gets executed? The easiest way to check what’s actually happening is to
run the script with debugger, but since we must debug it when it’s being executed by the server, normal
debugging process wouldn’t help, for we have to invoke the debugger from within the webserver. Luckily
Doug wrote an Apache::DB module and we will use it to debug my script. I’ll do it non-interactively
(while you can debug interactively with Apache::DB). I change my http.conf with:
10 29 Jun 1999
mod_perl tutorial: mod_perl Coding Guidelines 3.1 Exposing Apache::Registry secrets
Comment out ’use diagnostics;’, restart the server and call the counter.pl from your browser.
On the surface nothing changed - we still see the correct output as before, but two things happened at the
background: first -- the /tmp/db.out was written, with a complete trace of the code that was executed,
second -- error_log file showed us the whole code that was executed as a side effect of reporting the
warning we saw before: Variable "$counter" will not stay shared at (eval 52)
line 15.... In any case that’s the code that actually is being executed:
package Apache::ROOT::perl::conference::counter_2epl;
use Apache qw(exit);
sub handler {
BEGIN {
$^W = 1;
};
$^W = 1;
use strict;
my $counter = 0;
for (1..5) {
increment_counter();
}
sub increment_counter{
$counter++;
print "Counter is equal to $counter !<BR>\\n";
}
What do we learn from this discovering? First that every cgi script is being cached under a package whose
name is compounded from Apache::ROOT:: prefix and the relative part of the script’s URL
(perl::conference::counter_2epl) by replacing all occurrences of / with ::. That’s how
mod_perl knows what script should be fetched from cache - each script is just a package with a single
subroutine named handler. Now you understand why diagnostics pragma talked about inner
(nested) subroutine - increment_counter is actually a nested sub. In every script each subroutine is
nested inside the handler subroutine.
29 Jun 1999 11
3.1 Exposing Apache::Registry secrets Stas Bekman
# !/usr/bin/perl -w
use strict;
use vars qw($counter);
$counter = 0;
for (1..5) {
increment_counter();
}
sub increment_counter{
$counter++;
print "Counter is equal to $counter !<BR>\n";
}
There is no more closure effect, since there is no my() (lexically) defined variable being used in the
nested subroutine.
Another approach is to use fully qualified variables, which is even better, since less memory will be used,
but it adds an overhead of extra typing:
#!/usr/bin/perl -w
use strict;
$main::counter = 0;
for (1..5) {
increment_counter();
}
sub increment_counter{
$main::counter++;
print "Counter is equal to $main::counter !<BR>\n";
}
Now let’s proceed to the second mystery. Why did we see inconsistent results over numerous reloads.
That’s very simple. Every time a server gets a request to process, it handles it over one of the children,
generally in a round robin fashion. So if you have 10 httpd children alive, first 10 reloads might seem to
be correct. Since the closure starts to effect from the second re-invocation, consequent reloads return
unexpected results. Moreover children don’t serve the same request always consequently, at any given
moment one of the children could serve more times the same script than any other. That’s why we saw
that strange behavior.
A workaround is to run the server in a single server mode. You achieve this by invoking the server with
-X parameter (httpd -X). Since there is no other servers (children) running - you will detect the
problem on the second reload. But before that let the error_log to help you detect most of the possible
errors - most of the warnings can become errors, so you better make sure to check every warning that is
being detected by perl, and probably to write the code in a way, that none of the warnings will show up in
the error_log. If your error_log file is being filled up with hundreds of lines on every script invo-
12 29 Jun 1999
mod_perl tutorial: mod_perl Coding Guidelines 3.2 Sometimes it Works Sometimes it Does Not
cation - you will have a problem to locate and notice real problems.
Of course none of the warnings will be reported if the warning mechanism will not be turned ON. With
mod_perl it is also possible to turn on warnings globally via the PerlWarn directive, just add into a
httpd.conf:
PerlWarn On
You can turn it off within your code with local $^W=0. on the local basis (or inside the block). If you
write $^W=0 you disable the warning mode everywhere inside the child, $^W=1 enables it back. So if
perl warns you somewhere you sure it’s not a problem, you can locally disable the warning, e.g.:
[snip]
# we want perl to be quiet here -
# we don’t care whether $a was initialized
local $^W = 0;
print $a;
local $^W = 1;
[snip]
Of course this is not a way to fix initialization and other problems, but sometimes it helps.
While having a warning mode turned On is a must in a development server, you better turn it globally Off
in a production server, since if every CGI script generates only one warning per request, and your server
serves millions of requests per day - your log file will eat up all of your disk space and machine will die.
My production serves has the following directive in the httpd.conf:
PerlWarn Off
While we are talking about control flags, another even more important flag is -T which turns On the
Taint mode On. Since this is very broad topic I’ll not discuss it here, but if you aren’t forcing all of your
scripts to run under Taint mode you are looking for a trouble (always remember about malicious users).
To turn it on, add to httpd.conf:
PerlTaintCheck On
Generally the problem you have is of using global variables. Since global variables don’t change from one
script invocation to another unless you change them, you can find your scripts do ‘‘fancy’’ things.
The first example is amazing -- Web Services. Imagine that you enter some site you have your account on
(Free Email Account?). Now you want to see what other users read.
29 Jun 1999 13
3.2 Sometimes it Works Sometimes it Does Not Stas Bekman
You type in your name and passwd, and you expect to enter to your account, but instead you enter the
account of someone else. This is cool isn’t it? Is it a bug or feature. (For some of us it’s a feature, while
for others it’s a bug.) You say, why in the world does this happen? The answer is simple: Global Vari-
ables. You have entered the account of someone who happened to be served by the same server child as
you. Because of sloppy programming, a global variable was not reset at the beginning of the program and
voila, you can easily peek into other people’s emails! You would think that it can’t happen, since you have
entered the login and passwd. I tell you, it happens! See for yourself:
use vars ($authenticated);
my $q = new CGI;
my $username = $q->param(’username’);
my $passwd = $q->param(’passwd’);
authenticate($username,$passwd);
# failed, break out
die "Wrong passwd" unless $authenticated == 1;
# user is OK, fetch user’s data
show_user($username);
sub authenticate{
my ($username,$passwd) = @_;
# some checking
$authenticated = 1 if (SOMETHING);
}
Do you see the catch? With the code above, I can type in any valid username and any dummy passwd and
enter that user’s account, if someone has successfully entered his account before me using the same child
process! Since $authenticated is global - if it becomes 1 once it’ll be 1 for the remainder of the child’s
life!!! The solution is trivial -- reset $authenticated to 0 at the beginning of the program. (Or many other
different solutions). Of course this example is trivial -- but believe me it happens!
Just another little one liner that can spoil your day, assuming you forgot to reset the $allowed variable. It
works perfectly OK in plain mod_cgi:
$allowed = 1 if $username eq ’admin’;
But you will let any user to admin your system with the line above (again assuming you have used the
same child prior to some user request).
Another good example is usage of the /o regular expression qualifier, which compiles a regular expression
once, on its first execution and never recompile it again. This problem can be difficult to detect, as after
restarting the server each request you make will be served by a different child process, and thus the regex
pattern for that child will be compiled fresh. Only when you make a request that happens to be served by a
child which has already cached the regexp will you see the problem. Generally you miss that and when
you press reload, you see that it works (with a new, fresh child) and then it doesn’t (with a child that
already cached the regexp and wouldn’t recompile because of /o.) The example of such a case would be:
my $pat = ’^foo$’; # likely to be input from an HTML form field
foreach( @list ) {
print if /$pat/o;
}
14 29 Jun 1999
mod_perl tutorial: mod_perl Coding Guidelines 3.3 What’s different about modperl
To make sure you catch these bugs, always test your CGI scripts under a server running in a single mode.
I’ll talk in more extension about Compiled Regular Expressions at the end of this section.
printed result:
---------------
/home/httpd/perl/my/lib.pl
Scenario 1
First, You can’t have 2 identical module names running under the same server! Only the first one
use()’d or require()’d will be compiled into the package, the request to the other identical
module will be skipped since server will think that it’s already compiled. It’s already in the child’s
%INC. (Having /perl-status mode enabled will allow you to find out what is loaded and where)
So if you have two different Foo modules in two different directories and two scripts script1.pl
and script2.pl, locate like:
./cgi/tool1/Foo.pm
./cgi/tool1/tool1.pl
./cgi/tool2/Foo.pm
./cgi/tool2/tool2.pl
29 Jun 1999 15
3.3.2 Name collisions with Modules and libs Stas Bekman
./cgi/tool1/Foo.pm
--------------------
sub foo{
print "<B>I’m Tool Number One!</B><BR>\n";
}
1;
--------------------
./cgi/tool2/tool2.pl
--------------------
use Foo;
print "Content-type: text/html\n\n";
print "I’m Script number Two<BR>\n";
foo();
--------------------
./cgi/tool2/Foo.pm
--------------------
sub foo{
print "<B>I’m Tool Number Two!</B><BR>\n";
}
1;
--------------------
And both scripts call: use Foo only the first one called will know about Foo, when you will call the
second script it will not know about Foo at all - it’s like you’ve forgotten to write use Foo;. Run
the server in a single server mode to detect that kind of bug immediately.
Scenario 2
The above is true for the files you require() as well (assuming that the required files do not
declare a package). If you have:
./cgi/tool1/config.pl
./cgi/tool1/tool1.pl
./cgi/tool2/config.pl
./cgi/tool2/tool2.pl
16 29 Jun 1999
mod_perl tutorial: mod_perl Coding Guidelines 3.3.2 Name collisions with Modules and libs
While the content of the scripts and config.pl files is exactly like in the example above. Only the
first one will actually do the require(), all for the same reason that %INC already includes the
key "config.pl"! The second scenario is not different from the first one, since there is no difference
between use() and require() if you don’t have to import some symbols into a calling script.
Scenario 3
There are 3 solutions for that: (make sure you read the whole item 3)
Solution 1
The first two faulty scenarios can be solved by placing your library modules in a subdirectory struc-
ture so that they have different path prefixes. The file system layout will be something like:
./cgi/tool1/Tool1/Foo.pm
./cgi/tool1/tool1.pl
./cgi/tool2/Tool2/Foo.pm
./cgi/tool2/tool2.pl
29 Jun 1999 17
3.3.2 Name collisions with Modules and libs Stas Bekman
But this solution is very bad, since while it might work for you now,
if you add another script that wants to use the same module or
C<config.pl> file, it still wouldn’t work as we saw in the third
scenario. So let see better solutions.
Solution 2
Another option is to use a full path to the script, so it’ll be compiled into the name of the key in the
%INC;
require "/full/path/to/the/config.pl";
This solution solves the first two scenarios. I was surprised but it worked for the third scenario as
well!
But with this solution you loose portability! (If you move the tool around in the file system you will
have to change the base dir)
Solution 3
Declare a package in the required files! (Of course it should be unique to the rest of the package
names you use!) The %INC will use the package name for the key! It’s a good idea to build at least 2
level package names for your private modules. (e.g. MyProject::Carp and not Carp for it will
collide with existent standard package - even if as of the time of your coding it doesn’t exist yet - it
might enter the next perl distribution as a standard module and your code will become broken.
Foresee problems like this and save you a future trouble.)
When you use()d or require()d files without package declarations, it was very convenient
since all the variables and subroutines were part of the main:: package, so any of them could be
used as if they were part of the main script. With package declarations things get more complicated.
To be correct -- not complicated, but awkward, since you will have to use Package::func-
tion() method to call a subroutine from a package Package and to access a global variable inside
the same package you will have to write $Package::some_variable, you get a kind of typing
overhead. You will be unable to access lexically defined variables inside Package (declared with
my()).
You can make things simpler by exporting the symbols from the use()’d package, like:
use Package qw(:mysubs sub_b $var1 :myvars);
You can export both -- subroutines and global variables. This is a bad approach since it’ll consume
more memory for the current process. (See perldoc Exporter for information about exporting
variables)
This solution completely covers the third scenario. By using different module names in package
declarations, as explained above you solve the first two as well.
18 29 Jun 1999
mod_perl tutorial: mod_perl Coding Guidelines 3.3.3 __END__ or __DATA__ tokens
From the above discussion it should be clear that you cannot run a development and a production versions
of the tools on the same server! You have to run a separate server for each (it still can be the same
machine, but the server will use a different port).
You might start your scripts by overriding the exit sub (if you use Apache::exit() directly, you will have a
problem testing the script from the shell, unless you stuff use Apache (); into your code.) I use the
following code:
BEGIN {
# Auto-detect if we are running under mod_perl or CGI.
$USE_MOD_PERL = ( (exists $ENV{’GATEWAY_INTERFACE’}
and $ENV{’GATEWAY_INTERFACE’} =~ /CGI-Perl/)
or exists $ENV{’MOD_PERL’} ) ? 1 : 0;
}
use subs (exit);
Now each time the select code is called, the correct exit() will be chosen, whether you run the script as
a CGI or from the shell.
29 Jun 1999 19
3.3.7 Running from shell Stas Bekman
Note that if you run the script under Apache::Registry, The Apache function exit() overrides
the Perl core built-in function. While you see the exit() listed in @EXPORT_OK of Apache package,
Apache::Registry makes something you don’t see and imports this function for you. This means
that if your script is running under Apache::Registry handler (Apache::PerlRun as well), you
don’t have to worry about exit().
Note that if you still use CORE::exit() in your scripts running under modperl, the child will exit, but
neither proper exit nor logging will happen on the way. CORE::exit() cuts off the server’s legs... If
you need to properly shutdown the child , use $r->child_terminate (which sets the internal
MaxRequestsPerChild so the child will exit).
in httpd.conf:
PerlFixupHandler "sub { shift->child_terminate }"
Now the response line and common headers will be sent as they are by mod_cgi. And, just as with
mod_cgi, PerlSendHeader will not send a terminating newline, your script must send that itself, e.g.:
print "Content-type: text/html\r\n\r\n";
If you are using CGI.pm or CGI::Switch and print $q->header you do _not_ need PerlSendHeader
On.
20 29 Jun 1999
mod_perl tutorial: mod_perl Coding Guidelines 3.3.10 NPH (Non Parsed Headers) scripts
And if you normally set PerlSendHeader On, add this to your httpd.conf:
<Files */nph-*>
PerlSendHeader Off
</Files>
BEGIN blocks in modules and files pulled in via require() and/or use() will be executed:
An additional time, once per-child process if the module is pulled in off a disk again via
Apache::StatINC.
Only once, if pulled in by the parent process via Apache::RegistryLoader - once per-child
process if not pulled in by the parent process.
An additional time, once per-child process if the script file has changed on disk.
An additional time, in the parent process on each restart if pulled in by the parent process via
Apache::RegistryLoader and PerlFreshRestart is On.
29 Jun 1999 21
3.3.12 END blocks Stas Bekman
Normally, END blocks are executed by Perl during its perl_run() function, which is called once each time
the Perl program is executed, e.g. once per (mod_cgi) CGI scripts. However, mod_perl only calls
perl_run() once, during server startup. Any END blocks encountered during main server startup, i.e. those
pulled in by the PerlRequire or by any PerlModule, are suspended and run at server shutdown, aka
child_exit (requires apache 1.3b3+).
Any END blocks that are encountered during compilation of Apache::Registry scripts are called
after the script has completed (not during the cleanup phase though) including subsequent invocations
when the script is cached in memory. All other END blocks encountered during other Perl*Handler
callbacks, e.g. PerlChildInitHandler, will be suspended while the process is running and called
during child_exit() when the process is shutting down. Module authors might wish to use $r->regis-
ter_cleanup as an alternative to END blocks if this behavior is not desirable.
If needed, you can always turn off the ’strict’ pragma or a part of it inside the block, e.g:
{
no strict ’refs’;
... some code
}
It’s more important to have strict pragma enabled under mod_perl than anywhere else. While it’s not
required, it is strongly recommended, it will save you more time in the long run. And, of course, clean
scripts will still run under mod_cgi (plain CGI)!
If you need -- you can always turn off the warnings with local $^W=0 in your code if you have some
section you don’t want the perl compiler to warn in. The correct way to do this is:
22 29 Jun 1999
mod_perl tutorial: mod_perl Coding Guidelines 3.3.15 Passing ENV variables to CGI
{
local $^W=0;
# some code
}
It preserves the previous value of $^W when you quit the block (so if it was set before, it will return to be
set at the leaving of the block.
In production code, it can be a good idea to turn warnings off. Otherwise if your code isn’t very clean and
spits a few lines of warnings here and there, you will end up with a huge error_log file in a short time
on the heavily loaded server. Also, enabling runtime warning checking has a small performance impact --
in any script, not just under mod_perl -- so your approach should be to enable warnings during develop-
ment, and then disable them when your code is production-ready. Controlling the warnings mode through
the httpd.conf is much better, since you can control the behavior of all of the scripts from a central
place. I have PerlWarn On on my development server and PerlWarn Off on the production
machine.
diagnostics pragma can shed more light on the errors and warnings you see, but again, it’s better not
to use it in production, since otherwise you incur a huge overhead of the diagnostics pragma examining
your every bit of code mod_perl executes. (You can run your script with -dDprof to check the overhead.
See Devel::Dprof for more info).
e.g.:
PerlSetEnv PERLDB_OPTS "NonStop=1 LineInfo=/tmp/db.out AutoTrace=1"
29 Jun 1999 23
3.3.17 Memory leakage Stas Bekman
vs.
package MyPackage;
use strict;
use vars qw(@ISA $VERSION);
@ISA = qw(...);
$VERSION = "1.00";
Perl doesn’t return the memory it acquired from the kernel. It does reuse it though!
If your file is 5Mb, the child who served that script will grow exactly by that size. Now if you have 20
children and all of them will serve this CGI, all of them will consume additional 20*5M = 100M of RAM!
If that’s the case, try to use other approaches of processing the file, if possible of course. Try to process a
line at a time and print it back to the file. (If you need to modify the file itself, use a temporary file. When
finished, overwrite the source file, make sure to provide a locking mechanism!)
Second example demonstrates copying variables between functions (passing variables by value). Let’s
use the example above, assuming we have no choice but to read the whole file before any data processing.
Now you have some imagine process() subroutine that processes the data and returns it back. What
happens if you pass the $content by value? You have just copied another 5M and the child has grown
by another 5M in size (watch your swap space!) now multiply it again by factor of 20 you have 200M of
wasted RAM, which will be apparently reused but it’s a waste! Whenever you think the variable can grow
bigger than few Kb, pass it by reference!
Once I wrote a script that passed a content of a little flat file DataBase to a function that processed it by
value -- it worked and it was processed fast, but with a time the DataBase became bigger, so passing it by
value was an overkill -- I had to make a decision, whether to buy more memory or to rewrite the code. It’s
obvious that adding more memory will be merely a temporary solution. So it’s better to plan ahead and
pass the variables by reference, if a variable you are going to pass might be bigger than you think at the
time of your coding process. There are a few approaches you can use to pass and use variables passed by
reference. For example:
24 29 Jun 1999
mod_perl tutorial: mod_perl Coding Guidelines 3.3.17 Memory leakage
my $content = qq{foobarfoobar};
process(\$content);
sub process{
my $r_var = shift;
$$r_var =~ s/foo/bar/gs;
# nothing returned - the variable $content outside has been
# already modified
}
Another approach would be to directly use a @_ array. Using directly the @_ array serves the job of
passing by reference!
process($content);
sub process{
$_[0] =~ s/foo/bar/gs;
# nothing returned - the variable $content outside has been
# already modified
}
Be careful when you write this kind of subroutines, since it can confuse a potential user. It’s not obvious
that call like process($content); modifies the passed variable -- programmers (which are the users
of your library in this case) are used to subs that either modify variables passed by reference or return the
processed variable (e.g. $content=process($content);).
Third example demonstrates work with DataBases. If you do some DB processing, many times you
encounter the need to read lots of records into your program, and then print them to the browser after they
are formatted. (I don’t even mention the horrible case where programmers read in the whole DB and then
use perl to process it!!! Use a relational DB and let the SQL do the job, so you get only the records you
need!!!).
We will use DBI for this (assume that we are already connected to the DB) (refer to perldoc DBI for a
complete manual of the DBI module):
$sth->execute;
while(@row_ary = $sth->fetchrow_array;) {
<do DB accumulation into some variable>
}
<print the output using the the data returned from the DB>
29 Jun 1999 25
3.3.17 Memory leakage Stas Bekman
In the example above the httpd_process will grow up by the size of the variables that have been allocated
for the records that matched the query. (Again remember to multiply it by the number of the children your
server runs!).
A better approach is to not accumulate the records, but rather print them as they are fetched from the DB.
Moreover, we will use the bind_col() and $sth->fetchrow_arrayref() (aliased to
$sth->fetch()) methods, to fetch the data in the fastest possible way. The example below prints
a HTML TABLE with matched data, the only memory that is being used is a @cols array to hold tempo-
rary row values:
my @select_fields = qw(a b c);
# create a list of cols values
my @cols = ();
@cols[0..$#select_fields] = ();
$sth = $dbh->prepare($do_sql);
$sth->execute;
# Bind perl variables to columns.
$sth->bind_columns(undef,\(@cols));
print "<TABLE>";
while($sth->fetch) {
print "<TR>",
map("<TD>$_</TD>", @cols),
"</TR>";
}
print "</TABLE>";
Note: the above method doesn’t allow you to know how many records have been matched. The
workaround is to run an identical query before the code above where you use SELECT count(*) ...
instead of ’SELECT * ... to get the number of matched records.
For those who think that $sth->rows will do the job, here is the quote from the DBI manpage:
rows();
$rv = $sth->rows;
As a bonus, I wanted to write a single sub that flexibly processes any query, accepting: conditions, call-
back closure sub, select fields and restrictions.
26 29 Jun 1999
mod_perl tutorial: mod_perl Coding Guidelines 3.3.17 Memory leakage
# Usage:
# $o->dump(\%conditions,\&callback_closure,\@select_fields,@restrictions);
#
sub dump{
my $self = shift;
my %param = %{+shift}; # dereference hash
my $rsub = shift;
my @select_fields = @{+shift}; # dereference list
my @restrict = shift || ’’;
my $do_sql = ’’;
my @where = ();
Now a callback closure sub can do lots of things. We need a closure to know what stage are we in: header,
body or tail. For example, we want a callback closure for formatting the rows to print:
my $rsub = eval {
# make a copy of @fields list, since it might go
# out of scope when this closure will be called
my @fields = @fields;
my @query_fields = qw(user dir tool act); # no date field!!!
my $header = 0;
my $tail = 0;
my $counter = 0;
29 Jun 1999 27
3.4 Reloading Modules and Required Files Stas Bekman
sub {
# Header
if (@_ and !$header){
print "<TABLE>\n";
print $q->Tr(map{ $q->td($_) } @fields );
$header = 1;
}
# Body
if (@_) {
print $q->Tr(map{$q->td($_)} @_ );
$counter++;
return;
}
# No record found
unless ($header){
print $q->p($q->center($q->b("No record was found!\n")));
}
} # end of sub {}
}; # end of my $rsub = eval {
The situation is different with Apache::Registry, since the whole idea is to get maximum perfor-
mance from the server. By default, the server won’t spend the time to check whether any included library
modules have been changed. It assumes that they weren’t, thus saving a few milliseconds to stat() the
source file (multiplied by however many modules/libraries you are use()-ing and/or require()-ing in
your script.) The only check that is being done is whether your main script has been changed. So if you
28 29 Jun 1999
mod_perl tutorial: mod_perl Coding Guidelines 3.4.1 Restarting the server
have only one script that doesn’t use() (or require()) other perl modules (or packages), there is
nothing new about it. If however, you are developing a script that includes other modules, the files you
use() or require() aren’t being checked whether they have been modified.
Acknowledging this, how do we get our modperl-enabled server to recognize changes in any library
modules? Well, there are a couple of techniques:
Apache::StatINC - Reload %INC files when updated on disk. When Perl pulls a file via require, it
stores the filename in the global hash %INC. The next time Perl tries to require the same file, it sees the
file in %INC and does not reload from disk. This module’s handler iterates over %INC and reloads the file
if it has changed on disk.
To be sure it really works, turn on the debug mode on your development box with PerlSetVar Stat-
INCDebug On. You end up with something like:
PerlModule Apache::StatINC
<Location /perl>
SetHandler perl-script
PerlHandler Apache::Registry::handler
Options ExecCGI
PerlSendHeader On
PerlInitHandler Apache::StatINC
PerlSetVar StatINCDebug On
</Location>
Beware that only the modules located in @INC are being reloaded on change, and you can change the
@INC only before the server has been started. Whatever you do in your scripts/modules which are being
required() after the server startup will not have any effect on @INC. When you do use lib
qw(foo/bar);, the @INC is being changed only for the time the code is being parsed and compiled. When
it’s over the @INC is being reset to its original value. To make sure that you have set a correct @INC fetch
http://www.nowhere.com/perl-status?inc and watch the bottom of the page. (I assume you have config-
ured the <Location /perl-status> section in httpd.conf as the mod_perl docs show.)
29 Jun 1999 29
3.4.3 Reloading only specific files Stas Bekman
Assuming that you start your script with loading Foo::Bar and importing some tags:
use lib "/some/private/path";
use Foo::Bar qw(:tags_group tag1 tag2);
Now to make a modification testing and reload at runtime you have to use something like this:
# child’s private global variable to keep the timestamps
use vars qw(%MODIFIED);
my $module = "Foo::Bar";
require $path;
You may want to add debug print statements to debug this code in your application.
30 29 Jun 1999
mod_perl tutorial: mod_perl Coding Guidelines 3.5 Filehandlers and locks leakages
somewhere before the end of the script, since if you forget to close(), you might get a file descriptor
leakage and unlock problem (if you flock()ed on this file descriptor). Even if you do have it, but for
some reason the interpreter was stopped before the cleanup call, because of various reasons, such as user
aborted script the leakage is still there. In a long run your machine might get run out of file descriptors,
and even worse - file might be left locked and unusable by other invocations of the same and other scripts.
What can you do? Use IO::File (and other IO::* modules), which allows you to assign the file
handler to variable, which can be my() (lexically) scoped. And when this variable goes out of scope the
file or other file system entity will be properly closed and unlocked (if it was locked). Lexically scoped
variable will always go out of scope at the end of the script’s run even if it was aborted in the middle or
before the end if it was defined inside some internal block. For example:
{
my $fh = new IO::File("filename") or die $!;
# read from $fh
} # ...$fh is closed automatically at end of block, without leaks.
As I have just mentioned, you don’t have to create a special block for this purpose, for a file the code is
written in is a virtual block as well, so you can simply write:
my $fh = new IO::File("filename") or die $!;
# read from $fh
# ...$fh is closed automatically at end of block, without leaks.
What the first technique (using { BLOCK }) makes sure is that the file will be closed the moment, the
block is finished.
Use these approaches to ensure you have no leakages, but don’t be lazy to write close() statements,
make it a habit.
29 Jun 1999 31
3.6 The Script is too dirty, But It does the job and I can’t afford rewriting it. Stas Bekman
3.6 The Script is too dirty, But It does the job and I can’t
afford rewriting it.
You still can win from using mod_perl.
One approach is to replace the Apache::Registry handler with Apache::PerlRun and define a
new location (the script can reside in the same directory on the disk.
# srm.conf
Alias /cgi-perl/ /home/httpd/cgi/
# httpd.conf
<Location /cgi-perl>
#AllowOverride None
SetHandler perl-script
PerlHandler Apache::PerlRun
Options ExecCGI
allow from all
PerlSendHeader On
</Location>
Another ‘‘bad’’, but working method is to set MaxRequestsPerChild to 1, which will force each
child to exit after serving only one request, so you’ll get the preloaded modules, etc., the script will be
compiled each request, then killed off. This isn’t good for ‘‘high-traffic’’ sites though, as the parent server
will need to fork a new child each time one is killed, but you can fiddle with MaxStartServers,
MinSpareServers, to make the parent spawn more servers ahead so the killed one will be immedi-
ately replaced with the fresh one. Again, probably that’s not what you want.
Be aware though, that if you use packages that use internal variables that have circular references, they
will be not flushed!!! Apache::PerlRun only flushes your script’s name space, which does not
include any other required packages’ name spaces. If there’s a reference to a my() scoped variable that’s
keeping it from being destroyed after leaving the eval scope (of Apache::PerlRun), that cleanup
might not be taken care of until the server is shutdown and perl_destruct() is run, which always
happens after running command line scripts. Consider this example:
32 29 Jun 1999
mod_perl tutorial: mod_perl Coding Guidelines 3.8 Selecting the right porting/working mode
package Foo;
sub new { bless {} }
sub DESTROY {
warn "Foo->DESTROY\n";
}
eval <<’EOF’;
package my_script;
my $self = Foo->new;
#$self->{circle} = $self;
EOF
print $@ if $@;
print "Done with script\n";
Then, uncomment the line where $self makes a circular reference, and you’ll see:
Done with script
Foo->DESTROY
In this case, under mod_perl you wouldn’t see Foo->DESTROY until the server shutdown, or your
module properly took care of things.
This is usually a big win in loops over lists, or when using grep() or map() operators.
29 Jun 1999 33
3.9 Compiled Regular Expressions Stas Bekman
In long-lived mod_perl scripts, however, this can pose a problem if the variable changes according to the
invocation. The first invocation of a fresh httpd child will compile the regex and perform the search
correctly. However, all subsequent uses by the httpd child will continue to match the original pattern,
regardless of the current contents of the Perl variables the pattern is dependent on. Your script will appear
broken.
The first -- is to use eval q//, to force the code to be evaluated each time. Just make sure that the eval
block covers the entire loop of processing, and not just the pattern match itself.
Just saying:
foreach( @list ) {
eval q{ print if /$pat/o; };
}
You can use this approach if you require more than one pattern match operator in a given section of code.
If the section contains only one operator (be it an m// or s///), you can rely on the property of the null
pattern, that reuses the last pattern seen. This leads to the second solution, which also eliminates the use of
eval.
The only gotcha is that the dummy match that boots the regular expression engine must absolutely, posi-
tively succeed, otherwise the pattern will not be cached, and the // will match everything. If you can’t
count on fixed text to ensure the match succeeds, you have two possibilities.
If you can guarantee that the pattern variable contains no meta-characters (things like *, +, ^, $...), you can
use the dummy match:
"$pat" =~ /\Q$pat\E/; # guaranteed if no meta-characters present
34 29 Jun 1999
mod_perl tutorial: mod_perl Coding Guidelines 3.10 Finding the line number the error/warning has been triggered at
If there is a possibility that the pattern can contain meta-characters, you should search for the pattern or
the unsearchable \377 character as follows:
"\377" =~ /$pat|^[\377]$/; # guaranteed if meta-characters present
It depends on the complexity of the regexp you apply this technique to. One common usage where
compiled regexp is usually more efficient is to ‘‘match any one of a group of patterns’’ over and over
again.
Maybe with some helper routine, it’s easier to remember. Here is one slightly modified from Jeffery
Friedl’s example in his book ‘‘Mastering Regex.’’. I find it quite useful:
#####################################################
# Build_MatchMany_Function
# -- Input: list of patterns
# -- Output: A code ref which matches its $_[0]
# against ANY of the patterns given in the
# "Input", efficiently.
#
sub Build_MatchMany_Function {
my @R = @_;
my $expr = join ’||’, map { "\$_[0] =~ m/\$R[$_]/o" } ( 0..$#R );
my $matchsub = eval "sub { $expr }";
die "Failed in building regex @R: $@" if $@;
$matchsub;
}
Example usage:
@some_browsers = qw(Mozilla Lynx MSIE AmigaVoyager lwp libwww);
$Known_Browser=Build_MatchMany_Function(@some_browsers);
while (<ACCESS_LOG>) {
# ...
$browser = get_browser_field($_);
if ( ! &$Known_Browser($browser) ) {
print STDERR "Unknown Browser: $browser\n";
}
# ...
}
29 Jun 1999 35
3.11 Forking subprocesses from mod_perl Stas Bekman
If you compile with the experimental PERL_MARK_WHERE=1, it shows you almost the exact line
number, where this is happening. Generally a compiler makes a shift in its line counter. You can always
stuff your code with special compiler directives, to reset its counter to the value you will tell. At the begin-
ning of the line you should write (the ’#’ in column 1):
#line 298 myscript.pl
or
#line 890 some_label_to_be_used_in_the_error_message
The label is optional - the filename of the script will be used by default. This specifies the line number of
the following line, not the line the directive is on. You can use a little script to stuff every N lines of your
code with these directives, but then you will have to rerun this script every time you add or remove code
lines. The script:
#!/usr/bin/perl
# Puts Perl line markers in a Perl program for debugging purposes.
# Also takes out old line markers.
die "No filename to process.\n" unless @ARGV;
my $filename = $ARGV[0];
my $lines = 100;
open IN, $filename or die "Cannot open file: $filename: $!\n";
open OUT, ">$filename.marked"
or die "Cannot open file: $filename.marked: $!\n";
my $counter = 1;
while (<IN>) {
print OUT "#line $counter\n" unless $counter++ % $lines;
next if $_ =~ /^#line /;
print OUT $_;
}
close OUT;
close IN;
chmod 0755, "$filename.marked";
36 29 Jun 1999
mod_perl tutorial: mod_perl Coding Guidelines 3.12 Debugging your code in Single Server Mode
and in program.pl :
@params=FreezeThaw::thaw(shift @ARGV);
# check that @params is ok
close STDIN;
close STDOUT;
open STDERR, ">/dev/null";
setsid(); # to detach
At this point, program.pl is running in the ‘‘background’’ while the system() returns and permits
apache to get on with life.
This has obvious problems. Not the least of which is that $params must not be bigger then whatever
your architecture’s limit is (could depend on your shell).
However, you might want be trying to do the ‘‘wrong thing’’. If what you want is to send information to
the browser and then do some post-processing, look into PerlCleanupHandler.
You want to test that your application correctly handles global variables (if you have any - the less you
have of them the better, but sometimes you just can’t without them). It’s hard to test with multiple servers
serving your cgi since each child has a different value for its global variables. Imagine that you have a
random() sub that returns a random number and you have the following script.
use vars qw($num);
$num ||= random();
print ++$num;
This script initializes the variable $num with a random value, then increments it on each request and
prints it out. Running this script in multiple server environments will result in something like 1,9,4,19
(number per reload), since each time your script will be served by a different child. (On some OSes, the
parent httpd process will assign all of the requests to the same child process if all of the children are idle...
AIX...). But if you run in httpd -X single server mode you will get 2,3,4,5... (taken that the random()
returned 1 at the first call)
But do not get too obsessive with this mode, since working only in single server mode sometimes hides
problems that show up when you switch to multi server mode. Consider an application that allows you to
change the configuration at run time.
29 Jun 1999 37
3.13 -M and other time() file tests under mod_perl Stas Bekman
Let’s say the script produces a form to change the background color of the page. It’s not a good design,
but for the sake of demonstrating the potential problem, we will assume that our script doesn’t write the
changed background color to the disk, but simply changes it in memory.
So you have typed in a new color, and in response, your script prints back the html with a new color - you
think that’s it! It was so simple. And if you keep running in single server mode you will never notice that
you have a problem...
If you run the same code in the multi server environment, after you submit the color change you will get
the result as expected, but when you will call the same URL again (not reload!) chances are that you will
get back the old color, since except the child who processed the color change request no one knows about
their global variable change. Just remember that children can’t share information, other than that which
they inherited from their parent on their load.
Also note that since the server is running in single mode, if the output returns HTML with <IMG> tags,
then the load of these will take a lot of time. If you use Netscape while your server is running in
single-process mode, HTTP’s KeepAlive feature gets in the way. Netscape tries to open multiple
connections and keep them open. Because there is only one server process listening, each connection has
to time-out before the next succeeds. Turn off KeepAlive in httpd.conf to avoid this effect while
developing or you can press STOP after a few seconds (assuming you use the image size params, so the
Netscape will be able to render the rest of the page).
In addition you should know that when running with -X you will not see any control messages that the
parent server normally writes to the error_log. (Like ‘‘server started, server stopped and etc’’.) Since
httpd -X causes the server to handle all requests itself, without forking any children, there is no
controlling parent to write status messages.
If you want to have -M test to count the time relative to the current request, you should reset the $^T vari-
able as with any other perl script. Just add $^T=time; at the beginning of the scripts.
38 29 Jun 1999
mod_perl tutorial: mod_perl Coding Guidelines 3.14 Handling the ’User pressed Stop button’ case
It’s important to notice that when the user hits the browser’s STOP button, the mod_perl script is bliss-
fully unaware until it tries to send some data to the browser. At that point, Apache realizes that the
browser is gone, and all the good cleanup stuff happens.
Starting from apache 1.3.6 apache will not catch SIGPIPE anymore and modperl will do it much better.
Here is something from CHANGES from Apache 1.3.6.
*) SIGPIPE is now ignored by the server core. The request write
routines (ap_rputc, ap_rputs, ap_rvputs, ap_rwrite, ap_rprintf,
ap_rflush) now correctly check for output errors and mark the
connection as aborted. Replaced many direct (unchecked) calls to
ap_b* routines with the analogous ap_r* calls. [Roy Fielding]
What happens if your mod_perl script has some global variables, that are being used for resource locking?
It’s possible not to notice the pitfall if the critical code section between lock and unlock is very short and
finishes fast, so you never see this happens (you aren’t fast enough to stop the code in the middle). But
look at the following scenario:
1. lock resource
<critical section starts>
2. sleep 20 (== do some time consuming processing)
<critical section ends>
3. unlock resource
If user presses STOP and Apache sends SIGPIPE before step 3, since we are in the mod_perl mode and
we want the lock variable to be cached, it will be not unlocked. A kind of deadlock exists.
Here is the working example. Run the server with -X, Press STOP before the count-up to 10 has been
finished. Then rerun the script, it’ll hang in while(1)! The resource is not available anymore.
use vars qw(%CACHE);
use CGI;
$|=1;
my $q = new CGI;
print $q->header,$q->start_html;
29 Jun 1999 39
3.15 Handling the server timeout cases and working with $SIG{ALRM} Stas Bekman
print $c++,"\n<BR>";
}
You may ask, what is the solution for this problem? All END blocks that are encountered during compila-
tion of Apache::Registry scripts are called after the script done is running, including subsequent
invocations when the script is cached in memory. So if you are running in Apache::Registry mode,
the following is your remedy:
END {
$CACHE{LOCK} = 0;
}
Notice that the END block will be run after the Apache::Registry::handler is finished (not
during the cleanup phase though).
If you are into a perl API, use the register_cleanup() method of Apache.
$r->register_cleanup(sub {$CACHE{LOCK} = 0;});
If you are into Apache API Apache-request->connection->aborted()> construct can be used to test for
the aborted connection.
I hope you noticed, that this example is very misleading, since there is a different instance of %CACHE in
every child, so if you modify it -- it is known only inside the same child, none of global %CACHE variables
in other children is getting affected. But if you are going to work with code that allows you to control vari-
ables that are being visible to every child (some external shared memory or other approach) -- the hazard
this example still applies. Make sure you unlock the resources either when you stop using them or when
the script is being aborted in the middle, before the actual unlocking is being happening.
40 29 Jun 1999
mod_perl tutorial: mod_perl Coding Guidelines 3.15 Handling the server timeout cases and working with $SIG{ALRM}
die $@ if $@;
But, as lately it was discovered local $SIG{’ALRM’} does not restore the original underlying C
handler. It was fixed in the mod_perl 1.19_01 (CVS version). As a matter of fact none of the local
$SIG{FOO} restore the original C handler.
;o)
29 Jun 1999 41
4 Performance. Benchmarks. Stas Bekman
4 Performance. Benchmarks.
42 29 Jun 1999
mod_perl tutorial: Performance. Benchmarks. 4.1 Performance: The Overall picture
From the other side, you might tune your scripts and webserver to process incoming requests ultra fast, so
you will need a little number of working servers, but you might find out that server processes are busy
waiting for slow clients to complete the download. You will see more examples in this chapter. My point
is that a web service is like car, if one of the details or mechanisms is broken the car will not drive
smoothly and it can even stop dead if pushed further without first fixing it.
It is very important to understand that your goal is not to have MaxRequestsPerChild to be 10000.
Having a child serving 300 requests on precompiled code is already a huge speedup, so if it is 100 or
10000 it does not really matter if it saves you the RAM by sharing. Do not forget that if you preload most
of your code at the server startup, the fork to spawn a new child will be very very fast, because it inherits
most of the preloaded code and the perl interpreter from the parent process. But then during the work of
the child, its memory pages (which aren’t really its yet, it uses the parent’s pages) are getting dirty (origi-
nally inherited and shared variables are getting updated/modified) and the copy-on-write happens, which
reduces the number of shared memory pages - thus enlarging the memory demands. Killing the child and
29 Jun 1999 43
4.3 Preload Perl modules at server startup Stas Bekman
respawning a new one, allows to get the pristine shared memory from the parent process again.
The conclusion is that MaxRequestsPerChild should not be too big, otherwise you loose the benefits
of the memory sharing.
But even a better approach is to create a separate startup file (where you code in plain perl) and put there
things like:
use DBI;
use Carp;
Then you require() this startup file with help of PerlRequire directive from httpd.conf, by
placing it before the rest of the mod_perl configuration directives:
PerlRequire /path/to/start-up.pl
CGI.pm is a special case. Ordinarily CGI.pm autoloads most of its functions on an as-needed basis. This
speeds up the loading time by deferring the compilation phase. However, if you are using mod_perl,
FastCGI or another system that uses a persistent Perl interpreter, you will want to precompile the methods
at initialization time. To accomplish this, call the package function compile() like this:
use CGI ();
CGI->compile(’:all’);
The arguments to compile() are a list of method names or sets, and are identical to those accepted by
the use() and import() operators. Note that in most cases you will want to replace ’:all’ with tag
names you really use in your code, since generally only a subset of subs is actually being used.
Here is an example of the use of this technique. This code is included in a PerlRequire’d file, and
walks the directory tree under which all registry scripts are installed. For each .pl file encountered, it
calls the Apache::RegistryLoader::handler() method to preload the script in the parent
server (before pre-forking the child processes):
44 29 Jun 1999
mod_perl tutorial: Performance. Benchmarks. 4.5 Avoid Importing Functions
my $status = $rl->handler($url);
unless($status == 200) {
warn "pre-load of ‘$url’ failed, status=$status\n";
}
}, $perl_dir);
}
Note that we didn’t use the second argument to handler() here, as module’s manpage suggests. To
make the loader smarter about the uri->filename translation, you might need to provide a trans() func-
tion to translate the uri to filename. URI to filename translation normally doesn’t happen until HTTP
request time, so the module is forced to roll its own translation. If filename is omitted and a trans()
routine was not defined, the loader will try using the URI relative to ServerRoot.
my $global = "FooAAA";
leak_test {
$$global = 1;
++$global;
};
The argument to leak_test() is an anonymous sub, so you can just throw it around any code you
suspect might be leaking. beware, it will run the code twice, because the first time in, new SVs are created,
but does not mean you are leaking, the second pass will give better evidence. you do not need to be inside
mod_perl to use it, from the command line, the above script outputs:
29 Jun 1999 45
4.7 Limiting the size, resource, speed of the processes Stas Bekman
Build a debuggable perl to see dumps of the SVs. the simple way to have both a normal perl and debug-
gable perl, is to follow hints in the SUPPORT doc for building libperld.a, when that is built copy the
perl from that directory to your perl bin directory, but name it dperl.
Since $dbh is a global variable for the child, once the child has opened the connection it will use it over
and over again, unless you perform disconnect().
Be careful to use different names for handlers if you open connection to different databases!
Apache::DBI allows you to make a persistent database connection. With this module enabled, every
connect() request to the plain DBI module will be forwarded to the Apache::DBI module. This
looks to see whether a database handle from a previous connect() request has already been opened,
and if this handle is still valid using the ping method. If these two conditions are fulfilled it just returns the
database handle. If there is no appropriate database handle or if the ping method fails, a new connection is
established and the handle is stored for later re-use. There is no need to delete the disconnect()
statements from your code. They will not do a thing, as the Apache::DBI module overloads the
46 29 Jun 1999
mod_perl tutorial: Performance. Benchmarks. 4.8 Persistent DB Connections
disconnect() method with a NOP. On child’s exit there is no explicit disconnect, the child dies and so
does the database connection. You may leave the use DBI; statement inside the scripts as well.
If you use DBI for DB connections, and you use Apache::DBI to make them persistent, it also allows
you to preopen connections to DB for each child with connect_on_init() method, thus saving up a
connection overhead on the very first request of every child.
use Apache::DBI ();
Apache::DBI->connect_on_init("DBI:mysql:test",
"login",
"passwd",
{
RaiseError => 1,
PrintError => 0,
AutoCommit => 1,
}
);
This can be used as a simple way to have apache children establish connections on server startup. This call
should be in a startup file require()d by PerlRequire or inside <Perl> section. It will establish a
connection when a child is started in that child process. See the Apache::DBI manpage to see the
requirements for this method.
Another problem is with timeouts: some databases disconnect the client after a certain time of inactivity.
This problem is known as morning bug. The ping() method ensures that this will not happen. Some
DBD drivers don’t have this method, check the Apache::DBI manpage to see how to write a ping()
method.
Another approach is to change the client’s connection timeout. For mysql users, starting from
mysql-3.22.x you can set a wait_timeout option at mysqld server startup to change the default value.
Setting it to 36 hours probably would fix the timeout problem.
29 Jun 1999 47
4.9 Benchmarks. Impressing your Boss and Colleagues. Stas Bekman
If you are going to write your own benchmarking utility -- use Benchmark module for heavy scripts and
Time::HiRes module for very fast scripts (faster than 1 sec) where you need better time precision.
There is no need to write a special benchmark though. If you want to impress your boss or colleagues, just
take some heavy CGI script you have (e.g. a script that crunches some data and prints the results to
STDOUT), open 2 xterms and call the same script in mod_perl mode in one xterm and in mod_cgi mode
in the other. You can use lwp-get from LWP package to emulate the web agent (browser). (bench-
mark directory of mod_perl distribution includes such an example)
All the above parameters should be specified on the basis of the resources you have. While with a plain
apache server, there is no big deal if you run too many servers (not too many of course) since the
processes are of ~1Mb and aren’t eating a lot of your RAM. Generally the numbers are even smaller if
memory sharing is taking place. The situation is different with mod_perl. I have seen mod_perl processes
of 20Mb and more. Now if you have MaxClients set to 50: 50x20Mb = 1Gb - do you have 1Gb of
RAM? Probably not. So how do you tune these parameters? Generally by trying different combinations
48 29 Jun 1999
mod_perl tutorial: Performance. Benchmarks. 4.10.1 Tuning with ab - ApacheBench
and benchmarking the server. Again mod_perl processes can be of much smaller size if sharing is in place.
Before you start this task you should be armed with a proper weapon. You need a crashme utility, which
will load your server with mod_perl scripts you possess. You need it to have an ability to emulate a
multiuser environment and to emulate multiple clients behavior which will call the mod_perl scripts at
your server simultaneously. While there are commercial solutions, you can get away with free ones which
do the same job. You can use an ApacheBench ab utility that comes within apache distribution or use a
tool based on LWP::Parallel::UserAgent.
Another important issue is to make sure to run testing client (load generator) on a system that is more
powerful than the system being tested. After all we are trying to simulate the Internet users, where many
users are trying to reach your service at once -- since a number of concurrent users can be quite large, your
testing machine much be very powerful and capable to generate a heavy load. Of course you should not
run the clients and the server on the same machine. If you do -- your testing results would be incorrect,
since clients will eat a CPU and a memory that have to be dedicated to the server, and vice versa.
Let’s try it. We will simulate 10 users concurrently requesting a very light script at
www.nowhere.com:81/test/test.pl. Each ‘‘user’’ makes 10 requests.
% ./ab -n 100 -c 10 www.nowhere.com:81/test/test.pl
29 Jun 1999 49
4.10.1 Tuning with ab - ApacheBench Stas Bekman
Let’s raise the load of requests to 100 x 10 (10 users, each makes 100 requests)
% ./ab -n 1000 -c 10 www.nowhere.com:81/perl/access/access.cgi
Concurrency Level: 10
Complete requests: 1000
Failed requests: 0
Requests per second: 139.76
As expected nothing changes -- we have the same 10 concurrent users. Now let’s raise the number of
concurrent users to 50:
% ./ab -n 1000 -c 50 www.nowhere.com:81/perl/access/access.cgi
Complete requests: 1000
Failed requests: 0
Requests per second: 133.01
We see that the server is capable of serving 50 concurrent users at an amazing 133 req/sec! Let’s find the
upper boundary. Using -n 10000 -c 1000 failed to get results (Broken Pipe?). Using -n 10000
-c 500 derived 94.82 req/sec. The server’s performance went down with the high load.
Now let’s kill a child after a single request, we will use the following configuration:
MinSpareServers 8
MaxSpareServers 6
StartServers 10
MaxClients 100
MaxRequestsPerChild 1
The benchmark timed out with the above configuration.... I watched the output of ps as I ran it, the parent
process just wasn’t capable of respawning the killed children at that rate...When I raised the MaxRe-
questsPerChild to 10 I’ve got 8.34 req/sec - very bad (18 times slower!) (You can’t benchmark the
importance of the MinSpareServers, MaxSpareServers and StartServers with this kind of
test).
50 29 Jun 1999
mod_perl tutorial: Performance. Benchmarks. 4.10.1 Tuning with ab - ApacheBench
Now let’s try to return MaxRequestsPerChild to 1500, but to lower the MaxClients to 10 and run
the same test:
MinSpareServers 8
MaxSpareServers 6
StartServers 10
MaxClients 10
MaxRequestsPerChild 1500
I’ve got 27.12 req/sec, which is better but still 4-5 times slower (133 with MaxClients of 50)
MinSpareServers, MaxSpareServers and StartServers are only important for user response
times (sometimes user will have to wait a bit).
Also it is important to understand that we didn’t test the response times in the tests above, but the ability
of the server to respond under a heavy load of requests. If the script that was used to test was heavier, the
numbers would be different but the conclusions are very similar.
After each server restart when I did changes to the server’s configurations, I made sure the scripts were
preloaded by fetching a script at least once by every child.
It is important to notice that none of requests timed out, even if was kept in server’s queue for more than 1
minute! (That is the way ab works, which is OK for the testing purposes but will be unacceptable in the
real world - users will not wait for more than 5-10 secs for a request to complete, and the client (browser)
will timeout in a few minutes.)
Now let’s take a look at some real code whose execution time is more than a few millisecs. We will do
real testing and collect the data in tables for easier viewing.
29 Jun 1999 51
4.10.1 Tuning with ab - ApacheBench Stas Bekman
Running a mod_perl script with lots of mysql queries (the script under test is mysqld bounded)
(http://www.nowhere.com:81/perl/access/access.cgi?do_sub=query_form), with configuration:
MinSpareServers 8
MaxSpareServers 16
StartServers 10
MaxClients 50
MaxRequestsPerChild 5000
gives us:
NR NC RPS comment
------------------------------------------------
10 10 3.33 # not a reliable statistics
100 10 3.94
1000 10 4.62
1000 50 4.09
Conclusions: Here I wanted to show that when the application is slow -- not due to perl loading, code
compilation and execution, but bounded to some external operation like mysqld querying which made the
bottleneck -- it almost does not matter what load we place on the server. The RPS (Requests per second) is
almost the same (given that all the requests have been served, you have an ability to queue the clients, but
be aware that something that goes to queue means a waiting client and a client (browser) that might time
out!)
Now we will benchmark the same script without using the mysql (perl only bounded code)
(http://www.nowhere.com:81/perl/access/access.cgi), it’s the same script that just returns a HTML form,
without making any SQL queries.
MinSpareServers 8
MaxSpareServers 16
StartServers 10
MaxClients 50
MaxRequestsPerChild 5000
NR NC RPS comment
------------------------------------------------
10 10 26.95 # not a reliable statistics
100 10 30.88
1000 10 29.31
1000 50 28.01
1000 100 29.74
10000 200 24.92
100000 400 24.95
52 29 Jun 1999
mod_perl tutorial: Performance. Benchmarks. 4.10.1 Tuning with ab - ApacheBench
Conclusions: This time the script we executed was pure perl (not bounded to I/O or mysql), so we see that
the server serves the requests much faster. You can see the RequestPerSecond (RPS) is almost the
same for any load, but goes lower when the number of concurrent clients goes beyond the MaxClients.
With 25 RPS, the client supplying a load of 400 concurrent clients will be served in 16 secs. But to get
more realistic and assume the max concurrency of 100, with 30 RPS, the client will be served in 3.5 secs,
which is pretty good for a highly loaded server.
Now we will use the server for its full capacity, by keeping all MaxClients alive all the time and
having a big MaxRequestsPerChild, so no server will be killed during the benchmarking.
MinSpareServers 50
MaxSpareServers 50
StartServers 50
MaxClients 50
MaxRequestsPerChild 5000
NR NC RPS comment
------------------------------------------------
100 10 32.05
1000 10 33.14
1000 50 33.17
1000 100 31.72
10000 200 31.60
Conclusion: In this scenario there is no overhead involving the parent server loading new children, all the
servers are available, and the only bottleneck is contention for the CPU.
Now we will try to change the MaxClients and to watch the results: Let’s reduce MC to 10.
MinSpareServers 8
MaxSpareServers 10
StartServers 10
MaxClients 10
MaxRequestsPerChild 5000
NR NC RPS comment
------------------------------------------------
10 10 23.87 # not a reliable statistics
100 10 32.64
1000 10 32.82
1000 50 30.43
1000 100 25.68
1000 500 26.95
2000 500 32.53
Conclusions: Very little difference! Almost no change! 10 servers were able to serve almost with the same
throughput as 50 servers. Why? My guess it’s because of CPU throttling. It seems that 10 servers were
serving requests 5 times faster than when in the test above we worked with 50 servers. In the case above
each child received its CPU time slice 5 times less frequently. So having a big value for MaxClients,
doesn’t mean that the performance will be better. You have just seen the numbers!
29 Jun 1999 53
4.10.2 Tuning with crashme script Stas Bekman
NR NC RPS comment
------------------------------------------------
100 10 1.12
1000 50 1.14
1000 100 1.13
Conclusion: mod_cgi is much slower :) in test NReq/NClients 100/10 the RPS in mod_cgi was of 1.12 and
in mod_perl of 32, which is 30 times faster!!! In the first test each child waited about 100 secs to be
served. In the second and third 1000 secs!
The tool provides the same results as ab above but it also allows you to set the timeout value, so requests
will fail if not served within the time out period. You also get Latency (secs/Request) and Throughput
(Requests/sec) numbers. It can give you a better picture and make a complete simulation of your favorite
Netscape browser :).
I have noticed while running these 2 benchmarking suites - ab gave me results 2.5-3.0 times better. Both
suites run on the same machine with the same load with the same parameters. But the implementations are
different.
54 29 Jun 1999
mod_perl tutorial: Performance. Benchmarks. 4.10.2 Tuning with crashme script
Sample output:
URL(s): http://www.nowhere.com:81/perl/access/access.cgi
Total Requests: 100
Parallel Agents: 10
Succeeded: 100 (100.00%)
Errors: NONE
Total Time: 9.39 secs
Throughput: 10.65 Requests/sec
Latency: 0.85 secs/Request
use LWP::Parallel::UserAgent;
use Time::HiRes qw(gettimeofday tv_interval);
use strict;
###
# Configuration
###
my $nof_parallel_connections = 10;
my $nof_requests_total = 100;
my $timeout = 10;
my @urls = (
’http://www.nowhere.com:81/perl/faq_manager/faq_manager.pl’,
’http://www.nowhere.com:81/perl/access/access.cgi’,
);
##################################################
# Derived Class for latency timing
##################################################
package MyParallelAgent;
@MyParallelAgent::ISA = qw(LWP::Parallel::UserAgent);
use strict;
###
# Is called when connection is opened
###
sub on_connect {
my ($self, $request, $response, $entry) = @_;
$self->{__start_times}->{$entry} = [Time::HiRes::gettimeofday];
}
###
# Are called when connection is closed
###
sub on_return {
my ($self, $request, $response, $entry) = @_;
my $start = $self->{__start_times}->{$entry};
$self->{__latency_total} += Time::HiRes::tv_interval($start);
}
29 Jun 1999 55
4.10.2 Tuning with crashme script Stas Bekman
sub on_failure {
on_return(@_); # Same procedure
}
###
# Access function for new instance var
###
sub get_latency_total {
return shift->{__latency_total};
}
##################################################
package main;
##################################################
###
# Init parallel user agent
###
my $ua = MyParallelAgent->new();
$ua->agent("pounder/1.0");
$ua->max_req($nof_parallel_connections);
$ua->redirect(0); # No redirects
###
# Register all requests
###
foreach (1..$nof_requests_total) {
foreach my $url (@urls) {
my $request = HTTP::Request->new(’GET’, $url);
$ua->register($request);
}
}
###
# Launch processes and check time
###
my $start_time = [gettimeofday];
my $results = $ua->wait($timeout);
my $total_time = tv_interval($start_time);
###
# Requests all done, check results
###
my $succeeded = 0;
my %errors = ();
56 29 Jun 1999
mod_perl tutorial: Performance. Benchmarks. 4.10.3 Choosing MaxClients
###
# Format errors if any from %errors
###
my $errors = join(’,’, map "$_ ($errors{$_})", keys %errors);
$errors = "NONE" unless $errors;
###
# Format results
###
my ($left, $right);
###
# Print out statistics
###
format STDOUT =
@<<<<<<<<<<<<<<< @*
"$left:", $right
.
29 Jun 1999 57
4.10.4 Choosing MaxRequestsPerChild Stas Bekman
So if I have 400Mb left for the webserver to run with, I can set the MaxClients to be of 40 if I know
that each child is bounded to the 10Mb of memory (e.g. with Apache::SizeLimit).
Certainly you will wonder what happens to your server if there are more than MaxClients concurrent
users at some moment. This situation is accompanied by the following warning message into the
error.log file:
[Sun Jan 24 12:05:32 1999] [error] server reached MaxClients setting,
consider raising the MaxClients setting
There is no problem -- any connection attempts over the MaxClients limit will normally be queued, up
to a number based on the ListenBacklog directive. Once a child process is freed at the end of a differ-
ent request, the connection will then be served.
But it is an error because clients are being put in the queue rather than getting served at once, despite the
fact that they do not get an error response. The error can be allowed to persist to balance available system
resources and response time, but sooner or later you will need to get more RAM so you can start more
children. The best approach is to try not to have this condition reached at all, and if reached you should
start to worry about it.
It’s important to understand how much real memory a child occupies. Your children can share the
memory between them (when OS supports that and you take action to allow the sharing happen. If this is
the case, chances are that your MaxClients can be even higher. But it seems that it’s not so simple to
calculate the absolute number. (If you come up with solution please let us know!). If the shared memory
was of the same size through the child’s life, we could derive a much better formula:
MaxClients=(Total_RAM+Shared_RAM_per_Child*MaxClients)/Max_Process_Size-1
which is:
MaxClients=(Total_RAM-Max_Process_Size)/(Max_Process_Size-Shared_RAM_per_Child)
Setting MaxRequestsPerChild to a non-zero limit has two beneficial effects: it solves memory leak-
ages and helps reduce the number of processes when the server load reduces.
The first reason is the most crucial for mod_perl, since sloppy programming will cause a child process to
consume more memory after each request. If left unbounded, then after a certain number of requests the
children will use up all the available memory and leave the server to die from memory starvation. Note,
that sometimes standard system libraries leak memory too, especially on OSes with bad memory manage-
ment (e.g. Solaris 2.5 on x86 arch). If this is your case you can set MaxRequestsPerChild to a small
number, which will allow the system to reclaim the memory, greedy child process consumed, when it
exits after MaxRequestsPerChild requests. But beware -- if you set this number too low, you will
loose the speed bonus you receive with mod_perl. Consider using Apache::PerlRun if this is the case.
58 29 Jun 1999
mod_perl tutorial: Performance. Benchmarks. 4.10.5 Choosing MinSpareServers, MaxSpareServers and StartServers
Also setting MaxSpareServers to a number close to MaxClients, will improve the response time
(but your parent process will be busy respawning new children all the time!)
Another approach is to use Apache::SizeLimit module. By using it, you should be able to discon-
tinue using the MaxRequestsPerChild, although for some folks, using both in combination does the
job.
Let’s try some numbers. For a heavily loaded web site and a dedicated machine I would think of (note
400Mb is just for example):
Available to webserver RAM: 400Mb
Child’s memory size bounded: 10Mb
MaxClients: 400/10 = 40 (larger with mem sharing)
StartServers: 20
MinSpareServers: 20
MaxSpareServers: 35
However if I want to use the server for many other tasks, but make it capable of handling a high load, I’d
think of:
Available to webserver RAM: 400Mb
Child’s memory size bounded: 10Mb
MaxClients: 400/10 = 40
StartServers: 5
MinSpareServers: 5
MaxSpareServers: 10
(These numbers are taken off the top of my head, and it shouldn’t be used as a rule, but rather as examples
to show you some possible scenarios. Use this information wisely!)
29 Jun 1999 59
4.10.6 Summary of Benchmarking to tune all 5 parameters Stas Bekman
MaxRequestsPerChild
If your scripts are clean and don’t leak memory, set this variable to a number as large as possible
(10000?). If you use Apache::SizeLimit, you can set this parameter to 0 (equal to infinity).
You will want this parameter to be smaller if your code becomes unshared over the process’ life.
StartServers
If you keep a small number of servers active most of the time, keep this number low. Especially if
MaxSpareServers is low as it’ll kill the just loaded servers before they were utilized at all (if
there is no load). If your service is heavily loaded, make this number close to MaxClients (and
keep MaxSpareServers equal to MaxClients as well.)
MinSpareServers
If your server performs other work besides web serving, make this low so the memory of unused
children will be freed when there is no big load. If your server’s load varies (you get loads in bursts)
and you want fast response for all clients at any time, you will want to make it high, so that new chil-
dren will be respawned in advance and be waiting to handle bursts of requests.
MaxSpareServers
The logic is the same as of MinSpareServers - low if you need the machine for other tasks, high
if it’s a dedicated web host and you want a minimal response delay.
MaxClients
Not too low, so you don’t get into a situation where clients are waiting for the server to start serving
them (they might wait, but not for too long). Do not set it too high, since if you get a high load and all
requests will be immediately granted and served, your CPU will have a hard time keeping up, and if
the child’s size * number of running children is larger than the total available RAM, your server will
start swapping (which will slow down everything, which in turn will make things even more slower,
until eventually your machine will die). It’s important that you take pains to ensure that swapping
does not normally happen. Swap space is an emergency pool, not a resource to be used on a consis-
tent basis. If you are low on memory and you badly need it - buy it, memory is amazingly cheap
these days.
But based on the test I conducted above, even if you have plenty of memory like I have (1Gb),
increasing MaxClients sometimes will give you no speedup. The more clients are running, the
more CPU time will be required, the less CPU time slices each process will receive. The response
latency (the time to respond to a request) will grow, so you won’t see the expected improvement. The
best approach is to find the minimum requirement for your kind of service and the maximum capabil-
ity of your machine. Then start at the minimum and test like I did, successively raising this parameter
until you find the point on the curve of the graph of the latency or/and throughput where the improve-
60 29 Jun 1999
mod_perl tutorial: Performance. Benchmarks. 4.11 Using $|=1 under mod_perl and better print() techniques.
ment becomes smaller. Stop there and use it. Of course when you use these parameters in production
server, you will have the ability to tune them more precisely, since then you will see the real
numbers. Also don’t forget that if you add more scripts, or just modify the running ones -- most prob-
ably that the parameters need to be recalculated, since the processes will grow in size as you compile
in more code.
If you are using a _bad_ style in generating output, which consist of multiple print() calls, or you just
have too many of them, you will experience a degradation in performance. The severity depends on the
number of the calls you make.
which reveals the following drawbacks: multiple print() calls - performance degradation with $|=1,
backslashism which makes the code less readable and more difficult to format the HTML to be easily
readable as CGI’s output. The code below solves them all:
print qq{
<BODY BGCOLOR="black" TEXT="white">
<H1>
Hello
</H1>
<A HREF="foo.html"> foo </A>
</BODY>
};
I guess you see the difference. Be careful though, when printing a <HTML> tag. The correct way is:
print qq{<HTML>
<HEAD></HEAD>
<BODY>
}
29 Jun 1999 61
4.12 Profiling Stas Bekman
print qq{
<HTML>
<HEAD></HEAD>
<BODY>
}
Some older browsers might not accept the output as HTML, but rather print it as a plain text, since they
expect the first characters after the headers and empty line to be <HTML> and not spaces and/or additional
newline and then <HTML>. Even if it works with your browser, it might not work for others.
Now let’s go back to the $|=1 topic. I still disable buffering, for 2 reasons: I use few print() calls by
printing out multiline HTML and not a line per print() and I want my users to see the output immedi-
ately. So if I am about to produce the results of the DB query, which might take some time to complete, I
want users to get some titles ahead. This improves the usability of my site. Recall yourself: What do you
like better: getting the output a bit slower, but steadily from the moment you’ve pressed the Submit button
or having to watch the ‘‘falling stars’’ for awhile and then to receive the whole output at once, even a few
millisecs faster (if the client (browser) did not time out till then).
Conclusion: Do not blindly follow suggestions, but think what is best for you in every given case.
4.12 Profiling
Profiling process helps you to determine which subroutines or just snippets of code take the longest execu-
tion time and which subroutines are being called most often. Probably you will want to optimize those,
and to improve the code toward efficiency.
It is possible to profile code running under mod_perl with the Devel::DProf module, available on
CPAN. However, you must have apache version 1.3b3 or higher and the PerlChildExitHandler
enabled (during the httpd build process). When the server is started, Devel::DProf installs an END
block to write the tmon.out file. This block will be called at the server shutdown. Here is how to start
and stop a server with the profiler enabled:
% setenv PERL5OPT -d:DProf
% httpd -X -d ‘pwd‘ &
... make some requests to the server here ...
% kill ‘cat logs/httpd.pid‘
% unsetenv PERL5OPT
% dprofpp
The Devel::DProf package is a Perl code profiler. It will collect information on the execution time of
a Perl script and of the subs in that script (remember that print() and map() are just like any other
subroutines you write, but they are come bundled with Perl!)
Another approach is to use Apache::DProf, which hooks Devel::DProf into mod_perl. The
Apache::DProf module will run a Devel::DProf profiler inside each child server and write the
tmon.out file in the directory $ServerRoot/logs/dprof/$$ when the child is shutdown (where
$$ is a number of the child process). All it takes is to add to httpd.conf:
62 29 Jun 1999
mod_perl tutorial: Performance. Benchmarks. 4.13 Sending plain HTML as a compressed output
PerlModule Apache::DProf
Remember that any PerlHandler that was pulled in before Apache::DProf in the httpd.conf or
<startup.pl>, would not have its code debugging info inserted. To run dprofpp, chdir to $Server-
Root/logs/dprof/$$ and run:
% dprofpp
Apache::GzipChain comes to help you with this task. If a client (browser) understands gzip encod-
ing this module compresses the output and sends it downstream. A client decompresses the data upon
receive and renders the HTML as if it was a plain HTML fetch.
Remember that it will work only if the browser claims to accept compressed input, thru
Accept-Encoding header. Apache::GzipChain keeps a list of user-agents, thus it also looks at
User-Agent header, for known to accept compressed output browsers.
For example if you want to return compressed files which should pass in addition through Embperl
module, you would write:
<Location /test>
SetHandler perl-script
PerlHandler Apache::OutputChain Apache::GzipChain Apache::EmbperlChain Apache::PassFile
</Location>
Hint: Watch an access_log file to see how many bytes were actually send, compare with a regular
configuration send.
;o)
29 Jun 1999 63
5 Choosing the Right Strategy Stas Bekman
64 29 Jun 1999
mod_perl tutorial: Choosing the Right Strategy 5.1 Do it like me?!
To be clear, I will not address all technologies and tools, but limit this discussion to those complementing
mod_perl.
Please let me stress it again: DO NOT blindly copy someone’s setup and hope for a good result. Choose
what is best for your situation -- it might take some effort to find it out.
1. Having one binary and one config file (one big binary for mod_perl).
2. Having two binaries and two config files (one big binary for mod_perl and one small for static
objects like images.)
3. Having one DSO-style binary, mod_perl loadable object and two config files (Dynamic linking lets
you compile once and have a big and a small binary in memory BUT you have to deal with a freshly
made solution that has weak documentation and is still subject to change and is rather more
complex.)
4. Any of the above plus a reverse proxy server in http accelerator mode.
If you are a newbie, I would recommend that you start with the first option and work on getting your feet
wet with apache and mod_perl. Later, you can decide whether to move to the second one which allows
better tuning at the expense of more complicated administration, or to the third option -- the more
state-of-the-art-yet-suspiciously-new DSO system, or to the forth option which gives you even more
power.
1. The first option will kill your production site if you serve a lot of static data with ~2-12 MB
webserver processes. On the other hand, while testing you will have no other server interaction to
mask or add to your errors.
2. The second option allows you to seriously tune the two servers for maximum performance. On the
other hand you have to deal with proxying or fancy site design to keep the two servers in synchro-
nization. In this configuration, you also need to choose between running the two servers on multiple
29 Jun 1999 65
5.3 Standalone mod_perl Enabled Apache Server Stas Bekman
ports, multiple IPs, etc... This adds the burden of administrating more than one server.
3. The third option (DSO) -- as mentioned above -- means playing with the bleeding edge. In addition
mod_so (the DSO module) adds size and complexity to your binaries. With DSO, modules can be
added and removed without recompiling the server, and modules are even shared among multiple
servers. Again, it is bleeding edge and still somewhat platform specific, but your mileage may vary.
4. The fourth option (proxy in http accelerator mode), once correctly configured and tuned, improves
the performance of any of the above three options by caching and buffering page results.
The rest of this section discusses the pros and the cons of each of these presented configurations.
The advantages:
Simplicity. You just follow the installation instructions, configure it, restart the server and you are
done.
No network changes. You do not have to worry about using additional ports as we will see later.
Speed. You get a very fast server, you see an enormous speedup from the first moment you start to
use it.
The disadvantages:
The process size of a mod_perl-enabled Apache server is huge (starting from 4Mb at startup and
growing to 10Mb and more, depending on how you use it) compared to the typical plain Apache. Of
course if memory sharing is in place -- RAM requirements will be smaller.
You probably have a few tens of children processes. The additional memory requirements add up in
direct relation to the number of children processes. Your memory demands are growing by an order
of magnitude, but this is the price you pay for the additional performance boost of mod_perl. With
memory prices so cheap nowadays, the additional cost is low -- especially when you consider the
dramatic performance boost mod_perl gives to your services with every 100Mb of RAM you add.
While you will be happy to have these monster processes serving your scripts with monster speed,
you should be very worried about having them serve static objects such as images and html files.
Each static request served by a mod_perl-enabled server means another large process running,
competing for system resources such as memory and CPU cycles. The real overhead depends on
static objects request rate. Remember that if your mod_perl code produces HTML code which
includes images, each one will turn into another static object request. Having another plain webserver
to serve the static objects solves this not pleasant obstacle. Having a proxy server as a front end,
caching the static objects and freeing the mod_perl processes from this burden is another solution.
66 29 Jun 1999
mod_perl tutorial: Choosing the Right Strategy 5.4 One Plain and One mod_perl-enabled Apache Servers
Another drawback of this approach is that when serving output to a client with a slow connection, the
huge mod_perl-enabled server process (with all of its system resources) will be tied up until the
response is completely written to the client. While it might take a few milliseconds for your script to
complete the request, there is a chance it will be still busy for some number of seconds or even
minutes if the request is from a slow connection client. As in the previous drawback, a proxy solution
can solve this problem. More on proxies later.
Proxying dynamic content is not going to help much if all the clients are on a fast local net (for
example, if you are administering an Intranet.) On the contrary, it can decrease performance. Still,
remember that some of your Intranet users might work from home through the slow modem links.
If you are new to mod_perl, this is probably the best way to get yourself started.
And of course, if your site is serving only mod_perl scripts (close to zero static objects, like images), this
might be the perfect choice for you!
Using these large processes to serve static objects like images and html documents is overkill. A better
approach is to run two servers: a very light, plain apache server to serve static objects and a heavier
mod_perl-enabled apache server to serve requests for dynamic (generated) objects (aka CGI).
From here on, I will refer to these two servers as httpd_docs (vanilla apache) and httpd_perl (mod_perl
enabled apache).
The advantages:
The heavy mod_perl processes serve only dynamic requests, which allows the deployment of fewer
of these large servers.
MaxClients, MaxRequestsPerChild and related parameters can now be optimally tuned for
both httpd_docs and httpd_perl servers, something we could not do before. This allows us to
fine tune the memory usage and get a better server performance.
Now we can run many lightweight httpd_docs servers and just a few heavy httpd_perl
servers.
An important note: When user browses static pages and the base URL in the Location window points to
the static server, for example http://www.nowhere.com/index.html -- all relative URLs (e.g.
<A HREF="/main/download.html">) are being served by the light plain apache server. But this is
not the case with dynamically generated pages. For example when the base URL in the Location window
points to the dynamic server -- (e.g. http://www.nowhere.com:8080/perl/index.pl) all
29 Jun 1999 67
5.5 One light non-Apache and One mod_perl enabled Apache Servers Stas Bekman
relative URLs in the dynamically generated HTML will be served by the heavy mod_perl processes. You
must use a fully qualified URLs and not the relative ones!
http://www.nowhere.com/icons/arrow.gif is a full URL, while /icons/arrow.gif is a
relative one. Using <BASE HREF="http://www.nowhere.com/"> in the generated HTML is
another way to handle this problem. Also the httpd_perl server could rewrite the requests back to
httpd_docs (much slower) and you still need an attention of the heavy servers. This is not an issue if
you hide the internal port implementations, so client sees only one server running on port 80.
The disadvantages:
An administration overhead.
A need for two different sets of configuration, log and other files. We need a special directory
layout to manage these. While some directories can be shared between the two servers (like the
include directory, containing the apache include files -- assuming that both are built from the
same source distribution), most of them should be separated and the configuration files updated
to reflect the changes.
If you are processing log files, now you probably will have to merge the two separate log files
into one before processing them.
We still have the problem of a mod_perl process spending its precious time serving slow clients,
when the processing portion of the request was completed long time ago, exactly as in the one server
approach. Deploying a proxy solves this, and will be covered in the next sections.
As with only one server approach, this is not a major disadvantage if you are on a fast local Intranet.
It is likely that you do not want a buffering server in this case.
Before you go on with this solution you really want to look at the standalone webserver plus proxy solu-
tion.
The Advantages:
68 29 Jun 1999
mod_perl tutorial: Choosing the Right Strategy 5.6 Adding a Proxy Server in http Accelerator Mode
More memory saving. Apache is about 4 times bigger then thttpd, if you spawn 30 children you use
about 30M of memory, while thttpd uses only 260k - 100 times less! You could use the saved 30M
to run more mod_perl servers.
Note that this is not true if your OS supports memory sharing and you configured apache to use it (it
is a DSO approach. There is no memory sharing if apache modules are being statically compiled into
httpd). If you do allow memory sharing -- 30 light apache servers ought to use about 3-4Mb only,
because most of it will be shared. If this is the case -- the save ups are much smaller with thttpd.
Reported to be about 5 times faster then plain apache serving static objects.
The Disadvantages:
Lacks some of apache’s features, like access control, error redirection, customizable log file formats,
and so on.
Now I am going to convince you that you want to use a proxy server (in the http accelerator mode). The
advantages are:
Allow serving of static objects from the proxy’s cache (objects that previously were entirely served
by the httpd_docs server).
You get less I/O activity reading static objects from the disk (proxy serves the most ‘‘popular’’
objects from the RAM memory - of course you benefit more if you allow the proxy server to
consume more RAM). Since you do not wait for the I/O to be completed you are able to serve the
static objects much faster.
The proxy server acts as a sort of output buffer for the dynamic content. The mod_perl server sends
the entire response to the proxy and is then free to deal with other requests. The proxy server is
responsible for sending the response to the browser. So if the transfer is over a slow link, the
mod_perl server is not waiting around for the data to move.
Using numbers is always more convincing :) Let’s take a user connected to your site with 28.8 kbps
(bps == bits/sec) modem. It means that the speed of the user’s link is 28.8/8 = 3.6 kbytes/sec. I
assume an average generated HTML page to be of 10kb (kb == kilobytes) and an average script that
generates this output in 0.5 secs. How much time will the server wait before the user gets the whole
29 Jun 1999 69
5.6 Adding a Proxy Server in http Accelerator Mode Stas Bekman
output response? A simple calculation reveals pretty scary numbers - it will have to wait for another
6 secs (20kb/3.6kb), when it could serve another 12 (6/0.5) dynamic requests in this time. This very
simple example shows us that we need a twelve the number of children running, which means you
will need only one twelve of the memory (which is not quite true because some parts of the code are
being shared). But you know that nowadays scripts return pages which sometimes are being blown
up with javascript code and similar, which makes them of 100kb size and download time to be of...
(This calculation is left to you as an exercise :)
To make your download time numbers even worse, let me remind you that many users like to open
many browser windows and do many things at once (download files and browse heavy sites). So the
speed of 3.6kb/sec we were assuming before, may many times be 5-10 times slower.
Also we are going to hide the details of the server’s implementation. Users will never see ports in the
URLs (more on that topic later). And you can have a few boxes serving the requests, and only one
serving as a front end, which spreads the jobs between the servers in a way you configured it too. So
you can actually put down one server down for upgrade, but end user will never notice that because
the front end server will dispatch the jobs to other servers. (Of course this is a pretty big issue, and it
would not be discussed in the scope of this document)
For security reasons, using any httpd accelerator (or a proxy in httpd accelerator mode) is essential
because you do not let your internal server get directly attacked by arbitrary packets from whomever.
The httpd accelerator and internal server communicate in expected HTTP requests. This allows for
only your public ‘‘bastion’’ accelerating www server to get hosed in a successful attack, while
leaving your internal data safe.
Of course there are drawbacks. Luckily, these are not functionality drawbacks, but more of adminis-
tration hassle. You add another daemon to worry about, and while proxies are generally stable, you
have to make sure to prepare proper startup and shutdown scripts, which are being run at the boot and
reboot appropriately. Also, maybe a watchdog script running at the crontab.
Proxy servers can be configured to be light or heavy, the admin must decide what gives the highest
performance for his application. A proxy server like squid is light in the concept of having only one
process serving all requests. But it can appear pretty heavy when it loads objects into memory for
faster service.
Have I succeeded in convincing you that you want the proxy server?
If you are on a local area network (LAN), then the big benefit of the proxy buffering the output and
feeding a slow client is gone. You are probably better off sticking with a straight mod_perl server in this
case.
As of this writing the two proxy implementations are known to be used in bundle with mod_perl - squid
proxy server and mod_proxy which is a part of the apache server. Let’s compare the two of them.
70 29 Jun 1999
mod_perl tutorial: Choosing the Right Strategy 5.7 The Squid Server
Caching of static objects. So these are being served much faster assuming that your cache size is big
enough to keep the most requested objects in the cache.
Buffering of dynamic content, by taking the burden of returning the content generated by mod_perl
servers to slow clients, thus freeing mod_perl servers from waiting for the slow clients to download
the data. Freed servers immediately switch to serve other requests, thus your number of required
servers goes dramatically down.
Non-linear URL space / server setup. You can use Squid to play some tricks with the URL space
and/or domain based virtual server support.
The Disadvantages:
Proxying dynamic content is not going to help much if all the clients are on a fast local net. Also, a
message on the squid mailing list implied that squid only buffers in 16k chunks so it would not allow
a mod_perl to complete immediately if the output is larger.
Speed. Squid is not very fast today when compared to plain file based web servers available. Only if
you are using a lot of dynamic features such as mod_perl or similar speed is a reason to use Squid,
and then only if the application and server is designed with caching in mind.
HTTP protocol level. Squid is pretty much a HTTP/1.0 server, which seriously limits the deployment
of HTTP/1.1 features.
HTTP headers / dates, freshness. The squid server might give out ‘‘old’’ pages, confusing down-
stream/client caches. Also chances are that you will be giving out stale pages.
Stability. Compared to plain web servers Squid is not the most stable.
The presented pros and cons lead to an idea, that probably you might want squid more for its dynamic
content buffering features, but only if your server serves mostly dynamic requests. So in this situation it is
better to have a plain apache server serving static objects, and squid proxying the mod_perl enabled server
only. At least when performance is the goal.
29 Jun 1999 71
5.8 An Apache’s mod_proxy Stas Bekman
process to proxy to more than one back end, but there is a problem in fixing redirects in a way that keeps
the client’s view of both server names and port numbers in all cases. The difficult case being where you
have DNS aliases that map to the same IP address for an alias and you want the redirect to use port 80
(when the server is really on a different port) but you want it to keep the specific name the browser sent so
it does not change in the client’s location window.
The Advantages:
No additional server is needed. We keep the one plain plus one mod_perl enabled apache servers. All
you need is to enable the mod_proxy in the httpd_docs server and add a few lines to
httpd.conf file.
ProxyPass and ProxyPassReverse directives allow you to hide the internal redirects, so if
http://nowhere.com/modperl/ is actually http://localhost:81/modperl/, it will
be absolutely transparent for user. ProxyPass redirects the request to the mod_perl server, and
when it gets the respond, ProxyPassReverse rewrites the URL back to the original one, e.g:
ProxyPass /modperl/ http://localhost:81/modperl/
ProxyPassReverse /modperl/ http://localhost:81/modperl/
It even does caching. You have to produce correct Content-Length, Last-Modified and
Expires http headers for it to work. If some dynamic content is not to change constantly, you can
dramatically increase performance by caching it with ProxyPass.
ProxyPass happens before the authentication phase, so you do not have to worry about authenti-
cating twice.
Apache is able to accel https (secure) requests completely, while also doing http accel. (with squid
you have to use an external redirection program for that).
The latest (from apache 1.3.6) Apache proxy accel mode reported to be very stable.
The Disadvantages:
Users reported that it might be a bit slow, but the latest version is fast enough. (How fast is enough?
:)
;o)
72 29 Jun 1999
mod_perl tutorial: Real World Scenarios Implementation 6 Real World Scenarios Implementation
29 Jun 1999 73
6.1 Standalone mod_perl Enabled Apache Server Stas Bekman
That’s all!
Notes: Replace x.x.x with the real version numbers of mod_perl and apache. gnu tar uncompresses as
well (with z flag).
Extract both sources. Usually I open all the sources in /usr/src/, your mileage may vary. So move the
sources and chdir to the directory, you want to put the sources in. Gnu tar utility knows to uncompress
too with z flag, if you have a non-gnu tar utility, it will be incapable to decompress, so you would do it
in two steps: first uncompressing the packages with gzip -d apache_x.xx.tar.gz and gzip -d
mod_perl-x.xx.tar.gz, second un-tarring them with tar xvf apache_x.xx.tar and tar
xvf mod_perl-x.xx.tar.
% cd /usr/src
% tar zvxf apache_x.xx.tar.gz
% tar zvxf mod_perl-x.xx.tar.gz
74 29 Jun 1999
mod_perl tutorial: Real World Scenarios Implementation 6.1.3 Configuration Process
Now build the make file, for a basic work and first time installation the parameters in the example below
are the only ones you would need. APACHE_SRC tells where the apache src directory is. If you have
followed my suggestion and have extracted the both sources under the same directory (/usr/src), do:
% perl Makefile.PL APACHE_SRC=../apache_x.x.x/src \
DO_HTTPD=1 USE_APACI=1 PERL_MARK_WHERE=1 EVERYTHING=1
There are many additional parameters. You can find some of them in the configuration dedicated and
other sections. While running perl Makefile.PL ... the process will check for prerequisites and
tell you if something is missing, If you are missing some of the perl packages or other software -- you will
have to install these before you proceed.
Now we make the project (by building the mod_perl extension and calling make in apache source direc-
tory to build a httpd), test it (by running various tests) and install the mod_perl modules.
% make && make test && make install
Note that if make fails, neither make test nor make install will be executed. If make test
fails, make install will be not executed.
Now change to apache source directory and run make install to install apache’s headers, default
configuration files, to build apache directory tree and to put the httpd there.
% cd ../apache_x.x.x/src
% make install
When you execute the above command, apache installation process will tell you how to start a freshly
built webserver (the path of the apachectl, more about it later) and where the configuration files are.
Remember (or even better write down) both, since you will need this information very soon. On my
machine they two important paths are:
/usr/local/apache/bin/apachectl
/usr/local/apache/conf/httpd.conf
Now the build and the installation processes are completed. Just configure httpd.conf and start the
webserver.
Shut the server down, open the httpd.conf in your favorite editor and scroll to the end of the file,
where we will add the mod_perl configuration directives (of course you can place them anywhere in the
file).
29 Jun 1999 75
6.1.3 Configuration Process Stas Bekman
Assuming that you put all your scripts, that should be executed by mod_perl enabled server, under
/home/httpd/perl/ directory.
PerlModule Apache::Registry
<Location /perl>
SetHandler perl-script
PerlHandler Apache::Registry
Options ExecCGI
PerlSendHeader On
allow from all
</Location>
Make it executable and readable by server, if your server is running as user nobody (hint: look for User
directive in httpd.conf file), do the following:
% chown nobody /home/httpd/perl/test.pl
% chmod u+rx /home/httpd/perl/test.pl
Test that the script is running from the command line, by executing it:
% /home/httpd/perl/test.pl
It worked!!!
Now it is a time to test our mod_perl server, assuming that your config file includes Port 80, go to your
favorite Netscape browser and fetch the following URL (after you have started the server):
http://localhost/perl/test.pl
Make sure that you have a loop-back device configured, if not -- use the real server name for this test, for
example:
http://www.nowhere.com/perl/test.pl
76 29 Jun 1999
mod_perl tutorial: Real World Scenarios Implementation 6.2 One Plain and One mod_perl enabled Apache Servers
If something went wrong, go through the installation process again, and make sure you didn’t make a
mistake. If that doesn’t help, read the INSTALL pod document (perlpod INSTALL) in the mod_perl
distribution directory.
Now copy some of your perl/CGI scripts into a /home/httpd/perl/ directory and see them working
much much faster, from the newly configured base URL (/perl/). Some of your scripts will not work
out of box and will demand some minor tweaking or major rewrite to make them work properly with
mod_perl enabled server. Chances are that if you are not practicing a sloppy programming techniques --
the scripts will work without any modifications at all.
The above setup is very basic, it will help you to have a mod_perl enabled server running and to get a
good feeling from watching your previously slow CGIs now flying.
As with perl you can start benefit from mod_perl from the very first moment you try it. When you
become more familiar with mod_perl you will want to start writing apache handlers and deploy more of
the mod_perl power.
For this illustration, we will use /usr/local as our root directory. The Apache installation directories
will be stored under this root (/usr/local/bin, /usr/local/etc and etc...)
First let’s prepare the sources. We will assume that all the sources go into /usr/src dir. It is better
when you use two separate copies of apache sources. Since you probably will want to tune each apache
version at separate and to do some modifications and recompilations as the time goes. Having two inde-
pendent source trees will prove helpful, unless you use DSO, which is covered later in this section.
29 Jun 1999 77
6.2 One Plain and One mod_perl enabled Apache Servers Stas Bekman
Replace /tmp directory with a path to a downloaded file and x.x.x with the version of the server you
have.
% cd /usr/src/httpd_docs
% ls -l
drwxr-xr-x 8 stas stas 2048 Apr 29 17:38 apache_x.x.x/
% ls -l
drwxr-xr-x 8 stas stas 2048 Apr 29 17:38 apache_x.x.x/
drwxr-xr-x 8 stas stas 2048 Apr 29 17:38 modperl-x.xx/
Time to decide on the desired directory structure layout (where the apache files go):
ROOT = /usr/local
The two servers can share the following directories (so we will not duplicate data):
/usr/local/bin/
/usr/local/lib
/usr/local/include/
/usr/local/man/
/usr/local/share/
Important: we assume that both servers are built from the same Apache source version.
/usr/local/sbin/httpd_docs/
httpd_perl/
/usr/local/var/httpd_docs/logs/
proxy/
run/
httpd_perl/logs/
proxy/
run/
After completion of the compilation and the installation of the both servers, you will need to configure
them. To make things clear before we proceed into details, you should configure the
/usr/local/etc/httpd_docs/httpd.conf as a plain apache and Port directive to be 80 for
78 29 Jun 1999
mod_perl tutorial: Real World Scenarios Implementation 6.2.1 Configuration and Compilation of the Sources.
The next step is to configure and compile the sources: Below are the procedures to compile both servers
taking into account the directory layout I have just suggested to use.
If you need some other modules, like mod_rewrite and mod_include (SSI), add them here as well:
--enable-module=include --enable-module=rewrite
Note: gcc -- compiles httpd by 100K+ smaller then cc on AIX OS. Remove the line env CC=gcc
if you want to use the default compiler. If you want to use it and you are a (ba)?sh user you will not
need the env function, t?csh users will have to keep it in.
Note: add --layout to see the resulting directories’ layout without actually running the configura-
tion process.
Sources Compilation:
% make
% make install
Now update an apachectl utility to point to the renamed httpd via your favorite text editor or by
using perl:
29 Jun 1999 79
6.2.1 Configuration and Compilation of the Sources. Stas Bekman
% perl -p -i -e ’s|httpd_docs/httpd|httpd_docs/httpd_docs|’ \
/usr/local/sbin/httpd_docs/apachectl
It is important to make clean since some of the versions are not binary compatible (e.g apache 1.3.3 vs
1.3.4) so any ‘‘third-party’’ C modules need to be re-compiled against the latest header files.
Here I did not find a way to compile with gcc (my perl was compiled with cc so we have to compile
with the same compiler!!!
% cd /usr/src/httpd_perl/mod_perl-x.xx
% /usr/local/bin/perl Makefile.PL \
APACHE_PREFIX=/usr/local/ \
APACHE_SRC=../apache_x.x.x/src \
DO_HTTPD=1 \
USE_APACI=1 \
PERL_MARK_WHERE=1 \
PERL_STACKED_HANDLERS=1 \
ALL_HOOKS=1 \
APACI_ARGS=--sbindir=/usr/local/sbin/httpd_perl, \
--sysconfdir=/usr/local/etc/httpd_perl, \
--localstatedir=/usr/local/var/httpd_perl, \
--runtimedir=/usr/local/var/httpd_perl/run, \
--logfiledir=/usr/local/var/httpd_perl/logs, \
--proxycachedir=/usr/local/var/httpd_perl/proxy
Notice that all APACI_ARGS (above) must be passed as one long line if you work with t?csh!!!
However it works correctly the way it shown above with (ba)?sh (by breaking the long lines with ’\’).
If you work with t?csh it does not work, since t?csh passes APACI_ARGS arguments to ./config-
ure by keeping the new lines untouched, but stripping the original ’\’, thus breaking the configuration
process.
As with httpd_docs you might need other modules like mod_rewrite, so add them here:
80 29 Jun 1999
mod_perl tutorial: Real World Scenarios Implementation 6.2.2 Configuration of the servers
--enable-module=rewrite
Note: You may have noticed that we did not run make install in the apache’s source directory. When
USE_APACI is enabled, APACHE_PREFIX will specify the --prefix option for apache’s config-
ure utility, specifying the installation path for apache. When this option is used, mod_perl’s make
install will also make install on the apache side, installing the httpd binary, support tools, along
with the configuration, log and document trees.
If make test fails, look into t/logs and see what is in there.
While doing perl Makefile.PL ... mod_perl might complain by warning you about missing
libgdbm. Users reported that it is actually crucial, and you must have it in order to successfully complete
the mod_perl building process.
29 Jun 1999 81
6.2.2 Configuration of the servers Stas Bekman
/usr/local/sbin/httpd_docs/apachectl start
The first thing to do is to set a Port directive - it should be different from 80 since we cannot bind 2
servers to use the same port number on the same machine. Here we will use <8080>. Some developers use
port 81, but you can bind to it, only if you have root permissions. If you are running on multiuser
machine, there is a chance someone already uses that port, or will start using it in the future - which as you
understand might cause a collision. If you are the only user on your machine, basically you can pick any
not used port number. Port number choosing is a controversial topic, since many organizations use fire-
walls, which may block some of the ports, or enable only a known ones. From my experience the most
used port numbers are: 80, 81, 8000 and 8080. Personally, I prefer the port 8080. Of course with 2
server scenario you can hide the nonstandard port number from firewalls and users, by either using the
mod_proxy’s ProxyPass or proxy server like squid.
Now we proceed to mod_perl specific directives. A good idea will be to add them all at the end of the
httpd.conf, since you are going to fiddle a lot with them at the beginning.
First, you need to specify the location where all mod_perl scripts will be located.
From now on, all requests starting with /perl will be executed under mod_perl and will be mapped to
the files in /usr/local/myproject/perl/.
<Location /perl>
#AllowOverride None
SetHandler perl-script
PerlHandler Apache::Registry
Options ExecCGI
allow from all
PerlSendHeader On
</Location>
This configuration causes all scripts that are called with a /perl path prefix to be executed under the
Apache::Registry module and as a CGI (so the ExecCGI, if you omit this option the script will be
printed to the user’s browser as a plain text or will possibly trigger a ’Save-As’ window).
Apache::Registry module lets you run almost unaltered CGI/perl scripts under mod_perl.
82 29 Jun 1999
mod_perl tutorial: Real World Scenarios Implementation 6.3 Running 2 webservers and squid in httpd accelerator mode
PerlSendHeader On tells the server to send an HTTP header to the browser on every script invoca-
tion. You will want to turn this off for nph (non-parsed-headers) scripts.
First, lets understand what do we have in hands and what do we want from squid. We have an
httpd_docs and httpd_perl servers listening on ports 81 and 8080 accordingly (we have to move
the httpd_docs server to port 81, since port 80 will be taken over by squid). Both reside on the same
machine as squid. We want squid to listen on port 80, forward a single static object request to the port
httpd_docs server listens to, and dynamic request to httpd_perl’s port. Both servers return the data to the
proxy server (unless it is already cached in the squid), so user never sees the other ports and never knows
that there might be more then one server running. Proxy server makes all the magic behind it transparent
to user. Do not confuse it with mod_rewrite, where a server redirects the request somewhere according to
the rules and forgets about it. The described functionality is being known as httpd accelerator
mode in proxy dialect.
You should understand that squid can be used as a straight forward proxy server, generally used at compa-
nies and ISPs to cut down the incoming traffic by caching the most popular requests. However we want to
run it in the httpd accelerator mode. Two directives: httpd_accel_host and
httpd_accel_port enable this mode. We will see more details in a few seconds. If you are currently
using the squid in the regular proxy mode, you can extend its functionality by running both modes concur-
rently. To accomplish this, you extend the existent squid configuration with httpd accelerator
mode’s related directives or you just create one from scratch.
As stated before, squid listens now to the port 80, we have to move the httpd_docs server to listen for
example to the port 81 (your mileage may vary :). So you have to modify the httpd.conf in the httpd_docs
configuration directory and restart the httpd_docs server (But not before we get the squid running if you
are working on the production server). And as you remember httpd_perl listens to port 8080.
29 Jun 1999 83
6.3 Running 2 webservers and squid in httpd accelerator mode Stas Bekman
Let’s go through the changes we should make to the default configuration file. Since this file
(/etc/squid/squid.conf) is huge (about 60k+) and we would not use 95% of it, my suggestion is
to write a new one including only the modified directives.
We want to enable the redirect feature, to be able to serve requests, by more then one server (in our case
we have httpd_docs and httpd_perl) servers. So we specify httpd_accel_host as virtual. This
assumes that your server has multiple interfaces - Squid will bind to all of them.
httpd_accel_host virtual
Then we define the default port - by default, if not redirected, httpd_docs will serve the pages. We assume
that most requests will be of the static nature. We have our httpd_docs listening on port 81.
httpd_accel_port 81
We do not use icp (icp used for cache sharing between neighbor machines), which is more relevant in the
proxy mode.
icp_port 0
hierarchy_stoplist defines a list of words which, if found in a URL, causes the object to be
handled directly by this cache. In other words, use this to not query neighbor caches for certain objects.
Note that I have configured the /cgi-bin and /perl aliases for my dynamic documents, if you named
them in a different way, make sure to use the correct aliases here.
hierarchy_stoplist /cgi-bin /perl
Please note that the last two directives are controversial ones. If you want your scripts to be more comply-
ing with the HTTP standards, the headers of your scripts should carry the Caching Directives
according to the HTTP specs. You will find a complete tutorial about this topic in Tutorial on HTTP
Headers for mod_perl users by Andreas J. Koenig. If you set the headers correctly there is no
need to tell squid accelerator to NOT try to cache something. The headers I am talking about are
Last-Modified and Expires. What are they good for? Squid would not bother your mod_perl server
a second time if a request is (a) cachable and (b) still in the cache. Many mod_perl applications will
produce identical results on identical requests at least if not much time goes by between the requests. So
your squid might have a hit ratio of 50%, which means that mod_perl servers will have as twice as less
work to do than before. This is only possible by setting the headers correctly.
Even if you insert user-ID and date in your page, caching can save resources when you set the expiration
time to 1 second. A user might double click where a single click would do, thus sending two requests in
parallel, squid could serve the second request.
84 29 Jun 1999
mod_perl tutorial: Real World Scenarios Implementation 6.3 Running 2 webservers and squid in httpd accelerator mode
But if you are lazy, or just have too many things to deal with, you can leave the above directives the way I
described. But keep in mind that one day you will want to reread this snippet and the Andreas’ tutorial and
squeeze even more power from your servers without investing money for additional memory and better
hardware.
While testing you might want to enable the debugging options and watch the log files in
/var/log/squid/. But turn it off in your production server. I list it commented out. (28 == access
control routes).
# debug_options ALL, 1, 28, 9
We need to provide a way for squid to dispatch the requests to the correct servers, static object requests
should be redirected to httpd_docs (unless they are already cached), while dynamic should go to the
httpd_perl server. The configuration below tells squid to fire off 10 redirect daemons at the specified path
of the redirect daemon and disables rewriting of any Host: headers in redirected requests (as suggested
by squid’s documentation). The redirection daemon script is enlisted below.
redirect_program /usr/lib/squid/redirect.pl
redirect_children 10
redirect_rewrites_host_header off
Maximum allowed request size in kilobytes. This one is pretty obvious. If you are using POST to upload
files, then set this to the largest file’s size plus a few extra kbytes.
request_size 1000 KB
Then we have access permissions, which I will not explain. But you might want to read the documentation
so to avoid any security flaws.
acl all src 0.0.0.0/0.0.0.0
acl manager proto cache_object
acl localhost src 127.0.0.1/255.255.255.255
acl myserver src 127.0.0.1/255.255.255.255
acl SSL_ports port 443 563
acl Safe_ports port 80 81 8080 81 443 563
acl CONNECT method CONNECT
Since squid should be run as non-root user, you need these if you are invoking the squid as root.
cache_effective_user squid
cache_effective_group squid
Now configure a memory size to be used for caching. A squid documentation warns that the actual size of
squid can grow three times larger than the value you are going to set.
29 Jun 1999 85
6.3 Running 2 webservers and squid in httpd accelerator mode Stas Bekman
cache_mem 20 MB
Keep pools of allocated (but unused) memory available for future use. Read more about it in the squid
documents.
memory_pools on
Now tight the runtime permissions of the cache manager CGI script (cachemgr.cgi,that comes
bundled with squid) on your production server.
cachemgr_passwd disable shutdown
#cachemgr_passwd none all
Now the redirection daemon script (you should put it at the location you have specified by redi-
rect_program parameter in the config file above, and make it executable by webserver of course):
#!/usr/local/bin/perl
$|=1;
while (<>) {
# redirect to mod_perl server (httpd_perl)
print($_), next if s|(:81)?/perl/|:8080/perl/|o;
In my scenario the proxy and the apache servers are running on the same machine, that’s why I just substi-
tute the port. In the presented squid configuration, requests that passed through squid are converted to
point to the localhost (which is 127.0.0.1). The above redirector can be more complex of course, but
you know the perl, right?
You must disable buffering. $|=1; does the job. If you do not disable buffering, the STDOUT will be
flushed only when the buffer becomes full and its default size is about 4096 characters. So if you have an
average URL of 70 chars, only after 59 (4096/70) requests the buffer will be flushed, and the requests will
finally achieve the server in target. Your users will just wait till it will be filled up.
If you think that it is a very ineffective way to redirect, I’ll try to prove you the opposite. The redirector
runs as a daemon, it fires up N redirect daemons, so there is no problem with perl interpreter loading,
exactly like mod_perl -- perl is loaded all the time and the code was already compiled, so redirect is very
fast (not slower if redirector was written in C or alike). Squid keeps an open pipe to each redirect daemon,
thus there is even no overhead of the expensive system calls.
86 29 Jun 1999
mod_perl tutorial: Real World Scenarios Implementation 6.3 Running 2 webservers and squid in httpd accelerator mode
Almost... When you try the new setup, you will be surprised and upset to discover the port 81 showing up
in the URLs of the static objects (like htmls). Hey, we did not want the user to see the port 81 and use it
instead of 80, since than it will bypass the squid server and the hard work we went through was just a
waste of time? The solution is to run both squid and httpd_docs on the same port. This can be accom-
plished by binding each one to a specific interface. Modify the httpd.conf in the httpd_docs
configuration directory:
Port 80
BindAddress 127.0.0.1
Listen 127.0.0.1:80
Where 123.123.123.3 should be replaced with IP of your main server. Now restart squid and
httpd_docs in either order you want, and voila the port number has gone.
You must also have in the /etc/hosts an entry (most chances that it’s already there):
127.0.0.1 localhost.localdomain localhost
Now if your scripts were generating HTML including fully qualified self references, using the 8080 or
other port -- you should fix them to generate links to point to port 80 (which means not using the port at
all). If you do not, users will bypass squid, like if it was not there at all, by making direct requests to the
mod_perl server’s port. The only question left is what to do with users who bookmarked your services and
they still have the port 8080 inside the URL. Do not worry about it. The most important thing is for your
scripts to return a full URLs, so if the user comes from the link with 8080 port inside, let it be. Just make
sure that all the consecutive calls to your server will be rewritten correctly. During a period of time users
will change their bookmarks. What can be done is to send them an email if you have one, or to leave a
note on your pages asking users to update their bookmarks. You could avoid this problem if you did not
publish this non-80 port in first place.
icp_port 0
29 Jun 1999 87
6.4 Running 1 webserver and squid in httpd accelerator mode Stas Bekman
redirect_program /usr/lib/squid/redirect.pl
redirect_children 10
redirect_rewrites_host_header off
request_size 1000 KB
cache_effective_user squid
cache_effective_group squid
cache_mem 20 MB
memory_pools on
Note that all directives should start at the beginning of the line.
88 29 Jun 1999
mod_perl tutorial: Real World Scenarios Implementation 6.4 Running 1 webserver and squid in httpd accelerator mode
So I have decided to have even more administration overhead and to stick with squid, httpd_docs and
httpd_perl scenario, where I can optimize and fine tune everything. Of course this can be not your case. If
you are feeling that the scenario from the previous section is too complicated for you, make it simpler.
Have only one server with mod_perl built in and let the squid to do most of the job that plain light apache
used to do. As I have explained in the previous paragraph, you should pick this lighter setup only if you
can make squid cache most of your static objects. If it cannot, your mod_perl server will do the work we
do not want it to.
If you are still with me, install apache with mod_perl and squid. Then use a similar configuration from the
previous section, but now httpd_docs is not there anymore. Also we do not need the redirector anymore
and we specify httpd_accel_host as a name of the server and not virtual. There is no need to
bind two servers on the same port, because we do not redirect and there is neither Bind nor Listen
directives in the httpd.conf anymore.
# redirect_program /usr/lib/squid/redirect.pl
# redirect_children 10
# redirect_rewrites_host_header off
request_size 1000 KB
cache_effective_user squid
cache_effective_group squid
cache_mem 20 MB
29 Jun 1999 89
6.5 Using mod_proxy Stas Bekman
memory_pools on
The server on port 80 answers http requests directly and proxies the mod_perl enabled server in the
following way:
ProxyPass /modperl/ http://localhost:81/modperl/
ProxyPassReverse /modperl/ http://localhost:81/modperl/
PPR is the saving grace here, that makes apache a win over Squid. It rewrites the redirect on its way back
to the original URI.
The above setting will set a buffer size to be of 1Mb. If it is not set explicitly, then the default buffer size
is used, which depends on OS, for Linux I suspect it is somewhere below 32k. So basically to get an
immediate release of the mod_perl server from stale awaiting, ProxyReceiveBufferSize should be
set to a value greater than the biggest generated respond produced by any mod_perl script.
The ProxyReceiveBufferSize directive specifies an explicit buffer size for outgoing HTTP and
FTP connections. It has to be greater than 512 or set to 0 to indicate that the system’s default buffer size
should be used.
As the name states, its buffering feature applies only to downstream data (coming from the origin server
to the proxy) and not upstream (i.e. buffering the data being uploaded from the client browser to the
proxy, thus freeing the httpd_perl origin server from being tied up during a large POST such as a file
upload).
Apache does caching as well. It’s relevant to mod_perl only if you produce proper headers, so your
scripts’ output can be cached. See apache documentation for more details on configuration of this capabil-
ity.
Ask Bjoern Hansen has written a mod_proxy_add_forward module for apache, that sets the
X-Forwarded-For field when doing a ProxyPass, similar to what squid can do. (Its location is
specified in the help section). Basically, that module adds an extra HTTP header to proxying requests.
You can access that header in the mod_perl-enabled server, and set the IP of the remote server. You won’t
need to compile anything into the back-end server, if you are using apache::{Registry,PerlRun}
just put something like the following into start-up.pl:
90 29 Jun 1999
mod_perl tutorial: Real World Scenarios Implementation 6.6 mod_perl server as DSO
return OK;
}
And in httpd.conf:
PerlPostReadRequestHandler My::ProxyRemoteAddr
Different sites have different needs. If you’re using the header to set the IP address, apache believes it is
dealing with (in the logging and stuff), you really don’t want anyone but your own system to set the
header. That’s why the above ‘‘recommended code’’ checks where the request is really coming from,
before changing the remote_ip.
From that point on, the remote IP address is correct. You should be able to access REMOTE_ADDR as
usual.
You could do the same thing with other environment variables (though I think several of them are
preserved, you will want to run some tests to see which ones).
If you run ./configure from apache source do not forget to add: --enable-shared=perl
You will find a complete explanation in the INSTALL.apaci pod which can be found in the mod_perl
distribution.
Some people reported that DSO compiled mod_perl would not run on specific OS/perl version. Also
threads enabled perl reported sometimes to break the mod_perl/DSO. But it still can work for you.
29 Jun 1999 91
6.7 HTTP Authentication with 2 servers + proxy Stas Bekman
The only possible caveat in the config file is that your Auth stuff needs to be in <Directory ...> ...
</Directory> tags because if you use a <Location /...> ... </Location> the proxypass
server takes the auth info for its own authentication and would not pass it on.
;o)
92 29 Jun 1999
mod_perl tutorial: Installation Notes 7 Installation Notes
7 Installation Notes
29 Jun 1999 93
7.1 Configuration and Installation Stas Bekman
7.1.2 apache
It is a good idea to try to install the apache webserver without mod_perl first. This way, if something goes
wrong, you will know that it’s not the apache server’s problem. But you can skip this stage if you already
have a working (non-mod_perl) apache server, or if you are just the daring type. In any case you should
unpack the apache source distribution, preferably at the same level as the mod_perl distribution.
% ls -l /usr/src
drwxr-xr-x 8 stas bar 2048 Oct 6 09:46 apache_x.x.x/
drwxr-xr-x 19 stas bar 4096 Oct 2 14:33 mod_perl-x.xx/
7.1.3 mod_perl
Now we come to the main point of this document.
Here I will give only a short example of mod_perl installation. A dedicated section discusses this issue in
more details.
As with any perl package, the installation of mod_perl is very easy and standard. perldoc INSTALL
will guide you through the configuration and the installation processes.
Note: replace x.x.x with the version numbers you actually use.
To change the installation target (either if you are not root or you need to install a second copy for
testing purposes), assuming you use /foo/server as a base directory, you have to run this:
% perl Makefile.PL APACHE_SRC=../apache_x.x.x/src \
DO_HTTPD=1 PERL_MARK_WHERE=1 EVERYTHING=1 \
APACHE_PREFIX=/foo/server PREFIX=/foo/server
94 29 Jun 1999
mod_perl tutorial: Installation Notes 7.2 How can I tell whether mod_perl is running
Where PREFIX specifies where to install the perl modules, APACHE_PREFIX -- the same for the apache
files.
The next step is to configure the mod_perl sections of the apache configuration file.
Fire up the server with /foo/server/sbin/apachectl start, Look for the error reports at the
error_log file in case the server does not start up (No error message will be printed to the console!).
Assuming that your mod_perl enabled server listens to port 8080, telnet to your server at port 8080, and
type HEAD / HTTP/1.0 then press the <ENTER> key TWICE:
% telnet localhost 8080<ENTER>
HEAD / HTTP/1.0<ENTER><ENTER>
29 Jun 1999 95
7.2.4 Testing via a CGI script Stas Bekman
HTTP/1.1 200 OK
Date: Tue, 01 Dec 1998 12:27:52 GMT
Server: Apache/1.3.6 (Unix) mod_perl/1.19
Connection: close
Content-Type: text/html
Connection closed.
The line: Server: Apache/1.3.6 (Unix) mod_perl/1.19 --confirms that you do have
mod_perl installed and its version is 1.19. Of course in your case it would be the version you have
installed.
However, just because you have got mod_perl linked in there, that does not mean that you have config-
ured your server to handle Perl scripts with mod_perl. You will have to configure it first.
Copy and paste the script below (no need for the first perl calling (shebang) line!). Let’s say you named it
test.pl, saved it at the root of the CGI scripts and CGI root is mapped directly to the /perl location
of your server.
print "Content-type: text/html\n\n";
print "Server’s environment<P>\n";
print "<TABLE>";
foreach ( keys %ENV ) {
print "<TR><TD>$_ </TD><TD>$ENV{$_}</TR></TD>";
}
print "</TABLE>";
96 29 Jun 1999
mod_perl tutorial: Installation Notes 7.2.5 Testing via lwp-request
Now if I run the same script in mod_cgi mode (configured with /cgi-bin Alias) (you will need to add
the perl invocation line #!/bin/perl for the above script) and fetch
http://www.nowhere.com/cgi-bin/test.pl.
SERVER_SOFTWARE Apache/1.3.6 (Unix)
GATEWAY_INTERFACE CGI/1.1
[...snipped]
You will see that two variables, SERVER_SOFTWARE and GATEWAY_INTERFACE, are different from
the case above. This gives you a hint of how to tell in what mode you are running in your cgi scripts. I
start all my cgi scripts that are mod_perl aware with:
BEGIN {
# Auto-detect if we are running under mod_perl or CGI.
$USE_MOD_PERL = ((exists $ENV{’GATEWAY_INTERFACE’}
and $ENV{’GATEWAY_INTERFACE’} =~ /CGI-Perl/)
or exists $ENV{’MOD_PERL’} );
# perl5.004 is a must under mod_perl
require 5.004 if $USE_MOD_PERL;
}
You might wonder why in the world you would need to know in what mode you are running. For example
you will want to use Apache::exit() and not CORE::exit() in your modules, but if you think that
your script might be used in both environments (mod_cgi vs. mod_perl), you will have to override the
exit() subroutine and to make the runtime decision of what method you will use. Not that if you run
scripts under Apache::Registry handler, it takes care of overriding the exit() call for you, so it’s
not an issue if this is your case.
Assuming you have the libwww-perl (LWP) package installed (you will need it installed in order to
pass mod_perl’s make test anyway):
% lwp-request -e -d http://www.nowhere.com
Will show you all the headers. (The -d option disables printing the response content.)
% lwp-request -e -d http://www.nowhere.com | egrep ’^Server:’
29 Jun 1999 97
7.3 Is it possible to install and use apache/mod_perl without having a root access? Stas Bekman
You will not be able to have the server listen to a port lower then 1024 if you are not starting it as root,
so choose a port number above 1024. (I use 8080 in most cases). Note that you will have to use a URL
like http://www.nowhere.com:8080 in that case, but that is not a problem since usually users do
not directly access URLs to CGI scripts, but rather are directed to them from a link on a web page or as
the ’ACTION’ of a HTML form, so they should not know at all that the port is different from the default
port 80.
If you want your apache server to start automatically on system reboot, you will need to invoke the server
startup script from somewhere within the init scripts on your host. This is often somewhere under
/etc/rc.d, but this path can vary depending upon the flavor of Unix you are using.
One more important thing to keep in mind is system resources. mod_perl is memory hungry -- if you run a
lot of mod_perl processes on a public, multiuser (not dedicated) machine -- most likely the system admin-
istrator of the host will ask you to use less resources and even to shut down your mod_perl server and to
find another home for it. You have a few solutions:
Ask your ISP if you can put a dedicated machine into their computer room and be root there.
Look for another ISP with lots of resources or one that supports mod_perl. You can find a list of
these ISP at http://perl.apache.org .
I was into this problem many times. I am going to build something by passing some non-default parame-
ters to the config script and then later when I need to rebuild the tool either to upgrade it or to make an
identical copy at another machine, I would find that I do not remember what parameters I altered.
98 29 Jun 1999
mod_perl tutorial: Installation Notes 7.5 Server Installation problems
The best solution for this problem is to prepare the run file with all the parameters that are about to be
used and then run it instead of typing it all by hand. So later I will have the script handy to be reused.
mod_perl suggests using the makepl_args.mod_perl file which comes with mod_perl distribution.
This is the file where you should specify all the parameters you are going to use.
But if you have found yourself with a compiled tool and no traces of the specified parameters left, usually
you can still find them out, if the sources were not make clean’d. You will find the apache specific
parameters in apache_x.x.x/config.status and modperl’s at in
mod_perl_x.xx/apaci/mod_perl.config.
DO_HTTPD=1 means default to ’y’ for the two apache’s configure utility prompts: (a) ’which source
tree to configure against’ and (b) ’whether to build the httpd in that tree’. PREP_HTTPD=1 just means
default ’n’ to the second prompt -- meaning, do not build httpd (make) in the apache source tree. In other
words if you use PREP_HTTPD=1 the httpd will be not build. It will be build only if you use
DO_HTTPD=1 option and not use PREP_HTTPD=1.
If you did not build the httpd, chdir to the apache source, and execute:
make
Note that you would have to do the same if you do not pass APACHE_PREFIX=/path_to_instal-
lation_prefix during the perl Makefile.PL [options] stage.
For all RH fans, before you are going to build the apache by yourself, do:
29 Jun 1999 99
7.5.3 Should I rebuild mod_perl if I have upgraded my perl? Stas Bekman
rpm -e apache
;o)
8 Server Configuration
Before you start with mod_perl specific configuration, first configure apache, and see that it works. When
done, return here to continue...
[ Note that prior to version 1.3.4, the default apache install used three configuration files -- httpd.conf,
srm.conf, and access.conf. The 1.3.4 version began distributing the configuration directives in a single
file -- httpd.conf. The remainder of this chapter refers to the location of the configuration directives using
their historical location. ]
# Apache::PerlRun mode
Alias /cgi-perl/ /usr/local/myproject/cgi/
Alias provides a mapping of URL to file system object under mod_perl. ScriptAlias is being
used for mod_cgi.
Alias defines the start of the URL path to the script you are referencing. For example, using the above
configuration, fetching http://www.nowhere.com/perl/test.pl, will cause the server to look
for the file test.pl at /usr/local/myproject/cgi, and execute it as an Apache::Registry
script if we define Apache::Registry to be the handler of /perl location (see below). The URL
http://www.nowhere.com/perl/test.pl will be mapped to /usr/local/mypro-
ject/cgi/test.pl. This means that you can have all your CGIs located at the same place at the
file-system, and call the script in any of three modes simply by changing the directory name component of
the URL (cgi-bin|perl|cgi-perl) - is not this neat? (That is the configuration you see above - all
three Aliases point to the same directory within your file system, but of course they can be different). If
your script does not seem to be working while running under mod_perl, you can easily call the script in
straight mod_cgi mode without making any script changes (in most cases), but rather by changing the
URL you invoke it by.
where SetHandler cgi-handler invokes mod_cgi. The latter will be overwritten if you enable
Apache::Registry. In other words, ScriptAlias does not work for mod_perl, it only appears to
work when the additional configuration is in there. If the Apache::Registry configuration came
before the ScriptAlias, scripts would be run under mod_cgi. While handy, ScriptAlias is a
known kludge, always better to use Alias and SetHandler.
Of course you can choose any other Alias (you will use it later in httpd.conf), you can choose to use
all three modes or only one of these (It is undesirable to run plain cgi-bin scripts from a mod_perl-enabled
server - the price is too high, it is better to run these on plain apache server.
First we add:
<Location /perl>
#AllowOverride None
SetHandler perl-script
PerlHandler Apache::Registry
Options ExecCGI
allow from all
PerlSendHeader On
</Location>
This configuration causes all scripts that are called with a /perl path prefix to be executed under the
Apache::Registry module and as a CGI (so the ExecCGI, if you omit this option the script will be
printed to the caller’s browser as a plain text or possibly will trigger a ’Save-As’ window).
PerlSendHeader On tells the server to send an HTTP header to the browser on every script invoca-
tion. You will want to turn this off for nph (non-parsed-headers) scripts. PerlSendHeader On means
to call ap_send_http_header() after parsing your script headers. It is only meant for CGI emula-
tion, its always better to use CGI-header> from CGI.pm module or $r->send_http_header
directly.
Remember the Alias from the section above? We must use the same Alias here, if you use Location
that does not have the same Alias defined in srm.conf, the server will fail to locate the script in the
file system. (We are talking about script execution here -- there are cases where Location is something
that is being executed by the server itself, without having the corresponding file, like /perl-status
location.)
before you specify the location that uses Apache::Registry as a PerlHandler. Basically you can
start running the scripts in the Apache::Registry mode...
You have nothing to do about /cgi-bin location (mod_cgi), since it has nothing to do with mod_perl.
8.1.3 PerlFreshRestart
To reload PerlRequire, PerlModule, other use()’d modules and flush the
Apache::Registry cache on server restart, add:
PerlFreshRestart On
8.1.4.1 Configuration
Add this to httpd.conf:
<Location /perl-status>
SetHandler perl-script
PerlHandler Apache::Status
order deny,allow
#deny from all
#allow from
</Location>
If you are going to use Apache::Status. It’s important to put it as a first module in the start-up file, or
in the httpd.conf (after Apache::Registry):
# startup.pl
use Apache::Registry ();
use Apache::Status ();
use Apache::DBI ();
If you don’t put Apache::Status before Apache::DBI then you don’t get Apache::DBI’s menu
entry in status.
8.1.4.2 Usage
Assuming that your mod_perl server listens to port 81, fetch http://www.nowhere.com:81/perl-status
Embedded Perl version 5.00502 for Apache/1.3.2 (Unix) mod_perl/1.16
process 187138, running since Thu Nov 19 09:50:33 1998
From some menus you can continue deeper to peek at the perl internals of the server, to watch the values
of the global variables in the packages, to the list of cached scripts and modules and much more. Just click
around...
PerlPassEnv passes, PerlSetEnv sets and passes the ENVironment variables to your scripts. you
can access them in your scripts through %ENV (e.g. $ENV{"key"}).
PerlSetVar is very similar to PerlSetEnv, but you extract it with another method. In <Perl>
sections:
push @{ $Location{"/"}->{PerlSetVar} }, [ ’FOO’ => BAR ];
Also you can call perl -c perl-startup to test the file’s syntax. What does this take?
Note that starting with $CGI::VERSION 2.46, the recommended method to precompile the code in
CGI.pm is:
use CGI qw(-compile :all);
8.1.6.2 What modules should you add to the startup file and why.
Modules that are being loaded at the server startup will be shared among server children, so only one copy
of each module will be loaded, thus saving a lot of RAM for you. Usually I put most of the code I develop
into modules and preload them from here. You can even preload your CGI script with
Apache::RegistryLoader and preopen the DB connections with Apache::DBI.
which means that the caller does not want any symbols to be imported. Why is this important? Since some
modules has @EXPORT set to a list of tags to be exported by default and when you write:
use Foo;
and think nothing is being imported, the import() call is being executed and probably some symbols do
being imported. See the docs/source of the module in question to make sure you use() it correctly.
When you write your own modules, always remember that it’s better to use @EXPORT_OK instead of
@EXPORT, since the former doesn’t export tags unless it was asked to.
Since the symbols that you might import into a startup’s script namespace will be visible by none of the
children, scripts that need a Foo’s module exported tags have to pull it in like if you did not preload Foo
at the startup file. For example, just because you have use()d Apache::Constants in the startup
script, does not mean you can have the following handler:
package MyModule;
sub {
my $r = shift;
return OK;
}
1;
You do have to define/import variables in your scripts and they will be visible inside a child process who
run this script. They will be not shared between siblings. Remember that every script is running in a
specially (uniquely) named package - so it cannot access variables from other packages unless it inherits
from them or use()’s them.
’apachectl configtest’ is the same as ’httpd -t’ and it actually executes the code in startup.pl,
not just parses it. <Perl> configuration has always started Perl during the configuration read,
Perl{Require,Module} do so as well.
If you want your startup code to get a control over the -t (configtest) server launch, start the server
configuration test with:
httpd -t -Dsyntax_check
if you want to prevent the code in the file from being executed.
<Perl> sections can contain *any* and as much Perl code as you wish. These sections are compiled into a
special package whose symbol table mod_perl can then walk and grind the names and values of Perl vari-
ables/structures through the apache core configuration gears. Most of the configurations directives can be
represented as scalars ($scalar) or lists (@list). An @List inside these sections is simply converted into a
space delimited string for you inside. Here is an example:
#httpd.conf
<Perl>
@PerlModule = qw(Mail::Send Devel::Peek);
$ServerAdmin = $User;
</Perl>
If a Directive can take two *or* three arguments you may push strings and the lowest number of argu-
ments will be shifted off the @List or use array reference to handle any number greater than the
minimum for that directive:
push @Redirect, "/foo", "http://www.foo.com/";;
To pass all environment variables to the children with a single configuration directive, rather than listing
each one via PassEnv or PerlPassEnv, a <Perl> section could read in a file and:
push @PerlPassEnv, [$key => $val];
or
Apache->httpd_conf("PerlPassEnv $key $val");
These are somewhat simple examples, but they should give you the basic idea. You can mix in any Perl
code your heart desires. See eg/httpd.conf.pl and eg/perl_sections.txt in mod_perl
distribution for some examples.
__END__
</Perl>
To enable <Perl> sections you should build mod_perl with perl Makefile.PL PERL_SECTIONS=1.
8.4.2 My script works under cgi-bin, but when called via mod_perl I
see A ’Save-As’ prompt
Did you put PerlSendHeader On in the configuration part of the <Location foo></Location>?
;o)
or maybe:
% ps -ef | grep httpd_perl
This will produce a list of all httpd_perl (the parent and the children) processes. You are looking for the
parent process. If you run your server as root - you will easily locate it, since it belongs to root. If you run
the server as user, most likely all the processes will belong to that user (unless defined differently in the
httpd.conf), but it’s still easy to know ’who is the parent’ -- the one of the smallest size...
You will notice many httpd_perl executables running on your system, but you should not send signals to
any of them except the parent, whose pid is in the PidFile. That is to say you shouldn’t ever need to
send signals to any process except the parent. There are three signals that you can send the parent: TERM,
HUP, and USR1.
Sending the TERM signal to the parent causes it to immediately attempt to kill off all of its children.
This process may take several seconds to complete, following which the parent itself exits. Any
requests in progress are terminated, and no further requests are served.
That’s the moment that the accumulated END blocks will be executed! Note that if you use
Apache::Registry or Apache::PerlRun, then END blocks are being executed upon each
request (at the end).
Sending the HUP signal to the parent causes it to kill off its children like in TERM (Any requests in
progress are terminated) but the parent doesn’t exit. It re-reads its configuration files, and re-opens
any log files. Then it spawns a new set of children and continues serving hits.
The server will reread its configuration files, flush all the compiled and preloaded modules, and rerun
any startup files. It’s equivalent to stopping, then restarting a server.
Note: If your configuration file has errors in it when you issue a restart then your parent will not
restart but exit with an error. See below for a method of avoiding this.
The USR1 signal causes the parent process to advise the children to exit after their current request (or
to exit immediately if they’re not serving anything). The parent re-reads its configuration files and
re-opens its log files. As each child dies off the parent replaces it with a child from the new genera-
tion of the configuration, which begins serving new requests immediately.
The only difference between USR1 and HUP is that USR1 allows children to complete any
in-progress request prior to killing them off.
By default, if a server is restarted (ala kill -USR1 ‘cat logs/httpd.pid‘ or with HUP
signal), Perl scripts and modules are not reloaded. To reload PerlRequire’s, PerlModule’s, other
use()’d modules and flush the Apache::Registry cache, enable with this command:
PerlFreshRestart On (in httpd.conf)
It’s worth mentioning that restart or termination can sometimes take quite a lot of time. Check out the
PERL_DESTRUCT_LEVEL=-1 option during the mod_perl ./Configure stage, which speeds this up
and leads to more robust operation in the face of problems, like running out of memory. It is only usable if
no significant cleanup has to be done by perl END blocks and DESTROY methods when the child termi-
nates, of course. What constitutes significant cleanup? Any change of state outside of the current process
that would not be handled by the operating system itself. So committing database transactions is signifi-
cant but closing an ordinary file isn’t.
Some folks prefer to specify signals using numerical values, rather than symbolics. If you are looking for
these, check out your kill(3) man page. My page points to /usr/include/sys/signal.h, the
relevant entries are:
#define SIGHUP 1 /* hangup, generated when terminal disconnects */
#define SIGTERM 15 /* software termination signal */
#define SIGUSR1 30 /* user defined signal 1 */
Start httpd:
% /usr/apps/sbin/httpd_perl/apachectl start
Stop httpd:
% /usr/apps/sbin/httpd_perl/apachectl stop
See the next section for the implication of the above calls.
Replace httpd_perl with httpd_docs in the above calls to control the httpd_docs server.
There are other options for apachectl, use help option to see them all.
It’s important to understand that this script is based on the PID file which is
PIDFILE=/usr/apps/var/httpd_perl/run/httpd.pid. If you delete the file by hand -
apachectl will fail to run.
Also, notice that apachectl is suitable to use from within your Unix system’s startup files so that your web
server is automatically restarted upon system reboot. Either copy the apachectl file to the appropriate
location (/etc/rc.d/rc3.d/S99apache works on my RedHat Linux system) or create a symlink
with that name pointing to the the canonical location. (If you do this, make certain that the script is
writable only by root -- the startup scripts have root privileges during init processing, and you don’t want
to be opening any security holes.)
Note that you must be using a version of Perl that recognizes and emulates the suid bits in order for this to
work. The script will do different things depending on whether it is named start_http, stop_http
or restart_http. You can use symbolic links for this purpose.
#!/usr/bin/perl
if ($name eq ’start_http’) {
system $HTTPD and die "Unable to start HTTP";
print "HTTP started.\n";
exit 0;
}
if ($name eq ’stop_http’) {
kill ’TERM’,$pid or die "Unable to signal HTTP";
print "HTTP stopped.\n";
exit 0;
}
if ($name eq ’restart_http’) {
kill ’HUP’,$pid or die "Unable to signal HTTP";
print "HTTP restarted.\n";
exit 0;
}
One simple solution is to use a slightly modified apachectl script which I called apache.watchdog and to
put it into the crontab to be called every 30 minutes or even every minute - if it’s so critical to make sure
the server will be up all the time.
The script:
#!/bin/sh
# admin’s email
EMAIL=webmaster@somewhere.far
#EMAIL=root@localhost
Another approach, probably even more practical, is to use the cool LWP perl package , to test the server by
trying to fetch some document (script) served by the server. Why is it more practical? While server can be
up as a process, it can be stuck and not working, So failing to get the document will trigger restart, and
‘‘probably’’ the problem will go away. (Just replace start with restart in the
$restart_command below.
Again we put this script into a crontab to call it every 30 minutes. Personally I call it every minute, to
fetch some very light script. Why so often? If your server starts to spin and trash your disk’s space with
multiply error messages. In a 5 minutes you might run out of free space, which might bring your system to
its knees. And most chances that no other child will be able to serve requests, since the system will be too
busy, writing to an error_log file. Think big - if you running a heavy service, which is very fast, since
you are running under mod_perl, adding one more request every minute, will be not felt by the server at
all.
use strict;
use diagnostics;
use URI::URL;
use LWP::MediaTypes qw(media_suffix);
my $VERSION = ’0.01’;
use vars qw($ua $proxy);
require LWP::UserAgent;
use HTTP::Status;
my $message = ($status == 0)
? "Server was down and successfully restarted!"
: "Server is down. Can’t restart.";
my $subject = ($status == 0)
? "Attention! Webserver restarted"
: "Attention! Webserver is down. can’t restart";
# Fetch document
my $res = $ua->request(HTTP::Request->new(GET => $url));
# failed
return 0;
} # end of sub checkurl
$messagebody
__END_OF_MAIL__
close MAIL;
}
When you execute the above the server will run in the fg (foreground) of the shell you have called it from.
So to kill you just kill it with Ctrl-C.
Note that in -X mode the server will run very slowly while fetching images. If you use Netscape while
your server is running in single-process mode, HTTP’s KeepAlive feature gets in the way. Netscape
tries to open multiple connections and keep them open. Because there is only one server process listening,
each connection has to time-out before the next succeeds. Turn off KeepAlive in httpd.conf to
avoid this effect while developing or you can press STOP after a few seconds (assuming you use the
image size params, so the Netscape will be able to render the rest of the page).
In addition you should know that when running with -X you will not see any control messages that the
parent server normally writes to the error_log. (Like ‘‘server started, server stopped and etc’’.) Since
httpd -X causes the server to handle all requests itself, without forking any children, there is no
controlling parent to write status messages.
I have approached it in other way. I have used the -Dparameter startup option of the server. I call my
version of the server
% http_perl -Dsbekman
In httpd.conf I wrote:
# Personal development Server for sbekman
# sbekman use the server running on port 8000
<IfDefine sbekman>
Port 8000
PidFile /usr/apps/var/httpd_perl/run/httpd.pid.sbekman
ErrorLog /usr/apps/var/httpd_perl/logs/error_log.sbekman
Timeout 300
KeepAlive On
MinSpareServers 2
MaxSpareServers 2
StartServers 1
MaxClients 3
MaxRequestsPerChild 15
</IfDefine>
Timeout 300
KeepAlive Off
MinSpareServers 1
MaxSpareServers 2
StartServers 1
MaxClients 5
MaxRequestsPerChild 0
</IfDefine>
What we have achieved with this technique: Full control over start/stop, number of children, separate error
log file, and port selection. This saves me from getting called every few minutes - ‘‘Stas, I’m going to
restart the server’’.
To make things even easier. (In the above technique, you have to discover the PID of your parent
httpd_perl process - written in /usr/apps/var/httpd_perl/run/httpd.pid.userfoo) . We
change the apachectl script to do the work for us. We make a copy for each developer called
apachectl.username and we change 2 lines in script:
PIDFILE=/usr/apps/var/httpd_perl/run/httpd.pid.sbekman
HTTPD=’/usr/apps/sbin/httpd_perl/httpd_perl -Dsbekman’
Of course you think you can use only one control file and know who is calling by using uid, but since you
have to be root to start the server - it is not so simple.
The last thing was to let developers an option to run in single process mode by:
/usr/apps/sbin/httpd_perl/httpd_perl -Dsbekman -X
In addition to making life easier, we decided to use relative links everywhere in the static docs (including
the calls to CGIs). You may ask how using the relative link you will get to the right server? Very simple -
we have utilized the mod_rewrite to solve our problems:
In access.conf of the httpd_docs server we have the following code: (you have to configure your
httpd_docs server with --enable-module=rewrite )
# sbekman’ server
# port = 8000
RewriteCond %{REQUEST_URI} ^/(perl|cgi-perl)
RewriteCond %{REMOTE_ADDR} 123.34.45.56
RewriteRule ^(.*) http://ourserver.com:8000/$1 [R,L]
# userfoo’s server
# port = 8001
RewriteCond %{REQUEST_URI} ^/(perl|cgi-perl)
RewriteCond %{REMOTE_ADDR} 123.34.45.57
RewriteRule ^(.*) http://ourserver.com:8001/$1 [R,L]
where IP numbers are the IPs of the developer client machines (where they are running their web
browser.) (I have tried to use REMOTE_USER since we have all the users authenticated but it did not work
for me)
Of course you have another problem: The CGI generates some html, which should be called again. If it
generates a URL with hard coded PORT the above scheme will not work. There 2 solutions:
First, generate relative URL so it will reuse the technique above, with redirect (which is transparent for
user) but it will not work if you have something to POST (redirect looses all the data!).
Second, use a general configuration module which generates a correct full URL according to
REMOTE_USER, so if $ENV{REMOTE_USER} eq ’sbekman’, I return
http://www.nowhere.com:8000/perl/ as cgi_base_url. Again this will work if the user is
authenticated.
All this is good for development. It is better to use the full URLs in production, since if you have a static
form and the Action is relative but the static document located on another server, pressing the form’s
submit will cause a redirect to mod_perl server, but all the form’s data will be lost during the redirect.
The best debugging approach is to write a wrapper that emulates the exact environment of the server, by
first deleting the environment variables like PERL5LIB and calling the same perl binary that it is being
used by the server. Next, set the environment identical to the server’s by copying the perl run directives
from server startup and configuration files. It will also allow you to remove completely the first line of the
script - since mod_perl skips it and the wrapper knows how to call the script.
Below is the example of such a script. Note that we force the -Tw when we call the real script. (I have
also added the ability to pass params, which will not happen when you call the cgi from the web)
#!/usr/apps/bin/perl -w
# It simulates the web server environment by setting the @INC and other
# stuff, so what will run under this wrapper will run under web and
# vice versa.
BEGIN{
use vars qw($basedir);
$basedir = "/usr/apps";
use strict;
use File::Basename;
use strict;
$ENV{PATH}=’/bin:/usr/bin’;
### configuration
my @logfiles = qw(access_log error_log);
umask 0;
my $server = "httpd_perl";
my $logs_dir = "/usr/apps/var/$server/logs";
my $restart_command = "/usr/apps/sbin/$server/apachectl restart";
my $gzip_exec = "/usr/intel/bin/gzip";
my ($sec,$min,$hour,$mday,$mon,$year) = localtime(time);
my $time = sprintf "%0.2d.%0.2d.%0.2d-%0.2d.%0.2d.%0.2d", $year,++$mon,$mday,$hour,$min,$sec;
$^I = ".".$time;
Cron fires off setuid script called log-roller that looks like this:
#!/usr/bin/perl -Tw
use strict;
use File::Basename;
$ENV{PATH} = "/usr/ucb:/bin:/usr/bin";
my %midnights;
open CONF, "<$CONF" or die "Cannot open $CONF: $!";
while (<CONF>) {
if (/^User (\w+)/i) {
$user_id = getpwnam($1);
next;
}
if (/^Group (\w+)/i) {
$group_id = getgrnam($1);
next;
}
if (/^PidFile (.*)/i) {
$pidfile = $1;
next;
}
next unless /^ErrorLog (.*)/i;
my $midnight = (dirname $1)."/$MIDNIGHT";
next unless -x $midnight;
$midnights{$midnight}++;
}
close CONF;
#!/usr/bin/perl -Tw
use strict;
Can you spot the security holes? Our trusted user base can’t or won’t. :) But these shouldn’t be used
in hostile situations.
It will produce a nice error in error_log, giving the line number of the call and the name of the undefined
subroutine.
Sometimes an error happens and causes the server to write millions of lines into your error_log file
and in a few minutes to put your server down on its knees. For example I get an error Callback
called exit show up in my error_log file many times. The error_log file grows to 300 Mbytes in
size in a few minutes. You should run a cron job to make sure this does not happen and if it does to take
care of it. Andreas J. Koenig is running this shell script every minute:
S=‘ls -s /usr/local/apache/logs/error_log | awk ’{print $1}’‘
if [ "$S" -gt 100000 ] ; then
/etc/rc.d/init.d/httpd restart
date | /bin/mail -s "error_log $S kB on inx" myemail@domain.com
fi
It seems that his script will trigger restart every minute, since once the logfile grows to be of 100000 lines,
it will stay of this size, unless you remove or rename it, before you do restart. On my server I run a watch-
dog every five minutes which restarts the server if it is getting stuck (it always works since when some
modperl child process goes wild, the I/O it causes is so heavy that other brother processes cannot normally
to serve the requests.)
,-----
| cyclog writes a log to disk. It automatically synchronizes the log
| every 100KB (by default) to guarantee data integrity after a crash. It
| automatically rotates the log to keep it below 1MB (by default). If
| the disk fills up, cyclog pauses and then tries again, without losing
| any data.
‘-----
;o)
10.2.1 Introduction
When loading the DBI module (do not confuse this with the Apache::DBI module) it looks if the envi-
ronment variable GATEWAY_INTERFACE starts with ’CGI-Perl’ and if the module Apache::DBI
has been loaded. In this case every connect request will be forwarded to the Apache::DBI module. This
looks if a database handle from a previous connect request is already stored and if this handle is still valid
using the ping method. If these two conditions are fulfilled it just returns the database handle. If there is no
appropriate database handle or if the ping method fails, a new connection is established and the handle is
stored for later re-use. In other words when the script is run again from a child that has already (and is
still) connected, the host/username/password is checked against the cache of open connections, and if one
is available, uses that one. There is no need to delete the disconnect statements from your code. They
won’t do anything because the Apache::DBI module overloads the disconnect method with a NOP
(like an empty call).
You want to use this module if you are opening a few DB connections to the server. Apache::DBI will
make them persistent per child, so if you have 10 children and each opens 2 different connections you will
have in total 20 opened persistent connections. Thus after initial connect you will save up the connection
time for every connect request from your DBI module. Which is a huge benefit for the mod_perl apache
server with high traffic of users deploying the relational DB.
As you understand you must NOT use this module if you are opening a special connection for each of
your users, since each of them will stay persistent and in a short time the number of connections will be so
big that your machine will scream and die. If you want to use Apache::DBI in both situations, as of this
moment the only available solution is to run 2 mod_perl servers, one using Apache::DBI and one not.
10.2.2 Configuration
After installing this module, the configuration is simple - add to the httpd.conf the following direc-
tive.
PerlModule Apache::DBI
Note that it is important, to load this module before any other ApacheDBI module !
You can skip preloading DBI, since Apache::DBI does that. But there is no harm if you leave it in.
As noted before, it is wise to you this method only if you only want all of apache to be able to connect to
the database server as one user (or few users).
Since now on you will see in the error.log file when Apache::DBI initializes a connection and
when it just returns it from its cache. Use the following command to see it in the real time (your
error.log file might be locate at a different path):
tail -f /usr/local/apache/logs/error_log
I use alias (in tcsh) so I will not have to remember the path:
alias err "tail -f /usr/local/apache/logs/error_log"
Note that starting from ver. 0.82, Apache::DBI implements ping inside the eval block, so if the
handle has been timed out, it should reconnect without creating the morning bug.
A: indeed, Apache::DBI returns a cached database handle, if and only if all parameters including all
options are identical. But you are free to modify the handle right after you got it from the cache. Initiate
the connections always using the same parameters and set LongReadLen afterwards.
A: You did not use DBI::connect() as with usual DBI usage to get your $dbh database handler.
Using the Apache::DBI does not eliminate the need to write a proper DBI code. As the man page states
- you should program as if you did not use Apache::DBI at all. Apache::DBI will override and
return you a cached connection. And in case of disconnect() call it will be just ignored.
Use the example script eg/startup.pl, just remove the comment from the line:
#use Apache::DebugDBI;
and adapt the connect string. Do not change anything in your scripts, for using Apache::DBI.
;o)
You are an old sailor in the ISP business, you have seen it all, you know how many ISPs are out there and
you know that the sales margins are too low to keep you happy. You are looking for some new service
almost no one provides, to attract more clients to become your users and hopefully to have a bigger slice
than a neighbor ISP.
If you are a user asking for a mod_perl service or an ISP considering to provide this service, this section
should make things clear for both of you.
1. ISP cannot afford having a user, running scripts under mod_perl, on the main server, since it will die
very soon for one of the many reasons: either sloppy programming, or user testing just updated script
which probably has some syntax errors and etc, no need to explain why if you are familiar with
mod_perl peculiarities. The only scripts that CAN BE ALLOWED to use, are the ones that were
written by ISP and are not being modified by user (guest books, counters and etc - the same standard
scripts ISPs providing since they were born). So you have to say NO for this choice.
2. But, hey why I cannot let my user to run his own server, so I clean my hands off and do not care how
dirty and sloppy user’s code is (assuming that user is running the server by his own username).
This option is fine as long as you are concerned about your new system requirements. If you have
even some very limited experience with mod_perl, you know that mod_perl enabled apache servers
while freeing up your CPU and lets you run scripts much much faster, has a huge memory demands
(5-20 times the plain apache uses). The size depends on the code length, sloppiness of the program-
mer, possible memory leaks the code might have and all that multiplied by the number of children
each server spawns. A very simple example : a server demanding 10Mb of memory which spawns 10
children, already rises your memory requirements by 100Mb (the real requirement are actually
smaller if your OS allows code sharing between processes and a programmer exploits these features
in her code). Now multiply the received number by the number of users you intend to have and you
will get the memory requirements. Since ISPs never say no, you better use an opposite approach -
think of a largest memory size you can afford then divide it by one user’s requirements as I have
shown in example, and you will know how much mod_perl users you can afford :)
But who am I to prognosticate how much memory your user may use. His requirement from a single
server can be very modest, but do you know how many of servers he will run (after all she has all the
control over httpd.conf - and it has to be that way, since this is very essential for the user running
mod_perl)?
All this rumbling about memory leads to a single question: Can you restrict user from using more
than X memory? Or another variation of the question: Assuming you have as much memory as you
want, can you charge user for the average memory usage?
If the answer for either of the above question is positive, you are all set and your clients will prize
your name for letting them run mod_perl! There are tools to restrict resources’ usage (See for
example man pages for ulimit(3), getrlimit(2), setrlimit(2) and sysconf(3) ).
If you have picked this choice, you have to provide your client:
Shutdown/startup scripts installed together with the rest of your daemon startup scripts (e.g
/etc/rc.d directory) scripts, so when you reboot your machine user’s server will be correctly
shutdowned and will be back online the moment your system comes back online. Also make
sure to start each server under username the server belongs to, if you are not looking for a big
trouble.
Proxy (in a forward or httpd accelerator mode) services for user’s virtual host. Since user will
have to run her server on unprivileged port (>1024), you will have to forward all requests from
user.given.virtual.hostname:80 (which is user.given.virtual.hostname
without port - 80 is a default) to your.machine.ip:port_assigned_to_user and
user to code his scripts to write self referencing URLs to be of
user.given.virtual.hostname base of course.
Letting user to run a mod_perl server, immediately adds a requirement for user to be able to
restart and configure their own server. But only root can bind port 80. That is why user has to
use ports numbers >1024.
Another problem you will have to solve is how to assign ports between users. Since user can
pick any port above 1024 to run his server on, you will have to make some regulation here. A
simple example will stress the importance of this problem: I am a malicious user or I just a rival
of some fellow who runs his own server on your ISP. All I should do is to find out what port his
server is listening to (e.g. with help of netstat(8)) and configure my own server to listen on
the same port. While I am unable to bind to this same port, imagine what will happen when you
reboot your system and my startup script happen to be run before my rivals! I get the port first,
now all requests will be redirected to my server and let your imagination go wild about what
nasty things might happen then. Of course the ugly things will be revealed pretty soon, but the
damage has been done.
3. A much better, but costly solution is co-location. Let user to hook her (or ISP’s) stand alone machine
into your network, and forget about this user. Of course either user or you will have to make all the
system administration chores and it will cost your client more money.
All in all, who are the people who seek the mod_perl support? The ones who run serious
projects/businesses, who can afford a stand alone box, thus gaining their goal of self autonomy and
keeping their ISP happy. So money is not an obstacle.
;o)
12.2 Acknowledgements
I want to thank all the people who donated their time and efforts to make this amazing idea of mod_perl a
reality. This includes Doug MacEachern, the author of mod_perl and all the developers who contributed
bug patches, modules and help. And of course the numerous unseen users who helped to find bugs and
advocate the mod_perl around the world.
http://perl.apache.org
Getting mod_perl
Get the latest mod_perl sources and documentation from http://perl.apache.org . Try the direct down-
load link http://perl.apache.org/dist/ .
http://www.modperl.com is the home site of The Apache Modules Book, a book about creating Web
server modules using the Apache API, written by Lincoln Stein and Doug MacEachern.
Now you can purchase the book at your local bookstore or from the online dealer. O’Reilly lists this
book as:
mod_perl Guide
mod_perl FAQ
at http://perl.apache.org/dist/cgi_to_mod_perl.html .
at http://perl.apache.org/dist/mod_perl_traps.html .
http://www.refcards.com (Apache and other refcards are available from this link)
The Apache/Perl mailing list (modperl@apache.org) is available for mod_perl users and develop-
ers to share ideas, solve problems and discuss things related to mod_perl and the Apache::*
modules. To subscribe to this list, send mail to majordomo@apache.org with empty Subject and
with Body:
subscribe modperl
Perl is most likely already installed on your machine, but you should at least check the version you
using. It is highly recommended that you have at least perl version 5.004 or higher. You can get the
latest perl version from http://www.perl.com/ . Try the direct download link
http://www.perl.com/pace/pub/perldocs/latest.html . You can get a perl documentation from the same
location.
http://www.perl.com/CPAN/doc/FAQs/FAQ/PerlFAQ.html
perl home
http://www.perl.com/
http://www.tpj.com/
at http://www.perl.com/CPAN/doc/FAQs/cgi/perl-cgi-faq.html
http://www.singlesheaven.com/stas/TULARC/webmaster/myfaq.html
http://www.perl.com/CPAN/doc/FAQs/cgi/idiots-guide.html
http://www.w3.org/Security/Faq/www-security-faq.html
http://www.apache.org
Getting apache
Get the latest apache webserver and documentation from http://www.apache.org . Try the direct
download link http://www.apache.org/dist/ .
http://www.refcards.com (Apache and other refcards are available from this link)
http://www.apache.org/docs/misc/FAQ.html
http://www.apache.org/docs/
Apache Handlers
http://www.apache.org/docs/handler.html
DBI at Hermetica
http://www.hermetica.com/technologia/DBI/
http://www.fugue.com/dbi/
http://outside.organic.com/mail-archives/dbi-users/
http://perl.apache.org/src/mod_perl.html#PERSISTENT_DATABASE_CONNECTIONS
FAQ - http://squid.nlanr.net/Squid/FAQ/FAQ.html
http://www.acme.com/software/thttpd/
mod_proxy_add_forward
Ask Bjoern Hansen has written a mod_proxy_add_forward.c module for Apache, that sets the
X-Forwarded-For field when doing a ProxyPass, similar to what Squid can do. His patch is at:
http://modules.apache.org/search?id=124 or at ftp://ftp.netcetera.dk/pub/apache/
;o)
29 Jun 1999 i
3.15 Handling the server timeout cases and working with $SIG{ALRM} . . . . . 40
mod_perl tutorial: Performance. Benchmarks. . . . . . . . . . . . 42
4 Performance. Benchmarks. . . . . . . . . . . . . . . . 42
4.1 Performance: The Overall picture . . . . . . . . . . . . . 43
4.2 Sharing Memory . . . . . . . . . . . . . . . . 43
4.3 Preload Perl modules at server startup . . . . . . . . . . . . 44
4.4 Preload Registry Scripts . . . . . . . . . . . . . . . 44
4.5 Avoid Importing Functions . . . . . . . . . . . . . . 45
4.6 How can I find if my mod_perl scripts have memory leaks (and where) . . . . 45
4.7 Limiting the size, resource, speed of the processes . . . . . . . . . 46
4.8 Persistent DB Connections . . . . . . . . . . . . . . 46
4.9 Benchmarks. Impressing your Boss and Colleagues. . . . . . . . . . 48
4.9.1 Benchmarking scripts with execution times below 1 second :) . . . . . . 48
4.9.2 PerlHandler’s Benchmarking . . . . . . . . . . . . . 48
4.10 Tuning the Apache’s configuration variables for the best performance . . . . 48
4.10.1 Tuning with ab - ApacheBench . . . . . . . . . . . . 49
4.10.2 Tuning with crashme script . . . . . . . . . . . . . 54
4.10.3 Choosing MaxClients . . . . . . . . . . . . . . 57
4.10.4 Choosing MaxRequestsPerChild . . . . . . . . . . . . 58
4.10.5 Choosing MinSpareServers, MaxSpareServers and StartServers . . . . . 59
4.10.6 Summary of Benchmarking to tune all 5 parameters . . . . . . . 60
4.11 Using $|=1 under mod_perl and better print() techniques. . . . . . . . 61
4.12 Profiling . . . . . . . . . . . . . . . . . . 62
4.13 Sending plain HTML as a compressed output . . . . . . . . . . 63
4.14 Apache::GzipChain - compress HTML (or anything) in the OutputChain . . . . 63
mod_perl tutorial: Choosing the Right Strategy . . . . . . . . . . . 64
5 Choosing the Right Strategy . . . . . . . . . . . . . . . 64
5.1 Do it like me?! . . . . . . . . . . . . . . . . . 65
5.2 mod_perl Deployment Overview . . . . . . . . . . . . . 65
5.3 Standalone mod_perl Enabled Apache Server . . . . . . . . . . 66
5.4 One Plain and One mod_perl-enabled Apache Servers . . . . . . . . 67
5.5 One light non-Apache and One mod_perl enabled Apache Servers . . . . . 68
5.6 Adding a Proxy Server in http Accelerator Mode . . . . . . . . . 69
5.7 The Squid Server . . . . . . . . . . . . . . . . 71
5.8 An Apache’s mod_proxy . . . . . . . . . . . . . . . 71
mod_perl tutorial: Real World Scenarios Implementation . . . . . . . . . 73
6 Real World Scenarios Implementation . . . . . . . . . . . . . 73
6.1 Standalone mod_perl Enabled Apache Server . . . . . . . . . . 74
6.1.1 Installation in 10 lines . . . . . . . . . . . . . . 74
6.1.2 Installation in 10 paragraphs . . . . . . . . . . . . . 74
6.1.3 Configuration Process . . . . . . . . . . . . . . 75
6.2 One Plain and One mod_perl enabled Apache Servers . . . . . . . . 77
6.2.1 Configuration and Compilation of the Sources. . . . . . . . . . 79
6.2.1.1 Building the httpd_docs Server . . . . . . . . . . . 79
6.2.1.2 Building the httpd_perl (mod_perl enabled) Server . . . . . . . 80
6.2.2 Configuration of the servers . . . . . . . . . . . . . 81
ii 29 Jun 1999
6.2.2.1 Basic httpd_docs Server’s Configuration . . . . . . . . . 81
6.2.2.2 Basic httpd_perl Server’s Configuration . . . . . . . . . 82
6.3 Running 2 webservers and squid in httpd accelerator mode . . . . . . . 83
6.4 Running 1 webserver and squid in httpd accelerator mode . . . . . . . 88
6.5 Using mod_proxy . . . . . . . . . . . . . . . . 90
6.6 mod_perl server as DSO . . . . . . . . . . . . . . . 91
6.7 HTTP Authentication with 2 servers + proxy . . . . . . . . . . 92
mod_perl tutorial: Installation Notes . . . . . . . . . . . . . . 93
7 Installation Notes . . . . . . . . . . . . . . . . . 93
7.1 Configuration and Installation . . . . . . . . . . . . . . 94
7.1.1 perl . . . . . . . . . . . . . . . . . . 94
7.1.2 apache . . . . . . . . . . . . . . . . . . 94
7.1.3 mod_perl . . . . . . . . . . . . . . . . . 94
7.2 How can I tell whether mod_perl is running . . . . . . . . . . 95
7.2.1 Testing by checking the error_log file . . . . . . . . . . . 95
7.2.2 Testing by viewing /perl-status . . . . . . . . . . . . 95
7.2.3 Testing via telnet . . . . . . . . . . . . . . . 95
7.2.4 Testing via a CGI script . . . . . . . . . . . . . . 96
7.2.5 Testing via lwp-request . . . . . . . . . . . . . . 97
7.3 Is it possible to install and use apache/mod_perl without having a root access? . . . 98
7.4 Is it possible to determine which options were given to modperl’s Makefile.PL . . . 98
7.5 Server Installation problems . . . . . . . . . . . . . . 99
7.5.1 make test fails . . . . . . . . . . . . . . . . 99
7.5.2 mod_perl.c is incompatible with this version of apache . . . . . . . 99
7.5.3 Should I rebuild mod_perl if I have upgraded my perl? . . . . . . . 100
mod_perl tutorial: Server Configuration . . . . . . . . . . . . . 101
8 Server Configuration . . . . . . . . . . . . . . . . 101
8.1 mod_perl Specific Configuration . . . . . . . . . . . . . 102
8.1.1 Alias Configurations . . . . . . . . . . . . . . . 102
8.1.2 Location Configuration . . . . . . . . . . . . . . 103
8.1.3 PerlFreshRestart . . . . . . . . . . . . . . . . 104
8.1.4 /perl-status location . . . . . . . . . . . . . . . 104
8.1.4.1 Configuration . . . . . . . . . . . . . . . 104
8.1.4.2 Usage . . . . . . . . . . . . . . . . . 105
8.1.4.3 Compiled Registry Scripts section seems to be empty. . . . . . . 105
8.1.5 PerlSetVar, PerlSetEnv and PerlPassEnv . . . . . . . . . . 105
8.1.6 perl-startup file . . . . . . . . . . . . . . . . 106
8.1.6.1 Sample perl-startup file . . . . . . . . . . . . . 106
8.1.6.2 What modules should you add to the startup file and why. . . . . . 107
8.1.6.3 The confusion with use() clause at the server startup? . . . . . . 107
8.1.6.4 The confusion with defining globals in startup . . . . . . . . 108
8.2 Running ’apachectl configtest’ or ’httpd -t’ . . . . . . . . . . . 108
8.3 Perl Sections . . . . . . . . . . . . . . . . . 109
8.4 General pitfalls . . . . . . . . . . . . . . . . . 110
8.4.1 My cgi/perl code is being returned as a plain text instead of being executed by the
webserver? . . . . . . . . . . . . . . . . . . 110
iv 29 Jun 1999