Transparent Proxy With Linux and Squid mini-HOWTO
Transparent Proxy With Linux and Squid mini-HOWTO
Transparent Proxy With Linux and Squid mini-HOWTO
Table of Contents
Transparent Proxy with Linux and Squid mini-HOWTO.............................................................................1
Daniel Kiracofe........................................................................................................................................1
1. Introduction..........................................................................................................................................1
1.1 Comments..........................................................................................................................................1
1.2 Copyrights and Trademarks...............................................................................................................1
1.3 #include <disclaimer.h>.....................................................................................................................1
2. Overview of Transparent Proxying......................................................................................................1
2.1 Motivation..........................................................................................................................................2
2.2 Scope of this document......................................................................................................................2
2.3 HTTPS...............................................................................................................................................3
2.4 Proxy Authentication.........................................................................................................................3
3. Configuring the Kernel........................................................................................................................3
4. Setting up squid...................................................................................................................................4
5. Setting up iptables (Netfilter)...............................................................................................................5
6. Transparent Proxy to a Remote Box...................................................................................................5
6.1 First method (simpler, but does not work for some esoteric cases)...................................................5
6.2 Second method (more complicated, but more general).....................................................................6
6.3 Method One: What if iptables-box is on a dynamic IP?....................................................................7
7. Transparent Proxy With Bridging........................................................................................................7
8. Put it all together..................................................................................................................................7
9. Troubleshooting...................................................................................................................................8
10. Further Resources..............................................................................................................................8
i
Transparent Proxy with Linux and Squid
mini-HOWTO
Daniel Kiracofe
v1.15, August 2002
This document provides information on how to setup a transparent caching HTTP proxy server using only
Linux and squid.
1. Introduction
1.1 Comments
Comments and general feedback on this mini HOWTO are welcome and can be directed to its author, Daniel
Kiracofe, at drk@unxsoft.com.
This manual may be reproduced in whole or in part, without fee, subject to the following restrictions:
• The copyright notice above and this permission notice must be preserved complete on all complete or
partial copies
• Translation to another language is permitted, provided that the author is notified prior to the
translation.
• Any derived work must be approved by the author in writing before distribution.
• If you distribute this work in part, instructions for obtaining the complete version of this manual must
be included, and a means for obtaining a complete version provided.
• Small portions may be reproduced as illustrations for reviews or quotes in other works without this
permission notice if proper citation is given.
Exceptions to these rules may be granted for academic purposes: Write to the author and ask. These
restrictions are here to protect us as authors, not to restrict you as learners and educators. Any source code
(aside from the SGML this document was written in) in this document is placed under the GNU General
Public License, available via anonymous FTP from the GNU archive.
2.1 Motivation
In ``ordinary'' proxying, the client specifies the hostname and port number of a proxy in his web browsing
software. The browser then makes requests to the proxy, and the proxy forwards them to the origin servers.
This is all fine and good, but sometimes one of several situations arise. Either
• You want to force clients on your network to use the proxy, whether they want to or not.
• You want clients to use a proxy, but don't want them to know they're being proxied.
• You want clients to be proxied, but don't want to go to all the work of updating the settings in
hundreds or thousands of web browsers.
This is where transparent proxying comes in. A web request can be intercepted by the proxy, transparently.
That is, as far as the client software knows, it is talking to the origin server itself, when it is really talking to
the proxy server. (Note that the transparency only applies to the client; the server knows that a proxy is
involved, and will see the IP address of the proxy, not the IP address of the user. Although, squid may pass an
X-Forwarded-For header, so that the server can determine the original user's IP address if it groks that
header).
Cisco routers support transparent proxying. So do many switches. But, (surprisingly enough) Linux can act as
a router, and can perform transparent proxying by redirecting TCP connections to local ports. However, we
also need to make our web proxy aware of the affect of the redirection, so that it can make connections to the
proper origin servers. There are two general ways this works:
The first is when your web proxy is not transparent proxy aware. You can use a nifty little daemon called
transproxy that sits in front of your web proxy and takes care of all the messy details for you. transproxy was
written by John Saunders, and is available from
ftp://ftp.nlc.net.au/pub/linux/www/ or your local metalab mirror. transproxy will not be discussed further in
this document.
A cleaner solution is to get a web proxy that is aware of transparent proxying itself. The one we are going to
focus on here is squid. Squid is an Open Source caching proxy server for Unix systems. It is available from
www.squid-cache.org
Alternatively, instead of redirecting the connections to local ports, we could redirect the connections to remote
ports. This is discussed in the Transparent Proxy to a Remote Box section. Readers interested in this approach
should skip down to that section. Readers interested on doing everything on one box can safely ignore that
section.
If you are using a development kernel or a development version of squid, you are on your own. This
document may help you, but YMMV.
I only focus on squid here, but Apache can also function as a caching proxy server. (If you are not sure which
to use, I recommend squid, since it was built from the ground up to be a caching proxy server, Apache's
caching proxy features are more of afterthought additions to an already existing system.) If you want use
Apache instead of squid: follow all the instructions in this document that pertain to the kernel and iptables
rules. Ignore the squid specific sections, and instead look at
http://lupo.campus.uniroma2.it/progetti/mod_tproxy/ for source code and instructions for a transparent proxy
module for Apache (thanks to Cristiano Paris (c.paris@libero.it) for contributing this).
2.3 HTTPS
Finally, as far as transparently proxing HTTPS (e.g. secure web pages using SSL, TSL, etc.), you can't do it.
Don't even ask. For the explanation, do a search for 'man-in-the-middle attack'. Note that you probably don't
really need to transparently proxy HTTPS anyway, since squid can not cache secure pages.
If your kernel is not configured for transparent proxying, you will need to recompile. Recompiling a kernel is
a complex process (at least at first), and it is beyond the scope of this document. If you need help compiling a
kernel, please see The Kernel HOWTO
The options you need to set in your configuration are as follows (Note: if you prefer modules, some (but not
all) of these can be built as modules. Luckily, everything that is not modularizable is probably got in your
kernel anyway.)
Once you have your new kernel up and running, you may need to enable IP forwarding. IP forwarding allows
your computer to act as a router. Since this is not what the average user wants to do, it is off by default and
must be explicitly enabled at run-time. However, your distribution might do this for you already. To check, do
``cat /proc/sys/net/ipv4/ip_forward''. If you see ``1'' you're good. Otherwise, do ``echo '1' >
/proc/sys/net/ipv4/ip_forward''. You will then want to add that command to your appropriate bootup scripts
(depending on your distribution, these may live in /etc/rc.d, /etc/init.d, or maybe somewhere else entirely).
4. Setting up squid
Now, we need to get squid up and running. Download the latest source tarball from www.squid-cache.org.
Make sure you get a STABLE version, not a DEVEL version. The latest as of this writing was
squid-2.4.STABLE4.tar.gz. Note that AFAIK, you must have squid-2.4 for linux kernel 2.4. The reason is that
the mechanism by which the process determines the original destination address has changed from linux 2.2,
and only squid-2.4 has this new code in it. (For those of you who are interested, previously the getsockname()
call was hacked to provide the original destination address, but now the call is getsockopt() with a level of
SOL_IP and an option of SO_ORIGINAL_DST).
Now, untar and gunzip the archive (use ``tar -xzf <filename>''). Run the autoconfiguration script and tell it to
include netfilter code (``./configure --enable-linux-netfilter''), compile (``make'') and then install (``make
install'').
Now, we need to edit the default squid.conf file (installed to /usr/local/squid/etc/squid.conf, unless you
changed the defaults). The squid.conf file is heavily commented. In fact, some of the best documentation
available for squid is in the squid.conf file. After you get it all up and running, you should go back and reread
the whole thing. But for now, let's just get the minimum required. Find the following directives, uncomment
them, and change them to the appropriate values:
• httpd_accel_host virtual
• httpd_accel_port 80
• httpd_accel_with_proxy on
• httpd_accel_uses_host_header on
Next, look at the cache_effective_user and cache_effective_group directives. Unless the default
nobody/nogroup has been created on your system (AFAIK, it is not created out of the box on many popular
distributions, including RH7.1), you'll either need to create those, or create another username/group for squid
to run under. I strongly recommend that you create a username/group of squid/squid and run under that, but
you could use any existing user/group if you want.
Finally, look at the http_access directive. The default is usually ``http_access deny all''. This will prevent
anyone from accessing squid. For now, you can change this to ``http_access allow all'', but once it is working,
you will probably want to read the directions on ACLs (Access Control Lists), and setup the cache such that
only people on your local network (or whatever) can access the cache. This may seem silly, but you should
put some kind of restrictions on access to your cache. People behind filtering firewalls (such as porn filters, or
filters in nations where speech is not very free) often ``hijack'' onto wide open proxies and eat up your
bandwidth.
Initialize the cache directories with ``squid -z'' (if this is a not a new installation of squid, you should skip this
step).
Now, run squid using the RunCache script in the /usr/local/squid/bin/ directory. If it works, you should be
able to set your web browser's proxy settings to the IP of the box and port 3128 (unless you changed the
default port number) and access squid as a normal proxy.
For additional help configuring squid, see the squid FAQ at www.squid-cache.org
To set up the rules, you will need to know two things, the interface that the to-be-proxied requests are coming
in on (I'll use eth0 as an example) and the port squid is running on (I'll use the default of 3128 as an example).
You will want to add the above commands to your appropriate bootup script under /etc/rc.d/. Readers
upgrading from 2.2 kernels should note that this is the only command needed. 2.2 kernels required two extra
commands in order to prevent forwarding loops. The infastructure of netfilter is much nicer, and only this
command is needed.
For the purposes of example commands, let's assume we have two boxes called squid-box and iptables-box,
and that they are on the network local-network. In the commands below, replace these strings with the actual
IP addresses or name of your machines and network.
6.1 First method (simpler, but does not work for some
esoteric cases)
First, we need to machine that squid will be running on, squid-box. You do not need iptables or any special
kernel options on this machine, just squid. You *will*, however, need the 'http_accel' options as described
above. (Previous version of this HOWTO suggested that you did not need those options. That was a mistake.
Sorry to have confused people...)
Now, the machine that iptables will be running on, iptables-box You will need to configure the kernel as
described in section 3 above, except that you don't need the REDIRECT target support). Now, for the iptables
commands. You need three:
The first one sends the packets to squid-box from iptables-box. The second makes sure that the reply gets sent
back through iptables-box, instead of directly to the client (this is very important!). The last one makes sure
the iptables-box will forward the appropriate packets to squid-box. It may not be needed. YMMV. Note that
we specified '-i eth0' and then '-o eth0', which stands for input interface eth0 and output interface eth0. If your
packets are entering and leaving on different interfaces, you will need to adjust the commands accordingly.
You'll also need the iproute2 tools. Your distribution probably already has them installed, but if not, look at
ftp://ftp.inr.ac.ru/ip-routing/
Note that the choice of firewall mark (3) and routing table (2) was fairly arbitrary. If you are already using
policy routing or firewall marking for some other purpose, make sure you choose unique numbers here.
Otherwise, don't worry about it.
Next, squid-box. Use this command, which should look remarkably similar to a command we've seen
previously.
Here is a brief explanation of how this works: in method one, we used Network Address Translation to get the
packets to the other box. The result of this is that the packet gets altered. This alteration is what causes some
kinds of clients mentioned above to fail. In method two, we use a magic thing called policy routing. The first
thing we do is to select the packets we want. Thus, all packets on port 80, except those coming from
squid-box itself, are MARKed. Then, when the kernel goes to make a routing decision, the MARKed packets
aren't routing using the normal routing table that you access with the ``route'' command but with a special
table. This special table has only one entry, a default gateway to squid-box. Thus, the packet is sent merrily on
it's way without every having been altered. So, even HTTP/1.0 connections can be handled perfectly. (Thanks
to Michal Svoboda for suggesting and helping to write this section)
This change avoids having to specify the IP address of iptables-box in the command. Since it will change
often, you'd have to change your commands to reflect it. This will save you a lot of hassle.
If you are trying to setup a transparent proxy on a Linux machine that has been configured as a bridge, you
will need to add one additional iptables command to what we had in section 5. Specifically, you need to
explicitly allow connections to the machine on port 3128 (or any other port squid is listening on), otherwise
the machine will just forward them over to the other interface like a good little bridge. Here's the magic
words:
Replacing interface with the interface that corresponds to your_bridge_ip (typically eth0 or eth1). First time
bridge users should also note that you'll probably want to repeat the same command with ``3128'' replaced by
``telnet'' if you want to administer your bridge remotely.
9. Troubleshooting
There is one problem that occurs often enough to mention here. If you get the following error:
/lib/modules/2.4.2-2/kernel/net/ipv4/netfilter/ip_tables.o
init_modules: Device or resource busy Hints: insmod errors can
be caused by incorrect module parameters; including invalid IO
or IRQ parameters.
then you are probably running Red Hat 7.x. The folks at Red Hat, in all their wisdom, decided to load the
ipchains module by default on startup. I guess this was for backwards compatibility for those who haven't
learned iptables yet. However, the problem is that ipchains and iptables are mutually incompatible. Since
ipchains has been secretly loaded by RH, you cannot use iptables commands. To see if this is your problem,
do the command ``lsmod'' and look for the module named ``ipchains''. If you see it, that is your problem. The
quick fix is to execute the command ``rmmod ipchains'' before you issue any iptables commands. To
permanently remove these commands from your startup scripts, the following command should work:
``/sbin/chkconfig --level 2345 ipchains off''. (Thanks to Rasmus Glud for pointing this command out to me).