Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 6fdd71c

Browse files
committed
Add to mmap discussion.
1 parent 29c18bc commit 6fdd71c

File tree

1 file changed

+392
-0
lines changed

1 file changed

+392
-0
lines changed

doc/TODO.detail/mmap

Lines changed: 392 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2014,3 +2014,395 @@ KwvG7YLsJ+xpsTUS67KD+4M=
20142014

20152015
--HjNkcEWJ4DMx36DP--
20162016

2017+
From pgsql-performance-owner+M1354=pgman=candle.pha.pa.us@postgresql.org Fri Mar 7 01:09:07 2003
2018+
Return-path: <pgsql-performance-owner+M1354=pgman=candle.pha.pa.us@postgresql.org>
2019+
Received: from relay2.pgsql.com (relay2.pgsql.com [64.49.215.143])
2020+
by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id h27693604295
2021+
for <pgman@candle.pha.pa.us>; Fri, 7 Mar 2003 01:09:05 -0500 (EST)
2022+
Received: from postgresql.org (postgresql.org [64.49.215.8])
2023+
by relay2.pgsql.com (Postfix) with ESMTP id 95CD2EDFD3B
2024+
for <pgman@candle.pha.pa.us>; Fri, 7 Mar 2003 01:09:03 -0500 (EST)
2025+
X-Original-To: pgsql-performance@postgresql.org
2026+
Received: from perrin.int.nxad.com (internal.ext.nxad.com [69.1.70.251])
2027+
by postgresql.org (Postfix) with ESMTP id F16034768E2
2028+
for <pgsql-performance@postgresql.org>; Fri, 7 Mar 2003 01:04:33 -0500 (EST)
2029+
Received: by perrin.int.nxad.com (Postfix, from userid 1001)
2030+
id 7969A21065; Thu, 6 Mar 2003 22:04:12 -0800 (PST)
2031+
Date: Thu, 6 Mar 2003 22:04:12 -0800
2032+
From: Sean Chittenden <sean@chittenden.org>
2033+
To: Neil Conway <neilc@samurai.com>
2034+
cc: Tom Lane <tgl@sss.pgh.pa.us>,
2035+
Christopher Kings-Lynne <chriskl@familyhealth.com.au>,
2036+
PostgreSQL Performance <pgsql-performance@postgresql.org>
2037+
Subject: Re: [PERFORM] [COMMITTERS] pgsql-server/ /configure /configure.in rc/incl ...
2038+
Message-ID: <20030307060412.GA19138@perrin.int.nxad.com>
2039+
References: <20030306031656.1876F4762E0@postgresql.org> <032f01c2e390$b1842b20$6500a8c0@fhp.internal> <11077.1046921667@sss.pgh.pa.us> <033f01c2e392$71476570$6500a8c0@fhp.internal> <12228.1046922471@sss.pgh.pa.us> <20030306094117.GA79234@perrin.int.nxad.com> <15071.1046964336@sss.pgh.pa.us> <20030307003640.GF79234@perrin.int.nxad.com> <1046998072.10527.67.camel@tokyo>
2040+
MIME-Version: 1.0
2041+
Content-Type: multipart/signed; micalg=pgp-sha1;
2042+
protocol="application/pgp-signature"; boundary="KsGdsel6WgEHnImy"
2043+
Content-Disposition: inline
2044+
In-Reply-To: <1046998072.10527.67.camel@tokyo>
2045+
User-Agent: Mutt/1.4i
2046+
X-PGP-Key: finger seanc@FreeBSD.org
2047+
X-PGP-Fingerprint: 3849 3760 1AFE 7B17 11A0 83A6 DD99 E31F BC84 B341
2048+
X-Web-Homepage: http://sean.chittenden.org/
2049+
Precedence: bulk
2050+
Sender: pgsql-performance-owner@postgresql.org
2051+
Status: OR
2052+
2053+
--KsGdsel6WgEHnImy
2054+
Content-Type: text/plain; charset=us-ascii
2055+
Content-Disposition: inline
2056+
Content-Transfer-Encoding: quoted-printable
2057+
2058+
> > I don't have my copy of Steven's handy (it's some 700mi away atm
2059+
> > otherwise I'd cite it), but if Tom or someone else has it handy, look
2060+
> > up the example re: the performance gain from read()'ing an mmap()'ed
2061+
> > file versus a non-mmap()'ed file. The difference is non-trivial and
2062+
> > _WELL_ worth the time given the speed increase.
2063+
>=20
2064+
> Can anyone confirm this? If so, one easy step we could take in this
2065+
> direction would be adapting COPY FROM to use mmap().
2066+
2067+
Weeee! Alright, so I got to have some fun writing out some simple
2068+
tests with mmap() and friends tonight. Are the results interesting?
2069+
Absolutely! Is this a simple benchmark? Yup. Do I think it
2070+
simulates PostgreSQL? Eh, not particularly. Does it demonstrate that
2071+
mmap() is a win and something worth implementing? I sure hope so. Is
2072+
this a test program to demonstrate the ideal use of mmap() in
2073+
PostgreSQL? No. Is it a place to start a factual discussion? I hope
2074+
so.
2075+
2076+
I have here four tests that are conditionalized by cpp.
2077+
2078+
# The first one uses read() and write() but with the buffer size set
2079+
# to the same size as the file.
2080+
gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -o test-=
2081+
mmap test-mmap.c
2082+
/usr/bin/time ./test-mmap > /dev/null
2083+
Beginning tests with file: services
2084+
2085+
Page size: 4096
2086+
File read size is the same as the file size
2087+
Number of iterations: 100000
2088+
Start time: 1047013002.412516
2089+
Time: 82.88178
2090+
2091+
Completed tests
2092+
82.09 real 2.13 user 68.98 sys
2093+
2094+
# The second one uses read() and write() with the default buffer size:
2095+
# 65536
2096+
gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -DDEFAUL=
2097+
T_READSIZE=3D1 -o test-mmap test-mmap.c
2098+
/usr/bin/time ./test-mmap > /dev/null
2099+
Beginning tests with file: services
2100+
2101+
Page size: 4096
2102+
File read size is default read size: 65536
2103+
Number of iterations: 100000
2104+
Start time: 1047013085.16204
2105+
Time: 18.155511
2106+
2107+
Completed tests
2108+
18.16 real 0.90 user 14.79 sys
2109+
# Please note this is significantly faster, but that's expected
2110+
2111+
# The third test uses mmap() + madvise() + write()
2112+
gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -DDEFAUL=
2113+
T_READSIZE=3D1 -DDO_MMAP=3D1 -o test-mmap test-mmap.c
2114+
/usr/bin/time ./test-mmap > /dev/null
2115+
Beginning tests with file: services
2116+
2117+
Page size: 4096
2118+
File read size is the same as the file size
2119+
Number of iterations: 100000
2120+
Start time: 1047013103.859818
2121+
Time: 8.4294203644
2122+
2123+
Completed tests
2124+
7.24 real 0.41 user 5.92 sys
2125+
# Faster still, and twice as fast as the normal read() case
2126+
2127+
# The last test only calls mmap()'s once when the file is opened and
2128+
# only msync()'s, munmap()'s, close()'s the file once at exit.
2129+
gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -DDEFAUL=
2130+
T_READSIZE=3D1 -DDO_MMAP=3D1 -DDO_MMAP_ONCE=3D1 -o test-mmap test-mmap.c
2131+
/usr/bin/time ./test-mmap > /dev/null
2132+
Beginning tests with file: services
2133+
2134+
Page size: 4096
2135+
File read size is the same as the file size
2136+
Number of iterations: 100000
2137+
Start time: 1047013111.623712
2138+
Time: 1.174076
2139+
2140+
Completed tests
2141+
1.18 real 0.09 user 0.92 sys
2142+
# Substantially faster
2143+
2144+
2145+
Obviously this isn't perfect, but reading and writing data is faster
2146+
(specifically moving pages through the VM/OS). Doing partial writes
2147+
from mmap()'ed data should be faster along with scanning through
2148+
mmap()'ed portions of - or completely mmap()'ed - files because the
2149+
pages are already loaded in the VM. PostgreSQL's LRU file descriptor
2150+
cache could easily be adjusted to add mmap()'ing of frequently
2151+
accessed files (specifically, system catalogs come to mind). It's not
2152+
hard to figure out how often particular files are accessed and to
2153+
either _avoid_ mmap()'ing a file that isn't accessed often, or to
2154+
mmap() files that _are_ accessed often. mmap() does have a cost, but
2155+
I'd wager that mmap()'ing the same file a second or third time from a
2156+
different process would be more efficient. The speedup of searching
2157+
through an mmap()'ed file may be worth it, however, to mmap() all
2158+
files if the system is under a tunable resource limit
2159+
(max_mmaped_bytes?).
2160+
2161+
If someone is so inclined or there's enough interest, I can reverse
2162+
this test case so that data is written to an mmap()'ed file, but the
2163+
same performance difference should hold true (assuming this isn't a
2164+
write to a tape drive ::grin::).
2165+
2166+
The URL for the program used to generate the above tests is at:
2167+
2168+
http://people.freebsd.org/~seanc/mmap_test/
2169+
2170+
2171+
Please ask if you have questions. -sc
2172+
2173+
--=20
2174+
Sean Chittenden
2175+
2176+
--KsGdsel6WgEHnImy
2177+
Content-Type: application/pgp-signature
2178+
Content-Disposition: inline
2179+
2180+
-----BEGIN PGP SIGNATURE-----
2181+
Comment: Sean Chittenden <sean@chittenden.org>
2182+
2183+
iD8DBQE+aDZc3ZnjH7yEs0ERAid6AJ9/TAYMUx2+ZcD2680OlKJBj5FzrACgquIG
2184+
PBNCzM0OegBXrPROJ/uIKDM=
2185+
=y7O6
2186+
-----END PGP SIGNATURE-----
2187+
2188+
--KsGdsel6WgEHnImy--
2189+
2190+
From pgsql-performance-owner+M1358=pgman=candle.pha.pa.us@postgresql.org Fri Mar 7 16:47:38 2003
2191+
Return-path: <pgsql-performance-owner+M1358=pgman=candle.pha.pa.us@postgresql.org>
2192+
Received: from relay2.pgsql.com (relay2.pgsql.com [64.49.215.143])
2193+
by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id h27LlX429809
2194+
for <pgman@candle.pha.pa.us>; Fri, 7 Mar 2003 16:47:35 -0500 (EST)
2195+
Received: from postgresql.org (postgresql.org [64.49.215.8])
2196+
by relay2.pgsql.com (Postfix) with ESMTP id D40CBEDFE05
2197+
for <pgman@candle.pha.pa.us>; Fri, 7 Mar 2003 16:47:32 -0500 (EST)
2198+
X-Original-To: pgsql-performance@postgresql.org
2199+
Received: from perrin.int.nxad.com (internal.ext.nxad.com [69.1.70.251])
2200+
by postgresql.org (Postfix) with ESMTP id 913B5474E44
2201+
for <pgsql-performance@postgresql.org>; Fri, 7 Mar 2003 16:46:50 -0500 (EST)
2202+
Received: by perrin.int.nxad.com (Postfix, from userid 1001)
2203+
id A55392105B; Fri, 7 Mar 2003 13:46:30 -0800 (PST)
2204+
Date: Fri, 7 Mar 2003 13:46:30 -0800
2205+
From: Sean Chittenden <sean@chittenden.org>
2206+
To: Tom Lane <tgl@sss.pgh.pa.us>
2207+
cc: Neil Conway <neilc@samurai.com>,
2208+
Christopher Kings-Lynne <chriskl@familyhealth.com.au>,
2209+
PostgreSQL Performance <pgsql-performance@postgresql.org>
2210+
Subject: Re: [PERFORM] [COMMITTERS] pgsql-server/ /configure /configure.in rc/incl ...
2211+
Message-ID: <20030307214630.GI79234@perrin.int.nxad.com>
2212+
References: <032f01c2e390$b1842b20$6500a8c0@fhp.internal> <11077.1046921667@sss.pgh.pa.us> <033f01c2e392$71476570$6500a8c0@fhp.internal> <12228.1046922471@sss.pgh.pa.us> <20030306094117.GA79234@perrin.int.nxad.com> <15071.1046964336@sss.pgh.pa.us> <20030307003640.GF79234@perrin.int.nxad.com> <1046998072.10527.67.camel@tokyo> <20030307060412.GA19138@perrin.int.nxad.com> <29933.1047047386@sss.pgh.pa.us>
2213+
MIME-Version: 1.0
2214+
Content-Type: multipart/signed; micalg=pgp-sha1;
2215+
protocol="application/pgp-signature"; boundary="TALVG7vV++YnpwZG"
2216+
Content-Disposition: inline
2217+
In-Reply-To: <29933.1047047386@sss.pgh.pa.us>
2218+
User-Agent: Mutt/1.4i
2219+
X-PGP-Key: finger seanc@FreeBSD.org
2220+
X-PGP-Fingerprint: 3849 3760 1AFE 7B17 11A0 83A6 DD99 E31F BC84 B341
2221+
X-Web-Homepage: http://sean.chittenden.org/
2222+
Precedence: bulk
2223+
Sender: pgsql-performance-owner@postgresql.org
2224+
Status: OR
2225+
2226+
--TALVG7vV++YnpwZG
2227+
Content-Type: text/plain; charset=us-ascii
2228+
Content-Disposition: inline
2229+
Content-Transfer-Encoding: quoted-printable
2230+
2231+
> > Absolutely! Is this a simple benchmark? Yup. Do I think it
2232+
> > simulates PostgreSQL? Eh, not particularly.
2233+
2234+
I think quite a few of these Q's would have been answered by reading
2235+
the code/Makefile....
2236+
2237+
> This would be on what OS?
2238+
2239+
FreeBSD, but it shouldn't matter. Any reasonably written VM should
2240+
have similar numbers (though BSD is generally regarded as having the
2241+
best VM, which, I think Linux poached not that long ago, iirc
2242+
::grimace::).
2243+
2244+
> What hardware?
2245+
2246+
My ultra-pathetic laptop with some fine - overly-noisy and can hardly
2247+
buildworld - IDE drives.
2248+
2249+
> What size test file?
2250+
2251+
In this case, only 72K. I've just updated the test program to use an
2252+
array of files though.
2253+
2254+
> Do the "iterations" mean so many reads of the entire file, or so
2255+
> many buffer-sized read requests?
2256+
2257+
In some cases, yes. With the file mmap()'ed, sorta. One of the test
2258+
cases (the one that did it in ~8s), mmap()'ed and munmap()'ed the file
2259+
every iteration and was twice as fast as the vanilla read() call.
2260+
2261+
> Did the mmap case actually *read* anything, or just map and unmap
2262+
> the file?
2263+
2264+
Nope, read it and wrote it out to stdout (which was redirected to
2265+
/dev/null).
2266+
2267+
> Also, what did you do to normalize for the effects of the test file
2268+
> being already in kernel disk cache after the first test?
2269+
2270+
That honestly doesn't matter too much since I wasn't testing the rate
2271+
of reading in files from my hard drive, only the OS's ability to
2272+
read/write pages of data around. In any case, I've updated my test
2273+
case to iterate through an array of files instead of just reading in a
2274+
copy of /etc/services. My laptop is generally a poor benchmark for
2275+
disk read performance given it takes 8hrs to buildworld, over 12hrs to
2276+
build mozilla, 18 for KDE, and about 48hrs for Open Office. :)
2277+
Someone with faster disks may want to try this and report back, but it
2278+
doesn't matter much in terms of relevancy for considering the benefits
2279+
of mmap(). The point is that there are calls that can be used that
2280+
substantially speed up read()'s and write()'s by allowing the VM to
2281+
align pages of data and give hints about its usage. For the sake of
2282+
argument re: the previously done tests, I'll reverse the order in
2283+
which I ran them and I bet dime to dollar that the times will be
2284+
identical.
2285+
2286+
% make =
2287+
~/open_source/mmap_test
2288+
cp -f /etc/services ./services
2289+
gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -DDEFAUL=
2290+
T_READSIZE=3D1 -DDO_MMAP=3D1 -DDO_MMAP_ONCE=3D1 -o mmap-test mmap-test.c
2291+
/usr/bin/time ./mmap-test > /dev/null
2292+
Beginning tests with file: services
2293+
2294+
Page size: 4096
2295+
File read size is the same as the file size
2296+
Number of iterations: 100000
2297+
Start time: 1047064672.276544
2298+
Time: 1.281477
2299+
2300+
Completed tests
2301+
1.29 real 0.10 user 0.92 sys
2302+
gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -DDEFAUL=
2303+
T_READSIZE=3D1 -DDO_MMAP=3D1 -o mmap-test mmap-test.c
2304+
/usr/bin/time ./mmap-test > /dev/null
2305+
Beginning tests with file: services
2306+
2307+
Page size: 4096
2308+
File read size is the same as the file size
2309+
Number of iterations: 100000
2310+
Start time: 1047064674.266191
2311+
Time: 7.486622
2312+
2313+
Completed tests
2314+
7.49 real 0.41 user 6.01 sys
2315+
gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -DDEFAUL=
2316+
T_READSIZE=3D1 -o mmap-test mmap-test.c
2317+
/usr/bin/time ./mmap-test > /dev/null
2318+
Beginning tests with file: services
2319+
2320+
Page size: 4096
2321+
File read size is default read size: 65536
2322+
Number of iterations: 100000
2323+
Start time: 1047064682.288637
2324+
Time: 19.35214
2325+
2326+
Completed tests
2327+
19.04 real 0.88 user 15.43 sys
2328+
gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -o mmap-=
2329+
test mmap-test.c
2330+
/usr/bin/time ./mmap-test > /dev/null
2331+
Beginning tests with file: services
2332+
2333+
Page size: 4096
2334+
File read size is the same as the file size
2335+
Number of iterations: 100000
2336+
Start time: 1047064701.867031
2337+
Time: 82.4294540875
2338+
2339+
Completed tests
2340+
81.57 real 2.10 user 69.55 sys
2341+
2342+
2343+
Here's the updated test that iterates through. Ooh! One better, the
2344+
files I've used are actual data files from ~pgsql. The new benchmark
2345+
iterates through the list of files and and calls bench() once for each
2346+
file and restarts at the first file after reaching the end of its
2347+
list (ARGV).
2348+
2349+
Whoa, if these tests are even close to real world, then we at the very
2350+
least should be mmap()'ing the file every time we read it (assuming
2351+
we're reading more than just a handful of bytes):
2352+
2353+
find /usr/local/pgsql/data -type f | /usr/bin/xargs /usr/bin/time ./mmap-te=
2354+
st > /dev/null
2355+
Page size: 4096
2356+
File read size is the same as the file size
2357+
Number of iterations: 100000
2358+
Start time: 1047071143.463360
2359+
Time: 12.109530
2360+
2361+
Completed tests
2362+
12.11 real 0.36 user 6.80 sys
2363+
2364+
find /usr/local/pgsql/data -type f | /usr/bin/xargs /usr/bin/time ./mmap-te=
2365+
st > /dev/null
2366+
Page size: 4096
2367+
File read size is default read size: 65536
2368+
Number of iterations: 100000
2369+
.... [been waiting here for >40min now....]
2370+
2371+
2372+
Ah well, if these tests finish this century, I'll post the results in
2373+
a bit, but it's pretty clearly a win. In terms of the data that I'm
2374+
copying, I'm copying ~700MB of data from my test DB on my laptop. I
2375+
only have 256MB of RAM so I can pretty much promise you that the data
2376+
isn't in my system buffers. If anyone else would like to run the
2377+
tests or look at the results, please check it out:
2378+
2379+
o1 and o2 should be the only targets used if FILES is bigger than the
2380+
RAM on the system. o3's by far and away the fastest, but only in rare
2381+
cases will a DBA have more RAM than data. But, as mentioned earlier,
2382+
the LRU cache could easily be modified to munmap() infrequently
2383+
accessed files to keep the size of mmap()'ed data down to a reasonable
2384+
level.
2385+
2386+
The updated test programs are at:
2387+
2388+
http://people.FreeBSD.org/~seanc/mmap_test/
2389+
2390+
-sc
2391+
2392+
--=20
2393+
Sean Chittenden
2394+
2395+
--TALVG7vV++YnpwZG
2396+
Content-Type: application/pgp-signature
2397+
Content-Disposition: inline
2398+
2399+
-----BEGIN PGP SIGNATURE-----
2400+
Comment: Sean Chittenden <sean@chittenden.org>
2401+
2402+
iD8DBQE+aRM23ZnjH7yEs0ERAoqhAKCFgmhpvNMqe9tucoFvK1H6J50z2QCeIZEI
2403+
mgBHwu/H1pe1sXIX9UG2V+I=
2404+
=cFRQ
2405+
-----END PGP SIGNATURE-----
2406+
2407+
--TALVG7vV++YnpwZG--
2408+

0 commit comments

Comments
 (0)