@@ -2014,3 +2014,395 @@ KwvG7YLsJ+xpsTUS67KD+4M=
2014
2014
2015
2015
--HjNkcEWJ4DMx36DP--
2016
2016
2017
+ From pgsql-performance-owner+M1354=pgman=candle.pha.pa.us@postgresql.org Fri Mar 7 01:09:07 2003
2018
+ Return-path: <pgsql-performance-owner+M1354=pgman=candle.pha.pa.us@postgresql.org>
2019
+ Received: from relay2.pgsql.com (relay2.pgsql.com [64.49.215.143])
2020
+ by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id h27693604295
2021
+ for <pgman@candle.pha.pa.us>; Fri, 7 Mar 2003 01:09:05 -0500 (EST)
2022
+ Received: from postgresql.org (postgresql.org [64.49.215.8])
2023
+ by relay2.pgsql.com (Postfix) with ESMTP id 95CD2EDFD3B
2024
+ for <pgman@candle.pha.pa.us>; Fri, 7 Mar 2003 01:09:03 -0500 (EST)
2025
+ X-Original-To: pgsql-performance@postgresql.org
2026
+ Received: from perrin.int.nxad.com (internal.ext.nxad.com [69.1.70.251])
2027
+ by postgresql.org (Postfix) with ESMTP id F16034768E2
2028
+ for <pgsql-performance@postgresql.org>; Fri, 7 Mar 2003 01:04:33 -0500 (EST)
2029
+ Received: by perrin.int.nxad.com (Postfix, from userid 1001)
2030
+ id 7969A21065; Thu, 6 Mar 2003 22:04:12 -0800 (PST)
2031
+ Date: Thu, 6 Mar 2003 22:04:12 -0800
2032
+ From: Sean Chittenden <sean@chittenden.org>
2033
+ To: Neil Conway <neilc@samurai.com>
2034
+ cc: Tom Lane <tgl@sss.pgh.pa.us>,
2035
+ Christopher Kings-Lynne <chriskl@familyhealth.com.au>,
2036
+ PostgreSQL Performance <pgsql-performance@postgresql.org>
2037
+ Subject: Re: [PERFORM] [COMMITTERS] pgsql-server/ /configure /configure.in rc/incl ...
2038
+ Message-ID: <20030307060412.GA19138@perrin.int.nxad.com>
2039
+ References: <20030306031656.1876F4762E0@postgresql.org> <032f01c2e390$b1842b20$6500a8c0@fhp.internal> <11077.1046921667@sss.pgh.pa.us> <033f01c2e392$71476570$6500a8c0@fhp.internal> <12228.1046922471@sss.pgh.pa.us> <20030306094117.GA79234@perrin.int.nxad.com> <15071.1046964336@sss.pgh.pa.us> <20030307003640.GF79234@perrin.int.nxad.com> <1046998072.10527.67.camel@tokyo>
2040
+ MIME-Version: 1.0
2041
+ Content-Type: multipart/signed; micalg=pgp-sha1;
2042
+ protocol="application/pgp-signature"; boundary="KsGdsel6WgEHnImy"
2043
+ Content-Disposition: inline
2044
+ In-Reply-To: <1046998072.10527.67.camel@tokyo>
2045
+ User-Agent: Mutt/1.4i
2046
+ X-PGP-Key: finger seanc@FreeBSD.org
2047
+ X-PGP-Fingerprint: 3849 3760 1AFE 7B17 11A0 83A6 DD99 E31F BC84 B341
2048
+ X-Web-Homepage: http://sean.chittenden.org/
2049
+ Precedence: bulk
2050
+ Sender: pgsql-performance-owner@postgresql.org
2051
+ Status: OR
2052
+
2053
+ --KsGdsel6WgEHnImy
2054
+ Content-Type: text/plain; charset=us-ascii
2055
+ Content-Disposition: inline
2056
+ Content-Transfer-Encoding: quoted-printable
2057
+
2058
+ > > I don't have my copy of Steven's handy (it's some 700mi away atm
2059
+ > > otherwise I'd cite it), but if Tom or someone else has it handy, look
2060
+ > > up the example re: the performance gain from read()'ing an mmap()'ed
2061
+ > > file versus a non-mmap()'ed file. The difference is non-trivial and
2062
+ > > _WELL_ worth the time given the speed increase.
2063
+ >=20
2064
+ > Can anyone confirm this? If so, one easy step we could take in this
2065
+ > direction would be adapting COPY FROM to use mmap().
2066
+
2067
+ Weeee! Alright, so I got to have some fun writing out some simple
2068
+ tests with mmap() and friends tonight. Are the results interesting?
2069
+ Absolutely! Is this a simple benchmark? Yup. Do I think it
2070
+ simulates PostgreSQL? Eh, not particularly. Does it demonstrate that
2071
+ mmap() is a win and something worth implementing? I sure hope so. Is
2072
+ this a test program to demonstrate the ideal use of mmap() in
2073
+ PostgreSQL? No. Is it a place to start a factual discussion? I hope
2074
+ so.
2075
+
2076
+ I have here four tests that are conditionalized by cpp.
2077
+
2078
+ # The first one uses read() and write() but with the buffer size set
2079
+ # to the same size as the file.
2080
+ gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -o test-=
2081
+ mmap test-mmap.c
2082
+ /usr/bin/time ./test-mmap > /dev/null
2083
+ Beginning tests with file: services
2084
+
2085
+ Page size: 4096
2086
+ File read size is the same as the file size
2087
+ Number of iterations: 100000
2088
+ Start time: 1047013002.412516
2089
+ Time: 82.88178
2090
+
2091
+ Completed tests
2092
+ 82.09 real 2.13 user 68.98 sys
2093
+
2094
+ # The second one uses read() and write() with the default buffer size:
2095
+ # 65536
2096
+ gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -DDEFAUL=
2097
+ T_READSIZE=3D1 -o test-mmap test-mmap.c
2098
+ /usr/bin/time ./test-mmap > /dev/null
2099
+ Beginning tests with file: services
2100
+
2101
+ Page size: 4096
2102
+ File read size is default read size: 65536
2103
+ Number of iterations: 100000
2104
+ Start time: 1047013085.16204
2105
+ Time: 18.155511
2106
+
2107
+ Completed tests
2108
+ 18.16 real 0.90 user 14.79 sys
2109
+ # Please note this is significantly faster, but that's expected
2110
+
2111
+ # The third test uses mmap() + madvise() + write()
2112
+ gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -DDEFAUL=
2113
+ T_READSIZE=3D1 -DDO_MMAP=3D1 -o test-mmap test-mmap.c
2114
+ /usr/bin/time ./test-mmap > /dev/null
2115
+ Beginning tests with file: services
2116
+
2117
+ Page size: 4096
2118
+ File read size is the same as the file size
2119
+ Number of iterations: 100000
2120
+ Start time: 1047013103.859818
2121
+ Time: 8.4294203644
2122
+
2123
+ Completed tests
2124
+ 7.24 real 0.41 user 5.92 sys
2125
+ # Faster still, and twice as fast as the normal read() case
2126
+
2127
+ # The last test only calls mmap()'s once when the file is opened and
2128
+ # only msync()'s, munmap()'s, close()'s the file once at exit.
2129
+ gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -DDEFAUL=
2130
+ T_READSIZE=3D1 -DDO_MMAP=3D1 -DDO_MMAP_ONCE=3D1 -o test-mmap test-mmap.c
2131
+ /usr/bin/time ./test-mmap > /dev/null
2132
+ Beginning tests with file: services
2133
+
2134
+ Page size: 4096
2135
+ File read size is the same as the file size
2136
+ Number of iterations: 100000
2137
+ Start time: 1047013111.623712
2138
+ Time: 1.174076
2139
+
2140
+ Completed tests
2141
+ 1.18 real 0.09 user 0.92 sys
2142
+ # Substantially faster
2143
+
2144
+
2145
+ Obviously this isn't perfect, but reading and writing data is faster
2146
+ (specifically moving pages through the VM/OS). Doing partial writes
2147
+ from mmap()'ed data should be faster along with scanning through
2148
+ mmap()'ed portions of - or completely mmap()'ed - files because the
2149
+ pages are already loaded in the VM. PostgreSQL's LRU file descriptor
2150
+ cache could easily be adjusted to add mmap()'ing of frequently
2151
+ accessed files (specifically, system catalogs come to mind). It's not
2152
+ hard to figure out how often particular files are accessed and to
2153
+ either _avoid_ mmap()'ing a file that isn't accessed often, or to
2154
+ mmap() files that _are_ accessed often. mmap() does have a cost, but
2155
+ I'd wager that mmap()'ing the same file a second or third time from a
2156
+ different process would be more efficient. The speedup of searching
2157
+ through an mmap()'ed file may be worth it, however, to mmap() all
2158
+ files if the system is under a tunable resource limit
2159
+ (max_mmaped_bytes?).
2160
+
2161
+ If someone is so inclined or there's enough interest, I can reverse
2162
+ this test case so that data is written to an mmap()'ed file, but the
2163
+ same performance difference should hold true (assuming this isn't a
2164
+ write to a tape drive ::grin::).
2165
+
2166
+ The URL for the program used to generate the above tests is at:
2167
+
2168
+ http://people.freebsd.org/~seanc/mmap_test/
2169
+
2170
+
2171
+ Please ask if you have questions. -sc
2172
+
2173
+ --=20
2174
+ Sean Chittenden
2175
+
2176
+ --KsGdsel6WgEHnImy
2177
+ Content-Type: application/pgp-signature
2178
+ Content-Disposition: inline
2179
+
2180
+ -----BEGIN PGP SIGNATURE-----
2181
+ Comment: Sean Chittenden <sean@chittenden.org>
2182
+
2183
+ iD8DBQE+aDZc3ZnjH7yEs0ERAid6AJ9/TAYMUx2+ZcD2680OlKJBj5FzrACgquIG
2184
+ PBNCzM0OegBXrPROJ/uIKDM=
2185
+ =y7O6
2186
+ -----END PGP SIGNATURE-----
2187
+
2188
+ --KsGdsel6WgEHnImy--
2189
+
2190
+ From pgsql-performance-owner+M1358=pgman=candle.pha.pa.us@postgresql.org Fri Mar 7 16:47:38 2003
2191
+ Return-path: <pgsql-performance-owner+M1358=pgman=candle.pha.pa.us@postgresql.org>
2192
+ Received: from relay2.pgsql.com (relay2.pgsql.com [64.49.215.143])
2193
+ by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id h27LlX429809
2194
+ for <pgman@candle.pha.pa.us>; Fri, 7 Mar 2003 16:47:35 -0500 (EST)
2195
+ Received: from postgresql.org (postgresql.org [64.49.215.8])
2196
+ by relay2.pgsql.com (Postfix) with ESMTP id D40CBEDFE05
2197
+ for <pgman@candle.pha.pa.us>; Fri, 7 Mar 2003 16:47:32 -0500 (EST)
2198
+ X-Original-To: pgsql-performance@postgresql.org
2199
+ Received: from perrin.int.nxad.com (internal.ext.nxad.com [69.1.70.251])
2200
+ by postgresql.org (Postfix) with ESMTP id 913B5474E44
2201
+ for <pgsql-performance@postgresql.org>; Fri, 7 Mar 2003 16:46:50 -0500 (EST)
2202
+ Received: by perrin.int.nxad.com (Postfix, from userid 1001)
2203
+ id A55392105B; Fri, 7 Mar 2003 13:46:30 -0800 (PST)
2204
+ Date: Fri, 7 Mar 2003 13:46:30 -0800
2205
+ From: Sean Chittenden <sean@chittenden.org>
2206
+ To: Tom Lane <tgl@sss.pgh.pa.us>
2207
+ cc: Neil Conway <neilc@samurai.com>,
2208
+ Christopher Kings-Lynne <chriskl@familyhealth.com.au>,
2209
+ PostgreSQL Performance <pgsql-performance@postgresql.org>
2210
+ Subject: Re: [PERFORM] [COMMITTERS] pgsql-server/ /configure /configure.in rc/incl ...
2211
+ Message-ID: <20030307214630.GI79234@perrin.int.nxad.com>
2212
+ References: <032f01c2e390$b1842b20$6500a8c0@fhp.internal> <11077.1046921667@sss.pgh.pa.us> <033f01c2e392$71476570$6500a8c0@fhp.internal> <12228.1046922471@sss.pgh.pa.us> <20030306094117.GA79234@perrin.int.nxad.com> <15071.1046964336@sss.pgh.pa.us> <20030307003640.GF79234@perrin.int.nxad.com> <1046998072.10527.67.camel@tokyo> <20030307060412.GA19138@perrin.int.nxad.com> <29933.1047047386@sss.pgh.pa.us>
2213
+ MIME-Version: 1.0
2214
+ Content-Type: multipart/signed; micalg=pgp-sha1;
2215
+ protocol="application/pgp-signature"; boundary="TALVG7vV++YnpwZG"
2216
+ Content-Disposition: inline
2217
+ In-Reply-To: <29933.1047047386@sss.pgh.pa.us>
2218
+ User-Agent: Mutt/1.4i
2219
+ X-PGP-Key: finger seanc@FreeBSD.org
2220
+ X-PGP-Fingerprint: 3849 3760 1AFE 7B17 11A0 83A6 DD99 E31F BC84 B341
2221
+ X-Web-Homepage: http://sean.chittenden.org/
2222
+ Precedence: bulk
2223
+ Sender: pgsql-performance-owner@postgresql.org
2224
+ Status: OR
2225
+
2226
+ --TALVG7vV++YnpwZG
2227
+ Content-Type: text/plain; charset=us-ascii
2228
+ Content-Disposition: inline
2229
+ Content-Transfer-Encoding: quoted-printable
2230
+
2231
+ > > Absolutely! Is this a simple benchmark? Yup. Do I think it
2232
+ > > simulates PostgreSQL? Eh, not particularly.
2233
+
2234
+ I think quite a few of these Q's would have been answered by reading
2235
+ the code/Makefile....
2236
+
2237
+ > This would be on what OS?
2238
+
2239
+ FreeBSD, but it shouldn't matter. Any reasonably written VM should
2240
+ have similar numbers (though BSD is generally regarded as having the
2241
+ best VM, which, I think Linux poached not that long ago, iirc
2242
+ ::grimace::).
2243
+
2244
+ > What hardware?
2245
+
2246
+ My ultra-pathetic laptop with some fine - overly-noisy and can hardly
2247
+ buildworld - IDE drives.
2248
+
2249
+ > What size test file?
2250
+
2251
+ In this case, only 72K. I've just updated the test program to use an
2252
+ array of files though.
2253
+
2254
+ > Do the "iterations" mean so many reads of the entire file, or so
2255
+ > many buffer-sized read requests?
2256
+
2257
+ In some cases, yes. With the file mmap()'ed, sorta. One of the test
2258
+ cases (the one that did it in ~8s), mmap()'ed and munmap()'ed the file
2259
+ every iteration and was twice as fast as the vanilla read() call.
2260
+
2261
+ > Did the mmap case actually *read* anything, or just map and unmap
2262
+ > the file?
2263
+
2264
+ Nope, read it and wrote it out to stdout (which was redirected to
2265
+ /dev/null).
2266
+
2267
+ > Also, what did you do to normalize for the effects of the test file
2268
+ > being already in kernel disk cache after the first test?
2269
+
2270
+ That honestly doesn't matter too much since I wasn't testing the rate
2271
+ of reading in files from my hard drive, only the OS's ability to
2272
+ read/write pages of data around. In any case, I've updated my test
2273
+ case to iterate through an array of files instead of just reading in a
2274
+ copy of /etc/services. My laptop is generally a poor benchmark for
2275
+ disk read performance given it takes 8hrs to buildworld, over 12hrs to
2276
+ build mozilla, 18 for KDE, and about 48hrs for Open Office. :)
2277
+ Someone with faster disks may want to try this and report back, but it
2278
+ doesn't matter much in terms of relevancy for considering the benefits
2279
+ of mmap(). The point is that there are calls that can be used that
2280
+ substantially speed up read()'s and write()'s by allowing the VM to
2281
+ align pages of data and give hints about its usage. For the sake of
2282
+ argument re: the previously done tests, I'll reverse the order in
2283
+ which I ran them and I bet dime to dollar that the times will be
2284
+ identical.
2285
+
2286
+ % make =
2287
+ ~/open_source/mmap_test
2288
+ cp -f /etc/services ./services
2289
+ gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -DDEFAUL=
2290
+ T_READSIZE=3D1 -DDO_MMAP=3D1 -DDO_MMAP_ONCE=3D1 -o mmap-test mmap-test.c
2291
+ /usr/bin/time ./mmap-test > /dev/null
2292
+ Beginning tests with file: services
2293
+
2294
+ Page size: 4096
2295
+ File read size is the same as the file size
2296
+ Number of iterations: 100000
2297
+ Start time: 1047064672.276544
2298
+ Time: 1.281477
2299
+
2300
+ Completed tests
2301
+ 1.29 real 0.10 user 0.92 sys
2302
+ gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -DDEFAUL=
2303
+ T_READSIZE=3D1 -DDO_MMAP=3D1 -o mmap-test mmap-test.c
2304
+ /usr/bin/time ./mmap-test > /dev/null
2305
+ Beginning tests with file: services
2306
+
2307
+ Page size: 4096
2308
+ File read size is the same as the file size
2309
+ Number of iterations: 100000
2310
+ Start time: 1047064674.266191
2311
+ Time: 7.486622
2312
+
2313
+ Completed tests
2314
+ 7.49 real 0.41 user 6.01 sys
2315
+ gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -DDEFAUL=
2316
+ T_READSIZE=3D1 -o mmap-test mmap-test.c
2317
+ /usr/bin/time ./mmap-test > /dev/null
2318
+ Beginning tests with file: services
2319
+
2320
+ Page size: 4096
2321
+ File read size is default read size: 65536
2322
+ Number of iterations: 100000
2323
+ Start time: 1047064682.288637
2324
+ Time: 19.35214
2325
+
2326
+ Completed tests
2327
+ 19.04 real 0.88 user 15.43 sys
2328
+ gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -o mmap-=
2329
+ test mmap-test.c
2330
+ /usr/bin/time ./mmap-test > /dev/null
2331
+ Beginning tests with file: services
2332
+
2333
+ Page size: 4096
2334
+ File read size is the same as the file size
2335
+ Number of iterations: 100000
2336
+ Start time: 1047064701.867031
2337
+ Time: 82.4294540875
2338
+
2339
+ Completed tests
2340
+ 81.57 real 2.10 user 69.55 sys
2341
+
2342
+
2343
+ Here's the updated test that iterates through. Ooh! One better, the
2344
+ files I've used are actual data files from ~pgsql. The new benchmark
2345
+ iterates through the list of files and and calls bench() once for each
2346
+ file and restarts at the first file after reaching the end of its
2347
+ list (ARGV).
2348
+
2349
+ Whoa, if these tests are even close to real world, then we at the very
2350
+ least should be mmap()'ing the file every time we read it (assuming
2351
+ we're reading more than just a handful of bytes):
2352
+
2353
+ find /usr/local/pgsql/data -type f | /usr/bin/xargs /usr/bin/time ./mmap-te=
2354
+ st > /dev/null
2355
+ Page size: 4096
2356
+ File read size is the same as the file size
2357
+ Number of iterations: 100000
2358
+ Start time: 1047071143.463360
2359
+ Time: 12.109530
2360
+
2361
+ Completed tests
2362
+ 12.11 real 0.36 user 6.80 sys
2363
+
2364
+ find /usr/local/pgsql/data -type f | /usr/bin/xargs /usr/bin/time ./mmap-te=
2365
+ st > /dev/null
2366
+ Page size: 4096
2367
+ File read size is default read size: 65536
2368
+ Number of iterations: 100000
2369
+ .... [been waiting here for >40min now....]
2370
+
2371
+
2372
+ Ah well, if these tests finish this century, I'll post the results in
2373
+ a bit, but it's pretty clearly a win. In terms of the data that I'm
2374
+ copying, I'm copying ~700MB of data from my test DB on my laptop. I
2375
+ only have 256MB of RAM so I can pretty much promise you that the data
2376
+ isn't in my system buffers. If anyone else would like to run the
2377
+ tests or look at the results, please check it out:
2378
+
2379
+ o1 and o2 should be the only targets used if FILES is bigger than the
2380
+ RAM on the system. o3's by far and away the fastest, but only in rare
2381
+ cases will a DBA have more RAM than data. But, as mentioned earlier,
2382
+ the LRU cache could easily be modified to munmap() infrequently
2383
+ accessed files to keep the size of mmap()'ed data down to a reasonable
2384
+ level.
2385
+
2386
+ The updated test programs are at:
2387
+
2388
+ http://people.FreeBSD.org/~seanc/mmap_test/
2389
+
2390
+ -sc
2391
+
2392
+ --=20
2393
+ Sean Chittenden
2394
+
2395
+ --TALVG7vV++YnpwZG
2396
+ Content-Type: application/pgp-signature
2397
+ Content-Disposition: inline
2398
+
2399
+ -----BEGIN PGP SIGNATURE-----
2400
+ Comment: Sean Chittenden <sean@chittenden.org>
2401
+
2402
+ iD8DBQE+aRM23ZnjH7yEs0ERAoqhAKCFgmhpvNMqe9tucoFvK1H6J50z2QCeIZEI
2403
+ mgBHwu/H1pe1sXIX9UG2V+I=
2404
+ =cFRQ
2405
+ -----END PGP SIGNATURE-----
2406
+
2407
+ --TALVG7vV++YnpwZG--
2408
+
0 commit comments