Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 7c360d6

Browse files
Jan WieckJan Wieck
Jan Wieck
authored and
Jan Wieck
committed
Added documentation for the new interface between the buffer manager
and the cache replacement strategy as well as a description of the ARC algorithm and the special tailoring of that done for PostgreSQL. Jan
1 parent 320a138 commit 7c360d6

File tree

1 file changed

+153
-1
lines changed
  • src/backend/storage/buffer

1 file changed

+153
-1
lines changed

src/backend/storage/buffer/README

Lines changed: 153 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
$Header: /cvsroot/pgsql/src/backend/storage/buffer/README,v 1.4 2003/10/31 22:48:08 tgl Exp $
1+
$Header: /cvsroot/pgsql/src/backend/storage/buffer/README,v 1.5 2003/11/14 04:32:11 wieck Exp $
22

33
Notes about shared buffer access rules
44
--------------------------------------
@@ -95,3 +95,155 @@ concurrent VACUUM. The current implementation only supports a single
9595
waiter for pin-count-1 on any particular shared buffer. This is enough
9696
for VACUUM's use, since we don't allow multiple VACUUMs concurrently on a
9797
single relation anyway.
98+
99+
100+
Buffer replacement strategy interface:
101+
102+
The two files freelist.c and buf_table.c contain the buffer cache
103+
replacement strategy. The interface to the strategy is:
104+
105+
BufferDesc *
106+
StrategyBufferLookup(BufferTag *tagPtr, bool recheck)
107+
108+
This is allways the first call made by the buffer manager
109+
to check if a disk page is in memory. If so, the function
110+
returns the buffer descriptor and no further action is
111+
required.
112+
113+
If the page is not in memory, StrategyBufferLookup()
114+
returns NULL.
115+
116+
The flag recheck tells the strategy that this is a second
117+
lookup after flushing a dirty block. If the buffer manager
118+
has to evict another buffer, he will release the bufmgr lock
119+
while doing the write IO. During this time, another backend
120+
could possibly fault in the same page this backend is after,
121+
so we have to check again after the IO is done if the page
122+
is in memory now.
123+
124+
BufferDesc *
125+
StrategyGetBuffer(void)
126+
127+
The buffer manager calls this function to get an unpinned
128+
cache buffer who's content can be evicted. The returned
129+
buffer might be empty, clean or dirty.
130+
131+
The returned buffer is only a cadidate for replacement.
132+
It is possible that while the buffer is written, another
133+
backend finds and modifies it, so that it is dirty again.
134+
The buffer manager will then call StrategyGetBuffer()
135+
again to ask for another candidate.
136+
137+
void
138+
StrategyReplaceBuffer(BufferDesc *buf, Relation rnode,
139+
BlockNumber blockNum)
140+
141+
Called by the buffer manager at the time it is about to
142+
change the association of a buffer with a disk page.
143+
144+
Before this call, StrategyBufferLookup() still has to find
145+
the buffer even if it was returned by StrategyGetBuffer()
146+
as a candidate for replacement.
147+
148+
After this call, this buffer must be returned for a
149+
lookup of the new page identified by rnode and blockNum.
150+
151+
void
152+
StrategyInvalidateBuffer(BufferDesc *buf)
153+
154+
Called from various parts to inform that the content of
155+
this buffer has been thrown away. This happens for example
156+
in the case of dropping a relation.
157+
158+
The buffer must be clean and unpinned on call.
159+
160+
If the buffer associated with a disk page, StrategyBufferLookup()
161+
must not return it for this page after the call.
162+
163+
void
164+
StrategyHintVacuum(bool vacuum_active)
165+
166+
Because vacuum reads all relations of the entire database
167+
through the buffer manager, it can greatly disturb the
168+
buffer replacement strategy. This function is used by vacuum
169+
to inform that all subsequent buffer lookups are caused
170+
by vacuum scanning relations.
171+
172+
173+
Buffer replacement strategy:
174+
175+
The buffer replacement strategy actually used in freelist.c is a
176+
version of the Adaptive Replacement Cache (ARC) special tailored for
177+
PostgreSQL.
178+
179+
The algorithm works as follows:
180+
181+
C is the size of the cache in number of pages (conf: shared_buffers)
182+
ARC uses 2*C Cache Directory Blocks (CDB). A cache directory block
183+
is allwayt associated with one unique file page and "can" point to
184+
one shared buffer.
185+
186+
All file pages known in by the directory are managed in 4 LRU lists
187+
named B1, T1, T2 and B2. The T1 and T2 lists are the "real" cache
188+
entries, linking a file page to a memory buffer where the page is
189+
currently cached. Consequently T1len+T2len <= C. B1 and B2 are
190+
ghost cache directories that extend T1 and T2 so that the strategy
191+
remembers pages longer. The strategy tries to keep B1len+T1len and
192+
B2len+T2len both at C. T1len and T2 len vary over the runtime
193+
depending on the lookup pattern and its resulting cache hits. The
194+
desired size of T1len is called T1target.
195+
196+
Assuming we have a full cache, one of 5 cases happens on a lookup:
197+
198+
MISS On a cache miss, depending on T1target and the actual T1len
199+
the LRU buffer of T1 or T2 is evicted. Its CDB is removed
200+
from the T list and added as MRU of the corresponding B list.
201+
The now free buffer is replaced with the requested page
202+
and added as MRU of T1.
203+
204+
T1 hit The T1 CDB is moved to the MRU position of the T2 list.
205+
206+
T2 hit The T2 CDB is moved to the MRU position of the T2 list.
207+
208+
B1 hit This means that a buffer that was evicted from the T1
209+
list is now requested again, indicating that T1target is
210+
too small (otherwise it would still be in T1 and thus in
211+
memory). The strategy raises T1target, evicts a buffer
212+
depending on T1target and T1len and places the CDB at
213+
MRU of T2.
214+
215+
B2 hit This means the opposite of B1, the T2 list is probably too
216+
small. So the strategy lowers T1target, evicts a buffer
217+
and places the CDB at MRU of T2.
218+
219+
Thus, every page that is found on lookup in any of the four lists
220+
ends up as the MRU of the T2 list. The T2 list therefore is the
221+
"frequency" cache, holding frequently requested pages.
222+
223+
Every page that is seen for the first time ends up as the MRU of
224+
the T1 list. The T1 list is the "recency" cache, holding recent
225+
newcomers.
226+
227+
The tailoring done for PostgreSQL has to do with the way, the
228+
query executor works. A typical UPDATE or DELETE first scans the
229+
relation, searching for the tuples and then calls heap_update() or
230+
heap_delete(). This causes at least 2 lookups for the block in the
231+
same statement. In the case of multiple matches in one block even
232+
more often. As a result, every block touched in an UPDATE or DELETE
233+
would directly jump into the T2 cache, which is wrong. To prevent
234+
this the strategy remembers which transaction added a buffer to the
235+
T1 list and will not promote it from there into the T2 cache during
236+
the same transaction.
237+
238+
Another specialty is the change of the strategy during VACUUM.
239+
Lookups during VACUUM do not represent application needs, so it
240+
would be wrong to change the cache balance T1target due to that
241+
or to cause massive cache evictions. Therefore, a page read in to
242+
satisfy vacuum (not those that actually cause a hit on any list)
243+
is placed at the LRU position of the T1 list, for immediate
244+
reuse. Since Vacuum usually requests many pages very fast, the
245+
natural side effect of this is that it will get back the very
246+
buffers it filled and possibly modified on the next call and will
247+
therefore do it's work in a few shared memory buffers, while using
248+
whatever it finds in the cache already.
249+

0 commit comments

Comments
 (0)