4
4
5
5
<abstract>
6
6
<para>
7
- A description of the database file default page format.
7
+ A description of the database file page format.
8
8
</para>
9
9
</abstract>
10
10
11
11
<para>
12
- This section provides an overview of the page format used by <productname>PostgreSQL</productname>
13
- tables. User-defined access methods need not use this page format.
12
+ This section provides an overview of the page format used by
13
+ <productname>PostgreSQL</productname> tables and indexes. (Index
14
+ access methods need not use this page format. At present, all index
15
+ methods do use this basic format, but the data kept on index metapages
16
+ usually doesn't follow the item layout rules exactly.) TOAST tables
17
+ and sequences are formatted just like a regular table.
14
18
</para>
15
19
16
20
<para>
17
21
In the following explanation, a
18
22
<firstterm>byte</firstterm>
19
23
is assumed to contain 8 bits. In addition, the term
20
24
<firstterm>item</firstterm>
21
- refers to data that is stored in <productname>PostgreSQL</productname> tables.
25
+ refers to an individual data value that is stored on a page. In a table,
26
+ an item is a tuple (row); in an index, an item is an index entry.
22
27
</para>
23
28
24
29
<para>
25
30
26
- <xref linkend="page-table"> shows how pages in both normal
27
- <productname>PostgreSQL</productname> tables and
28
- <productname>PostgreSQL</productname> indexes (e.g., a B-tree index)
29
- are structured. This structure is also used for toast tables and sequences.
31
+ <xref linkend="page-table"> shows the basic layout of a page.
30
32
There are five parts to each page.
31
33
32
34
</para>
48
50
49
51
<row>
50
52
<entry>PageHeaderData</entry>
51
- <entry>20 bytes long. Contains general information about the page to allow to access it.</entry>
53
+ <entry>20 bytes long. Contains general information about the page, including
54
+ free space pointers.</entry>
52
55
</row>
53
56
54
57
<row>
55
- <entry>itemPointerData </entry>
56
- <entry>List of (offset,length) pairs pointing to the actual item .</entry>
58
+ <entry>ItemPointerData </entry>
59
+ <entry>Array of (offset,length) pairs pointing to the actual items .</entry>
57
60
</row>
58
61
59
62
<row>
62
65
</row>
63
66
64
67
<row>
65
- <entry>items </entry>
66
- <entry>The actual items themselves. Different access method have different data here. </entry>
68
+ <entry>Items </entry>
69
+ <entry>The actual items themselves.</entry>
67
70
</row>
68
71
69
72
<row>
70
73
<entry>Special Space</entry>
71
- <entry>Access method specific data. Different method store different data. Unused by normal tables.</entry>
74
+ <entry>Index access method specific data. Different methods store different
75
+ data. Empty in ordinary tables.</entry>
72
76
</row>
73
77
74
78
</tbody>
78
82
<para>
79
83
80
84
The first 20 bytes of each page consists of a page header
81
- (PageHeaderData). It's format is detailed in <xref
85
+ (PageHeaderData). Its format is detailed in <xref
82
86
linkend="pageheaderdata-table">. The first two fields deal with WAL
83
87
related stuff. This is followed by three 2-byte integer fields
84
- (<firstterm>lower</firstterm>, <firstterm>upper</firstterm>, and
85
- <firstterm>special</firstterm>). These represent byte offsets to the start
88
+ (<structfield>pd_lower</structfield>, <structfield>pd_upper</structfield>,
89
+ and <structfield>pd_special</structfield>). These represent byte offsets to
90
+ the start
86
91
of unallocated space, to the end of unallocated space, and to the start of
87
92
the special space.
88
93
104
109
<row>
105
110
<entry>pd_lsn</entry>
106
111
<entry>XLogRecPtr</entry>
107
- <entry>6 bytes</entry>
112
+ <entry>8 bytes</entry>
108
113
<entry>LSN: next byte after last byte of xlog</entry>
109
114
</row>
110
115
<row>
@@ -132,68 +137,94 @@ Item
132
137
<entry>Offset to start of special space.</entry>
133
138
</row>
134
139
<row>
135
- <entry>pd_opaque </entry>
136
- <entry>OpaqueData </entry>
140
+ <entry>pd_pagesize_version </entry>
141
+ <entry>uint16 </entry>
137
142
<entry>2 bytes</entry>
138
- <entry>AM-generic information. Currently just stores the page size .</entry>
143
+ <entry>Page size and layout version number information .</entry>
139
144
</row>
140
145
</tbody>
141
146
</tgroup>
142
147
</table>
143
148
149
+ <para>
150
+ All the details may be found in src/include/storage/bufpage.h.
151
+ </para>
152
+
144
153
<para>
145
154
Special space is a region at the end of the page that is allocated at page
146
155
initialization time and contains information specific to an access method.
147
- The last 2 bytes of the page header, <firstterm>opaque</firstterm>,
148
- currently only stores the page size. Page size is stored in each page
149
- because frames in the buffer pool may be subdivided into equal sized pages
150
- on a frame by frame basis within a table (is this true? - mvo).
151
-
156
+ The last 2 bytes of the page header,
157
+ <structfield>pd_pagesize_version</structfield>, store both the page size
158
+ and a version indicator. Beginning with
159
+ <productname>PostgreSQL</productname> 7.3 the version number is 1; prior
160
+ releases used version number 0. (The basic page layout and header format
161
+ has not changed, but the layout of heap tuple headers has.) The page size
162
+ is basically only present as a cross-check; there is no support for having
163
+ more than one page size in an installation.
152
164
</para>
153
165
154
166
<para>
155
167
156
168
Following the page header are item identifiers
157
- (<firstterm>ItemIdData</firstterm>). New item identifiers are allocated
158
- from the first four bytes of unallocated space. Because an item
159
- identifier is never moved until it is freed, its index may be used to
160
- indicate the location of an item on a page. In fact, every pointer to an
161
- item (<firstterm>ItemPointer</firstterm>, also know as
162
- <firstterm>CTID</firstterm>) created by
163
- <productname>PostgreSQL</productname> consists of a frame number and an
164
- index of an item identifier. An item identifier contains a byte-offset to
169
+ (<type>ItemIdData</type>), each requiring four bytes.
170
+ An item identifier contains a byte-offset to
165
171
the start of an item, its length in bytes, and a set of attribute bits
166
172
which affect its interpretation.
173
+ New item identifiers are allocated
174
+ as needed from the beginning of the unallocated space.
175
+ The number of item identifiers present can be determined by looking at
176
+ <structfield>pd_lower</>, which is increased to allocate a new identifier.
177
+ Because an item
178
+ identifier is never moved until it is freed, its index may be used on a
179
+ long-term basis to reference an item, even when the item itself is moved
180
+ around on the page to compact free space. In fact, every pointer to an
181
+ item (<type>ItemPointer</type>, also known as
182
+ <type>CTID</type>) created by
183
+ <productname>PostgreSQL</productname> consists of a page number and the
184
+ index of an item identifier.
167
185
168
186
</para>
169
187
170
188
<para>
171
189
172
190
The items themselves are stored in space allocated backwards from the end
173
191
of unallocated space. The exact structure varies depending on what the
174
- table is to contain. Sequences and tables both use a structure named
175
- <firstterm >HeapTupleHeaderData</firstterm >, describe below.
192
+ table is to contain. Tables and sequences both use a structure named
193
+ <type >HeapTupleHeaderData</type >, described below.
176
194
177
195
</para>
178
196
179
197
<para>
180
198
181
199
The final section is the "special section" which may contain anything the
182
200
access method wishes to store. Ordinary tables do not use this at all
183
- (indicated by setting the offset to the pagesize).
201
+ (indicated by setting <structfield>pd_special</> to equal the pagesize).
184
202
185
203
</para>
186
204
187
205
<para>
188
206
189
- All tuples are structured the same way. A header of around 31 bytes
190
- followed by an optional null bitmask and the data. The header is detailed
191
- below in <xref linkend="heaptupleheaderdata-table">. The null bitmask is
192
- only present if the <firstterm>HEAP_HASNULL</firstterm> bit is set in the
193
- <firstterm>t_infomask</firstterm>. If it is present it takes up the space
194
- between the end of the header and the beginning of the data, as indicated
195
- by the <firstterm>t_hoff</firstterm> field. In this list of bits, a 1 bit
196
- indicates not-null, a 0 bit is a null.
207
+ All table tuples are structured the same way. There is a fixed-size
208
+ header (occupying 23 bytes on most machines), followed by an optional null
209
+ bitmap, an optional object ID field, and the user data. The header is
210
+ detailed
211
+ in <xref linkend="heaptupleheaderdata-table">. The actual user data
212
+ (fields of the tuple) begins at the offset indicated by
213
+ <structfield>t_hoff</>, which must always be a multiple of the MAXALIGN
214
+ distance for the platform.
215
+ The null bitmap is
216
+ only present if the <firstterm>HEAP_HASNULL</firstterm> bit is set in
217
+ <structfield>t_infomask</structfield>. If it is present it begins just after
218
+ the fixed header and occupies enough bytes to have one bit per data column
219
+ (that is, <structfield>t_natts</> bits altogether). In this list of bits, a
220
+ 1 bit indicates not-null, a 0 bit is a null. When the bitmap is not
221
+ present, all columns are assumed not-null.
222
+ The object ID is only present if the <firstterm>HEAP_HASOID</firstterm> bit
223
+ is set in <structfield>t_infomask</structfield>. If present, it appears just
224
+ before the <structfield>t_hoff</> boundary. Any padding needed to make
225
+ <structfield>t_hoff</> a MAXALIGN multiple will appear between the null
226
+ bitmap and the object ID. (This in turn ensures that the object ID is
227
+ suitably aligned.)
197
228
198
229
</para>
199
230
@@ -211,34 +242,34 @@ Item
211
242
</thead>
212
243
<tbody>
213
244
<row>
214
- <entry>t_oid </entry>
215
- <entry>Oid </entry>
245
+ <entry>t_xmin </entry>
246
+ <entry>TransactionId </entry>
216
247
<entry>4 bytes</entry>
217
- <entry>OID of this tuple </entry>
248
+ <entry>insert XID stamp </entry>
218
249
</row>
219
250
<row>
220
251
<entry>t_cmin</entry>
221
252
<entry>CommandId</entry>
222
253
<entry>4 bytes</entry>
223
- <entry>insert CID stamp</entry>
254
+ <entry>insert CID stamp (overlays with t_xmax) </entry>
224
255
</row>
225
256
<row>
226
- <entry>t_cmax </entry>
227
- <entry>CommandId </entry>
257
+ <entry>t_xmax </entry>
258
+ <entry>TransactionId </entry>
228
259
<entry>4 bytes</entry>
229
- <entry>delete CID stamp</entry>
260
+ <entry>delete XID stamp</entry>
230
261
</row>
231
262
<row>
232
- <entry>t_xmin </entry>
233
- <entry>TransactionId </entry>
263
+ <entry>t_cmax </entry>
264
+ <entry>CommandId </entry>
234
265
<entry>4 bytes</entry>
235
- <entry>insert XID stamp</entry>
266
+ <entry>delete CID stamp (overlays with t_xvac) </entry>
236
267
</row>
237
268
<row>
238
- <entry>t_xmax </entry>
269
+ <entry>t_xvac </entry>
239
270
<entry>TransactionId</entry>
240
271
<entry>4 bytes</entry>
241
- <entry>delete XID stamp </entry>
272
+ <entry>XID for VACUUM operation moving tuple </entry>
242
273
</row>
243
274
<row>
244
275
<entry>t_ctid</entry>
@@ -256,30 +287,28 @@ Item
256
287
<entry>t_infomask</entry>
257
288
<entry>uint16</entry>
258
289
<entry>2 bytes</entry>
259
- <entry>Various flags</entry>
290
+ <entry>various flags</entry>
260
291
</row>
261
292
<row>
262
293
<entry>t_hoff</entry>
263
294
<entry>uint8</entry>
264
295
<entry>1 byte</entry>
265
- <entry>length of tuple header. Also offset of data. </entry>
296
+ <entry>offset to user data</entry>
266
297
</row>
267
298
</tbody>
268
299
</tgroup>
269
300
</table>
270
301
271
302
<para>
272
-
273
- All the details may be found in src/include/storage/bufpage.h.
274
-
303
+ All the details may be found in src/include/access/htup.h.
275
304
</para>
276
305
277
306
<para>
278
307
279
308
Interpreting the actual data can only be done with information obtained
280
309
from other tables, mostly <firstterm>pg_attribute</firstterm>. The
281
- particular fields are <firstterm >attlen</firstterm > and
282
- <firstterm >attalign</firstterm >. There is no way to directly get a
310
+ particular fields are <structfield >attlen</structfield > and
311
+ <structfield >attalign</structfield >. There is no way to directly get a
283
312
particular attribute, except when there are only fixed width fields and no
284
313
NULLs. All this trickery is wrapped up in the functions
285
314
<firstterm>heap_getattr</firstterm>, <firstterm>fastgetattr</firstterm>
293
322
the next. Then make sure you have the right alignment. If the field is a
294
323
fixed width field, then all the bytes are simply placed. If it's a
295
324
variable length field (attlen == -1) then it's a bit more complicated,
296
- using the variable length structure <firstterm >varattrib</firstterm >.
325
+ using the variable length structure <type >varattrib</type >.
297
326
Depending on the flags, the data may be either inline, compressed or in
298
327
another table (TOAST).
299
328
0 commit comments