Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit c7a165a

Browse files
committed
Code review for HeapTupleHeader changes. Add version number to page headers
(overlaying low byte of page size) and add HEAP_HASOID bit to t_infomask, per earlier discussion. Simplify scheme for overlaying fields in tuple header (no need for cmax to live in more than one place). Don't try to clear infomask status bits in tqual.c --- not safe to do it there. Don't try to force output table of a SELECT INTO to have OIDs, either. Get rid of unnecessarily complex three-state scheme for TupleDesc.tdhasoids, which has already caused one recent failure. Improve documentation.
1 parent fcd34f9 commit c7a165a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+506
-554
lines changed

contrib/fulltextindex/fti.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -190,7 +190,7 @@ fti(PG_FUNCTION_ARGS)
190190
tupdesc = rel->rd_att; /* what the tuple looks like (?) */
191191

192192
/* get oid of current tuple, needed by all, so place here */
193-
oid = rel->rd_rel->relhasoids ? HeapTupleGetOid(rettuple) : InvalidOid;
193+
oid = HeapTupleGetOid(rettuple);
194194
if (!OidIsValid(oid))
195195
elog(ERROR, "Full Text Indexing: Oid of current tuple is invalid");
196196

contrib/rserv/rserv.c

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -102,10 +102,7 @@ _rserv_log_()
102102

103103
if (keynum == ObjectIdAttributeNumber)
104104
{
105-
snprintf(oidbuf, "%u", 64,
106-
rel->rd_rel->relhasoids
107-
? HeapTupleGetOid(tuple)
108-
: InvalidOid);
105+
snprintf(oidbuf, "%u", sizeof(oidbuf), HeapTupleGetOid(tuple));
109106
key = oidbuf;
110107
}
111108
else

contrib/tablefunc/tablefunc.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -614,7 +614,7 @@ make_crosstab_tupledesc(TupleDesc spi_tupdesc, int num_catagories)
614614
* spi result input column.
615615
*/
616616
natts = num_catagories + 1;
617-
tupdesc = CreateTemplateTupleDesc(natts, WITHOUTOID);
617+
tupdesc = CreateTemplateTupleDesc(natts, false);
618618

619619
/* first the rowname column */
620620
attnum = 1;

doc/src/sgml/page.sgml

Lines changed: 94 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -4,29 +4,31 @@
44

55
<abstract>
66
<para>
7-
A description of the database file default page format.
7+
A description of the database file page format.
88
</para>
99
</abstract>
1010

1111
<para>
12-
This section provides an overview of the page format used by <productname>PostgreSQL</productname>
13-
tables. User-defined access methods need not use this page format.
12+
This section provides an overview of the page format used by
13+
<productname>PostgreSQL</productname> tables and indexes. (Index
14+
access methods need not use this page format. At present, all index
15+
methods do use this basic format, but the data kept on index metapages
16+
usually doesn't follow the item layout rules exactly.) TOAST tables
17+
and sequences are formatted just like a regular table.
1418
</para>
1519

1620
<para>
1721
In the following explanation, a
1822
<firstterm>byte</firstterm>
1923
is assumed to contain 8 bits. In addition, the term
2024
<firstterm>item</firstterm>
21-
refers to data that is stored in <productname>PostgreSQL</productname> tables.
25+
refers to an individual data value that is stored on a page. In a table,
26+
an item is a tuple (row); in an index, an item is an index entry.
2227
</para>
2328

2429
<para>
2530

26-
<xref linkend="page-table"> shows how pages in both normal
27-
<productname>PostgreSQL</productname> tables and
28-
<productname>PostgreSQL</productname> indexes (e.g., a B-tree index)
29-
are structured. This structure is also used for toast tables and sequences.
31+
<xref linkend="page-table"> shows the basic layout of a page.
3032
There are five parts to each page.
3133

3234
</para>
@@ -48,12 +50,13 @@ Item
4850

4951
<row>
5052
<entry>PageHeaderData</entry>
51-
<entry>20 bytes long. Contains general information about the page to allow to access it.</entry>
53+
<entry>20 bytes long. Contains general information about the page, including
54+
free space pointers.</entry>
5255
</row>
5356

5457
<row>
55-
<entry>itemPointerData</entry>
56-
<entry>List of (offset,length) pairs pointing to the actual item.</entry>
58+
<entry>ItemPointerData</entry>
59+
<entry>Array of (offset,length) pairs pointing to the actual items.</entry>
5760
</row>
5861

5962
<row>
@@ -62,13 +65,14 @@ Item
6265
</row>
6366

6467
<row>
65-
<entry>items</entry>
66-
<entry>The actual items themselves. Different access method have different data here.</entry>
68+
<entry>Items</entry>
69+
<entry>The actual items themselves.</entry>
6770
</row>
6871

6972
<row>
7073
<entry>Special Space</entry>
71-
<entry>Access method specific data. Different method store different data. Unused by normal tables.</entry>
74+
<entry>Index access method specific data. Different methods store different
75+
data. Empty in ordinary tables.</entry>
7276
</row>
7377

7478
</tbody>
@@ -78,11 +82,12 @@ Item
7882
<para>
7983

8084
The first 20 bytes of each page consists of a page header
81-
(PageHeaderData). It's format is detailed in <xref
85+
(PageHeaderData). Its format is detailed in <xref
8286
linkend="pageheaderdata-table">. The first two fields deal with WAL
8387
related stuff. This is followed by three 2-byte integer fields
84-
(<firstterm>lower</firstterm>, <firstterm>upper</firstterm>, and
85-
<firstterm>special</firstterm>). These represent byte offsets to the start
88+
(<structfield>pd_lower</structfield>, <structfield>pd_upper</structfield>,
89+
and <structfield>pd_special</structfield>). These represent byte offsets to
90+
the start
8691
of unallocated space, to the end of unallocated space, and to the start of
8792
the special space.
8893

@@ -104,7 +109,7 @@ Item
104109
<row>
105110
<entry>pd_lsn</entry>
106111
<entry>XLogRecPtr</entry>
107-
<entry>6 bytes</entry>
112+
<entry>8 bytes</entry>
108113
<entry>LSN: next byte after last byte of xlog</entry>
109114
</row>
110115
<row>
@@ -132,68 +137,94 @@ Item
132137
<entry>Offset to start of special space.</entry>
133138
</row>
134139
<row>
135-
<entry>pd_opaque</entry>
136-
<entry>OpaqueData</entry>
140+
<entry>pd_pagesize_version</entry>
141+
<entry>uint16</entry>
137142
<entry>2 bytes</entry>
138-
<entry>AM-generic information. Currently just stores the page size.</entry>
143+
<entry>Page size and layout version number information.</entry>
139144
</row>
140145
</tbody>
141146
</tgroup>
142147
</table>
143148

149+
<para>
150+
All the details may be found in src/include/storage/bufpage.h.
151+
</para>
152+
144153
<para>
145154
Special space is a region at the end of the page that is allocated at page
146155
initialization time and contains information specific to an access method.
147-
The last 2 bytes of the page header, <firstterm>opaque</firstterm>,
148-
currently only stores the page size. Page size is stored in each page
149-
because frames in the buffer pool may be subdivided into equal sized pages
150-
on a frame by frame basis within a table (is this true? - mvo).
151-
156+
The last 2 bytes of the page header,
157+
<structfield>pd_pagesize_version</structfield>, store both the page size
158+
and a version indicator. Beginning with
159+
<productname>PostgreSQL</productname> 7.3 the version number is 1; prior
160+
releases used version number 0. (The basic page layout and header format
161+
has not changed, but the layout of heap tuple headers has.) The page size
162+
is basically only present as a cross-check; there is no support for having
163+
more than one page size in an installation.
152164
</para>
153165

154166
<para>
155167

156168
Following the page header are item identifiers
157-
(<firstterm>ItemIdData</firstterm>). New item identifiers are allocated
158-
from the first four bytes of unallocated space. Because an item
159-
identifier is never moved until it is freed, its index may be used to
160-
indicate the location of an item on a page. In fact, every pointer to an
161-
item (<firstterm>ItemPointer</firstterm>, also know as
162-
<firstterm>CTID</firstterm>) created by
163-
<productname>PostgreSQL</productname> consists of a frame number and an
164-
index of an item identifier. An item identifier contains a byte-offset to
169+
(<type>ItemIdData</type>), each requiring four bytes.
170+
An item identifier contains a byte-offset to
165171
the start of an item, its length in bytes, and a set of attribute bits
166172
which affect its interpretation.
173+
New item identifiers are allocated
174+
as needed from the beginning of the unallocated space.
175+
The number of item identifiers present can be determined by looking at
176+
<structfield>pd_lower</>, which is increased to allocate a new identifier.
177+
Because an item
178+
identifier is never moved until it is freed, its index may be used on a
179+
long-term basis to reference an item, even when the item itself is moved
180+
around on the page to compact free space. In fact, every pointer to an
181+
item (<type>ItemPointer</type>, also known as
182+
<type>CTID</type>) created by
183+
<productname>PostgreSQL</productname> consists of a page number and the
184+
index of an item identifier.
167185

168186
</para>
169187

170188
<para>
171189

172190
The items themselves are stored in space allocated backwards from the end
173191
of unallocated space. The exact structure varies depending on what the
174-
table is to contain. Sequences and tables both use a structure named
175-
<firstterm>HeapTupleHeaderData</firstterm>, describe below.
192+
table is to contain. Tables and sequences both use a structure named
193+
<type>HeapTupleHeaderData</type>, described below.
176194

177195
</para>
178196

179197
<para>
180198

181199
The final section is the "special section" which may contain anything the
182200
access method wishes to store. Ordinary tables do not use this at all
183-
(indicated by setting the offset to the pagesize).
201+
(indicated by setting <structfield>pd_special</> to equal the pagesize).
184202

185203
</para>
186204

187205
<para>
188206

189-
All tuples are structured the same way. A header of around 31 bytes
190-
followed by an optional null bitmask and the data. The header is detailed
191-
below in <xref linkend="heaptupleheaderdata-table">. The null bitmask is
192-
only present if the <firstterm>HEAP_HASNULL</firstterm> bit is set in the
193-
<firstterm>t_infomask</firstterm>. If it is present it takes up the space
194-
between the end of the header and the beginning of the data, as indicated
195-
by the <firstterm>t_hoff</firstterm> field. In this list of bits, a 1 bit
196-
indicates not-null, a 0 bit is a null.
207+
All table tuples are structured the same way. There is a fixed-size
208+
header (occupying 23 bytes on most machines), followed by an optional null
209+
bitmap, an optional object ID field, and the user data. The header is
210+
detailed
211+
in <xref linkend="heaptupleheaderdata-table">. The actual user data
212+
(fields of the tuple) begins at the offset indicated by
213+
<structfield>t_hoff</>, which must always be a multiple of the MAXALIGN
214+
distance for the platform.
215+
The null bitmap is
216+
only present if the <firstterm>HEAP_HASNULL</firstterm> bit is set in
217+
<structfield>t_infomask</structfield>. If it is present it begins just after
218+
the fixed header and occupies enough bytes to have one bit per data column
219+
(that is, <structfield>t_natts</> bits altogether). In this list of bits, a
220+
1 bit indicates not-null, a 0 bit is a null. When the bitmap is not
221+
present, all columns are assumed not-null.
222+
The object ID is only present if the <firstterm>HEAP_HASOID</firstterm> bit
223+
is set in <structfield>t_infomask</structfield>. If present, it appears just
224+
before the <structfield>t_hoff</> boundary. Any padding needed to make
225+
<structfield>t_hoff</> a MAXALIGN multiple will appear between the null
226+
bitmap and the object ID. (This in turn ensures that the object ID is
227+
suitably aligned.)
197228

198229
</para>
199230

@@ -211,34 +242,34 @@ Item
211242
</thead>
212243
<tbody>
213244
<row>
214-
<entry>t_oid</entry>
215-
<entry>Oid</entry>
245+
<entry>t_xmin</entry>
246+
<entry>TransactionId</entry>
216247
<entry>4 bytes</entry>
217-
<entry>OID of this tuple</entry>
248+
<entry>insert XID stamp</entry>
218249
</row>
219250
<row>
220251
<entry>t_cmin</entry>
221252
<entry>CommandId</entry>
222253
<entry>4 bytes</entry>
223-
<entry>insert CID stamp</entry>
254+
<entry>insert CID stamp (overlays with t_xmax)</entry>
224255
</row>
225256
<row>
226-
<entry>t_cmax</entry>
227-
<entry>CommandId</entry>
257+
<entry>t_xmax</entry>
258+
<entry>TransactionId</entry>
228259
<entry>4 bytes</entry>
229-
<entry>delete CID stamp</entry>
260+
<entry>delete XID stamp</entry>
230261
</row>
231262
<row>
232-
<entry>t_xmin</entry>
233-
<entry>TransactionId</entry>
263+
<entry>t_cmax</entry>
264+
<entry>CommandId</entry>
234265
<entry>4 bytes</entry>
235-
<entry>insert XID stamp</entry>
266+
<entry>delete CID stamp (overlays with t_xvac)</entry>
236267
</row>
237268
<row>
238-
<entry>t_xmax</entry>
269+
<entry>t_xvac</entry>
239270
<entry>TransactionId</entry>
240271
<entry>4 bytes</entry>
241-
<entry>delete XID stamp</entry>
272+
<entry>XID for VACUUM operation moving tuple</entry>
242273
</row>
243274
<row>
244275
<entry>t_ctid</entry>
@@ -256,30 +287,28 @@ Item
256287
<entry>t_infomask</entry>
257288
<entry>uint16</entry>
258289
<entry>2 bytes</entry>
259-
<entry>Various flags</entry>
290+
<entry>various flags</entry>
260291
</row>
261292
<row>
262293
<entry>t_hoff</entry>
263294
<entry>uint8</entry>
264295
<entry>1 byte</entry>
265-
<entry>length of tuple header. Also offset of data.</entry>
296+
<entry>offset to user data</entry>
266297
</row>
267298
</tbody>
268299
</tgroup>
269300
</table>
270301

271302
<para>
272-
273-
All the details may be found in src/include/storage/bufpage.h.
274-
303+
All the details may be found in src/include/access/htup.h.
275304
</para>
276305

277306
<para>
278307

279308
Interpreting the actual data can only be done with information obtained
280309
from other tables, mostly <firstterm>pg_attribute</firstterm>. The
281-
particular fields are <firstterm>attlen</firstterm> and
282-
<firstterm>attalign</firstterm>. There is no way to directly get a
310+
particular fields are <structfield>attlen</structfield> and
311+
<structfield>attalign</structfield>. There is no way to directly get a
283312
particular attribute, except when there are only fixed width fields and no
284313
NULLs. All this trickery is wrapped up in the functions
285314
<firstterm>heap_getattr</firstterm>, <firstterm>fastgetattr</firstterm>
@@ -293,7 +322,7 @@ Item
293322
the next. Then make sure you have the right alignment. If the field is a
294323
fixed width field, then all the bytes are simply placed. If it's a
295324
variable length field (attlen == -1) then it's a bit more complicated,
296-
using the variable length structure <firstterm>varattrib</firstterm>.
325+
using the variable length structure <type>varattrib</type>.
297326
Depending on the flags, the data may be either inline, compressed or in
298327
another table (TOAST).
299328

0 commit comments

Comments
 (0)