Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 641a5b7

Browse files
committed
doc: improve build for non-Latin1 characters
Add README.non-ASCII to explain non-ASCII doc behavior; some text moved from release.sgml. Change UTF8 SGML characters to use HTML entities. Remove unnecessary UTF8 spaces. Add SVG file check for check-nbsp target. Add dummy 'pdf' Makefile target. Reported-by: Yugo Nagata Discussion: https://postgr.es/m/20241011114122.c90f8a871462da36f2e2afeb@sraoss.co.jp Backpatch-through: master
1 parent fc7dded commit 641a5b7

File tree

6 files changed

+56
-36
lines changed

6 files changed

+56
-36
lines changed

doc/src/sgml/Makefile

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ GENERATED_SGML = version.sgml \
5959
features-supported.sgml features-unsupported.sgml errcodes-table.sgml \
6060
keywords-table.sgml targets-meson.sgml wait_event_types.sgml
6161

62-
ALLSGML := $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml) $(GENERATED_SGML)
62+
ALL_SGML := $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml) $(GENERATED_SGML)
6363

6464
ALL_IMAGES := $(wildcard $(srcdir)/images/*.svg)
6565

@@ -68,7 +68,7 @@ ALL_IMAGES := $(wildcard $(srcdir)/images/*.svg)
6868
# we're at it, also resolve all entities (that is, copy all included
6969
# files into one big file). This helps tools that don't understand
7070
# vpath builds (such as dbtoepub).
71-
postgres-full.xml: postgres.sgml $(ALLSGML)
71+
postgres-full.xml: postgres.sgml $(ALL_SGML)
7272
$(XMLLINT) $(XMLINCLUDE) --output $@ --noent --valid $<
7373

7474

@@ -143,11 +143,12 @@ postgres.txt: postgres.html
143143
## Print
144144
##
145145

146-
postgres.pdf:
146+
postgres.pdf pdf:
147147
$(error Invalid target; use postgres-A4.pdf or postgres-US.pdf as targets)
148148

149149
XSLTPROC_FO_FLAGS += --stringparam img.src.path '$(srcdir)/'
150150

151+
# XSL Formatting Objects (FO), https://en.wikipedia.org/wiki/XSL_Formatting_Objects
151152
%-A4.fo: stylesheet-fo.xsl %-full.xml
152153
$(XSLTPROC) $(XMLINCLUDE) $(XSLTPROCFLAGS) $(XSLTPROC_FO_FLAGS) --stringparam paper.type A4 -o $@ $^
153154

@@ -194,7 +195,7 @@ MAKEINFO = makeinfo
194195
##
195196

196197
# Quick syntax check without style processing
197-
check: postgres.sgml $(ALLSGML) check-tabs check-nbsp
198+
check: postgres.sgml $(ALL_SGML) check-tabs check-nbsp
198199
$(XMLLINT) $(XMLINCLUDE) --noout --valid $<
199200

200201

@@ -264,7 +265,7 @@ check-tabs:
264265
# Use perl command because non-GNU grep or sed could not have hex escape sequence.
265266
check-nbsp:
266267
@ ( $(PERL) -ne '/\xC2\xA0/ and print("$$ARGV:$$_"),$$n++; END {exit($$n>0)}' \
267-
$(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || \
268+
$(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/images/*.svg $(srcdir)/*.xsl $(srcdir)/images/*.xsl) ) || \
268269
(echo "Non-breaking spaces appear in SGML/XML files" 1>&2; exit 1)
269270

270271
##

doc/src/sgml/README.non-ASCII

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
<!-- doc/src/sgml/README.non-ASCII -->
2+
3+
Representation of non-ASCII characters
4+
--------------------------------------
5+
6+
Find non-ASCII characters using:
7+
8+
grep --recursive --color='auto' -P '[\x80-\xFF]' .
9+
10+
Convert to HTML4 named entity (&) escapes
11+
-----------------------------------------
12+
13+
We support several output formats:
14+
15+
* html (supports all Unicode characters)
16+
* man (supports all Unicode characters)
17+
* pdf (supports only Latin-1 characters)
18+
* info
19+
20+
While some output formatting tools support all Unicode characters,
21+
others only support Latin-1 characters. Specifically, the PDF rendering
22+
engine can only display Latin-1 characters; non-Latin-1 Unicode
23+
characters are displayed as "###".
24+
25+
Therefore, in the SGML files, we only use Latin-1 characters. We
26+
typically encode these characters as HTML entities, e.g., &Aacute;lvaro.
27+
It is also possible to safely represent Latin-1 characters in UTF8
28+
encoding for all output formats.
29+
30+
Do not use UTF numeric character escapes (&#nnn;).
31+
32+
HTML entities
33+
official: http://www.w3.org/TR/html4/sgml/entities.html
34+
one page: http://www.zipcon.net/~swhite/docs/computers/browsers/entities_page.html
35+
other lists: http://www.zipcon.net/~swhite/docs/computers/browsers/entities.html
36+
http://www.zipcon.net/~swhite/docs/computers/browsers/entities_page.html
37+
https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

doc/src/sgml/charset.sgml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1225,7 +1225,7 @@ CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-tr
12251225
<programlisting>
12261226
-- ignore differences in accents and case
12271227
CREATE COLLATION ignore_accent_case (provider = icu, deterministic = false, locale = 'und-u-ks-level1');
1228-
SELECT 'Å' = 'A' COLLATE ignore_accent_case; -- true
1228+
SELECT '&Aring;' = 'A' COLLATE ignore_accent_case; -- true
12291229
SELECT 'z' = 'Z' COLLATE ignore_accent_case; -- true
12301230

12311231
-- upper case letters sort before lower case.
@@ -1282,7 +1282,7 @@ SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
12821282
<entry><literal>'ab' = U&amp;'a\2063b'</literal></entry>
12831283
<entry><literal>'x-y' = 'x_y'</literal></entry>
12841284
<entry><literal>'g' = 'G'</literal></entry>
1285-
<entry><literal>'n' = 'ñ'</literal></entry>
1285+
<entry><literal>'n' = '&ntilde;'</literal></entry>
12861286
<entry><literal>'y' = 'z'</literal></entry>
12871287
</row>
12881288
</thead>
@@ -1346,7 +1346,7 @@ SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
13461346

13471347
<para>
13481348
At every level, even with full normalization off, basic normalization is
1349-
performed. For example, <literal>'á'</literal> may be composed of the
1349+
performed. For example, <literal>'&aacute;'</literal> may be composed of the
13501350
code points <literal>U&amp;'\0061\0301'</literal> or the single code
13511351
point <literal>U&amp;'\00E1'</literal>, and those sequences will be
13521352
considered equal even at the <literal>identic</literal> level. To treat
@@ -1430,8 +1430,8 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
14301430
<entry><literal>false</literal></entry>
14311431
<entry>
14321432
Backwards comparison for the level 2 differences. For example,
1433-
locale <literal>und-u-kb</literal> sorts <literal>'àe'</literal>
1434-
before <literal>''</literal>.
1433+
locale <literal>und-u-kb</literal> sorts <literal>'&agrave;e'</literal>
1434+
before <literal>'a&eacute;'</literal>.
14351435
</entry>
14361436
</row>
14371437

doc/src/sgml/images/genetic-algorithm.svg

Lines changed: 2 additions & 2 deletions
Loading

doc/src/sgml/release.sgml

Lines changed: 0 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -16,24 +16,6 @@ pg_[A-Za-z0-9_]+ <application>, <structname>
1616
\<[a-z]+_[a-z_]+\> <varname>, <structfield>
1717
<systemitem class="osname">
1818

19-
non-ASCII characters find using grep -P '[\x80-\xFF]' or
20-
(remove 'X') grep -X-color='auto' -P -n "[\x80-\xFF]"
21-
convert to HTML4 named entity (&) escapes
22-
23-
official: http://www.w3.org/TR/html4/sgml/entities.html
24-
one page: http://www.zipcon.net/~swhite/docs/computers/browsers/entities_page.html
25-
other lists: http://www.zipcon.net/~swhite/docs/computers/browsers/entities.html
26-
http://www.zipcon.net/~swhite/docs/computers/browsers/entities_page.html
27-
https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
28-
29-
We cannot use UTF8 because rendering engines have to
30-
support the referenced characters.
31-
32-
Do not use numeric _UTF_ numeric character escapes (&#nnn;),
33-
we can only use Latin1.
34-
35-
Example: Alvaro Herrera is &Aacute;lvaro Herrera
36-
3719
wrap long lines
3820

3921
For new features, add links to the documentation sections.

doc/src/sgml/stylesheet-man.xsl

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -213,12 +213,12 @@
213213
<!-- Slight rephrasing to indicate that missing sections are found
214214
in the documentation. -->
215215
<l:context name="xref-number-and-title">
216-
<l:template name="chapter" text="Chapter %n, %t, in the documentation"/>
217-
<l:template name="sect1" text="Section %n, “%t”, in the documentation"/>
218-
<l:template name="sect2" text="Section %n, “%t”, in the documentation"/>
219-
<l:template name="sect3" text="Section %n, “%t”, in the documentation"/>
220-
<l:template name="sect4" text="Section %n, “%t”, in the documentation"/>
221-
<l:template name="sect5" text="Section %n, “%t”, in the documentation"/>
216+
<l:template name="chapter" text="Chapter %n, &quot;%t&quot;, in the documentation"/>
217+
<l:template name="sect1" text="Section %n, &quot;%t&quot;, in the documentation"/>
218+
<l:template name="sect2" text="Section %n, &quot;%t&quot;, in the documentation"/>
219+
<l:template name="sect3" text="Section %n, &quot;%t&quot;, in the documentation"/>
220+
<l:template name="sect4" text="Section %n, &quot;%t&quot;, in the documentation"/>
221+
<l:template name="sect5" text="Section %n, &quot;%t&quot;, in the documentation"/>
222222
</l:context>
223223
</l:l10n>
224224
</l:i18n>

0 commit comments

Comments
 (0)