Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit bdb839c

Browse files
committed
Update unicode.org URLs
Use https, consistent host name, remove references to ftp. Also update the URLs for CLDR, which has moved from Trac to GitHub.
1 parent 9abb2bf commit bdb839c

File tree

9 files changed

+31
-31
lines changed

9 files changed

+31
-31
lines changed

contrib/unaccent/generate_unaccent_rules.py

+8-8
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,9 @@
2424
# Latin-ASCII.xml, the latest data sets released can be browsed directly
2525
# via [3]. Note that this script is compatible with at least release 29.
2626
#
27-
# [1] http://unicode.org/Public/8.0.0/ucd/UnicodeData.txt
28-
# [2] http://unicode.org/cldr/trac/export/14746/tags/release-34/common/transforms/Latin-ASCII.xml
29-
# [3] https://unicode.org/cldr/trac/browser/tags
27+
# [1] https://www.unicode.org/Public/8.0.0/ucd/UnicodeData.txt
28+
# [2] https://raw.githubusercontent.com/unicode-org/cldr/release-34/common/transforms/Latin-ASCII.xml
29+
# [3] https://github.com/unicode-org/cldr/tags
3030

3131
# BEGIN: Python 2/3 compatibility - remove when Python 2 compatibility dropped
3232
# The approach is to be Python3 compatible with Python2 "backports".
@@ -113,7 +113,7 @@ def is_mark(codepoint):
113113

114114
def is_letter_with_marks(codepoint, table):
115115
"""Returns true for letters combined with one or more marks."""
116-
# See http://www.unicode.org/reports/tr44/tr44-14.html#General_Category_Values
116+
# See https://www.unicode.org/reports/tr44/tr44-14.html#General_Category_Values
117117

118118
# Letter may have no combining characters, in which case it has
119119
# no marks.
@@ -226,7 +226,7 @@ def special_cases():
226226
return charactersSet
227227

228228
def main(args):
229-
# http://www.unicode.org/reports/tr44/tr44-14.html#Character_Decomposition_Mappings
229+
# https://www.unicode.org/reports/tr44/tr44-14.html#Character_Decomposition_Mappings
230230
decomposition_type_pattern = re.compile(" *<[^>]*> *")
231231

232232
table = {}
@@ -243,7 +243,7 @@ def main(args):
243243
for line in unicodeDataFile:
244244
fields = line.split(";")
245245
if len(fields) > 5:
246-
# http://www.unicode.org/reports/tr44/tr44-14.html#UnicodeData.txt
246+
# https://www.unicode.org/reports/tr44/tr44-14.html#UnicodeData.txt
247247
general_category = fields[2]
248248
decomposition = fields[5]
249249
decomposition = re.sub(decomposition_type_pattern, ' ', decomposition)
@@ -281,8 +281,8 @@ def main(args):
281281

282282
if __name__ == "__main__":
283283
parser = argparse.ArgumentParser(description='This script builds unaccent.rules on standard output when given the contents of UnicodeData.txt and Latin-ASCII.xml given as arguments.')
284-
parser.add_argument("--unicode-data-file", help="Path to formatted text file corresponding to UnicodeData.txt. See <http://unicode.org/Public/8.0.0/ucd/UnicodeData.txt>.", type=str, required=True, dest='unicodeDataFilePath')
285-
parser.add_argument("--latin-ascii-file", help="Path to XML file from Unicode Common Locale Data Repository (CLDR) corresponding to Latin-ASCII transliterator (Latin-ASCII.xml). See <http://unicode.org/cldr/trac/export/12304/tags/release-28/common/transforms/Latin-ASCII.xml>.", type=str, dest='latinAsciiFilePath')
284+
parser.add_argument("--unicode-data-file", help="Path to formatted text file corresponding to UnicodeData.txt.", type=str, required=True, dest='unicodeDataFilePath')
285+
parser.add_argument("--latin-ascii-file", help="Path to XML file from Unicode Common Locale Data Repository (CLDR) corresponding to Latin-ASCII transliterator (Latin-ASCII.xml).", type=str, dest='latinAsciiFilePath')
286286
parser.add_argument("--no-ligatures-expansion", help="Do not expand ligatures and do not use Unicode CLDR Latin-ASCII transliterator. By default, this option is not enabled and \"--latin-ascii-file\" argument is required. If this option is enabled, \"--latin-ascii-file\" argument is optional and ignored.", action="store_true", dest='noLigaturesExpansion')
287287
args = parser.parse_args()
288288

doc/src/sgml/acronyms.sgml

+1-1
Original file line numberDiff line numberDiff line change
@@ -728,7 +728,7 @@
728728
<term><acronym>UTF</acronym></term>
729729
<listitem>
730730
<para>
731-
<ulink url="http://www.unicode.org/">Unicode Transformation
731+
<ulink url="https://www.unicode.org/">Unicode Transformation
732732
Format</ulink>
733733
</para>
734734
</listitem>

doc/src/sgml/charset.sgml

+4-4
Original file line numberDiff line numberDiff line change
@@ -832,12 +832,12 @@ CREATE COLLATION german (provider = libc, locale = 'de_DE');
832832
</varlistentry>
833833
</variablelist>
834834

835-
See <ulink url="http://unicode.org/reports/tr35/tr35-collation.html">Unicode
835+
See <ulink url="https://www.unicode.org/reports/tr35/tr35-collation.html">Unicode
836836
Technical Standard #35</ulink>
837837
and <ulink url="https://tools.ietf.org/html/bcp47">BCP 47</ulink> for
838838
details. The list of possible collation types (<literal>co</literal>
839839
subtag) can be found in
840-
the <ulink url="http://www.unicode.org/repos/cldr/trunk/common/bcp47/collation.xml">CLDR
840+
the <ulink url="https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml">CLDR
841841
repository</ulink>.
842842
The <ulink url="https://ssl.icu-project.org/icu-bin/locexp">ICU Locale
843843
Explorer</ulink> can be used to check the details of a particular locale
@@ -900,7 +900,7 @@ CREATE COLLATION french FROM "fr-x-icu";
900900
different Unicode normal forms. It is up to the collation provider to
901901
actually implement such insensitive comparisons; the deterministic flag
902902
only determines whether ties are to be broken using bytewise comparison.
903-
See also <ulink url="https://unicode.org/reports/tr10">Unicode Technical
903+
See also <ulink url="https://www.unicode.org/reports/tr10">Unicode Technical
904904
Standard 10</ulink> for more information on the terminology.
905905
</para>
906906

@@ -1926,7 +1926,7 @@ RESET client_encoding;
19261926
</varlistentry>
19271927

19281928
<varlistentry>
1929-
<term><ulink url="http://www.unicode.org/"></ulink></term>
1929+
<term><ulink url="https://www.unicode.org/"></ulink></term>
19301930

19311931
<listitem>
19321932
<para>

src/backend/utils/mb/Unicode/Makefile

+7-7
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ DOWNLOAD = wget -O $@ --no-use-server-timestamps
119119
#DOWNLOAD = curl -o $@
120120

121121
BIG5.TXT CNS11643.TXT:
122-
$(DOWNLOAD) http://ftp.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/$(@F)
122+
$(DOWNLOAD) https://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/$(@F)
123123

124124
euc-jis-2004-std.txt sjis-0213-2004-std.txt:
125125
$(DOWNLOAD) http://x0213.org/codetable/$(@F)
@@ -131,19 +131,19 @@ GB2312.TXT:
131131
$(DOWNLOAD) 'http://trac.greenstone.org/browser/trunk/gsdl/unicode/MAPPINGS/EASTASIA/GB/GB2312.TXT?rev=1842&format=txt'
132132

133133
JIS0212.TXT:
134-
$(DOWNLOAD) http://ftp.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/$(@F)
134+
$(DOWNLOAD) https://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/$(@F)
135135

136136
JOHAB.TXT KSX1001.TXT:
137-
$(DOWNLOAD) http://ftp.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/$(@F)
137+
$(DOWNLOAD) https://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/$(@F)
138138

139139
KOI8-R.TXT KOI8-U.TXT:
140-
$(DOWNLOAD) http://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/$(@F)
140+
$(DOWNLOAD) https://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/$(@F)
141141

142142
$(ISO8859TEXTS):
143-
$(DOWNLOAD) http://ftp.unicode.org/Public/MAPPINGS/ISO8859/$(@F)
143+
$(DOWNLOAD) https://www.unicode.org/Public/MAPPINGS/ISO8859/$(@F)
144144

145145
$(filter-out CP8%,$(WINTEXTS)) CP932.TXT CP950.TXT:
146-
$(DOWNLOAD) http://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/$(@F)
146+
$(DOWNLOAD) https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/$(@F)
147147

148148
$(filter CP8%,$(WINTEXTS)):
149-
$(DOWNLOAD) http://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/$(@F)
149+
$(DOWNLOAD) https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/$(@F)

src/backend/utils/mb/Unicode/UCS_to_BIG5.pl

+2-2
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@
88
# map files provided by Unicode organization.
99
# Unfortunately it is prohibited by the organization
1010
# to distribute the map files. So if you try to use this script,
11-
# you have to obtain the map files from the organization's ftp site.
12-
# ftp://www.unicode.org/Public/MAPPINGS/
11+
# you have to obtain the map files from the organization's download site.
12+
# https://www.unicode.org/Public/MAPPINGS/
1313
#
1414
# Our "big5" comes from BIG5.TXT, with the addition of the characters
1515
# in the range 0xf9d6-0xf9dc from CP950.TXT.

src/backend/utils/mb/Unicode/UCS_to_JOHAB.pl

+2-2
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@
88
# map files provided by Unicode organization.
99
# Unfortunately it is prohibited by the organization
1010
# to distribute the map files. So if you try to use this script,
11-
# you have to obtain the map files from the organization's ftp site.
12-
# ftp://www.unicode.org/Public/MAPPINGS/
11+
# you have to obtain the map files from the organization's download site.
12+
# https://www.unicode.org/Public/MAPPINGS/
1313
# We assume the file include three tab-separated columns:
1414
# JOHAB code in hex
1515
# UCS-2 code in hex

src/backend/utils/mb/Unicode/UCS_to_most.pl

+2-2
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@
88
# map files provided by Unicode organization.
99
# Unfortunately it is prohibited by the organization
1010
# to distribute the map files. So if you try to use this script,
11-
# you have to obtain the map files from the organization's ftp site.
12-
# ftp://www.unicode.org/Public/MAPPINGS/
11+
# you have to obtain the map files from the organization's download site.
12+
# https://www.unicode.org/Public/MAPPINGS/
1313
# We assume the file include three tab-separated columns:
1414
# source character set code in hex
1515
# UCS-2 code in hex

src/common/unicode/Makefile

+1-1
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ DOWNLOAD = wget -O $@ --no-use-server-timestamps
2323
# These files are part of the Unicode Character Database. Download
2424
# them on demand.
2525
UnicodeData.txt CompositionExclusions.txt NormalizationTest.txt:
26-
$(DOWNLOAD) http://unicode.org/Public/UNIDATA/$(@F)
26+
$(DOWNLOAD) https://www.unicode.org/Public/UNIDATA/$(@F)
2727

2828
# Generation of conversion tables used for string normalization with
2929
# UTF-8 strings.

src/common/unicode_norm.c

+4-4
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
* Normalize a Unicode string to NFKC form
44
*
55
* This implements Unicode normalization, per the documentation at
6-
* http://www.unicode.org/reports/tr15/.
6+
* https://www.unicode.org/reports/tr15/.
77
*
88
* Portions Copyright (c) 2017-2019, PostgreSQL Global Development Group
99
*
@@ -109,7 +109,7 @@ get_decomposed_size(pg_wchar code)
109109
/*
110110
* Fast path for Hangul characters not stored in tables to save memory as
111111
* decomposition is algorithmic. See
112-
* http://unicode.org/reports/tr15/tr15-18.html, annex 10 for details on
112+
* https://www.unicode.org/reports/tr15/tr15-18.html, annex 10 for details on
113113
* the matter.
114114
*/
115115
if (code >= SBASE && code < SBASE + SCOUNT)
@@ -234,7 +234,7 @@ decompose_code(pg_wchar code, pg_wchar **result, int *current)
234234
/*
235235
* Fast path for Hangul characters not stored in tables to save memory as
236236
* decomposition is algorithmic. See
237-
* http://unicode.org/reports/tr15/tr15-18.html, annex 10 for details on
237+
* https://www.unicode.org/reports/tr15/tr15-18.html, annex 10 for details on
238238
* the matter.
239239
*/
240240
if (code >= SBASE && code < SBASE + SCOUNT)
@@ -362,7 +362,7 @@ unicode_normalize_kc(const pg_wchar *input)
362362
continue;
363363

364364
/*
365-
* Per Unicode (http://unicode.org/reports/tr15/tr15-18.html) annex 4,
365+
* Per Unicode (https://www.unicode.org/reports/tr15/tr15-18.html) annex 4,
366366
* a sequence of two adjacent characters in a string is an
367367
* exchangeable pair if the combining class (from the Unicode
368368
* Character Database) for the first character is greater than the

0 commit comments

Comments
 (0)