Hyph-Utf8: The Package and Hyphenation With TEX
Hyph-Utf8: The Package and Hyphenation With TEX
Hyph-Utf8: The Package and Hyphenation With TEX
Main description:
June 2011
Latest editorial change:
16 Mars 2018
Abstract:
In 2008 all the existing hyphenation patterns from TEX distributions have been collected in a
single package hyph-utf8, converted into UTF-8 encoding and adapted for use in different
TEX engines. The patterns can be used directly by Unicode-aware engines such as LuaTEX
and XƎTEX, and there is a mechanism to convert the patterns to the appropriate 8-bit encoding
when used with pTEX, pdfTEX or Knuth’s TEX.
Table of Contents:
1
1 Using hyphenation patterns
\uselanguage{langname}
where langname is the string identifying a particular hyphenation file in language.def (see Section 2).
1.2 LATEX
\usepackage[languagename]{babel}
In 8-bit engines you also need to make sure that you load the proper font encoding which supports all the
characters used in the language of your choice, for example:
\usepackage[T1]{fontenc}
N.B.: You can use Babel with any TEX engine, however it has never been properly adapted to work well with
Unicode engines. If you are using XƎTEX it is advisable to use Polyglossia instead.
\usepackage{polyglossia}
\setmainlanguage[optional settings]{langname}
\setotherlanguages{otherlangname}
\language=\l@<langname>
2
The user command is supposed to be
\hyphenrules{langname}
or
and should work with any flavour of LATEX, however we couldn’t make it work.
1.3 ConTEXt
ConTEXt doesn’t load patterns for all the language that hyph-utf8 provides. If you miss any language, please
contact the mailing list. The general syntax for supported languages is the following:
You can use full language name or the two-letter language code.
\usetypescript[iwona][qx]
\setupbodyfont[iwona]
\mainlanguage[polish]
ConTEXt loads hyphenation patterns in several encodings. The Czech or Slovak patterns can be used with both
EC and IL2 font encoding for example. The right hyphenation patterns will be chosen based on current font
encoding.
3
1.4 Some advanced examples
\usepackage{polyglossia}
% the language used for main document
\setmainlanguage{asturian}
% American English with extended hyphenation patterns
\setotherlanguage[variant=usmax]{english}
% German with experimental patterns "ngerman-x-latest"
\setotherlanguage[spelling=new,latesthyphen=true]{german}
\setotherlanguages{spanish,catalan,french}
\begin{document}
\begin{german}
Deutscher Text ... (with the hyphenation patterns selected above:
"ngerman-x-latest")
\end{german}
\begin[script=fraktur,spelling=old]{german}
Deutſcher Text ... (set in Fraktur, with traditional hyphenation).
\end{german}
\end{document}
4
2 List of supported languages
For several languages, there is additional documentation in a separate file: see
English
- english usenglish, USenglish, american
en-us usenglishmax
en-gb ukenglish british, UKenglish
Afrikaans Farsi
af afrikaans fa farsi persian
Ancientgreek Finnish
grc ancientgreek fi finnish
grc-x-ibycus ibycus
Arabic
ar arabic
Armenian
hy armenian
Assamese
as assamese
Basque
eu basque
Belarusian
be belarusian
Bengali
bn bengali
Bulgarian
bg bulgarian
Catalan
ca catalan
Chinese
zh-latn-pinyin pinyin
Church Slavonic
cu churchslavonic
Coptic
cop coptic
Croatian
hr croatian
Czech
cs czech
Danish
da danish
Dutch
nl dutch
Esperanto
eo esperanto
Estonian
et estonian
Ethiopic
mul-ethi ethiopic amharic, geez
5
French Marathi
fr french patois, francais mr marathi
Friulan Mongolian
fur friulan mn-cyrl mongolian
Galician mn-cyrl-x-lmc mongolianlmc
gl galician Norwegian
Georgian nb bokmal norwegian, norsk
ka georgian nn nynorsk
German Occitan
de-1901 german oc occitan
de-1996 ngerman Oriya
de-ch-1901 swissgerman or oriya
Greek Panjabi
el-monoton monogreek pa panjabi
el-polyton greek polygreek Polish
Gujarati pl polish
gu gujarati Piedmontese
Hindi pms piedmontese
hi hindi Portuguese
Hungarian pt portuguese portuges
hu hungarian Romanian
Icelandic ro romanian
is icelandic Romansh
Indonesian rm romansh
id indonesian Russian
Interlingua ru russian
ia interlingua Sanskrit
Irish sa sanskrit
ga irish Serbian
Italian sr-latn serbian
it italian sr-cyrl serbianc
Kannada Slovak
kn kannada sk slovak
Kurmanji Slovenian
kmr kurmanji sl slovenian slovene
Latin Spanish
la latin es spanish espanol
la-x-classic classiclatin Swedish
la-x-liturgic liturgicallatin sv swedish
Latvian Tamil
lv latvian ta tamil
Lithuanian Telugu
lt lithuanian te telugu
Malayalam Thai
ml malayalam th thai
Turkish
tr turkish
Turkmen
tk turkmen
Ukrainian
uk ukrainian
Uppersorbian
hsb uppersorbian
Welsh
cy welsh
6
Babel defines a few more synonyms (which consequently only work in LATEX):
english canadian
british australian, newzealand
german austrian
ngerman naustrian, nswissgerman
portuguese brazilian, brazil