TB 121 Ltnews 28

48 TUGboat, Volume 39 (2018), No.
LATEX News
Issue 28, April 2018
Contents replace it with the standard “Issue Tracker” available at

Github.
A new home for LATEX 2ε sources 1 The requirements and the workflow for reporting a
bug in the core LATEX software is documented at
Bug reports for core LATEX 2ε 1
https://www.latex-project.org/bugs/
UTF-8: the new default input encoding 1
The new default . . . . . . . . . . . . . . . . . 2 and with further details also discussed in [1].
Compatibility . . . . . . . . . . . . . . . . . . . 2
BOM: byte order mark handling . . . . . . . . 2 UTF-8: the new default input encoding
The first TEX implementations only supported reading
A general rollback concept 2 7-bit ascii files—any accented or otherwise “special”
character had to be entered using commands, if it could
Integration of remreset and chngcntr packages 3 be represented at all. For example to obtain an “ä” one
would enter \"a, and to typeset a “ß” the command
Testing for undefined commands 3
\ss. Furthermore fonts at that time had 128 glyphs
Changes to packages in the tools category 3 inside, holding the ascii characters, some accents to
LATEX table columns with fixed widths . . . . . 3 build composite glyphs from a letter and an accent, and
Obscure overprinting with multicol fixed . . . . 3 a few special symbols such as parentheses, etc.
With 8-bit TEX engines such as pdfTEX this situation
Changes to packages in the amsmath category 3 changed somewhat: it was now possible to process 8-bit
Updated user’s guide . . . . . . . . . . . . . . . 3 files, i.e., files that could encode 256 different characters.
However, 256 is still a fairly small number and with this
A new home for LATEX 2ε sources limitation it is only possible to encode a few languages
In the past the development version of the LATEX 2ε and for other languages one would need to change the
source files has been managed in a Subversion source encoding (i.e., interpret the character positions 0–255
control system with read access for the public. This way in a different way). The first code points 0–127 were
it was possible to download in an emergency the latest essentially normed (corresponding to ascii) while the
version even before it was released to CTAN and made second half 128–255 would vary by holding different
its way into the various distributions. accented characters to support a certain set of languages.
We have recently changed this setup and now manage Each computer used one of these encodings when
the sources using Git and placed the master sources on storing or interpreting files and as long as two computers
GitHub at used the same encoding it was (easily) possible to
exchange files between them and have them interpreted
https://github.com/latex3/latex2e
and processed correctly.
where we already store the sources for expl3 and other But different computers may have used different
work. As before, direct write access is restricted to encodings and given that a computer file is simply a
LATEX Project Team members, but everything is publicly sequence of bytes with no indication for which encoding
accessible including the ability to download, clone (using is intended, chaos could easily happen and has happened.
Git) or checkout (using SVN). More details are given For example, the German word “Größe” (height) entered
in [1]. on a German keyboard could show up as “GrŤàe” on a
different computer using a different encoding by default.
Bug reports for core LATEX 2ε So in summmary the situation wasn’t at all good and
For more than two decades we used GNATS, an open it was clear in the early nineties that LATEX 2ε (that was
source bug tracking system developed by the FSF. While being developed to provide a LATEX version usable across
that has served us well in the past it started to show the world) had to provide a solution to this issue.
its age more and more. So as part of this move we also The LATEX 2ε answer was the introduction of the
decided to finally retire the old LATEX bug database and inputenc package [2] through which it is possible to
LATEX News, and the LATEX software, are brought to you by the LATEX3 Project Team; Copyright 2018, all rights reserved.
LATEX News #28

TUGboat, Volume 39 (2018), No. 1 49
provide support for multiple encodings. It also allows Only documents that have been stored in a legacy
to correctly process a file written in one encoding on a encoding and used accented letters from the keyboard
computer using a different encoding and even supports without loading inputenc (relying on the similarities
documents where the encoding changes midway. between the input used and the T1 font encoding) are
Since the first release of LATEX 2ε in 1994, LATEX affected.
documents that used any characters outside ascii in These documents will now generate an error that
the source (i.e. any characters in the range of 128–255) they contain invalid UTF-8 sequences. However, such
were supposed to load inputenc and specify in which documents may be easily processed by adding the new
file encoding they were written and stored. If the command \UseRawInputEncoding as the first line of the
inputenc package was not loaded then LATEX used a file. This will re-instate the previous “raw” encoding
“raw” encoding which essentially took each byte from default.
the input file and typeset the glyph that happened to \UseRawInputEncoding may also be used on the
be in that position in the current font—something that command line to process existing files without requiring
sometimes produces the right result but often enough the file to be edited
will not.
pdflatex ’\UseRawInputEncoding \input’ file
In 1992 Ken Thompson and Rob Pike developed the
UTF-8 encoding scheme which enables the encoding will process the file using the previous default encoding.
of all Unicode characters within 8-bit sequences. Over Possible alternatives are reencoding the file to UTF-8
time this encoding has gradually taken over the world, using a tool (such as recode or iconv or an editor) or
replacing the legacy 8-bit encodings used before. These adding the line
days all major computer operating systems use UTF-8 \usepackage[hencodingi]{inputenc}
to store their files and it requires some effort to explicitly
store files in one of the legacy encodings. to the preamble specifying the hencodingi that fits the
As a result, whenever LATEX users want to use any file encoding. In many cases this will be latin1 or
accented characters from their keyboard (instead of cp1252. For other encoding names and their meaning
resorting to \"a and the like) they always have to use see the inputenc documentation.
As usual, this change may also be reverted via
\usepackage[utf8]{inputenc} the more general latexrelease package mechanism, by
in the preamble of their documents as otherwise LATEX speciying a release date earlier than this release.
will produce gibberish.
BOM: byte order mark handling
The new default When using Unicode the first bytes of a file may be a, so
With this release, the default encoding for LATEX files called, BOM character (byte order mark) to indicate the
has been changed from the “fall through raw” encoding byte oder used in the file. While this is not required with
to UTF-8 if used with classic TEX or pdfTEX. The UTF-8 encoded files (where the byte order is known) it
implementation is essentially the same as the existing is nevertheless allowed by the standard and some editors
UTF-8 support from \usepackage[utf8]{inputenc}. add that byte sequence to the beginning of a file. In the
The LuaTEX and XETEX engines always supported past such files would have generated a “Missing begin
the UTF-8 encoding as their native input encoding, so document” error or displayed strange characters when
with these engines inputenc was always a no-op. loaded at a later stage.
This means that with new documents one can assume With the addition of UTF-8 support to the kernel it is
UTF-8 input and it is no longer required to always now possible to identify and ignore such BOM characters
specify \usepackage[utf8]{inputenc}. But if this line even before \documentclass so that these issues will no
is present it will not hurt either. longer be showing up.
Compatibility
A general rollback concept for packages and
For most existing documents this change will be
transparent: classes
• documents using only ascii in the input file and In 2015 a rollback concept for the LATEX kernel was
accessing accented characters via commands; introduced. Providing this feature allowed us to make
corrections to the software (which more or less didn’t
• documents that specified the encoding of their file happen for nearly two decades) while continuing to
via an option to the inputenc package and then maintain backward compatibility to the highest degree.
used 8-bit characters in that encoding; In this release we have now extended this concept to
• documents that already had been stored in UTF-8 the world of packages and classes which was not covered
(whether or not specifying this via inputenc). initially. As the classes and the extension packages
LATEX News #28

50 TUGboat, Volume 39 (2018), No. 1
have different requirements compared to the kernel, Obscure overprinting with multicol fixed
the approach is different (and simplified). This should A rather peculiar bug was reported on StackExchange
make it easy for package developers to apply it to their for multicol. If the column/page breaking was fully
packages and authors to use when necessary. controlled by the user (through \columnbreak) instead
The documentation of this new feature is given in an of letting the environment do its job and if then more
article submitted to TUGboat and also available from \columnbreak commands showed up on the last page
our website [3]. then the balancing algorithm was thrown off track. As a
result some parts of the columns overprinted each other.
Integration of remreset and chngcntr packages The fix required a redesign of the output routines
into the kernel used by multicol and while it “should” be transparent in
With the optional argument to \newcounter LATEX other cases (and all tests in the regession test suite came
offers to automatically reset counters when some counter out fine) there is the off-chance that code that hooked
is stepped, e.g., stepping a chapter counter resets the into the internals of multicol needs adjustment.
section counter (and recursively all other heading
Changes to packages in the amsmath category
counters). However, what was until now missing was a
way to undo such a link between counters or to link two With this release of LATEX a few minor issues with
counters after they have been defined. amsmath have been corrected.
This can be now be done with \counterwithin Updated user’s guide
and \counterwithout, respectively. In the past one Furthermore, amsldoc.pdf, the AMS user’s guide
had to load the chngcntr package for this. For the for the amsmath package [5], has been updated from
programming level we also added \@removefromreset version 2.0 to 2.1 to incorporate changes and corrections
as the counterpart of the already existing \@addtoreset made between 2016 and 2018.
command. Up to now this was offered by the remreset
package.
References
Testing for undefined commands
[1] Frank Mittelbach: New rules for reporting bugs in
LATEX packages often use a test \@ifundefined to test the LATEX core software. In: TUGboat, 39#1, 2018.
if a command is defined. Unfortunately this had the https://www.latex-project.org/publications/
side effect of defining the command to \relax in the
case that it had no definition. The new release uses [2] Frank Mittelbach: LATEX 2ε Encoding
a modified definition (using extra testing possibilities Interface — Purpose, concepts, and Open
available in ε-TEX). The new definition is more natural, Problems. Talk given in Brno June 1995.
however code that was relying on the side effect of the https://www.latex-project.org/publications/
command being tested being defined if it was previously [3] Frank Mittelbach: A rollback concept for packages
undefined may have to add \let\hcommandi\relax. and classes. Submitted to TUGboat.
https://www.latex-project.org/publications/
Changes to packages in the tools category
[4] Frank Mittelbach: LATEX table columns with fixed
LATEX table columns with fixed widths widths. In: TUGboat, 38#2, 2017.
Frank published a short paper in TUGboat [4] on https://www.latex-project.org/publications/
producing tables that have columns with fixed widths.
The outlined approach using column specifiers “w” and [5] American Mathematical Society and The LATEX3
“W” has now been integrated into the array package. Project: User’s Guide for the amsmath package
(Version 2.1). April 2018. Available from
https://www.ctan.org and distributed as part of
every LATEX distribution.
LATEX News #28

TB 121 Ltnews 28

Uploaded by

Copyright:

Available Formats

TB 121 Ltnews 28

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TB 121 Ltnews 28

Uploaded by

Copyright:

Available Formats

48 TUGboat, Volume 39 (2018), No.

Contents replace it with the standard “Issue Tracker” available at

LATEX News #28

LATEX News #28

LATEX News #28

You might also like