YP.-QY: Conversion of TEX Fonts Into Type 1 Format
YP.-QY: Conversion of TEX Fonts Into Type 1 Format
YP.-QY: Conversion of TEX Fonts Into Type 1 Format
-QY
Conversion of TEX fonts into Type 1 format
ter Szabo
Pe
abstract. This paper analyses the problem of converting TEX fonts to Type 1 fonts,
describes TEXtrace, a new free conversion program, and compares it to other possible
methods and existing utilities. TEXtrace works by rendering the font in high
resolution and then tracing (vectorizing) it.
keywords: PDF, font conversion, Type1, METAFONT, vector, outline, raster, bitmap,
pdfTEX
"
here are two major ways to describe the shape of the glyphs1 of a font: using
bitmap images or vector outlines. A bitmap image is a rectangular array of
dots; each dot can be painted white or black independently from the others.
At a high resolution of around 600 DPIthe human eye cannot distinguish individual
pixels and sees the glyph as if it was continuous. On the other hand, vector outlines
specify the black regions by the curves (lines, arcs, ellipses and others) which delimit
them. The most important advantage of this representation is that glyph outlines can
be transformed (rotated, scaled etc.) without quality loss.
Most fonts used in todays desktop publishing systems are distributed in some kind
of vector outline format. (Such formats are: Type 1, TrueType, OpenType, Speedo,
METAFONT source.) The process of converting an outline font to its bitmap equivalent
is called rendering or rasterization. The rendering must surely happen before actually
printing the document onto paper or displaying it on screen, because todays printers
and screens display everything using a rectangular layout of dots. So the difference
between document preparation systems lies not in the presence of the rendering, but
the moment when it occurs.
When printing a document onto paper, the exact time instant of the rendering is
not important 4 the author wants the final output be beautiful, and all other inner
parts of the printing process are of minor importance. However, the case is different
when the document is about to be distributed in electronic form. Since readers might
use different software for viewing and/or printing, the author should ensure that the
document looks and prints nice in most environments, i.e. the document should be
Many
1 glyph:
szabo
created to be portable. This involves selecting a font format that works well in most
reader software, and using this font format exclusively in electronically published
versions of the document.
In our specific case, the file format used for document publishing is PDF, and one
of the best font formats for embedding is Type 1, a traditional vector outline format. (These will be presented later in more detail.) Todays TEX-to-PDF generating
software support Type 1 fonts, but unfortunately most TEX fonts are currently not
available in Type 1 format. So they have to be converted.
This paper focuses on a quite difficult font conversion problem. The most common
source format, METAFONT is rather complicated (in fact, it is an almost full-featured
programming language), and the destination format, Type 1 has too many restrictions.
Probably this is the reason why there have been no good and robust programs to
do the conversion. TEXtrace, described later in this document, overcomes the first
problem by not even trying to understand the METAFONT files, but letting the original
METAFONT program render the glyphs, and then calling the AutoTrace program which
traces the bitmaps (approximates them with vector outlines). Fortunately AutoTraces
output needs only minor modifications to comply the restrictions of the Type 1 format.
The main disadvantage of this blind conversion approach is a minor quality loss
caused by both the rendering and the tracing process.
The availability of the fonts in a vector outline format is of primary importance
when publishing the document as PDF, because Acrobat Reader, the most widely
used PDF viewer displays bitmap fonts slowly and ugly on screen.
Motivation
Why PDF?
The PDF file format is very popular nowadays among authors to distribute their work
electronically. PDF has become popular because it
9
9
9
represents letters, spaces, line breaks and images etc. accurately: documents retain
their structure and original visual appearance between PDF viewers (as opposed
to markup languages, such as SGML and HTML).
follows the industry standard PostScript page drawing model, PostScript files can
be converted to PDF without quality loss (and vice versa).
is portable and device-independent. Printing PDF files and rendering them on
screen is often easier and more straightforward than doing the same for PostScript
files. It is possible to create a portable PDF document, but one can never be sure
about PostScript portability.
has widespread freely available viewers (e.g. Acrobat Reader: either stand-alone
or in the web browser, Ghostscript, xpdf, PDF e-Book readers) for most platforms.
Most systems have some sort of PDF viewer installed.
has hyperlink, form submit and other web-related capability. (This article ignores
these facilities and deals only with static PDF files.)
Figure 1: The difference between vector (nice) and bitmap fonts in Acrobat Reader
is well suited for both short and long documents, including scientific articles, reports, books and software documentation.
Of course, it is possible to convert TEX output into PDF format either by calling
pdfTEX [6], which directly creates PDF files instead of DVI files, or using a DVI-toPDF or PostScript-to-PDF converter. Preparing a PDF version of a TEX document
is somewhat harder than just printing the PostScript version, and requires a bit more
time. To enjoy the benefits listed above, the author must use the proper version of
the utilities, she must adjust the settings, and she must make sure that the proper
versions of the fonts get embedded.
szabo
is merely syntactical: similar to converting plain text into a program that just prints
the text into its standard output and quits: we just convert a glyph outline to METAFONT drawing instructions that draw the outline. However, this direction of conversion
is quite useless, because TEX can already deal with Type 1 fonts well (even for PDF
files), plus the inner structure of the glyphs wouldnt be revealed, so wed lose the
benefit of the geometrical (programmed) font generation philosophy of METAFONT.
The other way, converting .mf to Type 1 is expected to be harder, because the
METAFONT language is much more complicated than the Type 1 syntax. So we cannot
convert the METAFONT programs exactly without understanding them. Because
current .mf files tend to be quite complicated, full of dirty tricks, and they are utilizing
most features of the language, the only reliable way to understand these files is using
a full METAFONT interpreter. There are two such interpreters available: mf itself and
mpost (METAPOST). mf doesnt fit our needs, because it generates bitmap images
and not vector outlines. On the other hand, mpost generates EPS (Encapsulated
PostScript) output, containing vector outlines which are very similar to the Type 1
outlines. But, due to the restrictions of the Type 1 format, these outlines must be
post-processed. This cleanup process is so complicated that currently no program
exists that can do it reliably. Alternative ways of using of METAFONT or METAPOST
for conversion will be discussed two sections later.
Of course 3 as in the case of TEXtrace 4 one may go ahead without trying to
understand the METAFONT language, and looking at only mfs bitmap output. These
bitmaps must somehow be converted (traced) into vector outlines that approximate
the bitmap. Fortunately there exist automatic utilities for this.
A tour from CM to EC
TEXs default and most common font family is the Computer Modern (CM) family,
designed by the author of TEX and METAFONT, Donald Knuth himself, between 1979
and 19855 . The fonts were available only in METAFONT format till 1992, when BlueSky
converted them to Type 1. Now the Type 1 .pfb files are available from CTAN and
they are part of most standard TEX distributions (such as teTEX and MikTEX).
pdfTEX is configured to use these .pfb files by default (pdftex.cfg includes bsr.
map which contains all the available font names), so there should be no problem when
creating PDF from those TEX documents which use the CM fonts.
The EC abbreviation stands for the European Computer Modern family, designed
by Jorg Knappen between 1995 and 1997. European means that these fonts suit
typesetting in some non-English European languages better than the CM fonts. Apart
from this, the EC fonts are basically the same as the CM fonts with a couple of minor
additions and adjustments. It is almost impossible to distinguish a font in the EC
family from its CM counterpart6 . Tha main difference between these two families is
technical: the EC family contains the accented characters directly in the font, while
the CM family contains the letters and the accents separately and TEX macros are
used to position the accents. The EC fonts are thus larger, but they have an important
5 for
those who are still wondering: this paper uses the Computer Modern fonts almost exclusively.
of the visible differences are the accents; for example the CM acute accent is taller, and the
CM Hungarian umlaut accent is more angled than its EC counterpart
6 one
szabo
benefit: text entered using those fonts can be automatically hyphenated by TEX (in
the case of CM fonts the macros prevent automatic hyphenation).
Unfortunately the EC fonts are not available in Type 1 format yet, and most European TEX users are suffering from this problem. The primary reason why TEXtrace
was written is to do the conversion. However, several methods existed before TEXtrace to overcome this Ugly Bit. Some of them are:
9
The AE fonts with the LATEX package ae simulate the EC fonts from the available
CM fonts with virtual fonts7 . Both requirements are fulfilled: automatic hyphenation works because the virtual font gives TEX all the accented glyphs, and PDF
files look nice because the virtual font points to the CM fonts, already available in
Type 1 format from BlueSky.
The most serious problem of the AE fonts is that only the most commonly used
EC fonts are simulated, and they are available only in some of the EC design sizes
(58 of 14). Of course some glyphs are missing because they were missing from the
CM fonts (examples: eth , gulliemots ). Another problem is that the metrics
are a bit different from the original font, and this affects line and page breaks of
the document. (This could be easily solved by overwriting the .tfm files from the
good EC fonts.)
Correcting the accents and creating all design sizes would have been possible
with a couple of weeks of work, but the missing glyphs imposed an unsolvable
limitation. And the method worked only for the EC fonts, it couldnt cope with
all TEX fonts.
The ZE fonts are similar to the AE fonts, but they collect the missing glyphs from
common Adobe fonts (such as /Times-Roman). Apart from this, they have the
same fundamental problems as the AE fonts.
Automatic hyphenation works well when typesetting an English (and other latin
non-accented) document. The already available CM fonts can be used for this.
In LATEX, the only thing the author must ensure is that the document preamble
doesnt contain \usepackage{t1enc}.
There are language-specific conversion utilities which modify .tex source files and
insert discretionary hyphens (\-) into the appropriate places. So the hyphenation
problem is solved, the automatic hyphenation provided by TEX isnt used at all.
These utilities are not part of TEX itself, and have different interface for different
languages. They contradict the generic, consistent and accurate way as Babel
works, and make typesetting with TEX more difficult and error-prone. The solution
is font-independent, so it works with the CM fonts.
Thus, in the case of the Computer Modern family, one can use the tricks above to
manage without the Type 1 versions of the EC fonts. But there are no such tricks
for most other TEX fonts: they need conversion, and for the conversion, one needs a
proper converter.
7a
TEX virtual font builds its glyphs from glyphs of other fonts. Virtual fonts can be used for
re-encoding and/or collecting glyphs from several fonts.
VX]X-X]XW
TEXtrace
TEXtrace is a collection of scripts for UNIX that convert any TEX font into a Type 1
.pfb outline font immediately suitable for use with DVIPS, pdfTEX, Acrobat Reader
(and many other programs). The documents using these fonts cannot be visually
distinguished from the documents using the original TEX fonts, moreover the PDF
documents display quickly and nicely in Acrobat Reader.
The main goal for writing TEXtrace was to convert the EC fonts to Type 1 format
with it. This goal has been fulfilled, 372 of the 392 font files (28 fonts 14 design sizes)
are available as .pfb files with an average file size of 86 kb. The missing 20 files are the
Typewriter fonts at 5pt and the experimental Fibonacci and Funny fonts at various
sizes. They are missing because METAFONT failed to generate them. The conversion
took 40 hours with a 333 MHz Celeron processor and 128 Mb memory.
The 392 TC (Text Companion) fonts and some others have been also converted.
(With some failures, because of METAFONT overflow or font program error).
TEXtrace is in beta status, the author is waiting bug reports from users.
How it works
TEXtrace renders all the 256 glyphs of the TEX font in a very high resolution (
7000 DPI), then it converts the resulting bitmaps to individual vector outlines, and
finally it assembles these outlines to form a syntactically correct Type 1 font. The
final result is a .pfb font file, which can be read by both DVIPS and pdfTEX to be
included into documents.
The principle sounds simple, but the implementation is quite complicated because
of the usual quirks of programs used, incompatible software and loose (or disobeyed)
file format specifications.
TEXtrace operates fully automatically. It doesnt require human intervention,
it doesnt ask questions. The output of TEXtrace is useful immediately, needs no
further conversion, re-editing or adjustment in font editors. TEXtrace can operate
in batch mode; this means converting all the fonts listed in a .map file.
The operations of TEXtrace for a single font:
1. A .tex file is generated, each page with a different glyph from the font.
Not only the shape of the glyph is important, but the absolute coordinates of
the character origin must also be specified: this guarantees that glyphs will line
up to basepoint and follow each other the same distances in the original and the
converted font. So the TEX primitive \shipout is used to output pages. This way
page numbers and the various glues are completely avoided, so the glyph is typeset
to a well-defined, exact place on the page.
The font is requested at a relatively large size (120.45 pt for 600 DPI and fonts
with design size of 10 pt), so that 1 em will be 1000 pixels long when rendered.
TEX is instructed to output glyph metrics into a separate .log file.
szabo
2. tex is invoked to create the .dvi file from the .tex file. This is a quite simple job
for TEX: no line or page breaks etc. occur. As a side effect, tex runs mktextfm (or
whatever font generation mechanism is used in the TEX distribution) if the .tfm
file was missing.
3. dvips is run, which converts the .dvi file to .ps and automatically embeds the
font into the .ps file.
DVIPS reads the font from the appropriate .pk file (e.g. ecrm1000.7227pk)
most of the time. If the .pk doesnt already exist (which is probably true since
the chances for a previous use of the font at 120.45 pt or so are negligible), it calls
mktexpk to create it. mktexpk looks around and passes the job to mf.
All this happens automatically, without TEXtrace having to know about TEX
font generation and embedding internals.
4. gs (GNU Ghostscript) is called to render (rasterize) the .ps file.
This is almost a syntactical conversion, because METAFONT fonts are already
inserted into .ps file in rasterized bitmap .pk format by DVIPS8 . Of course the
output bitmaps of Ghostscript are not sent to the printer, but they are saved to
individual image files (in the PBM format).
Special care must be taken for glyphs with negative sidebearing (i.e. glyphs
extending to the left from their origin). For normal printing, this isnt a problem,
because a sub-millimeter negative sidebearing is negligable with a left margin of
2 cm. In the case of TEXtrace, we must make sure that the glyph wont stick out
of the page. This could be accomplished by moving a large (but constant) amount
to the right and hoping that the left sidebearing is smaller than that amount. TEXtrace uses a more precise approach: it measures the glyph bounding box (with a
careful trickery using the bbox device of Ghostscript), and moves the appropriate
amount to the right.
5. The individual bitmaps are converted to vector outlines by the free external program AutoTrace (written by Martin Weber <martweb@gmx.net>).
This converter supports several input and output file formats, but these are
mostly equivalent. The EPS format is used for output, because it is very similar
to how Type 1 represents glyphs. However, Type 1 imposes some restrictions on
the glyphs, so a little modification of the original AutoTrace output routine was
required to obey these restrictions:
9 Path directions (clockwise/counter-clockwise) had to be corrected, so that black
regions would always be to the right-hand side from the path.
9 Multiple paths had to be converted into a single one.
9 The initial clearing of the image to white color had to be removed.
9 Curves didnt touch each other in the original output, so no modifications were
necessary.
The AutoTrace program has bugs: it errs out with some failed assertion error
for 0.12% of the glyphs. The author of AutoTrace is aware of the bugs, but he
currently has no fix for them. The bug is worked around empirically, by calling
8 an
exception: when DVIPS has embedded a font in other than .pk format.
original
AutoTrace
Freehand
CorelTrace
Figure 2: AutoTrace compared to other tracing software
Streamline
original
-corner-threshold 80
-corner-threshold 90
Figure 3: AutoTrace cannot find corners at low resolution
10
szabo
Availability
TEXtrace is free software, licensed under the GNU General Public License. It can
be downloaded from its home page:
http://www.inf.bme.hu/~pts/textrace/
TEXtrace requires a UNIX system with the teTEX distribution, GNU Ghostscript,
Perl and GNU Bash to run. Successful runs have been reported on MacOS X and
Win32, but those platforms are not supported. Of course, .pfb font files generated by
TEXtrace are platform independent, and they can be used in any TEX environment.
Some fonts converted by TEXtrace are available for download from the same home
page. These include all the EC and TC fonts, some missing AMS fonts and others.
Problems to be solved
9
Huge font files. The .pfb files TEXtrace generates are around 3.15 times larger
than the optimum. This is mostly because of AutoTrace cannot find the optimal
Bezier-curves that describe the outline. To solve this, AutoTrace has to be partially
rewritten or an other tracer must be used. (The author doesnt know of free tracers
comparable to or better than AutoTrace.)
Corners get smoothed. Finding the corners (sharp angles) of the outline is a weak
point AutoTrace: sometimes it recognizes corners as small arcs. The author tried
adjusting the tracing parameters to solve the problem, but hasnt succeeded.
Hinting information is missing. Type 1 fonts use the so-called hinting technology
to improve rendering on small resolutions (e.g. on screen, 100 DPI). This requires
global and glyphwise additional information in the font, called hints. To achieve
high quality, this information should be inserted by experts with great care, the
process can be automated only with loss of quality. Fonts generated by TEXtrace completely lack hinting, so glyphs are expected to be badly readable in low
resolutions. Fortunately, in practice, fonts generated by TEXtrace can be read
quite well in Acrobat Reader unless one turns antialiasing off explicitly9 .
Conversion is slow. With a 333 MHz Celeron processor and 128 Mb of memory,
conversion of a single font (e.g. ecit1728) takes around six minutes. This is
tolerable unless one wants to convert hundreds of fonts (such as all the 392 EC
fonts). Although this might sound quite slow, currently TEXtrace is the fastest in
its category. Moreover, the process is fully automatic, and the result is immediately
9 with
FilePreferencesGeneralSmooth. . .
11
usable without any further human intervention. Although doubled speed could be
achieved with complete rewrite of the code, this wouldnt worth optimizing.
Limited portability. TEXtrace is requires a UNIX environment, but could be easily
ported to other systems (with TEX, DVIPS and Ghostscript) in 12 days of work.
TEXtrace consists of programs written in Bash (shell script), Perl, PostScript and
ANSI C. The C part cannot be avoided because the engine, AutoTrace is written in
C. The PostScript part is OK, because Ghostscript is needed anyway. Shell scripts
can be easily rewritten in Perl, and Perl scripts are possible to be rewritten in C,
but it would require too much work.
Written for experts. Although the documentation describes the conversion process
for beginners, expert level knowledge is needed to enjoy TEXtraces benefits. The
selection of source fonts, and the proper usage of the destination fonts in documents
is left to the user. So is the recovery from errors: TEXtrace is quite paranoid and
stops on most errors, but the error message doesnt always indicate the real cause,
and the user is left alone for figuring out and making corrections.
Fails for some .mf files. METAFONT uses fixed point arithmetic to make rounding
errors identical on all architectures. The bits representing the fractions are chopped
from the integer part, so the range of numbers in METAFONT is small. The large
rendering resolution asked by TEXtrace requires large numbers. This results
sometimes in overflow errors for some glyphs, and mf fails to generate those glyphs.
TEXtrace currently refuses to convert fonts with bad glyphs. The problem is hard
to solve in the general case: the implementation of arithmetics in METAFONT
should be rewritten from scratch.
No Unicode support. TEXtrace operates on normal TEX fonts with 256 or less
glyphs. Although theoretically it wouldnt be hard to implement Omega and/or
Unicode support, it is not available yet. TEXtrace deals only with glyph indices,
because encodings would be already resolved by the time fonts generated by TEXtrace are needed.
12
szabo
psfonts.map.
9 pdfTEX. The file is any of the .map files mentioned in pdftex.cfg (in texmf/
pdftex/config). There is no default filename.
Note that the original .map files do not have to be overwritten, it is enough to
place the new version into the current directory, thanks to Kpathsea.
2. appending a line describing the font to the .map file. This line is of the form:
hTEX-font-namei hPostScript-FontNamei <hPFB-file-namei
Example:
ecrm1000 TeX-ecrm1000 <fcrm1000.pfb
The hPostScript-FontNamei can be read from the .pfb file after /FontName, but in
the case of TEXtrace it is simply the word TeX- prepended to hTEX-font-namei.
See the file contrib/ecpfb.map of the TEXtrace distribution.
3. copying the .pfb file to its final, system-wide location. This step isnt mandatory,
the font is also functional in the current directory, thanks to Kpathsea. The most
common system-wide location for Type 1 fonts is texmf/fonts/type1/*.
4. testing whether the utility really finds the .pfb file by running it on a sample
document and checking the console output whether it includes the .pfb file.
VX]X-X]XW
Alternatives
The method used by TEXtrace isnt ultimate, there are many other possible solutions.
13
Respects of comparison
(F)Final quality of the output. This is the most important respect. Legibility and
beauty must be considered in both low (screen) and high (printer) resolutions.
The converted font must be accurately reflect the original, there should be no
visual difference, not even in the smallest corners.
(C)Compatibility. The output must be compatible with both document preparation
and reader software.
(S) Size. Although the capacity of hard disks increase rapidly nowadays, font files must
be kept small because they might be loaded into printers very limited memory (even
as few as 512 kb). The font sizes also affect download time of the PDF document.
(A)Amount of human effort needed. The more automated the process is, the better.
Human effort is expected to be much for TEX fonts, because they are available in
many (often 10 or more) design sizes and many styles.
Possible approaches
9
Avoid conversion by using a virtual fonts to get a mixture of fonts that are already
available in Type1 format. (See the subsection about EC fonts above.)
The most serious limitation is that this approach is useless if no similar font
is available in Type 1 format. (F), (C), (S): no problems. (A): it is not difficult to arrange that metrics and displacements in the virtual fonts are generated
automatically.
The AE, ZE and ZD fonts use this method.
Design new fonts parallel in METAFONT and Type1. Develop a system that glues
existing tools to aid parallel font development by allowing generation of both fonts
from the same (new) source file (see [4]).
The most serious problem is that existing fonts cannot be converted this way.
Modify METAFONT somehow to output Type1 fonts (or vector outlines) instead
of bitmaps.
Modifying the mf engine itself would require too much effort. Another way is to
modify only the output generation process of METAFONT by adding macros and
.log file processing scripts. Such utilities already exist (for example: mf2ps by
Daniel Berry and Shimon Yanai and the MF-PS package by Boguslav Jackowski),
but they no longer work, cannot comply the restrictions of the Type 1 format or
work only for .mf files specially written for using them.
Interpret and execute .mf files directly, suppressing METAFONT altogether.
This would be as hard as to completely rewrite METAFONT from scratch having
Type 1 output capability in mind, because METAFONT is a full, complex programming language and most .mf files make use of this complexity. A faithful rewrite
of METAFONT is really hard because there are too many (dirty) tricks in both the
language and the current implementation.
Post-process METAFONTs output.
Effectively, post-processing means tracing. This is what TEXtrace does. (F):
font quality is far from the best, because of hinting information is missing. Traced
14
szabo
fonts looks almost always ugly in low resolutions, unless (H) a hinting is inserted
manually, using a great deal of human resources. (S): font size can be reduced
with sophisticated tracing techniques, possibly exploiting the curve representation
model of the renderer. AutoTrace isnt so sophisticated, so it generates somewhat
large fonts.
9
Hinting mechanism is much better, fonts are eye-pleasing to read even on screen
and without antialiasing.
It uses second order curves instead of third order (Bezier-) curves. Second level
curves approximate the outline less precisely, so more control points are needed to
achieve the quality of Type 1. This results larger file sizes.
For our intentions, neither of these respects is really relevant; the two font formats
provide an equally good solution for the problem. However, considering compatibility
issues, Type 1 turns out to be more adequate: it is older than TrueType, and more
TEX-related utilities support it. Moreover, Type 1 has no dialects, can be represented
in plain text canonically, and is easier to assemble and disassemble.
VX]X-X]XW
15
Conclusion
For those dealing with computers, Ugly Bits are experienced in everyday life, not just
in memories. European TEX users have suffered for years because of an Ugly Bit in
font inclusion to PDF documents. With the appearance of TEXtrace, the problem
became less severe (and documents became much less ugly).
Font conversion retaining the quality of the original is almost as tough as designing
new fonts. For best results, both of them must be done manually. However, handconverting hundreds of similar fonts is an enormous work, needs expert level knowledge
and isnt really inspiring. Probably thats the reason why most of the TEX fonts
werent available in Type 1 format despite of the great need for it. (It is clear that the
conversion in high quality is possible.)
TEXtrace is an automatic and generic converter. It doesnt provide the best quality
theoretically achievable in todays technology and human resources, but its output is
good enough for most purposes. The three most important benefits of TEXtrace are
that it works for all TEX fonts (with a few technical exceptions), it runs completely
automatically, without any human intervention, and it is free.
Due to the reasons mentioned above, professional quality Type 1 variants of all
popular TEX fonts are not expected to appear for years. Until this happens, fonts
generated by TEXtrace will be the best alternative.
YP-.QY
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
r
George Williams. PfaEdit A PostScript
Font Editor, Users manual. http:
//pfaedit.sourceforge.net/overview.html, 2000.