Chung-Hwa Buddhist Journal (2012: 25:149-166)
Taipei: Chung-Hwa Institute of Buddhist Studies
中華 學學報第二十五期 頁 149-166 (民國一
ISSN:1017-7132
零一年),臺
:中華
學研究所
ISSN:1017-7132
A Relational Database for Text-Critical Studies
Wojciech Simson
Univ. of Zürich
Abstract
After a brief introduction to the scope of a project digitizing the Confucian Analects and a
short explanation of the working principles of relational databases in general the
architecture of the actual relational database used in the project is outlined. The database
was designed to store, classify, compare and sort textual variants and to handle a
considerable number of textual witnesses in such a way that strains of transmission could
be compared with one another. Some attention is also paid to the handling of problems
typical for manuscripts like illegible or doubtful characters, lacunae and non-standard
characters that are not included in the Unicode standard. The database is further enhanced
by a tagging system allowing to classify and to analyze different types of variants. Finally
an evaluation of the whole system and suggestions for its further development are given.
Keywords:
Digitization, Relational Database, Textual Criticism, Stemmatology, Lunyu
150 Chung-Hwa Buddhist Journal Volume 25 (2012)
經文鑑別研究之關係資料庫
Wojciech Simson
蘇黎世大學
摘要
在簡短的介紹完數位 孔子 論語 專案的概況及對於關係資料庫的一般工 原則
之說明後, 計畫所使用的關係資料庫架構得由 可見其輪廓
資料庫用以設計
為貯存 分類 比較並排序文本的差異,同時處理相當數量文本見證,依 ,不同
的傳承系譜能夠有所比較 另外也關注一些處理寫本上的典型問題,例如難辨認的
或可疑的字元,脫漏及未被包含在 Unicode 標準的非標準字元
資料庫更進一
由標籤系統加強,可以歸類及分析差異的不同型態 另外, 文也提供一完整系統
評估及進一
展的建議
關鍵詞:數位
關係資料庫
經文鑑別 文獻系譜學
論語
A Relational Database for Text-Critical Studies 151
Introduction
Whereas most papers in this volume deal with the digitization of East Asian texts by
means of mark-up languages, the following project is quite different as to the digitization
method and the kind of text that has been digitized. The text in question is not Buddhist but
Confucian and is no other than the well known Analects or Sayings of Confucius (論語),
and it was digitized not in a mark-up language but in a relational database. I think,
however, that there is quite a bit of common ground: Among the copious textual
witnesses of the Analects incorporated into the database there were, among others, more
than 70 fragments of Tang time manuscripts stemming mostly from Dunhuang and partly
from the ancient city of Gaochang near the modern village of Astana in Xinjiang province.
They are very similar in age and in provenience to the Chan texts. The problems with
digitization are, therefore, similar: We regularly have to deal with variant characters some
of them not to be found even in the largest of dictionaries, we have many textual variants,
hardly legible or even illegible passages, and large lacunae. These features are even more
prevalent in the Analects manuscripts than in Buddhist texts, because the copies of the
Analects were produced not by accomplished scribes but by children who underwent an
elementary education in a more or less public school that must have been integrated into
the monastery of Dunhuang. The Analects manuscripts were never intended to be treated
as holy script that was to be preserved for future generations in a library. They look rather
like the wastepaper left over from the school’s daily practice. Due to the very frequent
scribal errors committed by the young students the Analects manuscripts have to be treated
with great caution as witnesses of the ancient text, nevertheless as very important witnesses,
because they antedate by several centuries the earliest extant printed editions on which the
textus receptus is based. Moreover, the Dunhuang manuscripts represent strains of
transmission that are clearly distinct from the printed editions and, therefore, of great textcritical interest. The very frequent corruptions on the manuscripts were not regarded as a
deficiency of this material but, on the contrary, came into a special focus of interest.
Scope of the Project
The primary goal of the project was not to produce a critical edition of the Analects, as
might be expected from what has been said so far, but to provide the necessary material
and methods for such a task. The scope of the project was therefore:
1) To gather the relevant material and to arrange it in a most flexible way for further
investigation.
2) To study the mechanisms of textual corruption, i.e. to determine the conditions
under which certain corruptions occur and, where possible, to establish rules that
would enable a textual critic to discern original readings from errors. For this
purpose it turned out to be a great advantage that the elementary students in
152 Chung-Hwa Buddhist Journal Volume 25 (2012)
Dunhuang and Astana had produced a great amount of very obvious scribal
errors that could not be regarded as valid textual variants but made it possible to
determine with great certainty which reading is original and which a corruption.
Without a clear and reliable identification of the original readings and the errors
respectively it would have been simply impossible to develop an adequate
understanding of the corruption process.
3) In the third place, one of the aims of the project was to test the applicability of
1
stemmatology to Chinese manuscripts. Stemmatology has been a major and in
certain cases extremely efficient and reliable text-critical tool in bible studies and
classical European philology.
2
4) Finally, the project resulted in a comprehensive textual history of the Analects
separating and describing the main strains of textual transmission and discerning
within these strains the dependent from independent textual witnesses.
The Relational Database Approach
Most contributors to the present volume are concerned with the digitization of
manuscripts, i.e. they produce digital representations of manuscripts that can be
reproduced and read in standard Chinese characters and can be electronically searched or
processed otherwise on a computer. Features of the manuscript that might be of interest
but cannot be represented in standard Chinese characters like lacunae, uncertain readings,
emendations and many others are usually represented by means of a mark-up language, a
versatile and extendable code that has been designed to describe such features.
The present project was not so much concerned with the manuscripts themselves but
with their differences. From the beginning it seemed to be a detour to digitize every
single textual witness of the Analects. Though individual digital representations of the
many text witnesses could have been easily produced by introducing their variants into an
already digitized version of the text, these variants, however, would have been to be
sieved out again from these digitized versions by collating them again by means of a
specialized software. The whole procedure would have been very susceptible for input
and processing errors, compatibility problems etc. It seemed, therefore, more straightforward to store only the variants right from the start. To store them in a relational
database allowed to maintain the data in a relatively flexible form that allowed further
modifications and, most important, could be searched and sorted according to various
criteria.
1
2
Cf. Simson (2002).
Simson (2006).
A Relational Database for Text-Critical Studies 153
Relational Databases
For those not familiar with relational databases a short outline of their underlying
working principles is given in this section and may be skipped by those who are well
acquainted with them.
Relational databases were not developed to store whole texts of an unlimited length.
They are mainly designed to store and handle different types of standardized information
with a well defined length. Each type of information is assigned a so called field and
types of information that belong inseparably together are stored together in the same table.
In a bibliographical database, for example, we would group the title of the book together
with its publishing year in the same table, and for each physical book we would always
have one data set with the same structure.
Books
Title
Year of Publication
Figure 1: Books table
There exists, of course, other very important information about the book, as the author or
the publishing house, which is, however, better stored in separate tables. The reason for
this splitting up of tables is that one and the same author may write several books as well
as one and the same publisher will publish many different books. By separating the books
and publishers from the book titles we need to store each piece of information only once.
This saves a lot of input time and avoids typing errors because we don’t have to retype
the name of the author each time a book of his is entered into the database. Moreover,
when the data of the author or publisher has to be updated or corrected, we have to
change it only once. Ideally the database is built in such a way that it contains no
redundant or contradictory data sets, this state is also called data consistency.
The different tables into which the information about the books in our bibliographic
database has been subdivided must be related to one another, otherwise we will never find
out which book belongs to which author or publishing house. This is achieved by keys.
Authors
Authors
AuthorKey
Author‘s First Name
Author‘s Surname
1
Books
m
AuthorKeys
1
Title
Publishers
Publishers
PublisherKey
n
PublisherKey
Year of Publication
Figure 2: Authors, Books and Publishers tables
Publishing House
Place of Publication
154 Chung-Hwa Buddhist Journal Volume 25 (2012)
Keys are, in most cases, integer numbers generated by the database system automatically.
Each author, for example, is assigned a number and this number is stored not only in the
respective author’s data set but also in the Books table to indicate the author of a book.
While the AuthorKey is unique to the Authors table it can be stored in the Books table an
unlimited number of times. This is called a one-to-many relation and is the mathematical
representation of the fact that in real life one author may write several books. The
Publishers are linked to the Books table accordingly. The relation between the Authors
and the Publishers is called a many-to-many relation, and represents the real life situation
where the same publishing house publishes books by various authors and the same author
publishes his books in different publishing houses. Such many-to-many relations imply
always the use of an intermediate or pivot table and can never be established between two
tables immediately.
A relational database system provides not only the keys and safeguards their
consistency but it is, furthermore, able to maintain a powerful indexing system which
allows one to search millions of datasets within fractions of a second.
Chopping up the Analects
After this short introduction to relational databases it’s time to ask how the text of the
Analects covering around 15,000 Chinese characters can be stored in the limited fields of
such a relational data structure. As already mentioned, it needs not to be stored there at all,
because what is in the focus of interest is not the text as a whole but its variants.
Collecting the variants alone, however, makes little sense without knowing to which
place in the text they belong. It is, therefore, necessary to refer to text passages in an
unambiguous way, and to do this it is necessary to lay a grid of coordinates over the text.
The Reference System
The grid follows the conventional way of referring to passages of the Lunyu, and
moreover, has to be further refined as to be able to point unambiguously to short passages
or even single characters within the chapters.
Traditionally the Lunyu is divided into 20 books (篇) and each book is further
subdivided into chapters (章). Most of these chapters contain one saying of Confucius’
and cover not more than a few dozens of characters. Both the books and the chapters have
a conventional numbering generally accepted among western scholars. This reference
system is taken over in the database and further refined by numbering the characters
within each chapter. Because different versions of the text differ slightly in the total
number of characters they contain, it is therefore necessary to stick to a certain text
version to maintain the reference system unambiguous. This is an easily available
electronic version of the textus receptus. This leads to the following data structure:
A Relational Database for Text-Critical Studies 155
Chapter
Chapters
1
Passages
m
ChapterKey
ChapterKey
ChapterKey
ChapterNumber
Start
Text
End
Commentaries etc.
Figure 3: Chapters and Passages tables
The Chapters table contains all the chapter numbers of the Lunyu, 01.01 being the first
chapter of the first book and so on. The reference text is included for practical reasons,
but strictly speaking, it is but a help for the user and not an indispensable part of the
database. The Passages table contains, of course, all the passages to which variants are
found. Because a passage consists very often of only one character and the same character
or even short phrases can possibly reappear several times within the same chapter, it is
essential to store the beginning and end of the passage in the table. The wording of the
passage is stored in the table too, but, strictly speaking, it could be also discarded as
redundant information.
No overlapping passages are allowed. Otherwise consistency in the overlapping
sections of the passages would be very difficult to maintain. Some other information, like
transmitted commentaries referring to the passage, is also stored in the Passages table.
They are skipped here, however, to keep the focus on the essentials.
The two tables are connected with one another by a one-to-many relation with the
Passages table on the many side. This is the representation of the simple fact that there
can be more than one passage within one and the same chapter.
The Variants
Having built a reference system that enables us to localize the variants within the text we can
proceed to collect the variants. Of course we will have to attach a further table. It is linked to
the Chapters table by a one-to-many relation, because there is always more than one variant
for a certain passage. It is, of course, essential to know where this variant was found.
Therefore we have to introduce another table storing all the textual witnesses of the text.
Passages
Chapters
ChapterKey
1
m
PassageKey
ChapterKey
Variants
1
m
PassageKey
WitnessKey
Witnesses
m
1
WitnessKey
ChapterNumber
Start
Reading
Description
Text
End
…
…
…
Figure 4: Chapters, Passages, Variants, Witnesses
156 Chung-Hwa Buddhist Journal Volume 25 (2012)
One may ask here, why the Witnesses table is related to the Variants table by a one-tomany relation with the Variant table on the one side. One and the same witness bears
usually not only a great number of variants referring to different passages, but the same
variant is very often found on several witnesses. This seems to be a typical many-to-many
relation. Basically this is true and it is possible to build the database in such a way. The
system would thus just store each variant only once and ignore the readings that coincide
with the textus receptus. This would make the data less redundant and more consistent.
For text critical purposes it is, however, more convenient to have all the readings right at
hand and not to have to determine first if a witness that is not listed among the variants
has the same reading or no reading at all and has to be, therefore, counted as a lacuna.
The Variants table stores the readings of all the textual witnesses whether they contain a
deviation from the textus receptus or not. This entails a lot of redundant data but is more
practical for text critical comparisons and for the presentation to the user who gets a good
overview over all extant readings (see figure 5). It is, moreover, much easier to write
queries with such an arrangement of data than with a mathematically more consistent one.
To the user the hitherto established data structure is presented as follows.
book
and
chapter
variant
type
commen
-taries
notes
passage
in the
textus
receptus
textus
receptus
witness
variant
reading
Figure 5: Main window of the user interface
A Relational Database for Text-Critical Studies 157
The Representation of Lacunae, Doubtful Readings and Rare Characters
Two kinds of lacunae are differentiated: One type is represented by a black upright square
(■). This is used when the number of lacking characters can be counted. This is the case
with block prints, stone steles and Japanese manuscripts which have a regular number of
characters per line. In cases where the number of missing characters cannot be
determined a twisted black square ( ) is used. The reason for this differentiation is that
in some cases the witnesses show variants that differ in the number of characters. In such
cases we can still decide which version was followed by a certain witness if we count the
missing characters.
Illegible characters are represented with an upright outlined square (□) when the
illegible characters can be counted and by a twisted outlined square ( ) when they can
be not. In doubtful readings every single character is put in brackets (
).
Another symbol, a black circle (●), is used to represent characters that are lacking
from one version while are present in others. Strictly speaking, such a symbol is
unnecessary, but it is much more conspicuous than just the lack of a character. This is of
some practical importance when you have to scan tens and hundreds of variants (cf.
figure 5).
Rare characters that are not included in the Unicode standard put a very serious
difficulty to any attempt to digitize Chinese manuscripts. One rather elegant method to
treat them is simply to extend the Unicode standard and to add these new characters to the
Chinese font on the computer. Thus the non-standard characters can be processed by the
computer without difficulties. The Lunyu-database takes, however, a different and more
complicated approach.
Each character beyond the Unicode standard is put in braces ({}) which contain
the assumed standard form of the character. In many cases this is not enough to identify
the character in question unambiguously, because there often exists more than one scribal
variant of one and the same character. Therefore, a special table with all the scribal
variants was attached to the Variants table where all the characters beyond the Unicode
standard are stored together with their pronunciation and a description of how they were
written.
Example: The scribal variant 扵 for 於 is stored as { 於 } together with a
description such as “扌on the left + 仒 on the right hand side” in a separate table. At first
glance the user sees only the form in braces{於}but can open by double clicking on the
character the auxiliary form and read the description.
Though such a complicated treatment of rare characters ensues considerable
programming effort to process the data correctly, it has also some advantages to offer.
When comparing textual witnesses of the text in order to establish a genealogical tree of
the textual witnesses one is usually not interested in orthographical variants, because they
mostly give no reliable hints towards the lineage of manuscripts but can be used by
158 Chung-Hwa Buddhist Journal Volume 25 (2012)
scribes at random. One is looking for so called significative errors instead, because they
are, unlike most orthographical variants, irreversible and therefore can be interpreted as
traces of the transmission process. By simply ignoring the braces in a query most of the
ubiquitous scribal variants are ignored too and this helps a lot to focus on those variant
readings that could be useful for stemmatological investigations.
Classifying the Witnesses
We have already several times touched upon the problem of tracing the lineage of textual
witnesses. This is necessary in order to apply the stemmatological method which is able
to decide between variant readings on the basis of their position on the pedigree of textual
transmission and not on interpretative criterions. In order to establish the pedigree or the
so called stemma it is very useful to have a reference system for the textual witnesses that
makes it possible to group together witnesses that possibly belong to the same strand of
transmission in order to check them for uniformity or to compare them against other
branches of the pedigree. Though the result looks very logical and simple it took some
time and several revisions of the data structure to establish a classifying system for the
textual witnesses that is simple and efficient enough to satisfy these needs. The witnesses
are classified according to three main criteria:
The Main Lines of Descent
The earliest historical report of the transmission of the Analects makes a distinction
between three main traditions in which the Lunyu appeared in Early Han times. Namely the
L – (Lu 魯論) Tradition from the ancient state of Lu, the native state of
Confucius.
Q – (Qi 薺論) Tradition from the ancient state of Qi.
G – (Guwen 古文論語) Tradition in ancient script found in the wall of
Confucius’ house during the Former Han.
None of these three has survived as an entire text to our days, but some readings could be
identified as belonging definitely to the Guwen tradition and some fragments of two lost
chapters of the Qi tradition have been handed down to us in commentaries and
encyclopedias. All transmitted text versions of the Lunyu go back to one major collation:
LQ – A text based on the Lu version into which Qi readings were
introduced by Zhang Yu, the Marquis of Anchang (張禹 † 5 BC),
during the Former Han.
A Relational Database for Text-Critical Studies 159
Another mixed version had to be introduced for the sake of a single but very important
textual witness:
LG – The text of a bamboo manuscript found in modern Zhengzhou dating
back to the first half of the first century BC.
The Commentarial Traditions
It can be expected that the text of the Lunyu was transmitted together with the
commentaries which were written down together with the text on the same paper scroll
from the second century AD on. By distinguishing the commentarial traditions we can
also separate the lines of transmission. In analogy with the main lines of descent these
commentarial traditions are abbreviated with the initials of the commentator’s name. The
earliest extant commentaries stem from the end of the second and the beginning of the
third centuries AD:
ZX – Zheng Xuan (鄭玄 127-200)
HY – He Yan (何晏 190-249)
The He Yan commentary was subcommented three times:
HK – Huang Kan (皇侃 488-545)
XB – Xing Bing (邢昺 932-1010)
LDM – Lu Deming (陸德明 550?-630)
Two other later commentaries are included too, because they also contain variant readings
that have to be taken into consideration:
HL – Han Yu (韓愈 768-824) and Li Ao (李翺 772-841)
ZZ – Zhu Xi (朱熹 1130-1200)
There is also a handful of manuscript fragments, one print and all the stone steles that
don’t carry a commentary, though they are all clearly derived from He Yan’s version.
These are abbreviated as BW for baiwen ( 文) or plain text.
Types of Witnesses
Apart from the commentaries a further, rather vague, distinction was introduced to
account for differences between straits of transmission due to geographical diversification:
dh – manuscripts from Dunhuang (敦煌)
gc – manuscripts from the ancient city of Gaochang (高昌)
np – for Nippon (日本), prints and manuscripts from Japan
160 Chung-Hwa Buddhist Journal Volume 25 (2012)
Some other types of witnesses were introduced to distinguish groups of witnesses that
show typical problems originating from different methods or circumstances of
reproduction:
zj – for zhujian (竹簡) or bamboo slips
kb – for keben (刻本) or Chinese block prints
sj – for shijing (石經) or the stone classics erected by various dynasties in
front of the imperial academy
Moreover, quotations of the Lunyu found with early authors are marked by the initials of
the respective author.
Texts having all three criteria – i.e. descent, commentator and type – in common are
further differentiated by adding a consecutive number to the abbreviation. Thus a short
but meaningful label for every witness is provided.
descent
commentary
type
consecutive number
lqHYdh01
Figure 6: Labeling system
“lqHYdh01” for example denotes the first manuscript from Dunhuang with the He Yan
commentary which goes back to the collation of the Lu and the Qi versions of the Lunyu.
In queries single elements of such a caption can be skipped. We can refer to the whole
group of Dunhuang manuscripts as “dh” or to the subgroup of Dunhuang manuscripts
with the Zheng Xuan commentary as “ZXdh” and so on. This system allows us to build
very precise queries that are able to pinpoint the data we are looking for.
A Relational Database for Text-Critical Studies 161
Represented in database terms, this results in a four table structure:
Descents
1
Witnesses
WitnessKey
DescentKey
CommentaryKey
TypeKey
DescentKey
Descent
n
n
1
Commentaries
CommentaryKey
n
Commentary
ConsecutiveNo
Description
…
1
Types
TypeKey
Type
Figure 7: Witnesses with related Descents, Commentaries, Types
The Witnesses table is also used to store further information about the witness like a
description of its physical appearance or which parts of the text are covered by the
witness.
Organizing the data in such a way provides not only a unique label for each textual
witness which can be used to refer to the witness for example in an apparatus criticus,
but it also makes it possible to bundle or separate single strands of transmission and to
compare them with one another, which is, of course, a necessary procedure when
establishing a genealogical tree.
Classifying the Variants
Apart from the stemmatological investigation of the Lunyu tradition one major aim of the
project was to provide a tool that could be used for the study of textual corruption.
Understanding the mechanisms of corruption is a precondition for what every textual
critic is aiming at, namely emendation. To understand these mechanisms it is necessary to
sort out certain types of variants and to study them in their specific contexts in order to
discover similarities that could be the reason for textual corruption or regularities that
could help us to establish rules of emendation. What is needed here is a versatile and
handy sieve for textual variants. Because the computer is but a stupid machine that
processes mechanically binary data without even the slightest idea of what they are
standing for it cannot be expected to sort the variants in a useful way, unless we
implement some of our own intelligence into the machine. This is achieved by attaching
162 Chung-Hwa Buddhist Journal Volume 25 (2012)
at least one label to each passage. This label indicates the type of variant that is found
among the various witnesses of the text that cover the passage in question.
VariantTypeLink
Passages
PassageKey
1
m
PassageKey
VariantTypeKey
ChapterKey
m
Start
1
VariantTypes
VariantTypeKey
Description
End
Figure 8: Passages, TypeLink, VariantTypes
As can be seen from figure 8, the relation between the passages and the variant types is
many-to-many. This is because a potentially unlimited number of variants can occur for
one and the same passage. Moreover, one and the same variant can be classified in more
than one way for different purposes. On the other side of the many-to-many relation the
same type of variant can occur in different passages. The intermediate VariantTypeLink
table is necessary to link the Passages with the VariantTypes in a many-to-many relation,
because only one-to-one and one-to-many relations are allowed between two tables.
What types of variants are distinguished then? – At first a very formal classification
system for the variants was devised that does not contain any interpretative criteria, i.e. it
does not suggest which variants are to be interpreted as errors and which as original
readings.
Variants
quantitative
doublings
實詞
others visual
similarity
虛詞
transpositions
qualitative
phonetic
similarity
semantic
similarity
kinetic
similarity
proper names
more subcategories and non-formal criteria
Figure 9: Hierarchy of variant types
The main distinction is between quantitative and qualitative variants and transpositions.
Quantitative means that there are differences in the number of characters between the
variant readings of a certain passage. These quantitative variants are subdivided into
A Relational Database for Text-Critical Studies 163
doublings of single or more characters and others. These others can consist in nonrepetitive differences of only one character or of whole phrases. The differences in single
characters are further differentiated into those concerning particles and those concerning
other words and so on.
The qualitative variants, on the other hand, describe variant readings differing not in
the number of characters but in the characters themselves. This category comprises a long
series of subcategories of which I want to mention only the major ones: Variants between
characters that have a phonetic, visual, logical or kinetic similarity. As the investigation
proceeds the need for more differentiation grows constantly and new categories and
subcategories can be easily introduced into the system. Subcategories are usually defined
in such a way that the label of a generic category is extended. The generic category called
“quantitative” (variants) is extended into two subcategories “quantitative/doubling” and
“quantitative/others”. The former is extended into “quantitative/doubling/character” and
“quantitative/doubling/phrase”. The former can be further differentiated into “quantitative/
doubling/character/particle”, “quantitative/doubling/character/proper name” and so on. In
queries wildcards can be used to refer to a generic category or to a whole group of
subcategories. For example, we can search the database only for “*particle” variants and
we will get all variants that concern particles, be they quantitative, qualitative or
transpositions. Based on the results of this query we can make a statistic analysis of the
types of variants that occur with particles and so on. The whole tagging system is very
versatile and easily expandable in order to cope with new questions.
Evaluation of the System
Capabilities
The Lunyu is probably the best evidenced secular text in ancient Chinese literature. This
means that we have not only a large amount of early textual witnesses at our disposal, but
that these witnesses have also a great diversity in age, geographical distribution and
commentarial tradition. Moreover, most of the many thousands of variant readings that
could be collected in the database are obvious errors. To put it the other way around: In
most cases we know with great certainty what the correct reading has to be. We have
therefore a very rich source of material that allows us to study under which conditions
errors occur and what shape they take. Such a study would provide an empirical data
basis for a methodology of textual criticism of ancient Chinese texts. Such an empirically
based methodology would be of great importance for every one dealing with ancient
Chinese texts. Scholars rely on traditional Chinese emendation strategies and concepts of
textual corruption instead. Some of their assumptions cannot be corroborated by
empirical data at all or seem at least extremely farfetched when tested against real life
material.
164 Chung-Hwa Buddhist Journal Volume 25 (2012)
3
An empirically grounded methodology, though not yet fully materialized, was a
major goal of the database project. The afore mentioned labeling system of variant
readings is the main device for such investigations. It allows us to extract variants of a
certain type from the database in order to provide the necessary data for the study of
textual corruption. We could focus for example on variants that show phonetic
similarities and investigate which degree of phonetic similarity can occur. We can even
easily confine our selection of phonetic variants to those coming from Dunhuang in order
to investigate the local dialect spoken there in the middle ages. The whole classification
system of variants can be adopted very easily to satisfy different needs.
The combination of the two classification systems for textual witnesses and variant
categories respectively proves also to be a very powerful tool when it comes to trace the
lines of transmission. The database can answer questions like “What variants have the
Dunhuang manuscripts in common with the Japanese tradition that are not shared by
other witnesses?” The result of such a query can be further confined to potentially
significative variants by applying the variant categories to it. To sieve out and to evaluate
such variants is the essential task of the stemmatological approach in textual criticism.
Strictly speaking, the database does not provide us with the desired variants
themselves, but sieves out all the passages to which a certain type of variants can be
found in a certain group of witnesses. For each passage we get therefore a long list of
variant readings that we have to sieve again manually to get really at the variants we are
looking for. This may look somewhat painstaking at first glance, but variant readings
exist only in contrast with other readings and have to be viewed together with them in
order to be understood as variants. We need, however, to accept that there may be several
distinct variants to the same passage and that the result of a query may contain also some
undesired material that we have to sort out manually. Though more than 130 witnesses
were incorporated into the database this has never become really a problem.
To sum up we can say that the relational organization of the variants provides a
capable tool for the study of textual corruption and stemmatological investigations. It
stores also all necessary data for a critical edition. The text critical work can be even
supported by adding notes and commentaries to the passages.
Problems
At this point it has first to be mentioned that the Lunyu database was never intended to be
published or to be used by other people than the author himself. It has always remained a
never ending construction site that was modified gradually in order to cope with uprising
3
Partial results will be published in Simson (2013).
A Relational Database for Text-Critical Studies 165
questions and a shifting focus of interest. As with all software, a long list of known
problems has to be added:
● A major inconveniency when processing Chinese characters with computers is that
computers do not provide a useful sorting order for Chinese characters. At best
they can be arranged according to their position in the Unicode code table which
roughly follows the stroke number of the characters. An arrangement according to
pronunciation or radical for example would be much more useful for most practical
purposes. This is, however, not a problem of the database approach but of
computing in general.
● As some readers may have noticed already, the database makes also use of markups. The handling of rare, doubtful or illegible characters with their brackets and
braces involves a special handling of such mark-ups that has to be implemented
into the database. This means a lot of programming effort and a slowdown of the
whole database because each time characters are processed the character string
has to be checked for mark-ups. Moreover, the routines handling the mark-ups
are compiled at run-time and this makes them much slower than precompiled
programming code.
● Most variants of the text cover only one character and such one character variants
can be handled easily by the system. Variants spanning over whole phrases are
sometimes more cumbersome, especially when there is more than one variant to
the passage and each has developed its own subvariants. The representation in the
database becomes rather intricate and the results of queries contain a lot of
redundant data that have to be sorted out manually. This has never become a real
problem in the project, but the system would have serious difficulties to tackle
texts that regularly differ in longer passages of several sentences in length.
● A special difficulty was the handling of lacunae. They were treated like a special
sort of variants and required an even larger amount of programming than the
mark-ups mentioned before.
● As already mentioned, the database is organized in such a way that the variant
readings for a certain passage are stored for each witness separately. This
involves a lot of redundant data, because the same variant reading is usually found
on several witnesses. Moreover, it takes a lot of programming and computing time
to maintain consistency among this redundant data, when manipulating them by
introducing new witnesses or passages. A more consistent data structure should be,
therefore, considered for the further development of the database.
166 Chung-Hwa Buddhist Journal Volume 25 (2012)
References
Maas, Paul. 1950. Textkritik (2.verbesserte Auflage). Leipzig: B.G. Teubner
Verlagsgesellschaft.
Pasquali, Giorgio. 1988. Storia Della Tradizione e Critica del Testo. Firenze: Casa Editrice
Le Lettere.
Reenen, Pieter van; Mulken, Margot van, ed. 1996. Studies in Stemmatology.
Amsterdam/Philadelphia: John Benjamins Publishing Company.
Reenen, Pieter van; Hollander, August den; Mulken, Margot van, ed. 2004. Studies in
Stemmatology II; Amsterdam/Philadelphia: John Benjamins Publishing Company.
Simson, Wojciech Jan. Applying Stemmatology to Chinese Textual Traditions. Textual
Scholarship in Chinese Studies. Ed. Vogelsang, Kai. Papers from the Munich
Conference 2000; Asiatische Studien/Études Asiatiques 2002/3. 587–608.
Simson, Wojciech Jan. 2006. Die Geschichte der Aussprüche des Konfuzius. Bern: Peter
Lang.
Simson, Wojciech Jan. 2013 (forthcoming). Contaminations in Chinese Manuscripts. The
Idea of Writing – Lapses, Glitches and Blunders in Writing Systems. Ed. Behr,
Wolfgang; Voogt, ALex de; Leiden: E. J. Brill.
Chung-Hwa Buddhist Journal (2012, 25:167-194)
Taipei: Chung-Hwa Institute of Buddhist Studies
中華 學學報第 十五期 頁 167-194 (民國一百零一年),臺
ISSN:1017-7132
中華
學研究所
Digital Editions of Premodern Chinese Texts: Methods
and Problems – Exemplified
Using the
1
Daozang Jiyao 道藏輯要
Christian Wittern
Kyoto University
Abstract
Digital editions do have a great potential for new avenues of research, but they also pose
vexing research questions that have to be resolved adequately in order to make the
resulting edition useful in the long run. One of the many differences between printed
editions of texts and digital editions is the open-endedness of the latter, which means that
it can be done incrementally and updated without incurring substantial expenses. The
medium of digital editions requires the creator to make many assumptions about the texts
explicit and record them in a way that can be processed automatically. This is a new
concept, which seems foreign to the agenda of a scholar whose ultimate aim is to engage
with the text. This article demonstrates that what seems like a detour is actually
advancing the understanding of the text and the need objectify a text in this gives access
to new dimensions of a text. It then goes on to provide details of a conceptual model for
describing a premodern text digitally that has been developed working on a digital edition
of the early Qing Daoist collection Daozang jiyao.
Keywords:
Text Encoding, Digital Editions, Character Encoding, XML, Doaist Studies
1
I would like to thank the anonymous reviewers for this journal for their very helpful
suggestions for clarifications and general improvement of the article.
168 Chung-Hwa Buddhist Journal Volume 25 (2012)
前現 漢語文本的數位版本 方法與問題—
道藏輯要 為例
維習安
京都大學
摘要
數位版本對於研究的新方向有著極大的潛力,但它們也引起 人困擾的研究問題,
而這些問題必須適當地解決, 使得 版本的成 對長期來說是有所助益的 紙本
與數位版本的許多差異之一是後者的開放性,也就是能夠 需要實質上的花費而增
加或更新 使用數位版本工 需要建立者建構許多有關文本的清晰假設,且能夠
自動執行的方式紀錄 這是一個新的概念,似乎對目標為參與文本的學者之預設立
場 同
篇文章說明那些看起來像是繞道而行,但 實上卻是增進文本理解與客
觀 文本需求的情形,依 開啟進入了解文本的新面向 同時並詳盡地提供說明數
位 前現 文本的概念模式,而 工 是建立在清初的 道藏輯要 之數位版本上
而發展的
關鍵詞 文本編碼
數位版本
字元編碼 XML 道教研究
Digital Editions of Premodern Chinese Texts
169
Introduction
Text transmitted on traditional written surfaces is immediately available and transparent
to the reader, without any additional steps involved. In contrast to this, any text stored
digitally, in whatever format, has to be rendered to the screen (or paper) by correctly
interpreting (decoding) the values of 0 and 1 that have been used to prepare (encode) the
text. Without this correct interpretation, the result of the decoding will be just illegible
garbage that does not make any sense whatsoever.
In order to make this decoding successful, the model, according to which the
encoding was done, has to be known at the time of the decoding. Even more importantly,
as is true for any digital format, the encoding of text into digital format can not be done
without a model of the text. The activity of developing and enhancing a model of the text
thus becomes a crucial, foundational activity, laying the groundwork for the actual
digitization of texts themselves.
The first fundamental decision that has to be made when devising such a model is to
whether to treat the text either just as a series of symbols or as a two-dimensional array of
spots of different color spread out over a flat surface. Descendants of the first type of
model would lead to a transcribed version of a text (an example of a page is shown in
Figure 1), while those of the second type of model would be some kind of facsimile
representation of the text, these will be called digital facsimile (see Figure 2). None of
these representations is intrinsically superior to the other; they do in fact very nicely
complement each other.
Figure 1: An example of a transcribed
text
Figure 2: An example of a digital facsimile
170 Chung-Hwa Buddhist Journal Volume 25 (2012)
If a text is to be used for information retrieval or any other purpose that requires access to
its symbolic content, like for example, text analysis or even the creation of a new version
with a different layout, it has to be encoded in a way that somehow represents the
symbols used to write the text. This requires a reading of the text and is thus always also
an interpretation of the text.
While the transcription of a text as a series of symbols is comparatively
straightforward in most alphabetical languages, the logographic languages of East-Asia
pose specific problems, since exactly this transcription is not a given, but is open to
various interpretations and in fact has to be considered part of the research question. It
thus needs a model that allows to make these interpretations transparent instead of hiding
them in the transcription process, which takes place before the text even gets to the reader.
This paper will discuss models used for such a representation and proposes a new
working model specific for premodern Chinese text.
It might be tempting to try to avoid the whole issue of legacy character encoding and
try to come up with a completely different way to encode characters. One such attempt is
2
the CHISE project , which tries to build a whole ontology of characters and character
information. In the model discussed here, the encoding is based on Unicode, but an
intermediate layer of dereference is introduced as explained below.
In the practice of transcribing primary sources, there is an additional complication
through the fact that there might be more than one witness for a text and therefore a
collation and analysis of textual variants in other text witnesses might be required. The
model will have to be able to account for this.
One last requirement is that it has to be possible to establish and maintain a
normalized version of the text in addition to establishing a copy text faithful to the
original.
Preliminaries and Prerequisites
Before starting to describe the proposed new model, some preliminaries and basic
assumptions have to be discussed. This involves a very brief description of the model
most widely used for transcribing primary sources, but will also involve a brief discussion
of the writing system for Chinese and how its basic properties have been reflected in
today's most widely used character encoding, Unicode.
2
See the CHISE (Character Information Service Environment) project. (http://www.kanji.zinbun.kyotou.ac.jp/projects/chise/)
Digital Editions of Premodern Chinese Texts
171
The TEI/XML Text Model
Text encoding according to the recommendations of the Text Encoding Initiative (TEI) is
today the most widely used format for the creation and processing of texts for research in
3
the Humanities.
In XML, which is the technical basis for the TEI text format, a text is basically seen
as a hierarchy of textual content objects, expressed as a hierarchy of XML elements and
4
attributes , this is the so-called OHCO (Ordered Hierarchy of Content Objects) view of a
text. While this provides a powerful model to deal with many aspects of a text and allows
the definition of sophisticated vocabularies, there are a few problems that are hard to
solve using this model.
One of these problems is that digital texts do in fact require different hierarchical
views, depending on the purpose of the creation and the intended processing of the text.
There are several ways the TEI attempts to solve this problem, one of them being
considering one of the hierarchies in a document as the primary hierarchy (Guidelines,
20.3 Fragmentation and Reconstitution of Virtual Elements). Textual features that do not
nest cleanly into this hierarchy are then arbitrarily split into two (or more) parts. And then
introducing additional notions, that can be used for example to virtually join elements
together, which have been arbitrarily split within the primary hierarchy.
Another way to overcome this problem is by using elements without text content to
indicate points in a text, at which features of the 'other' hierarchy starts. A classic example
for this is the use of milestones in TEI. Since the main hierarchy of a TEI document is
constructed using elements that describe the semantic content of the document (e.g.
5
<body>, <div>, <p>), elements that hold the content of pages and lines can not exist in
the same hierarchy. Pages (and columns and lines; these are all generalized into the
concept of 'milestones') are thus only indicated by marking the point in the text flow
where a new page begins. This makes it possible to work with both hierarchies at the
same time, but there is a tradeoff: It prioritizes one hierarchy, thus making it considerably
more difficult to retrieve the content of a page, as opposed to the content of, e.g. a
paragraph.
3
4
5
It goes without saying that TEI can be used to encode premodern Chinese texts, which is
amply demonstrated for example by the texts produced by the Chinese Buddhist Electronic
Text Association (CBETA), whose latest release had to be put on a DVD, since even in
compressed form, a CD-ROM could not hold the amount of material anymore. The earliest of
these texts are nearly 2000 years old.
See for example Renear&Mylonas&Durand (1996).
Earlier versions of the TEI contained elements <page>, <col> and <line> etc, which could be
used to construct a concurrent hierarchy that reflects how the text was laid down on the text
bearing surface, but these have been removed in the latest release, P5.
172 Chung-Hwa Buddhist Journal Volume 25 (2012)
There is also another difficulty of a more practical nature, that is, through what
procedure the encoded text is created. If text encoding is seen as a process of gaining
insight and enhancing the understanding of a text, this will be a circular process that adds
more information in several passes through the text. What this means is that the
sophistication of the TEI model, while serving the needs of text encoders well in
providing the expressive power to encode the features observed in a text, it puts an
enormous burden on text encoders, wishing to employ the system for their texts. This
seems to be especially true for premodern Chinese texts, where not only the writing
system poses additional difficulties, but there is also usually no indication of paragraph or
sentence boundaries, punctuation; the only given is the text as it is divided into 'scrolls',
pages and lines. For the purpose of this model then, the main hierarchy in the document is
that of the physical representation of the text on the text bearing surface of the witness
that is serving as the source for digitization. As the encoding of the text progresses,
markers of the points of change in the content hierarchy are inserted, thus gradually
bringing this other hierarchy into existence. In some ways this is thus an inversion of the
relationship between these hierarchies as they exist in the TEI model. The following
discussion will be targeted at requirements of Chinese text and no claims are made about
usefulness in other areas.
The model described in this paper is not intended as a replacement for the TEI text
model, but rather as a heuristic, methodological model that allows the creation of a
sophisticated text, most likely as the childhood of a text that will prepare it to spend its
adult life in a TEI environment.
Writing System
The main difficulty with encoding Chinese texts lies in the writing system. Over
thousands of years, the script used to write Chinese texts has evolved and has seen many
changes in conventions, styles and character usage. The result is thus a rich and deep
cultural heritage, which engraves in the writing system memories of a people that values
history and memory in a way few others do, resulting in a writing system that contains an
6
open ended, unknown number of distinct characters . Since the beginning of the 20th
century, there have been attempts at dealing with this problem from a practical side, by
limiting the use of characters in daily life and thus making it possible for the first time to
enable more than a tiny elite to acquire enough knowledge of the writing system to
participate in a modern society based on the written word, be it application forms,
contracts, newspapers or novels.
6
The largest dictionary known to this writer is the Zhonghua zihai, which contains 85000
characters, but the difficulty here is not really the number of distinct characters, but the
question what has to be seen as a character as opposed to a mere variant of another character.
We will return to this question.
Digital Editions of Premodern Chinese Texts
173
The last incarnation of the Unicode character set provides almost 75000 Chinese
7
characters . In this case also the definition of what has to be considered a separate
character changed significantly during the process of defining these, which has been
8
going on more than 20 years .
Although there are now assigned code-points for all characters in daily use and even
most rare characters that appear in historical sources, there are still problems with the
character encoding that are intrinsic to the way it is defined and evolved over the years of
9
its development: unwanted unification and unwanted separation of characters .
Unwanted unification: Especially in the early phase of the development, when
there was only insufficient space set aside and processing memory limited, efforts
were made to unify similarly looking character shapes into one code-point value.
This makes it impossible to refer to just one of the character shapes as opposed to
10
the other character shapes also defined with a given code-point in a universal way.
● Unwanted separation: On the other hand, there are certain code-points that encode
characters of a slightly shape separately; the most famous being 説 (U+8AAC)
and 說 (U+8AAA); the character shapes in many fonts do indeed look identical
for characters in this group, thus making it extremely difficult to consistently only
11
using one of them and avoiding the unwanted other pairs. Another reason for this
is the 'code separation rule' which meant that characters already encoded separately
in one of the character encodings that formed the source of Unicode, these had to
be treated separately.
12
● Inconsistencies, duplications, wrong assignments:
do also exist, but these are
not by design and much less disrupting.
●
While these are annoying problems when dealing with Unicode, it is clear that the
advantages of using a universal encoding for all texts far outnumber the problems
mentioned here. The strategy adopted here is thus not the development or use of a
7
8
With the release of Unicode 6.1 the total count of CJK characters is 74617.
Development of Unicode started with a document (http://www.unicode.org/history/unicode88.
pdf) by Joe Becker of Xerox corporation, published in August 1988.
9 It would be more precise to talk about glyphs here, but what it comes down to in digital text is
code-points.
10 In practice, this can be done by specifying one specific font to be used to represent a character.
Modern font technology also allows fonts to contain several character shapes for one
codepoint and allow a rendering program to select them as needed. There is however no
standardized way to do so across applications.
11 In practice, the only way to deal with this is to preprocess a document with a table that
changes the unwanted member of such a pair into the desired one.
12 See Kawabata (2006) for some examples.
174 Chung-Hwa Buddhist Journal Volume 25 (2012)
different encoding system, but rather a strategy to deal with these problems within and on
top of Unicode. This will be achieved through a character database and the definition of
additional private characters where necessary.
The Process of Encoding a Character
It might be useful here to look a bit more carefully into what exactly happens in the
process of encoding a character, that is transcribing a character from a source text to its
digital equivalent. In an encoded character set, each character that has been assigned to a
code-point can be seen as a kind of platonic, ideal character that stands for any number of
real-world, existing character shapes (glyphs), as we see them on a text bearing surface.
However, it is impossible to design such an encoded character set in a way that each
platonic character is only represented once, since it is in many cases impossible to
unambiguously assign one specific glyph shape to only one character, since it is not only
the shape, but also meaning and sound that contribute to this assignment and all of these
might be dependent on the specifics of area and era as additional conditionals. In the case
of the Unicode/ISO 10646 character set, this has led to a development where more and
more glyphs that had already been represented as members of the set of glyphs
represented by a given character, are now also encoded separately. The result is thus that
a given glyph can be logically represented in several sets.
In such a situation, the process of assigning a character code to a given glyph has to
look for the set of glyphs that as a whole most closely resemble the given glyph, or, to put
it differently, to look for the most specific representation of a given character. If that can
not be found, there are in principle two choices:
To add this glyph (G) to an existing set, encoded by an existing character code (C)
and thus in fact extending the set to accommodate this new glyph.
● To add a new character code (N) to the system, with this glyph as the most
representative of the set of glyphs represented by this character code.
●
The first option makes the assumption G has been recognized as in principle belonging to
the set of glyphs represented by C, which assumes knowledge of G and of the set of
allowed representatives for C. Since the set of allowed representatives for C is an open set,
which is not defined exhaustively in the relevant standards, but only by giving a sample
of such representatives, this decision has to be made case by case and can not be
13
generalized . The second option does not require any knowledge of the character beyond
13 Text encoding is in this respect more of an art than an exact science in that many decisions
depend on the encoder. This can and should be made less arbitrary than this sounds by
recognizing this fact and define a policy as to what exactly should the set of represented
glyphs be. The first step to this could be for example to use a specific reference font and
define what kinds of deviation from the glyphs used in this font are allowable. Such
Digital Editions of Premodern Chinese Texts
175
this glyph and is the only one available if nothing more is known about this character.
The downside is of course that this new character is not integrated into the network of
implicit knowledge that is already in the system, through system level character
properties and/or a database. It would therefore be wise to provide also a way to add such
information together with the character.
Figure 3: The semantic fields around the character 保 according to the HYDCD
Given this situation, information about the relationship between the characters in the
character set has to be maintained. Different types of such relations have to be
distinguished.
On the one hand, characters can be seen as mere variants of each other, serving
essentially as a replacement for each other. More often, however, such a relationship
covers only part of the semantic field of a given character, which makes it necessary to
allow for a character to belong to different groups of variant characters, depending on
14
which aspect of its meaning is called upon . In other cases, the relationship might be due
to a phonetic replacement or even error. Dictionaries and commentaries have for a long
time collected such information, which has to be taken into account. This type of
relationship could be called a generic relationship, which is true for all characters in this
set, thus it is a relationship (to use a technical term) on the level of the class of characters,
not the instances.
definitions should go into the project documentation.
14 The historic dimension of the development of the writing system towards more specific
characters is also playing a role here; what had been written with the same character in earlier
texts might be delegated to different characters later on.
176 Chung-Hwa Buddhist Journal Volume 25 (2012)
On the other hand, out of all the possible relationships that exist on a class level, or
sometimes even in addition to these, for every instance of a character that is not identical
with the character in modern usage, the corresponding modern character form needs to be
established. While this might not seem necessary for a pure diplomatic transcription of a
text, it is necessary to do proper searches and other text analytic tasks. Without this the
value of a transcribed version is not much more than a digital facsimile.
Between these two types of relationships, the one completely generic and the other
completely tied to the specific instance, it might well be useful to generalize from the
instance-specific relationships to relationships that are relevant for the whole text, text
corpus or text collection, thus forming a third type of relationship (of which could exist a
number of sub-types depending on the scope).
A New Model for Encoding Chinese Primary Sources
In this paper, a new model is presented, together with a description of an implementation
that acts on the model. The model again is described in two parts that are complementing
each other, that is (1) a representation of the text and (2) a database of characters.
Representation of the Text
With respect to the character encoding, the main problem for premodern Chinese texts is
that there is a friction between the modern usage, as reflected in the encoding systems
available for digital texts, and the characters as they are used in a source text. In order to
learn more about the writing system, and better understand the development of character
forms and usages, one ideally should not have to rely on modern encoding systems for
premodern texts, since they tend to hide exactly the differences that are the object of such
a study, but if we are to transcribe the texts digitally, there is in fact hardly another way
then to use such a modern encoding system. The only realistic way out is to give up on
using character encoding as the only trace of the characters from the written source. This
is however not easily achieved, since due to the way text encoding is done at the moment,
the character encoding is a given, on which the layer of markup is built. Although there is
some support, for example in TEI P5 to reach down into the encoding layer and introduce
additional characters through markup, this mechanism is not flexible enough for cases,
where the research questions involve investigation of the writing system itself.
The reason character encoding is performed is that this opens the way to
computationally simply deal with the symbols encoded and abstract from the
idiosyncrasics of the actual written characters. In alphabetical languages, this is very
seldom problematic and even for logographic languages, this is only problematic where
fundamental questions about the characters themselves need to be answered. On the other
side, if character encoding does not provide the stable framework on which the following
interpretative layers can be built, something else has to take its place.
Digital Editions of Premodern Chinese Texts
177
The fundamental difference with respect to character encoding in the model
proposed here is that first and foremost the location of a position in the text is recorded.
Only in a second step is this position than associated with an encoded character that might
15
provisionally serve to represent it.
The model proposed here takes one representative edition of a text as a reference
edition for digital encoding. This text is seen for the purpose of this model as a sequence
of pages (or scrolls or other writing surfaces), which contain a sequence of lines, and the
lines again containing a sequence of characters. While there is a provisional transcription
into encoded characters, these encoded characters are considered to be preliminary and
serve mainly as placeholders to mark slots for the positions in the text they fill. The
characters used might be replaced by others or further annotated and linked to. The
encoding is considered to be mainly positional (that is, identifying a character at a
specific position in a text), rather than mainly symbolic (i.e. identifying the symbol that
will be used for all such characters in this text).
In addition to the transcribed text of the reference edition, there are additional layers
of text, that might contain characters as they are found on other witnesses of the text, or
for example a regularized form that reflects modern usage. These layers are considered to
be linked positionally through the sequential numbers of the pages, lines and characters
(See Figure 5). The number of layers is unlimited, but for practical purposes they are
assigned to different categories:
●
The new edition to be created
●
The reference edition
●
Editions used for collation
●
Other editions
16
By convention any character position left empty will be filled by the character in the
reference edition, which has to be present for all characters. In addition to these
transcribed layers, a digital facsimile of the reference edition is linked to each page. If
necessary, a cutout from this digital can be linked to the characters on this position, thus
providing a connection between these two different representations of the text. The model
also allows for the possibility of linking a digital facsimile of other editions (with possible
17
different page arrangement) to the reference edition.
15
This idea is of course not new, it has been used implicitly in previous work, for example Yasuoka
(2005).
16 This category includes for example other electronic transcriptions of the text that are linked to
the reference edition to improve the proofreading, but are not in themselves witnesses of the text.
17 This can become rather complex and may in practice be difficult to realize if there are big
differences in the arrangement of text in different sources.
178 Chung-Hwa Buddhist Journal Volume 25 (2012)
Figure 4
Representation of the different editions
Digital Editions of Premodern Chinese Texts
179
Figure 5: Attempt to visualize the connection between two layers.
The provisional encoding is by no means the only or final encoding that should be used,
its main purpose is simply to occupy the position and show a representative that might
stand for the character used at that position. Closer examination of this and other similar
characters might bring up other possible candidates.
The transcription of the text is not seen just as a precondition for dealing
computationally with the text, but is in itself a means to acquire better understanding of
the writing system used to write the text and ultimately the content of the text. To gain an
increasingly detailed understanding of the text, a kind of hermeneutical circle has to be
performed, consisting of several steps to be performed in sequence.
●
●
●
For every character that seems doubtful, unintelligible or a non-standard
representation, the word intended by this character needs to be established.
This can be done by
●
Looking at the context of the occurrence of this character and
compare it with other, similar contexts
●
Looking at characters that are similar, either in visual, phonetical
or semantic respects
The result of this research gets registered into the database and thus provides
context for future lookups.
180 Chung-Hwa Buddhist Journal Volume 25 (2012)
●
●
Information about context and registered variants becomes only available as the
processing of the text progresses, therefore several loops of this activity have to be
performed.
Like a hermeneutical circle, this activity is in principle open ended and holds the
potential for ever new discoveries and observations.
Through the performing of several loops of proofreading and digesting of different
representations of characters, a new understanding of the text and the conventions and
idiosyncrasy used to write it is gained.
Quite separate from these layers of textual representation there is an interpretative
layer that might be thought to hover over the positional layer; in this layer connections or
disconnections between similar or different characters are established and investigations
of characters and their contexts is conducted.
Character Database
The model developed here relies on a database of characters. In this database, relations
between characters, their occurrences within the text and among groups of characters are
registered.
The groupings of the characters can be organized according to different properties of
the characters, thus allowing the researcher to built sets of characters similar in its
phonetic, semantic or visual properties. Since the relation to the occurrence of the
character in the text is maintained, these relations are never thought to be abstract and
generic, but are specific to the text under investigation.
Information in the database is held in two parts. One is holding generalized relations,
as they are recorded in dictionaries, here the table of variant characters of the Hanyu
dacidian 漢語大詞 (HYDZD) and the Dictionary of Variant Characters compiled by
the Taiwanese Ministry of Eductions are used, these are the most comprehensive tables of
this kind. This serves as a backdrop for a specific database, which records the relations as
they are observed in the text. This information is thus specific to the text it was developed
with and the records of the database are always tied to the context the information was
abstracted from. Nevertheless, as the number of texts processed with this system
increases, and information held for these texts in the databases is aggregated, it is hoped
that more general information on the Chinese writing system and its development can be
18
gained, which are not available at the moment.
18
It should be noted here, that the development of encoded character sets by necessity predates
the creation of textual material using these character sets. This precludes then of course any
statistical base that might be used as a guidance in developing such encoded character sets.
The results of work using systems such as the one developed here could serve as a guidance
for the future development of such character sets.
Digital Editions of Premodern Chinese Texts
181
The database connects the specific instance of the character, which is registered not
with a character code, but with the location of the character within the text, with a generic
identifier that is, an encoded representation of the character, if such a representation is
available in the encoded character set. If no such representation is available, a private
character will be created in order to allow computational processing and representation of
this character. In such cases, structural information about the character, as well as an
image cut from the digital facsimile is added to the record for this character.
If a suitable representation can be found within the almost 75000 character codes
registered in Unicode, there might still be slight differences in appearance that can't be
accounted for using the standard glyphs present in the operating system of the used
computer. In such cases, and whenever a doubt about this character arises, an image cut
from the facsimile representation of the text will be added to the record. The database can
thus also be seen as connecting the digital facsimile representation and the transcribed
representation of the text.
The Daozang Jiyao and its Editing Environment
The Daozang Jiyao
After the Daoist Canon of the Ming period ( 統道藏 Zhengtong Daozang, 1445), the
Daozang jiyao (Essentials of the Daoist Canon) is the most important collection of Daoist
texts. It is by far the largest anthology of premodern Daoist texts and an indispensable
source for research on Daoism in the Ming and Qing period (fourteenth to late nineteenth
century). Although the collection is chiefly derived from the Ming Canon, it contains
more than 100 texts that are not included there and thus is undoubtedly the most valuable
collection of Daoist literature of the late imperial period. It features texts on neidan or
inner alchemy, cosmology, philosophy, ritual, precepts, commentaries on Buddhist,
Confucian and Daoist classics, hagiographic, topographic, epigraphic and literary works,
and much else.
At the Institute for Research in Humanities in Kyoto, a research project on the DZJY
is being conducted. This was started by the late Monica Esposito with the help of
Mugitani Kunio and Christian Wittern, with the aim to investigate the origin of the
collection, but also create a new critical electronic edition and develop the tools for
19
exploring all aspects of its content .
19
More on the history of the Daozang jiyao and the projects sponsored by the Chiang-ching
Kuo Foundation (CCK) and the Japanese Society for the Promotion of Science (JSPS) can be
found at http://www.daozangjiyao.org. Due to the untimely passing away of Dr. Monica
Esposito in March 2011, the project has seen a reassessment and will be continued under the
leadership of Lai Chi-tim and in close collaboration with the Centre of Daoist Studies at the
Chinese University in Hong Kong.
182 Chung-Hwa Buddhist Journal Volume 25 (2012)
The genesis of this collection is still hardly explored. According to the most
common account, often presented even in recent articles and primarily based on Zhao
20
Zongcheng (1995)’s hypothesis , it is believed that there are at least three different
editions of the Daozang jiyao:
●
●
●
by 彭定求 Peng Dingqiu (1645-1719) compiled around 1700 and containing 200
titles from the Ming Canon;
by 蔣元庭 Jiang Yuanting (予蒲 Yupu, 1755-1819), who reportedly added 79
texts not contained in the Ming Canon (Weng Dujian, 1935) during the Jiaqing era
(1796-1820);
by 賀龍驤 He Longxiang and 彭瀚然 Peng Hanran published in 1906 at the
仙菴 Erxian'an of Chengdu (Sichuan) under the name of Chongkan Daozang jiyao
重 刊 道 藏 輯 要 (New Edition of the Essentials of the Daoist Canon), and
(according to this hypothesis) containing a total of 319 titles.
However as early in 1955, 吉岡義豊 Yoshioka Yoshitoyo in his work entitled Dōkyō
kyōten shiron 道教教 史論 (Historical Studies on Daoist Scriptures) cast doubt on the
belief and affirmed that there were only two editions of the Daozang jiyao (number 2 and
number 3).
One avenue that might provide new light in this controversy is the establishing of a
stemma of existing textual witnesses. This should provide an answer to this question.
However, a close reading and comparing of the existing witnesses is required, as well as a
method to computationally compare these versions and calculate the respective closeness
of individual witnesses.
Editing Environment
The editing environment has been realized as a Web application that can be used from
any compatible browser, anywhere on the Internet. One of the reasons for choosing this
platform was to be able to allow collaborative editing in a distributed environment,
another was the hope to use this interface either directly, or at least most of it for a webbased publication of the texts.
Mapping to a Relational Database
A relational database management system (in this case, PostgreSQL 8.3) has been used to
hold the data, while the user interface was developed with the Python-based web
application framework Django (post 1.0 SVN version) and the Javascript framework
ExtJS. In Django terms, there are two applications, 'textcoll' for holding the textual
20 Zhao (1995).
Digital Editions of Premodern Chinese Texts
183
content and 'chardb' for the character database; these two are glued together with a frontend called 'md'. One of the difficult tasks at the outset was to model the text collection,
21
which has been done in the following tables :
Tablename
Work
Edition
TextPage
TextLine
TextChar
Kind of Information
Title of the work, date and other information
Information about the edition, editor,
publication details
Page number, graphical image of the page,
serial number of the first character, number
of characters
Line number, serial number of the first
character, number of characters
Serial number of character, associated extra
22
information , Unicode value of the character,
serial number of previous and next character
Relations
Work
Edition, TextChar
TextPage, TextChar
TextLine, Edition,
TextChar,
Interpunction
As can be seen, there is in principle a hierarchical relationship from the Work through
Edition, TextPage and TextLine down to the TextChar table, which holds all the
information related to the character at this position. It goes without saying that this incurs
a tremendous overhead for the storage and processing of a simple text, but it should be
kept in mind that this is the equivalent to a raster electron microscope, which tries to
study the atomic units of a text, so there has to be some effort for isolating and handling
these atomic units. There are some anomalies in the hierarchy, which are for the
convenience of processing, which are that through the serial numbers of the first character
on pages and lines the TextPage and TextLine tables are linked also to the TextChar table,
which also has some internal links to the previous and following character position. Any
character that spans more than one position in the grid, as well as talismans, outlines of
movements in rituals or similar material that falls out of this simplistic model for the
layout of a text is treated as a graphic outside of the textual flow.
In addition to these tables representing the text and allowing the modeling of its
digital representation, there are a few other tables necessary for holding information about
the text structure and content, as follows:
21
22
Only tables and information relevant to this discussion are shown, implementation details are
ignored to keep the table simple.
Information about interpunction or other extra characters attached to this character is held
here. This does include the possibility to add additional information, for example in the case
of space characters that are used honorifically before names.
184 Chung-Hwa Buddhist Journal Volume 25 (2012)
Tablename
Attribute
Kind of Information
key, value, note
Mark
Interpunction
tag, name, gloss, scope, note, color
position, category
Relations
TextChar (start), TextChar
(end), Mark
The Mark table provides the tags that can be associated with locations in the text, whereas
the Attribute table does provide the actual connection between an instance of a mark and
a specific text location, given its start and end TextChar. Interpunction, except for space
that is already present in the source text, is held in a separate table, linked to the text from
the TextChar; besides the character used to represent the interpunction, the position
23
24
relative to the character and a category is recorded.
Here is a table of the tables in the Chardb, the part of the application that maintains
the character database:
Tablename
Char
Unihan
CharGroup
Variant
Pinyin
IDS
Kind of Information
Relations
unicode codepoint, character, types external link to TextChar
key, value
Char
members, type
Char through Variant
type, character, note
Char, CharGroup
pinyin reading
Char
25
Char
IDS (Ideographic Descriptor Sequence)
Groups of characters are built by linking the characters through the Variant table to a
CharGroup and declaring thereby membership to that group. Additional properties can be
set on Variant and CharGroup. The modeling of semantic is currently done through the
definition in the Unihan table; the sound is modeled through the Pinyin table. This is
provisional and is awaiting a more thorough solution.
User Interface
The user interface is accessed by opening the URL. It requires an account in the web
application. Upon login, the user will be presented with the last page visited before leaving
the system, like in Figure 6. The initially visible screen space is divided into three parts, at
the right is a page as digital facsimile of the text, in the center pane is a transcribed version
of this same page, while the left pane holds some administrative functions: There is
information about the current page, the user (including a logout button and a possibility to
23
24
25
This is given as one of eight compass positions with the character in question at the center,
numbered clockwise and starting in the 'East', that is, after the character.
At the moment, the categories are phrase-end, sentence-end and phrase/sentence-start
The IDS is a sequence of operators and character parts that together describe how a character
is composed.
Digital Editions of Premodern Chinese Texts
185
look at a change log), in the second part is a panel for navigating the text collection and
finally the bottom left has a multifunction panel for showing additional information and
perform other tasks on this text page.
Figure 6: The web application interface for establishing the source text
The main functions for interacting with the text however are not visible here. Most
editing actions are performed by clicking or selecting text and through the dialog boxes
that pop up following such an action. Figure 7 shows an example of this popup window,
in this case the fourth character position in the second line has been clicked, as a visual
feedback to remember which character position is the target of the actions taken in this
dialog, the character in this position is highlighted. The new window that opened gives in
the top line the TextLine of this position, the character and then a number of input boxes.
26
The first input box has the current character for the edition [CK-KZ] which is given in
the second box. By providing a different character and selecting a different edition, the
user can associate a new reading for another witness of the text, or give a different
character to be used in the JYE edition. If the correction or replacement is occurring
several times, the scope for this action can be set in the third selection box to be either
valid for the current character, for the whole page, or even for the remaining part of the
27
text . Below this line, there are four tabs for further action or inspection; by default it
26 The conventions for identifying the edition here is constructed as follows: Currently, there are
two edition groups, indicated by CK and YP. The actual edition from within the group is then
indicated in the second part of the sigle, in this case it is the Kaozheng reprint of the
Chongkan edition CK-KZ. An exception to this scheme is the new regularized edition created
here, which will be indicated as JYE.
27 This is mainly to make the editorial process more efficient, under the assumption that only
text not yet seen will be touched.
186 Chung-Hwa Buddhist Journal Volume 25 (2012)
opens to the second tab, which provides a glimpse into the information in the character
database for the character at this position. Among other things, the number of occurrences
of the character here are given (464) and images of the character as it has been cut from
the text. The main part gives additional information about the character, including
28
pronunciation and definition according to the Unihan database . More important however,
for the present context, is the ability to maintain character relations here. The information
about character variants, that is hold for the character
is shown in Figure 8. In this
29
case, the Hanyu da cidian 漢語大字
, on which the initial information is based, has
assigned this character to five different groups of characters. For all characters in this
group, the Unicode code-point, number of occurrences in the DZJY, as well as definition
and pronunciation is given. Characters can be added to groups or deleted from groups, or
new groups created as necessary, thus allowing to model this information exactly as is
needed for this text collection. In addition to that, to assist the user in distinguishing
characters that might be mistaken for each other, it is also possible to register characters
to the system which are not cognates of the current character.
Figure 7: The dialog box that opens when a character position in the
transcribed text is clicked
28 This is a database of basic character properties, maintained by the Unicode Consortium.
29 Hanyu da zidian weiyuanhui (1986-1989).
Digital Editions of Premodern Chinese Texts
Figure 8: Information about
held in the character database
187
188 Chung-Hwa Buddhist Journal Volume 25 (2012)
The first tab on this window allows the user to cut an image from the digital facsimile and
associate it with the current position in the transcribed text. In addition, this image is also
associated with the corresponding character in the character database.
Figure 9: Cutting a character from the text
The next tab on this window allow the user to see all information associated with a
character, as shown in Figure 10. Here, a regularized version of the character has been
registered for the JYE edition. It is also possible to add further notes to the character into
the textbox to the right. The last tab (not shown), allows for adding or deleting of larger
chunks of text.
Figure 10: Detailed information about this text location
Digital Editions of Premodern Chinese Texts
189
Another way to interact with the text is to select a string of characters. The action
following a selection can be configured to either copy the selected string to the search box,
or to apply markup to the selection, as shown in Figure 11. Currently, this is mostly used
to record characters that have been printed smaller as inline notes, but this will also be
used for titles, personal names and other items of interest in the text. To record structural
elements in the text, like paragraphs, verse lines or section headings, yet another dialog
can be used that pops up when clicked on the horizontal bars at the top of a text line (see
Figure 12); this assumes however that the features is starting at the beginning of the line.
Figure 11
Figure 12: Applying markup to a line
190 Chung-Hwa Buddhist Journal Volume 25 (2012)
Context
The discussion here stands in the context of practical experience and theoretical
considerations with digital text in Chinese. Some ideas have been pursued and have been
discussed in earlier presentations and articles. In particular, in the last several years, I was
30
developing an ontological model for understanding text from a perspective quite
different from the one taken here. The model presented here is meant to complement this
from a different perspective, filling some of the gaps in the earlier model.
The work here can also be seen as a continuation of an earlier line of thought, which
was concerned with a 'scholarly workbench'; the last incarnation of which was a
Filemaker-based application called KanDoku that supported annotation, translation and
markup of digital texts. When I tried to implement support for more flexible handling of
character representation and variant readings for different text witnesses, I quickly ran
into the limitations inherent in that platform. The present work should be seen as aiming
in a similar direction, except that this time and attempt has been made to start with a firm
foundation. It is planned however, to gradually add more of the possibilities of that earlier
KanDoku. Another difference of the present work, to KanDoku is that the latter took as
its input a completed TEI P5 compatible digital version of a text, while the former will
attempt to produce such a thing as its output (among other things), in fact one of its
design goals is to improve the workflow of creating high quality digital edition of text,
but hopefully its usefulness will extend beyond that and allow the user to gain new
insights into the text itself.
In the Daozang jiyao project, the work was initially done by editing TEI conformant
XML files with the XML editor oXygen. This was considered cumbersome and time
consuming by the researchers involved, so this editing application has been developed to
provide a more convenient interface for performing specific tasks on the text easier than
could be done otherwise. It should be noted however, that such a specialization also
involves an enormeous limitation to what can be done while editing the text, there will
therefore be many cases where such a solution can not be applied. It is planned to add a
routine to export the texts edited using this interface into TEI conformant XML
documents.
As it stands at the moment it is very much work in progress and much of the
necessary functionality, for example to visualize textual context in a way that takes into
account the several different layers of characters that might be available at a given point
in the text is still missing. The results that have been achieved so far in the context of
30 In English, this is presented most detailed in Wittern (2007), but more references can be found
here http://kanji.zinbun.kyoto-u.ac.jp/~wittern/publications articles/index.html.
Digital Editions of Premodern Chinese Texts
191
work on the Daozang jiyao seem to suggest that the work is going in the right direction
and will indeed be able to open up new avenues for digital texts.
It will be interesting to see how well this approach could also be applied to earlier
stages of the development of the Chinese writing system, such as bronze inscriptions or
texts on bamboo slips. The model presented here implicitly assumes a regular grid for
the layout of a text, so that model would require some extension, but it will have to be
actually tried with such a text to see in what way such extensions should be implemented.
192 Chung-Hwa Buddhist Journal Volume 25 (2012)
Abbreviations
CBETA
Chinese Buddhist Electronic Text Association 中華電子
(see http://www.cbeta.org)
協會
CJK
Chinese, Japanese and Korean Characters
CK
Chongkan 重刊 (reprint) edition of the DZJY, Sichuan 1906ff.
CK-KZ
Facsimile edtion of CK published by the Kaozheng publishing company
DZJY
Daozang Jiyao 道臧輯要
HYDCD
Hanyu Dacidian 漢語大詞
IDS
Ideographic Definition Sequences
JYE
New Electronic edition of the DZJY
TEI
Text Encoding Initiative (see http://www.tei-c.org)
UCS
Universal Character Set, also known as Unicode (see http://www.unicode.org)
XML
eXtensible Markup Language (see http://www.w3.org/XML/)
YP
DZJY original edition by Jiang Yupu 蔣予蒲 (1755-1819)
Digital Editions of Premodern Chinese Texts
193
References
Becker, Joe. 1988. Unicode 88. (http://www.unicode.org/history/unicode88.pdf. Accessed
2012-03-23).
Hanyu da zidian weiyuanhui 漢語大字 委員會, eds. 1986-1989. .Hanyu da Zidian 漢語
大字 . 8 vols. Wuhan: Hubei cishu chubanshe and Sichuan cishu chubanshe.
Kawabata, Taichi 川幡太一. 2005. Possible Multiple-encoded Ideographs in the UCS.
(http://www.cse.cuhk.edu.hk/~irg/irg/irg25/IRGN1155_Possible_Duplicates.pdf. Accessed
2012-03-23).
Kawabata, Taichi 川幡太一. 2006. IDS による UCS 漢字の 同一性 の判定手法
(Methods to Assert 'Sameness' of a Character in UCS Kanji Through IDS. ) 東洋学へ
の コ ン ヒュ ー タ 利 用 第 17 回 研 究 セ ミ ナ ー . Kyoto: Institute of Research in
Humanities.
Leng, Yulong 冷玉龍 and Wei, Yixin 韋一心, eds. 1994. Zhonghua Zihai 中華字海.
Beijing: Zhongua shuju.
Luo, Zhufeng 羅竹風, ed. 1987-1994. Hanyu Dacidian 漢語大詞 . Shanghai: Dictionary
Publishing House.
Morioka, Tomohiko. 守岡知彦 The CHISE (Character Information Service Environment)
project. (http://www.kanji.zinbun.kyoto-u.ac.jp/projects/chise/. Accessed 2012-03-23).
Renear, Alan; Mylonas, Elli; Durand, David. 1996. Refining Our Notion of What Text Really
is: the Problem of Overlapping Hierarchies. Research in Humanities Computing. Ed. Ide,
Nancy and Hockey, Susan. Oxford: Oxford University Press.
Wittern, Christian. 2007. Digital Text, Meaning and the World: Preliminary Considerations
for a Knowledgebase of Oriental Studies. Higashi Ajia ni Okeru Reigi to Keibatsu 東アシ
アにおけろ儀礼と刑罰 (Ritual and Punishment in East Asia). Ed. Tomiya, Itaru 冨谷
至. Kyoto: Institute for Research in Humanities. 41-58.
Yasuoka, Koichi 安岡孝一. Text-Searchable Image and Its Applications (http://kanji.zinbun.
kyoto-u.ac.jp/~yasuoka/publications/2005-01-22.pdf. Accessed 2012-03-23).
Yoshioka, Yoshitoyo 吉岡義豊. 1955. Dōkyō Kyōten Shiron 道教教 史論 (Historical
Studies on Daoist Scriptures) Tokyo: Gogatsu shobo.
Zhao, Zongcheng 趙宗誠. 1995. Daozang Jiyao de Bianzuan yu Zengbu 道蔵輯要的編纂
增補 (The Compilation of the Daozang Jiyao and its Enlarged Editions) Sichuan
Wenwu 四川文物 2:27-31.
Chung-Hwa Buddhist Journal (2012, 25:87-104)
Taipei: Chung-Hwa Institute of Buddhist Studies
中華佛學學報第二十五期 頁 87-104 (民國一百零一
ISSN:1017-7132
),臺
:中華佛學研究所
The Corpus Search and Results Handling System
Glossa – a Description
Janne Bondi Johannessen
The Text Laboratory, Department of linguistics and Nordic studies
University of Oslo
Abstract
The paper presents and describes Glossa, a corpus search and results handling system that
has two main characteristics: It is advanced with respect to search and handling options,
and it is very user-friendly. Also, it is freely downloadable. The system is suitable for
monolingual and parallel corpora, and for combining different kinds of information in the
search results. In the paper I show how sound, video and maps, as well as sets of double
transcriptions, are presented to the Glossa user.
Keywords:
Corpus Search System, User-friendly Interface, Advanced Search, Parallel Corpora,
Speech Corpora
88
Chung-Hwa Buddhist Journal Volume 25 (2012)
語料庫搜尋與結果處理系統 Glossa 之說明
Janne Bondi Johannessen
奧斯陸大學
摘要
此篇文章介紹並說明語料庫搜尋與結果處理系統—Glossa,此系統有兩個主要的特
性:它具有搜尋與處理選擇 的優越性,並相當考慮使用者的需要 另外,此系統
可免費 載,且適用於單一語言及 行語料庫,同時可以在搜尋結果中結合不同的
資訊 在此文,我將說明聲音,影像及地圖,及一套雙重抄寫如何呈現於 Glossa
的使用者
關鍵詞:語料庫搜尋系統 人性 用戶界面
進階搜尋
行語料庫
口語語料庫
The Corpus Search and Results Handling System Glossa 89
Introduction
1
The paper presents and describes Glossa, a corpus search and results handling system that
has two main characteristics: It is advanced with respect to search and handling options,
and it is very user-friendly. Also, it is freely downloadable, which means that those who
have a corpus and would like it to be available on the web in a nice interface, can use
Glossa. Many corpora are used with Glossa both at the University of Oslo and elsewhere.
The paper is structured as follows. In section 2, I briefly describe the importance of userfriendliness. Section 3 illustrates querying with Glossa, showing options with as different
texts as parallel corpora and speech corpora. That section concludes with an illustration of
the indispensability of Glossa for certain types of research. The illustration shows how
finding isoglosses for variation in noun morphology depend on the Glossa options of
maps and parallel transcription search. Section 4 gives the technical details, including
requirements on input data and a small discussion on the use of Google APIs. Section 5
concludes the paper.
Importance of User-Friendliness
There are several corpus interfaces available, see e.g. Johannessen et al. (2000), Bick
(2004), Hoffmann and Evert (2006). However, they often have limitations: some are not
network-enabled (i.e. each user has to download and manage corpora), some lack
flexibility with regard to queries, results display and post-processing, many are tied to a
specific corpus, and few are completely GUI-driven.
Typically, corpus applications require queries to be formed as regular expressions in
some formal language. Many corpus users find it difficult to learn such query languages,
with their requirements for accurate use of parentheses, asterisks, percentage signs etc.
Furthermore, applications often require the users to know the full tag set before querying
the corpus.
Many corpus users find it hard to have to know the tag inventory, tag names and
necessary tag abbreviations, as well as abbreviations for source texts, etc. For many
potential users, these issues act as an obstacle, preventing them from making easy or
efficient use of corpus tools.
We believe that an easy-to-use, flexible graphic user interface is important for
maximizing the potential of corpora in research, development and teaching. Furthermore,
1
I would like to thank the two anonymous reviewers for very good advice. I also thank my
colleagues at the Text Laboratory, University of Oslo, especially Joel Priestley, Anders
Nøklestad and Kristin Hagen, who are vital for the Glossa development and Lars Nygaard for
his important contributions in the early development phase.
90
Chung-Hwa Buddhist Journal Volume 25 (2012)
the interface should not presuppose full-text access to the corpora, as licence conditions
may prohibit free redistribution, even if they often do allow web-based querying. Glossa
satisfies these criteria.
Querying with Glossa
The corpus user can query the corpus by linguistic features or by non-linguistic features,
or by a combination. The most common linguistic queries involve specifying a token by
given attributes: word, lemma, affix or part of word (start, middle, and of word), part of
speech, morphological features, syntactic functions, sentence position. These queries can
always be done in a user-friendly way.
In (1) we exemplify what a search using a search language of regular expressions
would be like, in order to search for a plural noun starting with the letter sequence jump.
In figure 1 we see the same query in Glossa, with its pull-down menus. (The latter search
is translated by Glossa into regular expressions.)
(1) (word="jump.*"%c&(number="pl")&(pos="n"))
Figure 1: Querying Glossa using linguistic specifications.
All searches are done using checkboxes, pull-down menus, or writing simple letters to
make words or other strings.
The Corpus Search and Results Handling System Glossa 91
The querying in figure 1 is a monolingual search. In figure 2 we see how a query can
address more than one language in a parallel translation corpus. The user has indicated
that (s)he wishes to get all hits where the English text contains jump followed by a
preposition, and the Norwegian translation equivalent contains hopp.
Figure 2: Querying for a parallel search.
The parallel search in figure 2 is translated to a regular expression by the system,
presented in (2), and the search results are presented in figure 3. Without the interface, the
users would themselves have to write this regular expression.
(2) "([((word="jump" %c))][((pos="prep"))]) :OMC4_NO ([((word="hopp.*"%c))]) ;"
Figure 3: Some results from a parallel corpus query.
92
Chung-Hwa Buddhist Journal Volume 25 (2012)
The examples we have seen up to this point are ones that query linguistic features. The
corpus user can also filter the searches by non-linguistic features, on the same query page.
Here the choices are hidden in clickable boxes that appear when the little plus-sign is
expanded. This is shown in figure 4.
Figure 4: The non-linguistic features that can be used for filtering the search.
The Nordic Dialect Corpus (Johannessen et al. 2009) is a corpus that contains a lot of
information: the speech of five languages, hundreds of dialects, tagging, at least one
transcription for each dialect, but sometimes two, information on informants, like age and
sex, type of recording, recording year etc. But querying the corpus is no more
complicated than with simpler corpora.
Figure 5 illustrates how to search for a suffix. The suffix –um in the Övdalian dialect
of Sweden has this suffix for two functions: dative plural on nouns and 1st person plural
on verbs. It is an interesting search option since the Övdalian dialect is rapidly changing,
and one does not know whether the various inflections are still used. (They are no longer
used in standard Swedish.) Especially use of the dative suffix is clearly rapidly losing
ground. Since –um is a non-standard suffix, there is no standard orthography for handling
it, and consequently, a standard orthographic search is not viable. The user must specify
that the search should be performed in the phonetic transcription (the option at the bottom
of the pull-down menu). The standard orthography simply leaves out the suffix altogether,
so there is no way this could be used (see figure 7 for the results of the search, illustrating
also the difference between the two types of transcription).
The Corpus Search and Results Handling System Glossa 93
Figure 5: A simple query for the suffix –um.
Figure 6 illustrates many of the non-linguistic variables that can also be used to limit the
search. In addition to those regarding informants, there are also some other choices that
deal with the presentation of the result (top of figure 6). I will mention particularly the
option of choosing one or two or both types of transcription. This option is irrelevant of
whether the researcher originally searched in the phonetic or orthographic transcription.
Figure 6: Non-linguistic search options in the Nordic Dialect Corpus.
94
Chung-Hwa Buddhist Journal Volume 25 (2012)
In figure 7 some of the search results for the suffix query for –um are displayed,
and we see how the two transcriptions complement each other.
Figure 7: Three different displays of search results: two types of transcription (orthographic and phonetic) plus an English translation – in that order.
Without being able to search in the phonetic transcription we would not have been able to
find these suffixes. Without the orthographic transcription a non-expert dialect speaker
would not have been able to understand the phonetic transcription, given how far it is
from the standard. We would like to point to the fact that the displayed results are
translated to English by using a Google Translate API. This has to be done for each
concordance line separately, and is a service to less proficient, Nordic language speakers.
Figure 8 shows what the results window looks like when the film icon button next to
a result line is pushed. The video and audio give exactly the same segment as the text line
in the results list.
The Corpus Search and Results Handling System Glossa 95
Figure 8: Search results with audio and video.
In addition to the many search options, there are also various options for handling the
results. The Action menu visible in figure 7 and 8 gives a large selection of choices, for
example: sorting on matching phrases, bibliographic information or arbitrary points in the
context, counting matched phrases, downloading result sets in various formats (e.g. tab
separated values and Excel spreadsheets), collocation analysis, co-occurrence analysis,
user-defined annotation, singling out individual hits or whole results file for saving or
deletion, viewing with regards to metadata distribution, frequency count of all hits.
In figure 9, we have simply asked for a count of the results from a search on jump as
first part of a word. This option gives the researcher a very nice overview of the words of
the resulting search concordance. Here, the case-sensitive option has been chosen,
thereby distinguishing jump from Jump. This is a choice the user has to make before
displaying the result of the count.
96
Chung-Hwa Buddhist Journal Volume 25 (2012)
Figure 9: Word count
The word count can also be represented by a pie chart or a histogram, among other things,
as illustrated in figures 10 and 11:
Figure 10: Frequency displayed as a pie chart.
The Corpus Search and Results Handling System Glossa 97
Figure 11: Frequency displayed as a histogram.
The Action menu also gives the possibility of showing collocation data, as in
figure 12.
Figure 12: Collocations
98
Chung-Hwa Buddhist Journal Volume 25 (2012)
As mentioned, Glossa is continuously being developed and is getting new features. I have
not shown all the options that can be had with this corpus search and results handling
system, but I would like to mention one of the newest additions to the system; that of
showing maps for each concordance line. Thus, if we make a search for some feature that
is distributed geographically, a map display is very useful.
I choose to present a final example that illustrates how useful the Glossa options are
for linguistic research, by the overall research question of isoglosses for noun
morphology. A topic that has interested Norwegian dialectologists over many years is the
distribution of the various noun suffixes. While detailed maps were drawn for the noun
morphology in the mid 1900s, it is expected that the situation differs now, but it is costly
to do a full dialect survey only for this topic. With the Nordic Dialect Corpus available in
Glossa, a simple search for a specific, common noun such as ungene ‘the children.
MASCULINE, will give the desired results revealing the geographical distribution of this
plural definite suffix within seconds. There are 568 hits, and the results page shows each
form of the noun as in figure 13, and the geographical distribution on a Google map, as in
figure 14. It should be mentioned that I could also have chosen to search for just the
string –ene ‘plural definite suffix’, but have chosen not to do so here, since that would
have given hits for all three genders (neuter and feminine as well). Since many dialects
distinguish the plural definite suffix according to gender, I would have gotten many more
forms, which I do not find useful for this illustrative example.
The Corpus Search and Results Handling System Glossa 99
Figure 13: The full range of pronunciations of the word ungene ‘the
children in the Nordic Dialect Corpus, transcribed in a
traditional Norwegian system.
The corpus users themselves choose which words to group together by way of a colour
code. Here I have chosen to distinguish between three types: the full two-syllable suffix –
ane (green [editor's note: download PDF for color reproductions] ), the apocoped onesyllable suffix –an (black), the short non-nasal suffix –a (orange), and the dative suffix
consisting of a rounded vowel and a bilabial consonant –om (yellow). In figure 14 the
geographical distribution of these types is clearly displayed, and the isoglosses easy to see.
100 Chung-Hwa Buddhist Journal Volume 25 (2012)
Figure 14: Map with the distribution of the plural definite suffix.
The map shown that the full suffix –ane (green markers) is commonly used in the south
and west parts of southern Norway. The apocoped suffix –an (black) is used in all of
north Norway and the middle part of south Norway down to the coast. The short suffix –a
(orange) is mainly found in the eastern part of south Norway and in one place in north
Norway. The latter could be evidence of an immigrant group that came to this area in the
1700s–1800s from the eastern valleys of south Norway, a fact that, even today, is clearly
reflected in the language. The dative suffix –om (yellow) is only found in a few places in
the northern part of south Norway. Dative case is slowly dying in Norway, just as it is on
the other side of the border (recall the discussion on the similar Övdalian dative case
suffix –um).
The last suffix search with the resulting map illustrates two important features of
Glossa. If it had not had the possibility of searching for aligned phonetic transcription
variants via an orthographic search, finding so many versions of the suffix would have
been nearly impossible. With only access to orthographic transcription, no variation
would have been found, and conversely, with access only to phonetic transcription, a
The Corpus Search and Results Handling System Glossa 101
comprehensive search would have required detailed knowledge of all the dialect forms, a
near-impossible requirement. The second important feature for this search is the map.
Without the visual illustration, the isoglosses would have been hard to spot – with so
many places and so many linguistic forms.
Technical Details
Glossa (Nygaard 2007, Johannessen et al. 2008) is implemented partly through new
programming and partly with other reusable resources. The corpus search part is
performed with the IMS Corpus Workbench (CWB, Christ 1994), and the meta
information is put into a relational MySQL database. Although the web interface is
simple, it allows users to create complex queries in very simple ways, browse, process,
download result sets etc. Glossa supports all types of corpora, both multilingual and
multimodal, with various amounts and kinds of annotation. The statistics options are
implemented with the Ngram package (Pedersen 2008). Google Translate and Google
Maps are used for added value of display of search results. All in all, as indicated, Glossa
combines several features and functions, and makes them all available in the same user
interface. We know of no other interface that combines so many options. We have used
existing APIs for the programs mentioned here, but have not developed additional ones.
The use of Google APIs for translation and map functions deserves a comment.
Google is a commercial company with whom one does not communicate directly. They
offer good quality programs via APIs for free and have therefore been a valuable choice
for us. For example, we could have gotten Norwegian electronic maps free, from a
different company, as a university institution. However, we needed maps for many
countries in Northern Europe, while our institution only had agreements with one
company for Norwegian maps. Thus, Google’s free service turned out to be our only
option. Their API covers a lot and their functionality is good. A problem with Google,
however, is that they as a commercial company change their terms of service along the
way. Thus, the translation option that we have described here, was provided free of
charge, but now has to be paid for. Using Google thus makes some of the modules less
predictable in the long run.
When it comes to formats, Glossa needs texts to be in the format required by the
CWB, i.e., tab-separated text with XML tags. Glossa uses the XML tags for structural
(i.e., not about individual words) information such as sentence ID and time codes (for
audio and video files). If input texts come with TEI or other XML markup, information
from these tags will be extracted and inserted into the MySQL database. For Glossa to be
able to communicate with the third-party services (for example maps) and to link the
corpus text directly to audio and video, the corpus must have markup that includes
latitude/longitude coordinates and time codes, respectively. Grammatical tags are part of
the input tab-separated text, and must be mapped to the menu-structure of the search-
102 Chung-Hwa Buddhist Journal Volume 25 (2012)
interface. Mapping for TreeTagger for some languages is included in the system. But any
tag set and values can be imported. Glossa itself requires simply text in tab-separated
format and a MySQL database for metadata (extra-linguistic information on informants or
text sources). The programming languages used are Perl, Ruby, PHP and JavaScript.
CWB allows Unicode.
Configuration of the interface and the mapping from corpus data to menus and
search options is achieved using a set of corpus-specific configuration files. Search results
can be exported to several formats, such as tab separated and comma separated text, Excel etc.
Glossa is freely downloadable on a GPL licence from GitHub, and is undergoing
regular development and improvement in close contact between users and developers.
Some installation support can be given upon request. The Glossa package includes scripts
that convert written texts in TEI formats, as well as spoken language in Transcriber-XML,
into a full corpus and database.
There are many types of corpora that use Glossa (speech corpora, parallel, written
corpora, and monolingual written corpora). For a list, please consult the end of the paper.
Conclusion
The paper describes some features of the corpus search and results handling system
Glossa, developed at the Text Laboratory, UiO. We have seen that the basic search
system is the same for any kind of corpus, but that specific features (audio, audio or
translated texts) will give various additions to the usability. Glossa is currently used for
monolingual and multilingual, parallel written corpora and for speech corpora with audio
and video.
The Glossa system is freely downloadable (see web site below) and some support
can be given for corpus installation.
The Corpus Search and Results Handling System Glossa 103
References
Bick, Eckhard. 2004. Corpuseye: Et Brugervenligt Webinterface for Grammatisk Opmærkede
Korpora. Møde om Udforskningen af Dansk Sprog, Proceedings. Ed. Peter Widell and
Mette Kunøe. Denmark: Århus University. 46-57.
Christ, Oli. 1994. A Modular and Flexible Architecture for an Integrated Corpus Query
System. Complex' ’94. Budapest: Research Institute for Linguistics, Hungarian Academy
of Sciences.
Evert, Stefan. 2005. The CQP Query Language Tutorial. Germany: Institute for Natural
Language Processing, University of Stuttgart. (http://www.ims.uni-stuttgart.de/
projekte/CorpusWorkbench/CQPTutorial)
Hoffmann, Sebastian and Evert, Stefan. 2006. Bncweb (cqp-edition): The Marriage of two
Corpus Tools. Corpus Technology and Language Pedagogy: New Resources, New Tools,
New Methods, volume 3 of English Corpus Linguistics. Eds. S. Braun, K. Kohn, and J.
Mukherjee. Frankfurt am Main: Peter Lang. 177 - 195.
Johannessen, Janne Bondi; Nøklestad, Anders; Hagen, Kristin. 2000. A Web-Based
Advanced and User-Friendly System: The Oslo Corpus of Tagged Norwegian Texts.
Second International Conference on Language Resources and Evaluation. Proceedings.
Johannessen, Janne Bondi; Nygaard, Lars; Priestley, Joel; Nøklestad, Anders. 2008. Glossa:
a Multilingual, Multimodal, Configurable User Interface. Proceedings of the Sixth
International Language Resources and Evaluation (LREC'08). Paris: European
Language Resources Association (ELRA).
Johannessen, Janne Bondi; Priestley, Joel; Hagen, Kristin; Åfarli, Tor Anders; Vangsnes,
Øystein Alexander. 2009. The Nordic Dialect Corpus - an Advanced Research Tool.
Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA
2009. NEALT Proceedings Series Volume 4. Eds. Jokinen, Kristiina and Bick, Eckhard
Bick. Denmark: Northern European Association for Language Technology.
Nygaard, Lars. 2007. The Glossa Manual. Norway: The Text Laboratory.
Pedersen, Ted. 2008. Ngram Statistics Package. (http://www.d.umn.edu/~tpederse)
104 Chung-Hwa Buddhist Journal Volume 25 (2012)
Corpora that use Glossa
Big Brother Corpus (Speech), Norwegian: http://www.tekstlab.uio.no/nota/bigbrother/
The European Parliamentary Comparable and Parallel Corpora (ECPC) (under
development): http://www.ecpc.uji.es/EN/home.php?language= en
Lexiographical Bokmål Corpus:
http://www.hf.uio.no/iln/forskning/samlingene/bokmal/index.html#bokma
lskorpus
Lule Sámi Corpus: http://giellatekno.uit.no/doc/lang/corp/corpus-smj.html
Macedonian Text Corpus:
http://www.tekstlab.uio.no/glossa/html/index_dev.php?corpus= mak
Mörkuð íslensk málheild (Icelandic Corpus): http://mim.hi.is/
Nordic Dialect Corpus (Speech): http://www.tekstlab.uio.no/nota/scandiasyn/
North Sámi Corpus: http://giellatekno.uit.no/doc/lang/corp/corpus-sme.html
NoTa Oslo Speech Corpus: http://www.tekstlab.uio.no/nota/oslo/
Oslo Multilingual Corpus: http://www.hf.uio.no/ilos/OMC/
Ruija Speech Corpus of Kven:
http://www.hf.uio.no/iln/tjenester/kunnskap/sprak/korpus/talesprakskorp
us/ruija/index.html
RUN Parallel Corpus:
http://www.hf.uio.no/ilos/forskning/forskningsprosjekter/run/corpus/
TAUS Speech Corpus of Norwegian: http://www.tekstlab.uio.no/nota/taus/index.html
UPUS Speech Corpus Multiethnic Norwegian:
http://www.hf.uio.no/iln/forskning/prosjekter/upus/
Other Web Sites
GitHub: https://github.com/
Glossa: http://www.hf.uio.no/tekstlab/glossa.html
Google Translate: http://translate.google.com
IMS Corpus Workbench: http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/
MySQL: http://www.mysql.com
Open Source: http://www.opensource.org
Text Laboratory: http://www.hf.uio.no/tekstlab/
Chung-Hwa Buddhist Journal (2012, 25:3-6)
Taipei: Chung-Hwa Institute of Buddhist Studies
中華佛學學報第二十五期 頁 85-102 (民國一百零一年),臺北:中華佛學研究所
ISSN:1017-7132
Introduction
Christoph Anderl
The papers collected in this volume are originally based on lectures presented at the
workshop “Resources in the Mark-up and Digitization of Historical Texts”, organized at
the University of Oslo, bringing together scholars from a variety of backgrounds and with
1
different research interests. The workshop was organized in connection to a conference
on various aspects of Dūnhuáng manuscripts and early Chán Buddhism, and also included
2
a TEI meeting.
Whereas the Dūnhuáng Conference was primarily concerned with early Chán in the
Dūnhuáng area and its relation to ritual practices, esoteric Buddhism and Daoism, as
reflected in Dūnhuáng manuscripts and early Chán historiographical materials, one of the
main objectives of the Resources Conference included questions concerning the analysis
of text/manuscript materials and methods of presenting them to scholarly communities
and to the general public.
Many aspects of text/manuscript digitization and text mark-up have undergone
tremendous progress in recent years and it is sometimes difficult to follow the new
developments and approaches in this field. Originally, the workshops focused on East
Asian materials and the scripts used in these texts and manuscripts, however, through the
interdisciplinary approach of including a variety of other languages, and locations, the
discussion was greatly enriched by this comparative approach. The workshop featured
both descriptive presentations of existing database projects (providing an overview of
3
current projects), lectures dealing with specific analytical research questions, in addition
1
2
3
For the conference website, see http://folk.uio.no/christoa/ZenManus_Front.html; see also
http://www.hf.uio.no/ikos/english/research/projects/zen/. The conference (September 27th –
October 1st, 2009) was a joint project with the Institute of Research in Humanities (Kyōto
University), and was co-organized by Christian Wittern.
The conference and workshops were part of a larger interdisciplinary project on Chán/Zen
Buddhist culture, literature, and language at the Department of Culture Studies and Oriental
Languages (IKOS, University of Oslo). The results of the 2008 conference on Chán/Sǒn/Zen
Buddhist rhetoric were recently published in form of an edited volume (Anderl, Christoph. ed.
2012. Zen Buddhist Rhetoric in China, Korea, and Japan. Leiden/Boston: Brill).
This included presentations concerning the Text Encoding Initiative (TEI), CBETA, the Digital
Dictionary of Buddhism (DDB), Thesaurus Literaturae Buddhicae (Oslo Univ.), the Thesaurus
Linguae Sericae Project (Oslo/Heidelberg Univ.), The International Dunhuang Project (British
Library), digitization projects a the Dharma Drum College (Taiwan), a project on the Daoist
Canon (Kyōto University), the PROIEL parallel corpus (Oslo Univ.), the Text Laboratory
4
Chung-Hwa Buddhist Journal Volume 25 (2012)
to approaches focusing on the technological aspects of digitization and mark-up. In the
context of this volume, research-focused presentations primarily concerned with text
analysis and the development of analytical tools were selected.
Through the presentation of a variety of analytical approaches to texts and
manuscripts, as well as methods of (visual) presentations, we hope to stimulate further
interdisciplinary work and collaboration in this field. Digital Humanities have become an
indispensible aspect of continuously increasing significance in historical, religious, and
linguistic studies with all their sub-fields, and it is a research area where technology and
philology (in the broadest sense) interact in a dynamic and fascinating way.
The workshops also focused on some problems arising through the enormous speed
with which aspects of Digital Humanities have been developing during recent times.
Many philologically inclined scholars (including myself!) have originally not been
trained in the arts of mark-up and programming, or the magic skills of transformations, or
have only been confronted with these disciplines during a later stage of their scholarly
development. My interest in XML-based methods of structuring and analyzing texts was
generated by concrete research questions (and – I have to admit – by a certain fascination
concerning the flexibility of XML on the one hand, and the rigid structure which has to be
imposed on the material, on the other hand). Often it is difficult for the non-specialist to
make the right decisions concerning the adaptation of appropriate methods in the
framework of specific research projects. In addition, it is difficult to maintain an overview
of ongoing projects and the technological and mark-up strategies chosen and developed.
As such, collaboration and interdisciplinary communication is crucial for copying with
these challenges, and in order to avoid redundant work. With the publication of this
volume, we hope to make a small contribution to these ongoing developments.
I want to extend my special thanks to the co-organizer of the conference and the
workshops, Christian Wittern. Without him and his expertise in the field of Digital
Humanities, neither the conference nor the subsequent publications would have
materialized.
We want to thank all the participants who contributed during the workshops and
discussions, as well as the staff of IKOS who helped with the organization of the
conference, especially Arne Bugge Amundsen (Head of Department), Rune Svarverud
(Head of Research), Mona Bjørbæk (Head of Academics), Cecilie Wingerei Lilleheil
(Oslo Univ.), digitization projects in the context of the Turfan Collection in Berlin, a project
on Old Japanese syntax at Oxford University, a relational database on Confucius' Analects, as
well as the Organon Knowledge Editor (AnaCypher, Oslo). The presentations gave an overview
of important current projects within the field of Digital Humanities. For more information and a
pdf of the abstracts, see http://folk.uio.no/christoa/Zen%20conference_2009_abstracts_03.pdf.
The meetings were complemented by workshops on text mark-up, in addition to a course on
text mark-up for Master and Ph.D. students at Oslo University.
The Introduction
5
(Research Advisor), and Sathya Sritharan (Economy Officer). Our gratitude also extends
to the conference assistants Bori Kim, Therese Sollien, Øystein Krogh Visted, and Kevin
Dippner. Special thanks to Daniel Paul O’Donnell (TEI chair in 2009) for co-organizing
the TEI meeting.
We are also greatly indebted to Marcus Bingenheimer, Bill Magee, and Su-an Lin
for providing the opportunity to publish the articles in this journal, and their great efforts
in editing and proofreading the papers. Our thanks also extends to the readers/reviewers
who provided many good suggestions for improving the quality of the papers.
The conference for which these papers were written was generously funded by The
Research Council of Norway (NFR), The Chiang Ching-kuo Foundation (Taiwan),
and The Department of Culture Studies and Oriental Languages (IKOS) at the
University of Oslo.
An editorial note on the illustrations and photographs used in the articles:
Originally, several of the articles contained high-resolution color illustrations, as well as
sections of text marked with different colors. Since these features could not be integrated
in the black-and-white printing of the journal, we kindly ask the reader to refer to the
following web site containing these photographs and pictures:
http://www.chibs.edu.tw/ch_html/index_ch00_07.html
Chung-Hwa Buddhist Journal (2012, 25:7-50)
Taipei: Chung-Hwa Institute of Buddhist Studies
中華 學學報第 十 期 頁 7-50 (民國
零
ISSN:1017-7132
),臺
:中華
學研究
Some Reflections on the Mark-up and Analysis of
Dūnhuáng Manuscripts:
Exemplified by the Platform Sūtra
Christoph Anderl (University of Ghent)
Kevin Dippner (Malakoff High School)
Øystein Krogh Visted (Jiao Xie Center for Chinese Culture and Language)
Abstract
This paper deals with several questions and problems related to the editing, digitization
and analysis of Buddhist Dūnhuáng texts. The Dūnhuáng corpus of Chán (Zen)
manuscripts is the most important source for the study of the early history of this Chinese
Buddhist school. The authors discuss paleographic and textual features of the manuscripts
and investigate several possibilities of TEI-compatible mark-up concerning the collation,
translation, annotation, and semantic and syntactic analysis of this type of manuscript
literature, in addition to methods of transformations into visual media. The approaches
are exemplified by an experimental mark-up of the Dūnhuáng versions of the Platform
Sūtra. In the second part of the paper, the newly initiated Chan Database Project is
introduced and collaborative methods of dealing with Chán literature are discussed. In the
appendix to the paper, the system of phonetic loans, as well as scribal conventions and
errors in the manuscript versions of the Platform Sūtra are described and compared.
Keywords:
Platform Sūtra of the Sixth Patriarch (Liuzǔ Tanjing), Dūnhuang Manuscripts, Phonetic
Loan Characters, Analytic Mark‐up, Zen Buddhism
8 Chung-Hwa Buddhist Journal Volume 25 (2012)
檢視敦煌寫 的標記
析—
祖壇經為例
Christoph Anderl (根特大學)
Kevin Dippner (馬拉科 高中)
Øystein Krogh Visted (交 中國文 語 中心)
摘要
篇文 處理有關 教敦煌文獻的編輯,數位 及 析 的問題,而其中有關
禪的文集是研究中國 教宗派 期歷史的重要資源
者討論寫 的 文 學及文
性質並探討許多 文獻編碼協定(TEI)可相容性之標記的各種可能性,而除了影
像的轉 外, 些是有關 類文獻的校對 翻
註解 語意和 法之 析 其方
法是
壇經 的敦煌文 之實驗性的標記為例 而 文的第 部 則是 紹新近
開始的禪學資料專案(Chan Database Project) 及討論處理有關禪學文集的協 方
式 在附錄,則 述並比較 壇經 不 版 的形聲系 ,抄寫慣例及錯誤
關鍵字:
祖壇經
敦煌寫
借字
析性標記 禪學
Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 9
Introduction:
The Significance of Dūnhuáng Manuscripts
1
In ca. 1900, thousands of manuscripts were found behind a wall of the Mògāo 莫高 Cave
16/17 (Dūnhuáng, Gānsù Province, China). Soon after, most of the manuscripts were
removed from China by several expeditions from Great Britain, France, Russia, and Japan.
Today, the majority of the Dūnhuáng manuscripts are stored at various institutions such
as the British Library (Stein Collection) and the Bibliothèque Nationale (Pelliot
Collection), as well as collections in Russia (The Institute of Oriental Manuscripts), Japan
(e.g., Ryūkoku Univ., Kyoto), and China (e.g., The Dūnhuáng Academy, The National
Library of China in Běijīng; Běijīng University Library; there are also collections in
2
Tiānjīn, Shànghăi, and other places in China). Especially since after World War II
‘Dūnhuáng studies’ have developed into a major field of research and today numerous
individual scholars and institutions are investigating the textual and iconographic
materials from a variety of perspectives.
The manuscripts are one of our most important sources for the study of medieval
Chinese religion and culture. Whereas most of the Chinese manuscripts consist of copies
of canonical Buddhist scriptures, there is also a significant amount of texts on popular
religion, as well as sectarian texts. Many of these non-canonical texts were not
transmitted after the Táng Dynasty and the Dūnhuáng materials give us a unique window
for studying Buddhist history, doctrine and practice from ca. the 7th to the 10th centuries.
Texts of the early Chán 禪 Schools, Esoteric Buddhism, Buddho-Daoist texts, ‘popular’
Chinese religion and related topics (including devotional and ritual texts, almanacs,
prognostication and astronomical texts, talisman manuals, etc.) have received special
attention among scholars.
Until the discovery of the Dūnhuáng texts, our understanding of the early history of
Chán was to a great degree based on much later Sòng Dynasty materials and the
3
retrospective understanding of Táng Chán during that period. The study of the
1
2
3
We want to thank the two anonymous reviewers of the article for their many helpful
comments.
For a very good introduction to Dūnhuáng studies and the history of the manuscripts, see the
following webpage (‘The International Dunhuang Project’): http://idp.bl.uk/. 10.000s of
manuscripts and manuscript fragments are digitized in high quality and freely downloadable
(the digitization of the Pelliot and Stein collections is nearly complete, whereas only parts of
the Russian and Chinese collections are included so far). The digitized manuscripts are most
conveniently found by manuscript number, other search functions of the webpage are
unfortunately only at a rudimentary stage.
The Sòng versions of Táng materials were often heavily revised and altered, and,
retrospectively, a Sòng Dynasty understanding of the development of the Chán School(s) was
imposed on earlier materials. Táng texts which did not fit the doctrinal or historiographic
10 Chung-Hwa Buddhist Journal Volume 25 (2012)
Dūnhuáng Chán texts revolutionized the study of the early period in the evolution of
Chán. However, despite the immense progress of Chán studies from the 1970s to the
1990s there are still many texts which have not been properly edited, analyzed or
4
translated, and many problems pertaining to the texts have not been solved.
The Scholarly Value of Dūnháng Manuscripts
The manuscripts are not only an important source for the study of medieval Chinese
Buddhism but also for research in the development of the semantics and syntax of
medieval Chinese, including colloquial grammatical constructions (classifier
constructions, plural formation, coverb constructions, sentence finals, etc.).
There are certain types of Dūnhuáng manuscripts which contain a considerable
amount of vernacular elements, most importantly the so-called Transformation Texts
5
(biànwén 變 文 ) and related genres. Also certain types of Chán treatises contain
important information of the development of medieval vernacular Chinese (e.g., the
6
treatises attributed to Shénhuì and his disciples, and the Lìdài fǎbǎo jì 歷 法寶記). As
such, these materials are important sources for the study of the transition from treatises
written in Buddhist Hybrid Chinese to more vernacular types of narratives (many of these
texts are characterized by containing a considerable portion of passages with direct
speech).7
Copied by hand, the manuscripts are equally important for the study of palaeography
during the Táng period, in addition to scribal conventions and errors, the study of
phonetic loans, dialects, and vernacularisms. Medieval manuscripts are a significant
source for reconstructing the development of Middle Chinese with its colloquial
vocabulary and vernacular grammatical constructions. Many grammaticalized function
words still current in Modern Mandarin and other modern varieties of Chinese originated
during the late Táng (or, more precisely, surfaced in texts during that time). Thus, some
4
5
6
7
standards of the Sòng Dynasty were often not transmitted at all (on “text sanitation” during
the transition period from Táng to Sòng, see for example Anderl 2012a, 16-26).
E.g., the interdependence between texts; there are also few properly collated and annotated texts
at this point, and many textual and philological problems have only been touched upon.
On the genre of Transformation Texts, see for example Mair (1989).
For a recent excellent study of that text, see Adamek (2007).
Naturally, vernacular elements appear in passages recording direct speech and as such
reflecting the spoken word to some degree. This can be also observed in another early
vernacular text dating from the middle of the 10 th century, the Zǔtáng jí 祖堂集 (ZTJ). In
this text, the frame narratives are usually using a more conservative language whereas many
of the passages in direct speech are written in the vernacular (on aspects of the language of
ZTJ, see Anderl 2004; more generally, on the features of vernacular Chán texts, see Anderl
2012a).
Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 11
manuscripts contain many early written forms of function words used in spoken Chinese.
Since many of these function words were representations of words used in the spoken
language, Chinese characters were loaned in order to present their phonetic value. It was
usually not before the Sòng period that specific characters were created to represent these
colloquial words. A good example is the appearance of the pronoun shénme 什麼 (什么)
which was written in various forms on Dūnhuáng manuscripts, e.g., 是沒 (Dūnbó 77), 是
摩 / 甚摩 (Stein 2503), 甚謨 (Stein 2669), 甚物 / 甚沒 (Bǎolín zhuàn 寶林傳, 801 AD),
甚麼 (10th cent.). Dūnhuáng Chán materials reflect different degrees of colloquialisms,
depending on the period they were written in and which genre they belong to.
The Chan Database Project (CDP)
The recently initiated CDP8 aims at electronically publishing Chán texts with a critical
apparatus and a set of analytical modules. In this paper, certain strategies and problems
concerning this aim will be discussed. Although a variety of Chán texts (including the
printed editions starting from the Sòng Dynasty) are included in this project, one of the
major challenges will be the technical and analytical framework for the publication of the
corpus of the Dūnhuáng Chán manuscripts. In this paper, only a few problems will be
addressed and illustrated by an experimental edition of the Dūnhuáng manuscripts of the
famous Platform sūtra.9 The aim was the production of a collated and annotated version
of the Dūnhuáng Platform sūtra which allowed annotations and comments on several
aspects of the text.
One of the motivations for the initiation of such a project was the realization that—
despite the above described importance of the manuscripts in terms of Buddhist and
linguistic studies—there are frequently no authoritive and collated editions of many
important manuscript texts, and often the philological and linguistic aspects have been
somewhat neglected in the study of the materials. In many studies of Chinese Buddhist
texts in the West, there seems to be an overall contrast to the approach taken in the
research on Sanskrit Buddhist texts and Gāndhārī manuscripts, for example (which shows
a strong emphasis on thoroughly edited texts and philological studies).10 Not only being a
8
This project was originally initiated by the late John McRae, Christian Wittern, and Christoph
Anderl, and aims at creating and applying tools for editing and analyzing Chán/Zen Buddhist
texts, as well as organizing collaboration within the field of Chán/Zen Buddhist text studies.
9
This work on the Platform sūtra edition was originally started as a master class on Buddhist
Dūnhuáng texts at Oslo University taught by Christoph Anderl, with Christian Wittern (Kyoto
University) supervising the work on TEI compatibility and programming. The basic
programming and transformation of the xml mark-up was done by Kevin Dippner. The markup and anaylsis was done by Anderl and Visted. We want to thank all participants of the
course for their helpful comments.
10 An exception to this tendency is the study of (early) Buddhist translation literature in China;
12 Chung-Hwa Buddhist Journal Volume 25 (2012)
purpose in itself, thorough philological research on the texts will reflect back on our
understanding of their contents, as well as being helpful in contextualizing them
historically and intertextually.11
Some Important Features of the Manuscript Texts
Variant Characters
The study of character variants has developed into a significant subfield in the study of
Dūnhuáng manuscripts and the materials are important sources for the study of the
orthography and writing conventions of the Táng period. The history of many ‘nonstandard’ characters is extremely complex and important for deciphering the texts.
Historically, many Chinese characters which served as models for establishing the
abbreviated characters in the process of the language reforms in 20th century China, were
actually based on ‘vulgar’ (and other) forms of Táng and Sòng characters, in addition to
‘ancient’ forms of characters which were revived during these periods. After the Táng,
Dūnhuáng texts gradually ceased to circulate in China and many forms of characters
typical for Dūnhuáng writing conventions were forgotten or became obsolete. On the
other hand, many character forms were transmitted to Japan and continued to circulate
there until modern times.12 By recording the palaeographic features of the manuscripts
11
12
these studies are deeply influenced by the philological approach of Sanskrit/P li studies.
Specifically, modern Chán Buddhist studies in the West often seem somewhat reluctant to
approach texts also from a linguistic and philological angle, occasionally resulting in
interpretations and translations based on a fragmentary understanding of the language they
are written in. Part of the problem is maybe the fact that there is hardly any systematic training
in the semantics and syntax of Buddhist Hybrid or Medieval Vernacular Chinese at Western
universities. These types of texts are in many respects fundamentally different from texts written
in ‘Literary Chinese’ (for a good contrastive case study, see for example Harbsmeier 2012; for a
grammar of the vernacular language of the 10th century, see Anderl 2004).
Interesting examples are the contractions
(for púsà 菩薩 ‘bodhisattva’),
(for nièpán
(for pútí 菩 ‘bodhi’) which were widely used in Dūnhuáng texts
涅槃 ‘nirvāṇa’), and
but eventually ceased to be used in China. However, these characters continued to circulate in
Japan and are nowadays even frequently recognized by non-specialists! For a list of special
characters used in Japanese Buddhist manuscripts, see Ui (1983). The history of many
Dūnhuáng variants needs further investigation. Dictionaries such as the Lóngkān shŏujìng 龍
龕 鏡 (10th century) were criticized by scholars of subsequent periods for containing
unusual Chinese character forms. However, after the discovery of the Dūnhuáng manuscripts
in 1900 it became clear that the motivation for the compilation of this dictionary aimed at
providing the reader with the correct pronunciation of characters, as well as providing
reference to non-standard characters widely circulating on handwritten manuscripts and
inscriptions. Even for early Sòng Buddhists themselves, it had become difficult to understand
texts written in countless different forms of characters. Establishing the ‘correct’ (zhèng )
Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 13
and collecting them in a database, the development of the Chinese characters during these
periods can be studied in a more systematic way. 13 In addition, orthography and
calligraphy can be an important factor in dating the copies of the manuscripts.
In many Dūnhuáng materials, multiple forms of the same character can appear in the
very same text. Below, there are a few examples of character forms appearing in the
beginning section of the Stein (left) and Dūnbó (right) versions of the Platform sūtra:14
Scribal Errors and Conventions
By contrast to the often heavily edited and revised printed Chán scriptures of the Sòng
period (many of them eventually being integrated in the official Buddhist canon
sanctioned by the imperial court), Dūnhuáng Chán manuscripts were copied by hand
and—besides giving us information about the early stages of a text’s formation—are a
rich source for studying scribal conventions during different periods of the Táng dynasty,
in addition to errors and inaccuracies typical for the process of copying. The study and
identification of these typical errors and misreadings (for a few examples, see below)
facilitate the reading of handwritten manuscripts and the identification of corrupt
pronunciation and form was of great concern for the Buddhist scholars during the Táng and
later periods; on the one hand for reasons of philological concerns (there was an amazingly high
level of insight by many Buddhist scholars concerning the phonological, palaeographic, and
semantic aspects of texts), on the other hand based on the assumption that only correctly
pronounced characters/words were soteriologically efficient (especially in the dhāraṇī and
mantra texts which became greatly popular among all Buddhists from the 8th century onwards).
13 On a discussion of character databases, see the article by Christian Wittern in this volume.
14 There are both differences in character shapes internally (i.e., within the same text) as well as
compared to the other manuscripts.
14 Chung-Hwa Buddhist Journal Volume 25 (2012)
passages. Dūnhuáng manuscripts are also a rich source for studying conventions of
adding diacritics and markers in the texts. During the process of editing texts during the
Sòng dynasty, these markers (including section markers) were usually removed. Thus,
Táng dynasty manuscripts give us important information not only on the process of
copying but also on the conventions of reading the texts15 (often, markers are inserted by
the reader or monastery librarian rather than the copyist).16 A rich source for errors is the
similarity of characters in their handwritten forms which—in the process of copying—
are confused with each other.
Dūnhuáng manuscripts are also an very important source for the oral features of
texts and the phonetic loans used in them (for a list of phonetic loans in the Platform
sūtra, see the Appendix to the article). An important subtype are dialect phonetic loans
which appear in a number of manuscripts and usually reflect the language of the
Northwestern regions during the periods of the Táng Dynasty.
Some Important Aspects in the Digitization of Buddhist
Manuscripts
The digitization of Buddhist texts and the availability of manuscript facsimile have
progressed immensely during the recent years. This opens for the possibility to develop
tools for enhancing our understanding of these texts and manuscripts through an
analytical ‘fine-reading’.
Analytical Modules
The multi-faceted features (paleography, orthography, linguistic and Buddhological
aspects, etc.) of manuscript study call for flexible approaches in the study of the
E.g., there are ‘performance markers’ (text portions usually inserted with smaller characters) in
the manuscripts, suggesting that the scripture was used in ritual contexts related to the bestowal
of the precepts/commandments. The inserted passage informs the reader how often sets of
precepts have to be recited unisono during the ceremony. These markers are usually not
extant in the Sòng editions.
16 For an interesting study of these markers, see Galambos (forthcoming). For a more thorough
forthcoming study on these features of the Platform sūtra, see Anderl (2012b). In this paper, I
also try to show that a thorough philological approach can unravel new aspects of a text.
Concretely, a study of the textual features, internal structure, and intertextual relations (i.e.,
certain features typical for ‘esoteric’ texts can be found) of the Platform manuscripts suggest
certain re-evaluations of the text, for example, the possibility that the title Tánjīng 壇經
(Platform sūtra) originally did not refer to the text itself at all, but rather to the Diamond
sūtra, a text which was especially important in the Platform rituals of conferring the
Mah y na precepts at large congregations. As such, the text itself originated possibly as a
commentary to the Diamond sūtra, and the Platform sūtra only gradually developed an
‘internal’ reference to itself (for a detailed forthcoming study, see Anderl 2012b).
15
Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 15
materials. 17 The development and implementation of XML-based markup seems to
accommodate many needs in this respect, including analytic ‘modules’ for different
purposes, the possibility for constant revision, multiple transformations and visualizations,
as well as entering into an interactive dialogue with the ‘text consumer’ or fellowresearcher.18
Some Objectives for the Study of Chán Texts
- Web-based editions of important Chán manuscripts and texts can be permanently
updated, extended, and revised.
- Once developed, the edited texts can be analyzed by a set of analytical tools (e.g.,
syntactic analysis, terminology/dictionary tools, ‘text dependency’ analysis, character
analysis).
- Chán materials in non-Chinese languages (e.g., Tibetan, Uighur, Tangut, etc.)—which
are of great importance for the development of this branch of Buddhism in the East
Asian context—have so far been rather neglected in Chán studies.
- Manuscripts give us a unique insight in the processes of text production and
reproduction (as opposed to extant printed texts edited and ‘sanitized’ during the Sòng
period, for example). A thorough documentation of these features is the basis of a
better understanding of these processes. A documentation of textual features is not only
important for palaeographic and linguistic studies but also in the framework of religious
studies; e.g., the textual build-up and structure can give us important information on the
development of a text, which again might reflect the evolution of doctrines, lineage
systems, for example. In addition, the study of textual features can be important for the
17
18
A similar approach was taken in a recently initiated database project on Buddhist narratives at
the Ruhr University Bochum (The Mercator/Ceres Database of Buddhist Narratives; edited
by Christoph Anderl and Jessie Pons). Based on the diversity of the materials (both textual
and iconographic materials, in addition to information on locations), a system of dynamically
interconnected sets of sub-collections was used in the XML database. According to specific
needs arising during the concrete work with the iconographic and textual materials, customtailored tools and modules are developed and implemented (e.g., input masks for subsets of
data, analytical tools, visualizations, etc.). The ca. 20 sub-databases are held together by a
system of ‘labels’ for narratives, texts/manuscripts, and places (which can be interconnected
to each other). The internal research database has been online since 2011, whereas a public
version will be published in November 2012.
As it is also pointed out in other contributions, the XML approach also contains certain
difficulties, such as the necessity to follow a strictly hierarchical build-up and nesting. Thus,
multiple mark-up of the same text might overlap and offend against this rule. A ‘module’
approach could facilitate the work on the text, i.e., different aspects of the same text are
analyzed and marked-up separately (“stand-off” mark-up; as a by-product, the reader can
activate or deactivate specific modules when reading the text). Another problem is naturally
the time-consuming aspect of implementing analytical mark-up to texts. As such, questions of
quantity versus analytical quality have to be constantly considered and balanced.
16 Chung-Hwa Buddhist Journal Volume 25 (2012)
-
-
-
-
dating of texts, as well as for linking and ‘contextualizing’ them within a corpus/group
of texts.19
The analysis of Chinese characters: The Táng Dynasty witnessed the emergence of
numerous new character forms (specifically vulgar and abbreviated forms of Chinese
characters).
Syntactic analysis (see below).
The development of Chán terminology: The mark-up and registration of Chán
terminology in the relevant texts can provide researchers with important information of
the evolution of terms.
A ‘text dependency’ module will enable the mark-up of relationships between texts and
parallel passages. This will facilitate the study of the often complex relations between
texts or text portions and also aid in the dating of the manuscript texts. Such a tool
would also help researchers to retrace the origin, development, and interdependence of
themes, topics, ideas, and concepts as they appear in texts from various periods. Ideally,
instead of marking-up text portions or narrative sections by hand, dependent texts could
be automatically identified by sets of overlapping items.
Dictionary module (e.g., the linking with internal referential databases or external
databases such as the DDB).20
19
See also the Appendix to the paper: the study of manuscripts features can give us important
information on the actual function of texts, e.g., the emphasis on ‘orality’ and ritual functions
(as indicated by ‘performance markers’ which were often removed in edited and printed
versions of texts).
20 On the Digital Dictionary of Buddhism (DDB), see Charles Muller’s article in this volume.
Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 17
Illustration 1: Library building at Haein-sa 海印寺 where the Tripiṭaka Koreana is stored
(Second Kǒryo 高麗 edition; also referred to as Chaejo Taejanggyǒng 再調大藏經). The
project was initiated in 1236 by King Kojong 高宗 in order to secure help from Buddhas and
Bodhisattvas against a pending invasion of Korea by foreign armies (i.e., a project in the context of
‘state-protecting Buddhism’). The work of carving the 81.258 wood blocks (most of them carved
on both sides, amounting to 162.516 surfaces) lasted until 1251. One woodblock measures ca.
67x23 cm and is ca. 3 cm thick, weighing around 3,5 kg. There are typically 23 lines carved on
each surface, each line consisting of 14 Chinese characters (ca. 322 per surface), totaling about
52.330.000 characters. After having disappeared from China during the Song dynasty, the text
survived in Korea and was carved in the 15th century as part of the ‘supplementary canon’ of the
Tripiṭaka Koreana. However, the text was never printed before the printing blocks were
rediscovered in the beginning of the 20th century in Korea. ZTJ (which is one of our main sources
of early Chán historiography) was carved on 386 surfaces (ca. 190.000 characters). Today, the
canon is still stored in the library building which dates back to the 15th century. There was an
attempt to move the printing blocks to a modern library facility but within weeks the woodblocks
started to decay and had to be returned to the old building. The original building appears to have
been designed intuitively to provide ideal storage conditions (e.g., windows of different size insure
natural ventilation; a special kind of moisture-absorbing clay which covered the floor; the way the
woodblocks are arranged on shelves; etc.).21
21
Photograph by C. Anderl; on the background of the printing of ZTJ, see Anderl (2004, 1:2-52).
18 Chung-Hwa Buddhist Journal Volume 25 (2012)
Illustration 2: Detail of a printing-block of ZTJ; scribes outlined each character on the
woodblock in mirror-writing and afterwards the wood surrounding each character was chiseled
out; the tool marks are still recognizable on the blocks; the wood (birch tree) is of exceptional
hardness and was especially prepared for carving during a process lasting several years
(photograph by C. Anderl).
Work-steps in the Establishment of a Chán Database:
- Determining the text corpus22
- Input and text collation
- Linking of facsimiles with digital editions
- Basic mark-up and linking the text with reference materials (e.g. information on proper
names, Buddhist terms, etc.)
22
The most important groups of materials consist of (1) Dūnhuáng texts, (2) the printed texts of
‘classical’ Sòng Dynasty Chán (including primarily historical transmission texts (chuándēng
lù 傳 錄), recorded sayings texts (yǔlù 語錄), and collections (gōngàn 案); (3) materials
which complement and contextualize the above materials, e.g., letter-exchanges between
monks and officials, descriptions of Chán Buddhism in non-Buddhist materials, funeral and
pagoda inscriptions, imperial edicts, Neo-Confucian yǔlù, ritual texts, texts on monastic rules,
iconographic materials, lineage charts and other diagrams, etc. Another important aspect is
the inclusion of non-Chinese materials (e.g., in Tibetan, Tangut, Uighur). Whereas the corpus
of (2) is relatively easy to determine, it is considerable more difficult to pinpoint the relevant
Dūnhuáng manuscript materials. The point of departure are the texts listed in Yanagida
Seizan’s Zenseki kaidai 禪籍解題 (Nishitani, Keiji 西谷啟治/Yanagida, Seizan 柳 聖山
1974, 445-514). This list was recently expanded by Tanaka, Ryoshū; see also Sørensen (1989)
for a discussion of early Chán materials (with an emphasis on the esoteric texts). There needs
to be done more research concerning the manuscripts stored in the minor collections (e.g., the
collections of the Peking University and the Peking National Library, and those in Shànghǎi,
Ti njīn, Dūnhuáng, etc.).
Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 19
- Development and implementation of analytical modules (terminology, syntactic analysis,
text dependency,…)
- Collaboration, development of (multiple-user) ‘interfaces’,23 specific projects, etc.
Illustration 3: Experimental transformation of a Zǔtáng jí mark-up into an edited text parallel
to the woodblock facsimile. Circled items mark place and personal names, respectively, and can
be connected to referential databases on proper names. In addition, the edited text was linked with
an XML version of Anderl’s grammar on ZTJ. Entries in the grammar are automatically matched
with the text and the grey dots on top make the grammatical annotations by Anderl visible (the
initial mark-up of ZTJ and the transformation/programming was done by Christian Wittern; this
version of ZTJ is currently off-line).
23
The implementation of input- and analysis-interfaces for specific tasks can facilitate the work
on the mark-up considerably, as compared to the time-consuming work in programs such as
Oxygen.
20 Chung-Hwa Buddhist Journal Volume 25 (2012)
Illustration 4: This diagram shows the complex interrelation between the manuscript and
printed versions of the Platform sūtra (the diagram is drawn based on Yáng Zēngwén’s
reconstruction of the genealogy of the text). 24
The Mark-up of the Platform Sūtra:
Collations
Many Chán texts exist in several versions, having varying textual features. An important
issue for analytical web editions will be the collation of these manuscripts and the
inclusion of other important witnesses (on the Platform sūtra versions, see ill. 4; for a
short description, see the bibliography).25
In the concrete work on the Platform scripture one of the specific problems was
related to the question how the label <lem> should be applied. All manuscripts of the
Dūnhuáng text contain a great amount of errors, phonetic loans, and corrupt passages.
The <lem> labels was—somewhat atypically—used for marking an ‘ideal’ reading of the
text; thus it is the ‘reconstruction’ of an ideal textual version according to the view of the
24 Yáng (1993, 297) and Lǐ (1999a, 19).
25 In the work on the text, it was attempted to include all extant manuscript witnesses
(Or.8210/S.5475, Dūnbó 77, BD.48; the Lǚshùn manuscript was recently ‘rediscovered’ in
China; however, no facsimile reproductions were accessible during the work on the text), in
addition to occasional references to Sòng printed versions. For a description of the manuscripts,
see Anderl (2012b); for the Sòng editions, see Schlütter (2007, 394-405).
Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 21
editors. The differing readings of the other witnesses are added with the <rdg> label. In
future versions of the web publication there will be the choice to read the text according
to one specific manuscript version or to read an ‘ideal’ text with notes on the readings of
the differing versions.
Illustration 5: Portion of the Platfom sūtra mark-up and manuscript collation in Oxygen. Note
that sentence and phrase borders are generated with the <s> and <phr> tags. The basic mark-up
contains references to personal names (‘persName’, subdivided into several categories), title
(‘roleName’, with subdivisions), place names (‘placeName’), and terms (‘term’, with subdivisions).
The collation within the apparatus <app> includes references to an ‘ideal’ reading according to the
editors and mostly based on a manuscript witness. If all manuscripts have ‘corrupt’ readings, than a
<lem> reading according to a later Sòng edition and/or the editors is established (e.g., <lem
wit="#Editor">). Notes on the collation and the witnesses are inserted with <witDetail>, including
references to the secondary literature. Additions, notes, deletions, etc. are also recorded in the
manuscript description.
22 Chung-Hwa Buddhist Journal Volume 25 (2012)
Example of Recording and Commenting Different Readings:
<app><lem wit="#Stein_5475"> 業 </lem><rdg wit="#Dunbo_77" type="errShape"
xml:id="w093-02"> 葉 </rdg><witDetail target="#w093-02" wit="#Dunbo_77">The
characters 葉 and 業 are frequently confused with each other in Dùnhuáng treatises. Note
that they have the same pronunciation and at the same time are similar in shape with each
other. As such, this is a a “mixture” of errShape and phonLoan, or a case where characters
are habitually interchanged with each other although they do not have a direct connection
with each other.</witDetail></app>
Within the apparatus (<app>) the lemma (<lem>) establishes the ‘correct’ reading
according to the witness “#Stein_5475”, whereas the corrupt’ reading in the Dunbo_77
manuscript (wit=“#Dunbo_77”) is cited within <rdg>, with references to the type of
corruption (type=“errShape”, i.e. based on the an confusion of handwritten characters).
Details on the type of corruption are provided in <witDetail>.
Example of Recording a Scribal Intervention:
<app><lem wit="#Stein_5475 #Huixin"></lem><rdg wit="#Dunbo_77" type="annotation" hand="reader"
rend="small"><add place="right"> </add></rdg></app>
In this example the ‘correct’ reading (<lem>) is indicated as the absence of a character
(by the lack of any information between the <lem></lem> tags) which is incorrectly
inserted in Dunbo_77 manuscript on the right side (place=“right”) by an unidentified
‘reader’ of the manuscript (this can be for example either the copyist himself, a later
reader or a temple librarian who archived the manuscript, hand=“reader”), rendered in
small characters (rend=“small”).
XSL defining the transformation into HTML for the <app> element (including
<lem>, <rdg>, <witDetail>, etc.), with inserted programming commands in Javascript:
<xsl:template match="tei:app">
<div class="balloonstyle" id="{generate-id(.)}">
<xsl:text>Reading(s):</xsl:text><br/>
<xsl:apply-templates select="tei:rdg"/>
<xsl:apply-templates select="tei:witDetail"/>
</div>
<a rel="{generate-id(.)}" onclick="right_side('{generateid((preceding::tei:pb[@ed='#Stein_5475'])[last()])}','{generate-id(.)}');"><xsl:apply-templates
select="tei:lem"/></a>
</xsl:template>
<xsl:template match="tei:lem">
<font color="00bb00"><xsl:apply-templates/></font>
</xsl:template>
<xsl:template match="tei:rdg">
<script type="text/javascript">document.write(getWitName("<xsl:value-of
select="@wit"/>"));</script>
<xsl:text>:</xsl:text><br/>
<script type="text/javascript">document.write(getRdgErrorType("<xsl:value-of
Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 23
select="@type"/>"));</script>
<xsl:text>: </xsl:text>
<xsl:apply-templates/>
<br />
</xsl:template>
<xsl:template match="tei:witDetail">
<p/><xsl:text>Details:</xsl:text><br/>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="tei:teiHeader">
<xsl:variable name="witnesstext"><xsl:apply-templates select="//tei:witness"/></xsl:variable>
<script type="text/javascript">
function newWindow()
{
var generator=window.open('','vindu','height=500,width=600,scrollbars=1');
generator.moveTo("300","150");
generator.document.write('<html><head><title>Witness details</title></head>');
generator.document.write('<body bgcolor="#aaaaaa"><h2>Witness
details</h2><br/><xsl:value-of select="normalize-space($witnesstext)"/>');
generator.document.write('</body></html>');
}
</script>
<a href="javascript:newWindow();"><div align="center"><b>View witness
details</b></div></a>
</xsl:template>
<xsl:template match="tei:witness">
<xsl:text disable-output-escaping="yes"><h3></xsl:text><xsl:value-of
select="@xml:id"/><xsl:text></h3></xsl:text>
<xsl:variable name="a">'</xsl:variable>
<xsl:variable name="b">"</xsl:variable>
<xsl:value-of select="translate(., $a, $b)"/>
</xsl:template>
24 Chung-Hwa Buddhist Journal Volume 25 (2012)
Illustration 6: A ‘tripartite’ visualization of the marked-up text: On the left, the facsimile
reproduction of the manuscript passage; in the middle, the collated version of the text, circled
passages indicate parts where the manuscripts have different readings. The ‘ideal’ reading (<lem>)
of the text can be chosen, or one of the readings recorded in the <rdg> section. By clicking on the
green text portions the information on different readings is projected to the right column. Proper
names are underlined. Translations and notes in the middle can be shown or hidden. In upcoming
versions, the digitized text will be arranged vertically. Mark-up and text collation by C. Anderl and
Ø. K. Visted; transformation/programming by K. Dippner (with support by C. Wittern). In order to
encourage scholarly collaboration and permanent revision of the entries, future versions envisage a
‘comment box’ (concretely, the above entry could be modified by noting that wú 吾 actually did
not become “obsolete” after the Hàn but that the usage of the pronoun decreased until the Middle
Táng period).
- As part of the collation process, the differences between the witnesses were analyzed
and categorized (phonetic loans; erroneous characters because of similar shapes; added
characters; scribal interventions, etc.). Since this type of mark-up is very timeconsuming other possibilities for collating texts should be considered, e.g., the
digitization of electronic versions of different manuscripts which successively are
‘overlapped’ and a record of the differences automatically generated. As a second step,
these differences have to be ‘manually’ analyzed. In addition, specific interfaces for
mark-up work could be developed.
Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 25
Typology of Textual Features in Manuscript Collations:
- General ‘visual’ features, i.e., information about paper features, writing tools, text
arrangement, general character size, characters per column/line, alignment of
columns/lines, features of the title section, calligraphic/paleographic information: the
description of these important features are difficult to integrate in the formalized
collation itself; alternatively, more ‘narrative’ descriptions of manuscript sections could
be useful, or an integration in the ‘head’ section of the mark-up. As a useful aspect of
the ‘tripartite’ visual presentation of the material, these features can be directly viewed
in the facsimile reproduction represented to the left.
- Markers and scribal interventions 26 (punctuation, repetition markers, markers for
reversing reading sequence (e.g. ), markers for superfluous characters (e.g. ),
27
scratched out characters (
), empty spaces, inserted characters, small-sized
characters): information on these features is integrated in the ‘collation’ part of the
manuscripts.
Example of a passage with characters inserted to the right side of the column/line: As an
interesting feature, the text in small characters also includes repetition markers (rm)
which do not mark the repetition of a single characters, but the group of characters
preceding it (and, in addition, this group extending beyond sentence borders): this being
the case, the passage must be analyzed in the following way:
…
祖[弘忍 rm 和尚 rm]問
能… > …
祖弘忍和尚
弘忍和尚問
能…
- Textual variations and ‘deviations’: this includes information on ‘missing’ characters,
superfluous characters, corrupted characters,28 superfluous characters, phonetic loans,
the wrong sequence of characters: An important aspect here is not only the recording of
these deviations but also reflections on their typology and causes.29 Other variations
It is sometimes difficult to decide by which ‘hand’ these interventions were inserted, either by
the copyist himself (who read through his copy of the manuscript), by an owner/reader, or by
a temple-librarian. Sometimes, manuscripts have layers of interventions and annotations.
27 Stein 5475:03.01; Stein 5475:20.04.03.
28 Corruptions are often caused by the speed of the copying process, and by the decreasing
capacity of concentration in the course of copying a text. Many of the corruptions are
inherited from one copy to the next, and in some cases become even fixed parts of a text. One
special type of corruption concerns the ‘miscopying by context’, i.e., the copyist copies a
characters which appears in the columns/lines to the right or left. Another corruption could be
called ‘miscopying based on conventionalized sequences’ and often appears in disyllabic
terms/words: the copyist replaces a somehow unusual character combination with one which
is ‘fixed’ in his mind, e.g., frequently used Buddhist terms.
29 For a typology of phonetic loan characters and the miscopying based on vernacular, handwritten
forms of the characters, see the Appendix.
26
26 Chung-Hwa Buddhist Journal Volume 25 (2012)
encountered consists of the replacement of characters by (near-)synonyms or the
replacement of a term/concept by a related term/concept.
Examples for Frequently Miscopied Characters, Based on Their Handwritten Forms
>
伐 >
特 > 持
> 自
(Stein 5475: 04-01-09)
30
(Stein 5475: 05-03-02; etc. )
(Stein 5475: 04-02-05)
(Stein 5475: 05-02-10; 05-04-02)
>
Stein 5475: 09-01)
> 但 (
31
記 > 訖 (Stein 5475: 04-11-17)
Some of the Many Handwritten ‘Vulgar’ Forms of Characters Found in the
Platform Manuscripts:32
zuì 最 (modification/replacement of the determinative and right part of the
phoneticum)
bān 般 (modification of the upper right part of the phoneticum, typical for
handwritten/inscribed forms during that period)
jīng 經 (abbreviation of the phonetic part)
鍵
xiàng 相 (replacement of the determinative and modification of the
phoneticum)33
入
文
jiān 兼 (modification/replacement of the lower part of the character)
件
的
shēng 昇
引
文
30 或This error can be found throughout the manuscript! For a thorough list of this type of errors,
重see the table in the Appendix.
31 點Note that the error is also motivated by the fact that the compound 集記 appeared earlier in
the manuscript (‘error generated by the context’).
32 的Recently, many good reference works on Dūnhuáng variant characters have been published in
摘the PRC. A very good resource is also the ‘The Dictionary of Chinese Character Variants’
要(http://140.111.1.40/main.htm), recording more than 100.000 different variants and providingth
references to dozens of historical dictionaries (of major importance in this respect is the 10
century Lóngkān shŏujìng 龍龕 鏡).
33 您In the handwriting of many Dūnhuáng manuscripts, the number of strokes within ‘boxes’ is
可often modified, and structural elements such as 目 and 日 become undistinguishable.
將
文
字
Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 27
zuò 座 (modification of the left upper part of the phoneticum, 人 >
,
typically the same modification appears in other character containing the
phoneticum 坐; compare also the right upper part of bān above.)
xué 學 (a typical way of writing 學 in certain Dūnhuáng manuscripts; it is
not incidentally that the replacement wén 文 ‘pattern; Chinese character;
literature’ is chosen for the character meaning ‘to study’; this is actually an
ancient form of this character.)
zōng 宗 (an odd variant form of this characters, replacing both the determinative
and modifying the phonetic part)
zhǐ
(‘slight’ modification of the upper part)
dì 遞 (a radical abbreviation of the phonetic part)
- The edition should be flexible enough and allow annotations and comments on several
levels (multiple translations; multiple comments; linguistic analysis,…). These modules
can be made visible or excluded, according to the interests of the reader.
Tripartite Structure
An important question is how to ideally structure and visualize the edition of such a text.
Also in this respect, the flexibility of XML is convenient since different types of
visualization can be generated according to specific purposes (e.g., printed editions,
different types of web editions, ‘working’ editions, etc.). For our project, the following
solution was chosen: on the left side, a reproduction of the original (inhibited by copy
right limitations; in the text version only the Stein version is visible); in the middle, the
edited and collated text; on the right side, the annotations to the textual features (see ill. 6).
Some Notes on Syntactic Analysis
One of the challenges of the CDP is to find proper methods for recording the textual and
linguistic features of Dūnhuáng texts, in addition to providing other analytical tools.
Many manuscripts pose great problems in terms of linguistic analysis, also due to the fact
that many texts have heterogeneous (hybrid) features, i.e., integrating a variety of
syntactic and semantic features based on a variety of styles, genres, and periods of
language development. The section on grammatical mark-up in the TEI manuals is in this
respect not fully developed yet and maybe also has to be better adapted to non-European
28 Chung-Hwa Buddhist Journal Volume 25 (2012)
languages. 34 For consequent syntactic mark-up it would be also necessary to develop
visual adds and interfaces for specific analytical purposes.
Ideally, there should be the possibility for a layered analysis which covers different
features of a text, e.g., the mark-up of syntactic units and the relationship between them,
the identification and analysis of grammatical function words, the marking of modal and
style features, etc. These reflections on useful grammatical analysis are still in a very
tentative stage since considerable technical problems are involved.
In terms of Literary Chinese/Buddhist Chinese, an ‘immediate constituent’ approach
for the analysis of sentences seems to be useful since the sentence structure fits well to
the hierarchal structure of XML mark-up. As such, the syntactic units are identified and
their relationship between them determined. This kind of approach could be enormously
useful as an aid for producing more analytical approaches to Buddhist texts and
eventually more reliable translations.
Another promising approach is the implementation of an underlying narrative
grammar in XML-format which is linked to the texts (as described in the example above,
where in a collaborative effort a mark-up version of ZTJ by Wittern was linked to a XML
version of Anderl’s grammar on the text).35
In the course of the work on the Platform sūtra, several possibilities concerning the
linguistic mark-up were considered. However, these consideration are only in an
experimental stage (one problem is also the time-consuming aspect of this mark-up).
34
35
For a very interesting approach for the mark-up of Old Japanese see the article by Kerri L
Russell and Stephen Wright Horn in this volume.
After the transformation, the XML file of the grammatical notes still has to be ‘cleaned-up’
for the next version.
Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 29
Illustration 7: Mark-up of a sentence in the Platform sūtra; <s> and <phr> are used in order to
indicate the phrase structure and constituents are broken down until word level (<w>), specified
with ‘type’ and ‘subtype’; further specification by ‘function’ and ‘ana’ elements ; ‘next’ and ‘prev’
are untypically (in terms of their definition in the TEI manual) used to indicated relations between
immediate constituents; in future version, this will be replaced by ‘links’ (which will be used to
define the relations between the phrases).
Illustration 8: Possible ‘visualization’ of a grammatical mark-up based on the immediate
constituent analysis; successive analytical ‘break-down’: sentences level, phrase level, word level,
etc. The relationship between the constituents is indicated by a set of symbols.
30 Chung-Hwa Buddhist Journal Volume 25 (2012)
Appendix: A Comparison of Some Textual Features of
the Platform Manuscripts
Conventions Used in the Table with Notes on the ‘Northwestern’ Dialect
In the table below, the variations in the use of Chinese characters in the four manuscripts
are compared.36 The addition and deletion of characters and other aspects of important
differences between the manuscripts are not taken into account here.37 The focus is on
phonetic loans, alterations of parts of the characters (such as the determinative or
phonetic parts of the Chinese characters) and on mistakes made by the copyists based on
similar (and often ‘vernacular’) shapes of the characters in the handwritings. There is also
a minor category marked with ‘c’, indicating mistakes based on the context in which the
characters appear.38
In addition to the registration of the ‘dialect phonetic loans’ it was attempted to
analyze the system of ‘regular phonetic loans’ as well. Occasionally, it was difficult to
determine whether a character variation was caused by an alteration of the determinative
part (a very common phenomenon encountered in Dūnhuáng manuscripts) or should
rather be interpreted as a phonetic substitution. It can be shown that except the rather high
number of dialect loans and a few number of other uncommon phonetic loans, the
manuscripts of the Platform sūtra generally use a system of more or less established
phonetic substitutions, some having a very long tradition. As such, the use of phonetic
loan characters is by no means arbitrary in the manuscripts.39
Attention has been given to the uncommon phonetic loans based on the dialect of the
Northwestern region during the late Táng period. These loans are marked with ‘*’ and
In the table, the Dūnbó 77 manuscript is abbreviated to ‘D.’, Stein 5475 to ‘S.’, the
Běijīng manuscript to ‘B.’, the Lǚshùn manuscript to ‘L.’ (for a discussion of these
manuscript copies, see Anderl 2012b). To the left, the assumed ‘correct’ character is
listed. References to the later K sh ji (‘K.’, reflecting the Huìxīn version, based on
Yampolsky’s edition) and Z ngbǎo (‘Z.’) editions are only provided occasionally for
purposes of comparison. It also nicely illustrates how loans and mistakes were ‘normalized’
or ‘sanitized’ in the Sòng versions of the Platform sūtra (on these issue, see also Schlütter
1989 and Anderl 2012a, 16-26). The characters are usually listed according to their first
appearance in the manuscripts, however, phenomena such as phonetic loans which are related
to each other are grouped together (the characters taken out of their order of appearance are
marked with ‘/’). This method aims at allowing a more direct comparison and illustrating
‘clusters’ of phonetic loans, for example.
37 Concerning this aspect of the manuscripts, see Anderl (2012b).
38 E.g., the case when the copyist mistakenly inserts a character which also appears in the right
or left line/column.
39 References to two large dictionaries on phonetic loans have been used in the anal ysis of
the system of loan characters (Loan 1 and Loan 2, see the bibliography).
36
Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 31
references to explanations in Dèng and Róng (1999) are provided. These loans are of
great importance for determining the regional character of the manuscript copies and the
differences in the use of this kind of loans among them. Although the Stein, Dūnbó and
Běijīng manuscripts all use dialect loans, it is very obvious that they are most commonly
used in the Stein manuscript (i.e., the ‘*’ appears most frequently in the ‘S.’ column of
the table). The abundant use of regular and dialect loans also shows the important role of
‘orality’ in this type of manuscripts, i.e., the recording of the ‘sound’ of these texts was
more important than focusing on orthography and finding the ‘standardized’ characters.
This phenomenon can be observed in many Dūnhuáng manuscripts but seems to be
especially current in texts originating during the Táng period (as, for example, the Chán
treatises).40 A such, there is an abundant use of phonetic loans in this rather short text, in
40
Luó, Chángpéi 羅常培 (1933) was one of the first who tried to reconstruct the NorthWestern dialect based on a selection of Buddhist scriptures. However, the sources he had
available for this purpose were rather limited. Later on, these dialect studies were expanded
based on the identification of an ever-growing number of Dūnhuáng manuscripts in which
dialect loans were detected. The most important scholar in this respect is Takata Tokio (e.g.,
Takata 1987 and 1988). He discerns two specific types of dialects which can be detected on
Dūnhuáng materials, first, the dialect based on the language of Cháng’ n, the capital of Táng
China. The ‘standard’ colloquial language of that time was based on this dialect, and also
current in Dūnhuáng until it came under the control of Tibet (787 AD). The other one is the
Héxī 河西 dialect. This dialect is also referred to as North-Western (Xīběi 西 ) dialect
which started to prosper after the relations to the central government of China were cut.
According to Takata, the dialect was also influenced by elements of the Tibetan language
(e.g., zhū 諸 was pronounced ‘ci’). The usage of the dialect was at its height after 851 when
Dūnhuáng became a quasi-independent area.
Typical for the dialect loans used in the Dūnhuáng Platform sūtra, especially the Stein
version, are the features that syllables with a nasal final ‘-ng’ are not distinguished from those
without, resulting in homophones such as mí 迷- míng , tǐ 體 – tīng 聽, dì 第 – dìng 定,
xī 西 – xīng 星, lǐ 禮 – lìng , etc. In addition, the initial consonants (shēngmǔ 聲母) of
the 端 – 定 and the 審 – 心 categories are not differentiated, as well as the finals (rhymes)
of the 侵 and 庚 groups (see Dèng and Róng 1999, 25-26; for other studies concerning the
Northwestern dialect, see for example Shào Róngfēn 1963; for more bibliographic references,
see Dèng and Róng 1999, 39-40).
More recently, Takata (2000) has drawn attention to the heavy influence of the Tibetan
language during the period of the Dūnhuáng occupation, and the 10 th century when
Dūnhuáng was quasi-independent and communication to Central China reduced to a
minimum. Large copying projects were initiated by the Tibetans (especially during 815-841,
ibid:7) and bilingual communities (Chinese-Tibetan) were prospering. Eventually, many
Chinese would even use the Tibetan writing system for writing Chinese! “What is important
here is the fact that the tradition of writing Chinese and the Tibetan script established during
the period of Tibetan rule was still maintained in the tenth century under Return-toAllegiance Army of the Cáo.” (ibid.:9). The developments outlined by Takata might as well
be one of the factors that are reflected in the complex textual features of the late copies of
32 Chung-Hwa Buddhist Journal Volume 25 (2012)
addition to exchanges of parts of the characters such as the determinatives (for example in
Dūnhuáng manuscripts the exchange between the ‘tree’ 木 and ‘hand’
determinatives
is frequently encountered), the many passages where characters are mistakenly left out or
added, and the many corrupt passages based on the copyists’ misreading of the
handwritten characters. These are all factors which make parts of the Dūnhuáng versions
of the Platform sūtra difficult to decipher and understand.
The corrupt characters based on copyists’ errors are marked with ‘#’ in the table.
Although it is clear that the Stein manuscript has a larger amount of corrupt characters in
this category, the Dūnbó manuscript nevertheless also contains plentiful of mistakes
based on misreadings and a wrong interpretations of character forms. 41 A comparison of
the use of phonetic loans and the number and type of corrupt characters also shows that
the Dūnbó and Běijīng manuscripts are clearly closer to each other concerning their
textual features (although by no means identical!).42
Many confusions concerning the copying of characters are caused by the use of
‘vernacular’ forms of characters and the structural similarities between them. Within the
scope of this paper a thorough analysis of the orthography and paleographic features
cannot be included here. Generally, it can be observed that there are major differences
concerning the calligraphy and choice of character forms between the Stein and Běijīng
manuscripts. In addition to the differences between the individual manuscripts, there are
also significant internal differences, i.e., several forms of the same character are used in
the same manuscript. The calligraphy of the Dūnbó manuscript (and also the Běijīng
manuscript) is without doubt more ‘tidy’ and somewhat less ‘vernacular’ than the
characters on Stein.
the Platform sūtra, which include many oral and dialect features, a particular system of
phonetic loans, vernacular and often faulty orthography, and all kinds of textual corruptions.
41 Especially in Chinese secondary literature, the Stein manuscript is referred to as ‘bad copy’
(èběn
), as opposed to the ‘good’ Dūnbó and Běijīng manuscripts. Another aspect of
this judgment is the fact that the amount of mistakenly added or deleted characters is
somewhat smaller on the Dūnbó manuscript, in addition to the much more even style of
writing and text arrangement and the use of less distorted character forms as compared to the
Stein manuscript. The Stein manuscript, on the other hand, often gives the impression that it
was copied in a hasty and sloppy way.
42 A quantitative analysis is also difficult in this respect since in the Běijīng manuscript only ca.
one third of the text is extant.
Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 33
Table
'CORRECT'
S.
D.
般
授
般
波
B.
淨
官
陽
官
陽
K.
Z.
COMMENTS/REFERENCES
般
授
/授
授
靜
L.
凈
Traditionally not distinguished
(Loan 1:#1529)
Several occurrences;
frequently interchanged in Dūnhuáng texts
Loan 1:#2914
#
小
小
少
/小
/小
亦
/亦
少
小
小c
無(无#?)
少
小
亦
亦
乏
賣
之#
買
乏
賣
客
明
客
容#
明
小/少 (which are originally two
forms of the same character) are
frequently interchanged
少
少
又
亦
明
問
明
聞^
/問
/聞
/(聞)
聞
問^
門
聞
聞
問
聞
/問
縣
(見=)現
門
問
縣
見
#
Mistake in S. (deriving from
structural similarities of the
abbreviated version of 無?)
which transforms by negation
the meaning to its opposite
賣
/明
問
見
小
Deletion of the upper part of
the character; traditionally, 買
is also a loan for 賣 (Loan
1:#0464)
Many occurrences, but does not
seem to be a regular phonetic
loan
Several occurrences
問
問
聞
Note that S. often incorrectly
interchanges 聞 and 問; this is
not a regular phonetic loan; note
the cluster of these interchanges
in all manuscripts
Loan 1:#4591
Deletion of the inner part of the
character in S.; however, 門 can
function a phonetic loan for both
問 and 聞 (Loan 1:#4588,
#4589, #4590)
縣
Phonetic loan (Loan 1:#4909)
In the Sòng editions, 見 and 現
are usually differentiated
Note the mistake in all mss.!
34 Chung-Hwa Buddhist Journal Volume 25 (2012)
'CORRECT'
S.
/
/
/
/
D.
B.
#
#
#
L.
K.
特#
持
持
/待
持#
待
待
業
業
葉#
業
/業
葉#
等#
業
等#
嶺
性
/性
/世
/性
/聖
領
嶺
世*
性*
聖*
性*
性
世
性
聖
語
*
差
記#
訖
*
*
着
記#
說#
汝
外
汝
汝
*
如*
汝*
汝*
汝*
汝* *
汝
汝
汝
(汝)
/汝
/汝
/
/
/
/ 汝
/
/性
COMMENTS/REFERENCES
Note that in this series the
confusion of the two characters
appear in all mss.!
Often confused in Dūnhuáng
texts; several occurrences
( ‘hand’ – 牜 ‘ox’)
Typical substitution / confusion
of determinatives
( ‘hand’ – 彳 ‘step’)
This is probably not a phonetic
loan. The replacement based on
structural similarities occurs
several times in D. (and in many
other Dūnhuáng mss.)43
#
持
訖
訖
汝
Z.
#
#
#
葉#
In 善業; Dèng and Róng
1999:398, n.1
嶺
性
性
性
性
Often interchanged
Dèng and Róng 1999:327, n.13
Dèng and Róng 1999:421, n.1
Dèng and Róng 1999:371, n.7
聖
Several occurrences; Dèng and
Róng 1999:250, n.6; 390, n.2
Dèng and Róng 1999:223, n.3
Synonym
說#
*
汝
Many occurrences; Dèng and
Róng 1999:226, n.5; 397, n.19,
n.21; 400, n.9; 411, n.4
汝
Dèng and Róng 1999:244, n.4
Dèng and Róng 1999:383, n.1
Dèng and Róng 1999:399, n.7
Dèng and Róng 1999:371, n.10
Dèng and Róng 1999:278, n.1
已*
汝
Dèng and Róng 1999:369, n.12
44
*
Dèng and Róng 1999:371, n.9
Dèng and Róng 1999:313, n.3
43 Very similar shape in vernacular writing!
44 Can be interpreted as reversed sequence or as (twofold) dialect phonetic loan.
Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 35
'CORRECT'
S.
D.
求
/求
汝等
救
求
汝汝
求
救
汝汝
B.
L.
K.
Z.
智慧/知
慧
之知
知
智
知之
之知
知
之
知
之
/之
/之
/之
悟
/悟
知
知
智
諸*
吾
悟
之
之
智
之
悟
吾
/悟
/悟
/悟
/悟
/吾
急
澄
息
吾
吾
伍
俉
悟
急
呈
息
吾
伍
悟
悟
吾
呈
識*(?)
澄
息
識
衣
息*
於*
識
衣
衣
衣
/依
於*
於*
衣
依
Dèng and Róng 1999:229, n.9;
several on S.; note this cluster of
phonetic dialect loans
Dèng and Róng 1999:324, n.8
依
/依
於*
於*
/依
/於
/於
衣
於
放#
依
衣*
放#
依
Several dialect replacements on
S.; e.g., Dèng and Róng 1999:
400, n.22
Several occurrences; Dèng and
Róng 1999:407, n.11; 421, n.7;
both mss. use the dialect loan!
Here, of course, a ‘regular’ loan!
/於
衣*
於
於
救
COMMENTS/REFERENCES
Probably not a phonetic loan?
Compare above!
汝等
Plural by reduplication (rare with
pronouns!) as opposed to plural
by suffixing
Reversed sequence (or ‘reversed
loans’!)
Often interchanged (as
demonstrated by the clusters
below); but probably not a
regular phonetic loan.
之知
之
之
智
Dèng and Róng 1999:423, n.9
悟
Note this cluster of interchanges!
吾 for 悟 is a traditional
phonetic loan (Loan 1:#0598)
悟
悟
吾
吾
Several occurrences in S.
c
衣
依
放#
依
澄
Dèng and Róng 1999:229, n.7
interprets this as dialect form
Making this cluster of interchanges even more complicated,
this corruption by structural
similarity is intermixed with the
above
Dèng and Róng 1999:278, n.3;
279, n.11
36 Chung-Hwa Buddhist Journal Volume 25 (2012)
'CORRECT'
S.
D.
/於
壁
衣*
壁
於
糪
教
/教
/教
求法即
善
終
間
/鄣
+教
故#
敬#
即善求
法
修#
問#
教
教
教
即善求
法
修#
間
鄣
/ /鄣
/ /鄣
秉
知
拂
喚
讀訖
留
問
祖
不/
B.
請#記#
流
門#
祖
K.
Z.
This is probably not a loan but a
copying mistake
A rare case of an added radical
教
教
求法
即善
終
間
求法
即善
終
鄣
Mistake in both manuscripts!
#
知
拂
喚
請#記#
拂
喚
留
門#
但#
不
Loan 2:54
Confusion of determinatives
Changed by modern editors; 請
記 makes sense in the original
context
Loan 2:653
留
問
(Near-)synonym
Corruption?
Several occurrences;
Dèng and Róng 1999:238, n.13
for further examples
Dèng and Róng 1999:340, n.8
是*
青
從#
但(#)
知
Reversed sequence
Usually no differentiation in
Dūnhuáng manuscript texts
Deleted determinative in S.
唱
是*
/
題
清
徒
COMMENTS/REFERENCES
Dèng and Róng 1999:401, n.1
鄣
鄣
秉
和#
L.
題
清
徒
Loan 1:#405
Loan 1:#2665
This does not seem to be a
regular phonetic loan
Missing determinative
法
法
氣如
氣如
去(#)
氣如茲#
命如
起
於
去*
生#
起
起
起
去
去*
起*
起
起*
Several occurrences!
Dèng and Róng 1999:247, n.1
Dèng and Róng 1999:272, n.9
去
Dèng and Róng 1999:264, n.12;
266, n.1
廋
庚#
庚#
命如
起
廋
廋
Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 37
'CORRECT'
S.
D.
捉/害
僚
除
餘
頭#
奪#
餘
除
捉
寮
除
餘
如
智
/智
/智
/知
/智
遇
於*
知
智
諸*
諸*
遇
如
智
知
智
知
智
愚
/遇
/愚
/愚
/過
定 等
遇
遇
愚
愚#
等
遇
愚
遇
遇#
等
坐
坐
(直心)
曲
情
/情
/情
/情
須
被
/被
座
座
真(#)心
典#
清
親*
性
性
順#
彼
被
置
盤
盤
故
明*
*
坐
座
真(#)心
曲
情
親*
性
情
須
彼
彼
置
般
槃
故
迷
迷
無#
念#念
讀
為
念#念
續
般
/槃
(固)
迷
迷
為
念
續
B.
L.
K.
Z.
COMMENTS/REFERENCES
僚
除
Loan 2:1051
This ‘direction’ (除 > 餘) of
loaning is unusual!
Dèng and Róng 1999:251, n.9
如
智
Commonly interchanged
Dèng and Róng 1999:383, n.5
知
Dèng and Róng 1999:267, n.7
Dèng and Róng 1999:365, n.7
Often interchanged; see
Dèng and Róng 1999:251, n.11
Loan 2:611 (愚 > 遇)
愚
愚
愚
Loan 2:917 (遇 > 愚)
過
座
坐
坐
直心
曲
情
情
過
定
等
Often interchanged
直心
Also similar semantics
Several occurrences
Loan 2:460
情
Dèng and Róng 1999:390, n.11
Dèng and Róng 1999:401, n.2
Dèng and Róng 1999:402, n.3
須
Loan 2:249
彼
Loan 2:726
Loan 2:663 (entry #1)
Loan 2:663
Loan 2:410
迷
迷
Dèng and Róng 1999:259, n.10
Several occurrences;
Dèng and Róng 1999:264, n.7;
277, n. 20; 282, n.8; 325, n.4;
383, n.2; 407, n.6
為
念
Mistake in both manuscripts
What looks like a change or
confusion of determinatives
(糹‘silk’ –
‘speech’) is
38 Chung-Hwa Buddhist Journal Volume 25 (2012)
'CORRECT'
S.
是
為
K.
Z.
為
為
是 無
住為
是 無
住為
離
雜見
境
雜#
雜見
鏡
離
離#境
境
境
/境
/境
/境
邪
/邪
/(耶)
須
/雖
敬
境
境
境
邪
耶
那#
雖*
須*
#
邪
那#
雖*
雖
/雖
第
雖
弟
須*
第
着
體
/體
起心
看#
體
聽*
心起
看#
凈
起心
起心
不見
人過患
見
人過患
見
人過患
見
人過患
既
/記
記*[?]
既*[?]
記*[?]
記
記*[?]
記
見
見
見
自#
是*(?)
須#
願
體*
在自
在自性
思量
西*
參(#?)
妄
億
唱
曰
時
原
源
德
自在
自性在
思
星
森
妄
意
唱
曰
時
原
源
德
自在
無住
D.
見...
曰
時
源/原
/源
德
在自
在自性
思 /思量
星
森
妄
意
唱
無住
B.
無住
L.
COMMENTS/REFERENCES
actually an ‘established’ loan
(Loan 2:966)
I did not find any precedence to
this exchange
No precedence found
境
境
Loan 2:689
Loan 2:689
邪
邪
Common replacement
Dèng and Róng 1999:266, n.2
雖
Several occurrences; Dèng and
Róng 1999:347, n.11; 429, n.3
Dèng and Róng 1999:407, n.9
Several occurrences; originally
identical characters (Loan 2:98)
Several occurrences
着
Corruption in D.
Dèng and Róng 1999:399, n.5
Reversed sequence
不見
人過
患
不見
人過
患
Missing negation in all
manuscripts (generating the
opposite meaning of the passage)
Dèng and Róng 1999:271, n.6
記
見
時
見
Dèng and Róng 1999:298, n.5
classified as phonetic loan and
not as dialect loan [?]
More precise reference in later
(K. and Z.) editions
Dèng and Róng 1999:273, n.18
Not an established phonetic loan
Dèng and Róng 1999:275, n.8
‘Conventionalized sequence’
在自性
Sequence
思
星
森
Synonymous
Dèng and Róng 1999:280, n.17
Loan 2:594
意
昌
Loan 2:90
Loan 2:420 (entry #4.2)
Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 39
'CORRECT'
S.
D.
各各
既
/即/則
/即/則
前
各各
既
則
即
前
各各
即
即
則
何
矯誑
矯誰#?
妬
垢#
證
如
如
如如
B.
冬冬#
L.
K.
Z.
既
COMMENTS/REFERENCES
Loan 2:129
則
則
何
Synonymous
Two occurrences of this
corruption
西#
矯雜#
疫#
垢#
西#
證
證
如如
如如
矯雜#
疫#
矯誑
矯誑
垢#
如
如
如
如
Shapes very similar in vernacular
writing!
Change / confusion of
determinatives
‘conventionalized sequence’ 如
如 which is a frequently used
Buddhist term
Loan 2:647
猶
/猶
河
愚
彼
無
到
猶
何
愚
彼
無
到
何
思#
波#
不
到
河
思#
彼
不
倒
/倒
憶
倒
億
到
億
到
億
增
般若
增
般若
曾
增
譬
辟
譬
/
盡
心
深
/心
聞
是故
示
示
謂
見性
譬
來#
#
盡
深*
心*
身*
聞
是故
亦#
是*
為
見
Replacement by conceptually /
terminologically related items
Loan 2:969 (#10)
盡
心
深
身*
聞
是
示
示
為
見
#
心
Dèng and Róng 1999:315, n.1
脫
說
脫
脫
Loan 2:48
2 occurrences
到 > 倒 seems to be more
common than the ‘reverse’ loan
性
憶
憶
性
億> 憶 does not seem to be an
established phonetic loan
Loan 2:434
Dèng and Róng 1999:426, n.11
心
聞
是
示
示
為
見
心
Dèng and Róng 1999:421, n.4
Reversed sequence
示
Dèng and Róng 1999:319, n.6
謂
見性
脫
Loan 2:537(#2)
見性
Substitution by a term of related
semantics
There is a long history of the
replacement of 脫 with 說
40 Chung-Hwa Buddhist Journal Volume 25 (2012)
'CORRECT'
S.
D.
B.
縛
傳
傳#
縛#
縛
傳
縛
傳
縛
傳
/傳
轉
傳
謗
頌
謾#
訟
謗
頌
傍(#)
頌
頌
/頌
而
造
但
而
在
但
如
在
如
在
造
造
在
造
造
在
在
在
在
元
/元
出
無#
元
在
元
無# (无)
/元
悔
大
裏
願
海#
大
中
自#
元
悔
疑
摩
磨
*
磨
摩
L.
K.
頌
#
裏
Z.
COMMENTS/REFERENCES
(Loan 2:948)
Confusion of determinative in S.
‘Complementary’ confusion (縛
< > 傳)
The somewhat more usual
direction of loaning is 傳 > 轉
and not, as here, 轉 > 傳
This is a rather common
replacement
頌
45
Dèng and Róng 1999:326, n.7
是
Confusion by context (see also
below)?
Confusion by context (see also
above)?
46
Several occurrences47
元
Note that the confusions appears
both in S. (above) and D., based
on the abbreviated version of 無!
No precedence found
悔
#
裏
悔
Two occurrences
Near-synonym
Frequently confused in S.
自#
疑
摩
磨
疑
摩
Dèng and Róng 1999:329, n.11
Near-synonym and homophone!
See above, but in reverse!
45
但是頓教 vs. 頌是頓教.
46 Róng and Dèng (1999, 350, n.1) consider chū 出 as mistake; however, this is not clear since
the passage reads 邪見出 (在/是) 世間, 見出世間,邪 悉打卻,(菩 性宛然) (the
last phrase is inserted according to K sh ji and is lacking in the manuscripts). It could be
considered as ‘mistake by context’ since 出 appears in the second phrase and the copyist
maybe sensed a parallel construction. In addition, 出 can have several meanings which fit the
contexts, either ‘to emerge from’ (first phrase) or ‘to transcend’ (second phrase); K sh ji has
the copula shì 是 instead of 出 (Stein) or zài 在 (Dūnbó: ‘be located in’). Possible
translations which all make sense: “Wrong views emerge from the mundane (or: “Wrong
views are located in the mundane”), right views emerge from the mundane (or: “right views
transcend the mundane”), if ‘wrong’ and ‘right’ are all smashed, (the nature of bodhi is just as
such).” The whole passage must have posed problems to the copyist/reader since the last
phrase (the ‘conclusion’) was missing in the manuscripts.
47 The abbreviated form of 無 (无) is easy to confuse with 元.
Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 41
'CORRECT'
S.
D.
B.
/魔
摩
伐#
魔
伐#
花
伐#
帝
帝
已
大*
德
/花
/
帝
48
L.
K.
Z.
Several occurrences
Omitted determinative (or loan?);
several occurrences, but no
precedence found for a ‘loan’
See above!
花
已
陀
得
陀
得
已
大*
德
得
不
種
/種
方
彈
德
不
重*
眾*
者
禪#
得
不
種
種
者
禪#
遠
遙(#)
帝
These are frequently
interchanged in Dūnhuáng texts;
Dèng and Róng 1999:334, n.12
否
種
Common interchange
Dèng and Róng 1999:402, n.5
者
禪#
彈
/但
目
坦*
日#
壞
壞
壞
海 /大
海
除人
大海
海
害
害
肉#
破
波
破
破
了
西
人#
#
了
西
了
西
無
This does not seem to be a
common replacement (confusion
by ‘convention’, maybe, since
has a much higher frequency than
帝 in Buddhist texts)
Dèng and Róng 1999:334, n.10
遠
但*
但
目
無人
COMMENTS/REFERENCES
彈
遠
但*
High-frequency character 禪 in
Buddhist texts; easily confused in
the copying process
Synonymous and similar in shape
Dèng and Róng 1999:340, n.4
48
目
Dèng and Róng 1999:423, n.5
Frequently interchanged in
Dūnhuáng texts (compare
and 自)
> 壞 is a ‘common’
replacement (Loan 2:625)
Similar meaning
海
除人
人
Corruption or misunderstaning of
this passage in the manuscripts
The vernacular character for 肉:
宍 is similar in shape to 害
No precedence of the
replacement of these
(phonetically distinct) characters
found; thus, rather a confusion or
exchange of determinatives
Could that also be interpreted as modification or confusion of the determinative instead of a
dialect loan?
42 Chung-Hwa Buddhist Journal Volume 25 (2012)
'CORRECT'
處
理
S.
D.
K.
Z.
COMMENTS/REFERENCES
處
理
處
理
Corruption in S.
離
處
理
悉
俱
俱
悉
B.
悉
若欲
覓
破彼
得悟自
迷(#?)
若欲覓
真
破彼
得悟自
疑
疑
城
請
當(#?)
漕
疑
癡
誠
清
當(#?)
漕
癡
癡
除
喻
時
喻
除
如*
/悉
彼有
得悟自
性
誠
請
常
L.
No historical precedence for this
replacement found
Corruption in S.; replaced by a
synonym ‘all’ (悉 > 俱) in the
Sòng editions
欲得見
者
彼有
得悟
自性
誠
彼有
得悟
自性
In the phrase 無
彼有疑
Missing character in both
manuscripts!
Loan 2:164
Loan 2:461 (#5)
常
常
49
All occurrences in the mss.
疑
疑
Mistake in both manuscripts!
Maybe motivated by the
structural similarity and the
somewhat related semantics in
the Buddhist context (‘doubt’ vs.
‘ignorance’)
[?]
Dèng and Róng 1999:370, n.1
(>於)
文
聞
文
字
覺
#
家(#)
覺
字
文見
覺
覺
覺
覺
華
曾
Dèng and Róng 1999:374, n.1
‘decomposed’ character > 斍50
斍
#
(#?)
僧
即
Loan 2:1028 (#2)
(#?)
曾
即
華
曾
Note this series of mistakes/
alternations on the D. manuscript
involving the same character!
Here motivated by the
resemblance of the abbreviated
form 斍 (覺) with
.
Mistake in both manuscripts!
Added determinative in S.
Mistake in both manuscripts!
Probably not a confusion triggered by similar shape after all: there is a history of 當
replacing words of the ‘陽禪 ’ phonetic group (such as 嘗 and 償); however, no concrete
precedence for the replacement 當 > 常 was found.
50 斍 is a vernacular form of 覺 misread by the copyist as two characters.
49
Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 43
'CORRECT'
S.
D.
B.
L.
K.
Z.
幸
人
把
想
入#
把
相
人
犯#
想
味
觸
含
獨#
含
味
觸
合#
/含
含
舍#
(用?)
#
No precedence found (usually,
none of the two characters are
loaned or have phonetic loans)
Several occurrences
Missing determinative in S. (no
precedence for a phonetic loan
found)
Loan 2:333
味
觸
Altered determinative
Dèng and Róng 1999:387, n.4
#
定
定
空#
/定
弟*
定
Dèng and Róng 1999:404, n.5
油
火
能/解
遞
壇
十/拾
大#
解
迎(#?)
檀
拾
COMMENTS/REFERENCES
Added determinative in S. (same
phonetic value, ‘ 喻’, however,
no concrete precedence found)
(憂)
義
禮
淨
有
語*
*
諍
火
能
遞
壇
十
因
有
義
禮
淨
久/永
撩
若
鞠
永
遼
共(#)
因#
掬
久
遼
共(#)
因#
鞠
四十
十四
四十
Reversed sequence
嶮
劍
嶮
中
眾
中
No precedence as phonetic loan
found
No precedence found but
probably an unusual phonetic
loan; both characters can have the
Synonymous
Several occurrences
Several occurrences
Synonymous
Omitted determinative in D.
義
憂
義
Not phonetically identical
Dèng and Róng 1999:402, n.8
Dèng and Róng 1999:402, n.9
久
撩
久
撩
Can be loaned for ‘靜從 ’
phonetics, such as 靖, 靜, etc.
As such, this should be regarded
as phonetic loan
Near-synonym
Loan 2:926
Altered determinative or phonetic
loan?
44 Chung-Hwa Buddhist Journal Volume 25 (2012)
'CORRECT'
S.
D.
B.
L.
K.
Z.
pronunciation ‘東知 ’ (Loan
2:10 and 744#4); both characters
are sometimes loaned for 終
(which has the same
pronunciation; see Loan 1:3352
and 3354)
Very common loan (Loan 2:218)
員
覓/求
求
覓
報
保
保
遂
遂
Synonymous
報
日
日
日
處(#)
根
材
(建立)
No precedence found and
probably not a phonetic loan (
tone vs. 去 tone)
One character is ‘decomposed’
into two in the process of
copying
香
氛氛
崩
朋
據
報
#
香
氛
崩
COMMENTS/REFERENCES
#立
Loan 2:546
Two characters misread
(‘composed’) as one
據
根
林#
氛
Confusion of determinatives
林#
#立
Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 45
Manuscripts, Editions, Bibliography
Manuscript Witnesses
Dunbo_77: The manuscript Dūnbó 77 is preserved at the Dūnhuáng Museum (Dūnhuáng
bówùguǎn 敦煌博物館) as a booklet with 93 pages (‘butterfly binding), containing 4
texts, three claiming to be authored by Shénhuì 神會 and/or disciples, the Platform
sūtra, and a Commentary to the Heart sūtra by the Northern School master Jìngjué 淨覺.
Jorgensen (2008, 596) assumes that the texts were combined into a book in Dūnhuáng,
since at the end of the 8th century a disciple of Shénhuì by name of Móhēyán 摩訶
(‘Mah y na’) tried to harmonize the teachings of the ‘Northern’ and ‘Southern’ Schools.
P. 2045 contains the three Shénhuì texts in the same order and one can assume that the
texts were written about the same time (during the period when Dūnhuáng was under the
administration of Tibet; see Jorgensen 2002, 399-404 and Jorgensen 2005, 597). In
Anderl (2012b), it is argued that the reason for combining the texts could have been
motivated by the fact that they all deal with the teachings of prajñāpāramitā thought.
The page reference of the digital edition follows the edition in Dèng and Róng (1999)
who counts each side (and not full pages) of the butterfly binding. In the facsimile
edition of Gansu (1999), there is an alternative way of counting the pages. The
manuscript is complete and contains somewhat less variations and corruptions as the
Stein manuscript, and has a more even and visually appealing calligraphic style.
Stein_5475: The British Library manuscript with the number Or.8210/S.5475 is nearly
complete, only three lines in the middle are missing; this manuscript is the source text of
Yampolsky’s translation; this is a booklet consisting of 52 pages (including six blank
pages: pp. 1, 44, 49-52 and two half-blank pages: pp. 2, 48). This manuscript is
accessible as facsimile reproduction with very good resolution at the IDP (International
Dunhuang Project; http://idp.bl.uk/database/). The first reproduction as facsimile
appeared in Yabuki 1933, 102-103 and is also the source of the edition in T 48/2007,
337a01-345b17 (many mistakes!). It is also the source of the critical edition and
translation of Yampolsky 1967, as well as the translation of Chan 1963. The edited text
was also published by Suzuki/Kudo 1934 (divided into 57 sections; a structure which
was adapted by Yampoksky in his translation) and Ui 1939-1943, vol.2:117-172. In this
edition, each ‘page’ of the booklet is counted separately, thus each page consists usually
of 6 lines/columns (the page with the title consisting of 4 lines).
Beijing_48: Manuscript BD.48 (8024) is preserved at the Běijīng National Library. Parts of
the beginning and the end are missing and only ca. one third is extant. The text is written
on the back of an apocryphal sūtra, the Wúliàng shòu zōngyào jīng 無量壽宗要經. This
version of the text was probably copied somewhat later than the Dunbo 77 copy.51
51 There is a manuscript fragment of the Platform sūtra stored at the same institution. However,
BD.79 (8958) only contains four and a half lines of the text. For a facsimile reproduction, see
Lǐ Shēn and F ng Guǎngch ng (1999, 232).
46 Chung-Hwa Buddhist Journal Volume 25 (2012)
Lushun: This manuscript is preserved at the Lǚshùn 旅順 Museum (Lǚshùn bówùguăn 旅
順博物館) near Dàlián 大連 (Liáoníng Province) and has a complicated history;
previously it was part of the tani Collection (which was scattered into public and
private collections throughout Asia in 1914). In 1954, 620 Dūnhuáng manuscripts were
removed and incorporated into the Běijīng National Library collection. Only 9
Dūnhuáng manuscripts remained at the museum, together with the bulk of ca. 20.000
manuscript fragments from Central Asia (Turfan, Kharakhoto). The manuscript with the
Platform sūtra (no number) consisted originally of 45 folios (booklet with butterfly
binding), folded into 90 pages (dated 959 AD). The whereabouts of the manuscript were
unknown and until recently only two photographs of the beginning and the end were
extant (Ryūkoku Library in Japan). However, recently, the manuscript was
‘rediscovered’ and seems to be complete (the discovery was celebrated as a sensation in
the Chinese press, and an exhibition was organized at the Lǚshùn Museum). During the
work on this paper, no facsimile reproduction was available yet. We want to express our
gratitude to John Jorgensen who just informed us on a recent publication of the
rediscovered manuscript. This version will be considered in our future work on the
Platform sūtra.
Printed Editions as Witnesses52
Huixin: This refers to the ‘reconstructed’ early Sòng Dynasty edition by Huìxīn 昕 (967);
Huìxīn introduced the title Liù-zǔ tánjīng 祖壇經, in contrast to the extremely lengthy
title of the Dūnhuáng manuscripts with an unclear referent to the appellation ‘sūtra’, the
title by Huìxīn does not leave any doubt that the text itself is regarded as ‘sūtra’ (see
Yanagida 1976 on this edition).
Koshoji: The edition preserved at the K sh -ji temple (Kyoto, discovered in the 1930s) is
based on this text. This version of the sūtra is much longer than the above discussed
Dūnhuáng manuscripts editions, and includes materials appended during the Sòng
dynasty (in addition of being heavily revised). The Qisong, Zongbao and Deyi versions
consist of ca. 20,000 graphs. On the Koshoji, see Ui 1939-1943, vol. 2:113; reproduced
photolitographically by Suzuki 1938; for a printed version, see Suzuki/Kudo 1934.
Qisong: The edition by Qìs ng 契嵩 dates from 1056; he changed the title to Liùzǔ dàshī
fǎbǎo tánjīng cáoqī yuánběn 漕溪大師法寶壇經 溪原 (The Platform sūtra of the
dharma treasure of the great master Cáoqī—the original Cáoqī edition), usually
referred to as Cáoqī yuánběn 溪原 (Yanagida 1976). The text consists of 20.000
52
For more extensive information on the manuscripts, see Anderl (2012b, forthcoming); on the
Sòng editions, see Schlütter (1989). For an extensive and exquisite study on the formation of
the hagiography of Huineng, see Jorgensen (2005). The study also includes useful materials
on the manuscripts and editions, as well as a discussion of ZTJ in the context of Platform
sūtra studies. Jorgensen’s work will be the foundation of subsequent studies in this field for
many years to come.
Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 47
characters, as compared to ca. 12.000 characters of the Dūnhuáng manuscript versions
and ca. 14.000 of the Huixin version.
Zongbao: The Z ngbǎo edition dates from 1291 and has the title Liù-zǔ dàshī fǎbǎo tánjīng
祖大師法寶壇經. This edition became the ‘canonical’ version of the text and is the
source of T 48/2008, 245-265.
Deyi: The Déyì 德異 edition is another edition from the Yuán period, edited in: Gen en’yū
kōrai kokubon rokuso daishi hōbō dankyō 元延祐高麗刻
祖大師法寶壇經
(Zengaku kenkyū 禪學研究 23 [1935]:1-63).
Xixia: The extant parts of the Xīxià 西夏 edition can be found in Shǐ (1993). In 1929
Beiping (Peking) University obtained more than 100 manuscripts from the Xīxià
Buddhist canon, among those were 5 pages of the Platform sūtra (a translation into
Chinese and reproductions of photographs were published in Luó 1932).
Yampolsky_1967: This version, for a long time the authoritative edition and translation in the
West, is based on Stein 5475, compared and supplemented with the Koshoji edition.
Bibliography of Modern Editions and Secondary Literature
Adamek, Wendi. 2007. The Mystique of Transmission: On an Early Chan History and Its
Context. New York: Columbia University Press.
Anderl, Christoph. 2012a. Zen Rhetoric: An Introduction. Zen Rhetoric in China, Korea, and
Japan. Ed. Christoph Anderl. Leiden/Boston: Brill. 1-94.
Anderl, Christoph. 2012b (forthcoming). Was the Platform Sūtra Always a Sūtra? Studies in
the Textual Features of the Platform Scripture Manuscripts from Dūnhuáng. Chinese
Manuscripts: Copies and Originals. Ed. Imre Galambos. Budapest: Eötvös Loránd
University.
Anderl, Christoph. 2004. Studies in the Language of Zǔ-táng jí 祖堂集. 2 vols. Oslo: Unipub.
App, Urs. 1993. Rokuso Dankyō Ichiji Sakuin
祖壇經 字索引. Kyoto: Hanazono
daigaku kokusai zengaku kenkyūjo 花 大學國 禪學研究 .
Dèng, Wénku n 鄧文寬 and Róng, Xīnku n 榮新寬. 1998. Dūnbó běn Chán-jí Lùjiào 敦
博 禪籍錄校. Nánjīng: Ji ngsū gǔjí chūbǎnshè 江蘇 籍出版社.
DDB. Digital Dictionary of Buddhism (general editor: Charles Muller). http://www.buddhismdict.net/ddb/.
Dīng, Zh ngyòu 仲祜. 2000. Liùzǔ Tánjīng 祖壇經. Hong Kong: Xi nggǎng fójīng
liút ng chù 香港 經流 處.
F ng, Guǎngch ng 方廣錩. 1999. Tán Dūnhuáng běn Tánjīng Bi otí de Géshì 談敦煌 壇
經標題的格式. Dūnhuáng Tánjīng Héjiào Jiǎnzhù 敦煌壇經合校簡注. 139-144.
F ng, Guǎngch ng 方廣錩. 2001. Gu nyú Dūnhuáng běn Tánjīng 關於敦煌
壇經 .
Dūnhuáng Wénxiàn Lùnjí 壇經文獻論集. Ed. Hǎo, Chūnwén 郝春文. Shěnyáng:
Liáoníng rénmín chūbǎnshè 遼寧人民出版社.
48 Chung-Hwa Buddhist Journal Volume 25 (2012)
Féng, Qíy ng 馮其庸 and Dèng, nshēng 鄧安生. 2006. Tōngjiǎ Zìhuìshì
字彙釋
(An Explanation of Phonetic Loan Characters). Běijīng: Běijīng chūbǎnshè 京出版
社 2006. (Abbreviated reference in the table: Loan 2)
Galambos, Imre. (forthcoming). Punctuation Marks in Medieval Chinese Manuscripts.
Forthcoming. Manuscript Cultures: Mapping the Field. Ed. Jan-Ulrich Söbisch and Jörg
B. Quenzer. Berlin: de Gruyter. (Page numbers according to the draft version)
G nsù cáng Dūnhuáng wénxiàn bi nwěihuì 甘肅藏敦煌文獻編 會, ed. 1999. Gānsù Cáng
Dūnhuáng Wénxiàn 甘肅藏敦煌文獻. 6 vols. Lánzh u: G nsù rénmín chūbǎnshè 甘肅
人民出版社.
Guó, Péng 郭朋, ed. 1981. Tánjīng Duìkān
壇經 對勘. Jìnán: Qílǔ shūdiàn 齊魯 社.
Guó, Péng 郭朋, ed. 1983. Tánjīng Jiàoshì 壇經校釋. Bĕijīng: Zh nghuá shūjú 中華 局.
Guó, Péng 郭朋. 1987. Tánjīng dǎo dú 壇經導讀. Chéngdū: B shŭ chūbǎnshè 巴蜀 社.
Harbsmeier, Christoph. 2012. Reading the One Hundred Parables Sūtra: The Dialogue
Preface and the G th Postface. Zen Rhetoric in China, Korea, and Japan. Ed. Christoph
Anderl. Leiden/Boston: Brill. 163-204.
Jorgensen, John. 2002. The Platform Sutra and the Corpus of Shen-hui: Recent Critical Text
Editions and Studies. Revue Bibliographique de Sinologie. 399-438.
Jorgensen, John. 2005. Inventing Hui-neng, the Sixth Patriarch – Hagiography and
Biography in Early Ch’an. Leiden: Brill.
Lǐ, Shēn 李申 and F ng, Guǎngch ng 方廣錩, eds. 1999. Dūnhuáng Tánjīng Héjiào
Jiǎnzhù 敦煌壇經合校簡注. Tàiyuán: Sh nxī gǔjí chūbǎnshè 山西 籍出版社.
Lǐ, Shēn 李申. 1999a. Tánjīng Bànběn Chúyì 壇經 版 芻 . Dūnhuáng Tánjīng Héjiào
Jiǎnzhù 敦煌壇經合校簡注. 12-26.
Lǐ, Shēn 李申. 1999b. S nbù Dūnhuáng Tánjīng Jiàoběn dú hòu 部敦煌 壇經 校 讀
後. Dūnhuáng Tánjīng Héjiào Jiǎnzhù 敦煌壇經合校簡注. 109-138.
Luó, Chángpéi 羅常培. 1933. Táng-Wǔdài Xīběi Fāngyīn 唐
西 方音. Shànghǎi:
Academia Sinica.
Luó, Fúchéng 羅福 . 1932. Liùzǔ Dàshī Făbăo Tánjīng Cánběn Shìwén 祖大師法寶壇
經殘 釋文. Guólì Běipíng Túshūguănkān 國立
圖 館刊 4/3 (Xīxià wén
zhu nhào 西夏文專號).
Mair, Victor. 1989. T’ang Transformation Texts: A Study of the Buddhist Contributions to the
Rise of Vernacular Fiction and Drama in China. Cambridge (Mass.) and London:
Harvard University Press. (Harvard-Yenching Institute Monograph Series 28)
McRae, John R. 1986. The Northern School and the Formation of Early Ch’an Buddhism.
Honolulu: University of Hawaii Press.
McRae, John R., tr. 2000. The Platform Sutra of the Sixth Patriarch. Berkeley: Numata
Center for Buddhist Translation and Research.
Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 49
Nakagawa, Taka 中 孝. 1953. Rokuso Danky no Ihon ni Tsuite
いて. Indogaku Bukkyōgaku Kenkyū 3:155-156.
祖壇經の異 に就
Nakagawa, Taka 中 孝. 1954. Danky no Shis shikiteki Kenkyū 壇經の思想史的研究.
Indogaku Bukkyōgaku Kenkyū 5:281-284.
Nakagawa, Taka 中 孝. 1976. Rokuso Dangy
祖壇経. Zen no Goroku 禅の語録 4.
T ky : Chikuma shob 筑摩
.
Nishitani, Keiji 西谷啟治 and Yanagida Seizan 柳 聖山, eds. 1972. Zenke Goroku 禪家
語錄, vol. 2. Sekai Koten Bungaku Zenshû 世界 典文學 集, no. 36B. T ky :
Chikuma shob 筑摩
.
P n, Zhòngguī 潘重規. 1996. Dūnhuáng běn Liùzǔ Tánjīng dú hòu Guǎnjiàn 敦煌
祖
壇經 讀後管見. Dūnhuáng Tǔlǔfān xué Yánjiū Lùnjí 敦煌吐魯 學研究論集.
Shūmù wénxiàn chūbǎnshè .
Shào, Róngfēn 邵榮芬. 1963. Dūnhuáng Súwénxué Zuòpĭn Zh ng de Biézì Yìwén hé Táng
Wǔdài Xīběi F ngyīn 敦煌俗文學 品中的別字異文和唐
西 方音. Zhōngguó
Yǔwén 中國語文 3:193-217.
Shǐ, Jīnb 史金波. 1993. Xīxià wén Liùzǔ Tánjīng cán yè Yìshì 西夏文
祖壇經 殘頁
釋. Shìjiè Zōngjiào Yánjiū 世界宗教研究 3:90-100.
Schlütter, Morten. 1989. A Study of the Genealogy of the Platform Sutra. Studies in Central
& East Asian Religions 2:53-114.
Sørensen, H. Henrik. 1989. Observations on the Characteristics of the Chinese Chan
Manuscripts from Dunhuang. Studies in Central & East Asian Religions 2:115-139.
Suzuki, Daisetsu (Teitar ) 鈴木大拙 貞 郎 and Kuda, Rentar
連 郎. 1934.
Tonkō Shutsudo Jinne Zenji Goroku Kaisetsu Oyobi Mokuji; Tonkō Shutsudo Rokuso
Dankyō Kaisetsu Oyobi Mokuji; Kōshōji bon Rokuso Dankyō Kaisetsu Oyobi Mokuji.
煌出土神會禪師語錄解說及目次; 煌出土 祖壇經解說及目次; 聖寺
祖壇經解說及目次. T ky : Morie shoten 森江 店.
Suzuki, Daisetsu (Teitar ) 鈴木大拙 (貞 郎), ed. 1942. Jōshū Sōkei-zan Roku Soshi
Dangyō 韶
溪山 祖師壇經. T ky : Iwanami shoten 岩波 店.
Takata, Tokio 高 時雄. 1987. Le Dialecte Chinois de la Region du Hexi. Cahiers
d’Extrême-Asie 3:93-102.
Takata, Tokio. 1988. Tonkō Shiryō ni Yoru Chūgokugo shi no Kenkyū: Kyū, Jusseiki no Kasei
Hōgen 敦煌資料 見中國語 史之研究: 九・十世紀の河西方 . T ky : S bunsha
創文社.
Takata, Tokio. 2000. Multilingualism in Tun-huang. Acta Asiatica 78:49-70. (page number
references in this paper according to a digitized draft version of the article)
Tanaka, Ry shū 中良昭. 1983. Tonkō Zenshū Bunken no Kenkyū 敦煌禪宗文獻の研究.
T ky : Dait shuppansha 大東出版社.
50 Chung-Hwa Buddhist Journal Volume 25 (2012)
Ui, Hakuju 宇 伯壽. 1966. Zengaku shi Kenkyū 禪學史研究. 3 vols. T ky : Iwanami
shoten 岩波 店.
Wáng, Huī 王輝, ed. 2008. Gǔwénzì Tōngjiǎ Zìdiǎn 文字
字典 (A Dictionary of
Ancient Phonetic Loan Characters). Běijīng: Zh nghuá shūjú 中華 局. (Abbreviated
reference in the table: Loan 1)
Xiàndài fójiào xuéshù cóngk n bi njí wěiyuánhuì 現
教學術 刊編輯 員會, ed. 1976.
Liùzǔ Ttánjīng Yánjiū Lùnjí 祖壇經研究論集. Táiběi: Dàshèng wénhuà chūbǎnshè
大乘文 出版社.
Yabuki, Keiki 矢吹慶輝. 1930. Meisha Yoin – Tonkō Shutsudo Miden Koitsu Butten Kaihō
鳴沙餘韻﹣敦煌出土 傳 逸 典開寶 (Rare and Unknown Chinese Manuscript
Remains of Buddhist Literature Discovered in Tun-huang Collected by Sir Aurel Stein
and Preserved in the British Museum). T ky : Iwanami shoten 岩波 店.
Yampolsky, Philip. 1967. The Platform Sūtra of the Sixth Patriarch. New York: Columbia
University Press.
Yanagida, Seizan 柳 聖山. 1972. Zenseki Kaidai 禪籍解題. Nishitani and Yanagida.
445-514.
Yanagida, Seizan 柳 聖山, ed. 1976. Rokuso Dangyō Shohon Shūsei 祖壇經諸 集 .
Ky to: Chūmon shuppansha 中文出版社.
Yáng, Zēngwén 楊曾文. 1993. Dūnhuáng Xīnběn Liùzǔ Tánjīng 敦煌新
Shànghǎi: Shànghǎi gǔjí chūbǎnshè 海 籍出版社.
祖壇經 .
Yáng, Zēngwén 楊曾文. 1996. Shénhuì Héshàng Chán-huà lù 神會和尚禪話錄. Běijīng:
Zh nghuá shūjú 中華 局.
Yè, G ngchuò 葉恭綽 1926. Lǚshùn Gu nd ng-tíng Bówùguăn suŏ cún Dūnhuáng Chūtŭ
zhī Fójiào Jīngdiăn 旅順關東廳博物館 存敦煌出土之 教經典. Túshūguăn xué
Jìkān 圖 館學季刊 1/4.
Zh u, Shàoliáng 周紹良. 1997. Dūnhuáng Xiěběn Tánjīng Yuánbĕn 敦煌寫 壇經原 .
Běijīng: Wénwù chūbǎnshè 文物出版社.
Zh u, Shàoliáng 周紹良. 1998. Xù èr 序 . Dūnbó běn Chán-jí Lùjiào 敦博 禪籍錄校.
1-26.
Chung-Hwa Buddhist Journal (2012, 25:51-86)
Taipei: Chung-Hwa Institute of Buddhist Studies
中華佛學學報第 十 期
51- 86 (民國一 零一
ISSN:1017-7132
),臺
:中華佛學研究所
Bibliographical Notes on Buddhist Temple Gazetteers,
Their Prefaces and Their Relationship to the Buddhist
Canon
Marcus Bingenheimer
Temple University
Abstract
This article is part of the Buddhist Temple Gazetteer Project funded by the Chung-hwa
Institute of Buddhist Studies. The project resulted in the digitization of more than 230
1
gazetteers (zhi 志) of Chinese Buddhist sites. The task of compiling a high-quality
digital archive involves making both academic and technological decisions, which in turn
necessitate research. In order to visualize gazetteer literature in various ways according to
temporal or geographic parameters, we need first to understand the provenance of the
texts, which often have complex edition histories. The aim of this paper is to summarize
some of the bibliographical data for the more than 230 mountain and temple gazetteers of
which the archive is comprised, to compare the two available print collections, to
illustrate the importance of prefaces for understanding these texts and to outline the
relationship between texts on Buddhist religious sites and the Buddhist canon.
Keywords:
Buddhist History, Temple Gazetteers, Chinese Temples, Digital Archive, Digitization
Project
1
The project is conducted at the Dharma Drum Buddhist College and the archive is currently
hosted at http://buddhistinformatics.ddbc.edu.tw/fosizhi/ (July 2009). The data presented here
is largely the result of sustained team work. The catalog data was produced 2008-2009 by Lin
Zhimiao 林智妙, Ke April 柯春玉, Peng Chuanqin 彭 芩, Lin Xiuli 林綉麗 and myself.
Many of the texts cited here were first examined in a reading group led by Lin Zhimiao 林智
妙, whose explanations solved many difficult passages. I am grateful to Simon Wiles for
improving the English, and Peter Bol, John Kieschnick and Wu Jiang for their valuable
comments. The text also profited from the helpful suggestions made by two anonymous
reviewers.
52 Chung-Hwa Buddhist Journal Volume 25 (2012)
目註記—佛寺志及 序言 之於佛教藏經的關係
馬德偉
大學
摘要
篇文章是中華佛學研究所佛教寺廟志計 的一部分, 計 包含超過230個中國
佛教寺廟志
計 之任務在於匯編一高品質的數位 藏,而
藏涉及一必要的
研究,也就是學
技
的 斷 根據時間 地理 的參考點,為了從多方面檢
視地方志文本,需要 了解文本的來源,而 些文本 常
複雜的版本 史
篇文章旨在針對
藏所包含的超過230個山岳 寺院的地方志,總集部分的 目
資料,比較 個刻版的收藏,說明 序文對於了解 些文本的重要性,並闡述存於
佛教據點的文本 藏經的關係
關鍵字:佛教
史
寺廟志
中國寺院
數位
藏
數位
計
Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 53
Introduction
Among the most precious sources for the study of later Chinese Buddhist history are the
large number of gazetteers on Buddhist sites and institutions. Gazetteers, as Sinology has
come to translate zhi 志 (or its variant 誌), are composite works compiled from texts
belonging to different genres (topographic descriptions, biographies, essays, poems,
epigraphia, maps, portraits etc.). The contribution of the compiler was to select, collect
and arrange the texts, his own additions ranging from merely adding a preface to writing
or rewriting a substantial amount of the volume. The Temple Gazetteer Project, of which
this paper is a part, aims at collecting and digitally editing all available temple gazetteers
of Buddhist sites with the goal of making them available to a wider audience.
Filled to the brim with facts and legends about a location, the vast majority of these
2
gazetteers were published between the 16th and the 20th centuries and offer valuable
information about the history of Buddhism. The mature form of the gazetteer, which
attempts to provide a comprehensive, cultural description of a site, was widely adopted
only after the Northern Song. At that time too it became common practice to include the
term zhi 志 in the title (Hargett 1996, 419). The first work on Buddhist chorography
which uses the term fangzhi 方志 was Daoxuan’s 道宣 Shijia fangzhi 釋迦方志 (T
2088) of 650, in which he lists places in India and Central Asia.
In the corpus of texts discussed here only a few important ‘proto-gazetteers’ such as
3
the Luoyang qielan ji 洛陽伽藍記 (ZFSH 001) or the Tiantai shengji lu 台勝蹟錄
(ZFSH 064), were published prior to the mid-16th century. As in the case of paper editions,
digital editions need to carefully record the provenance of their content and describe its
relationship with other texts. In the following, therefore, we will give a bibliographic
overview of the corpus at hand.
2
3
See Hargett (1996), Hahn (1997), and Bol (2001) on the antecedents of the gazetteer genre
before the Ming and bibliographic references to more extensive discussions of the topic in
Chinese. Gu (2010) is the most comprehensive reference work for Song gazetteers. For an
analysis of the often mixed Buddhist and Daoist character of “sacred mountains” see Robson
(2009). For a periodization and overview of gazetteer production on Buddhist sites see Cao
(2011, 235-243).
Zhongguo fosi shizhi huikan 中國佛寺史志彙刊.
54 Chung-Hwa Buddhist Journal Volume 25 (2012)
Bibliographic Research
4
A considerable amount of bibliographic research has been done on gazetteers in general.
More still remains to be done. This is a rather dull but indispensable task, both because of
the large quantity of gazetteers and because of the complicated edition histories many of
5
them have. Most of the more than 8500 gazetteers that are known to us are on
governmental administrative divisions such as counties (xian 縣), subprefectures (zhou
), prefectures (fu 府) or provinces (sheng 省). However, as Brook (2002, 31) has
remarked, there are other types of gazetteers, and he provides valuable bibliographic
information on 860 “topographical and institutional gazetteers” which take landscape
features and individual institutions as their subjects. While the gazetteers on
administrative divisions usually include information on Buddhist and Daoist temples and
monastics for the region, this information is generally terse and cannot compare with the
breadth of cultural information that gazetteers dedicated to a site and compiled or
6
commissioned with religious intent can offer.
Hahn (1997) in his dissertation on “mountain gazetteers” (shanzhi 山 志 ) pays
attention to the importance of gazetteer literature for the understanding of religious space;
his focus, however, is exclusively on the category of mountain gazetteers, many of which
treat Daoist sites. Our project, on the other hand, includes both mountain and temple
gazetteers (sizhi 寺志), but we are interested only in records of Buddhist sites. Among
such gazetteers two subgroups can be distinguished: gazetteers that relate information on
a number of Buddhist sites and institutions within a certain region; and those only
concerned with one temple and its adjacent sites. The former subsumes many of the
mountain gazetteers that Hahn (1997) has described, but also includes gazetteers that
describe Buddhist sites of a city or region (e.g. ZFSH 1, ZFSH 7, ZFSH 57).
4
5
6
In English see Brook (2002), Dow (1969), and Franke (1968) and in Chinese Zhuang et al.
(1985), Jin & Hu (1996), Gu (2010) to name only a few.
The most comprehensive catalog so far, the Zhongguo difangzhi zongmu tiyao 中國地方志總
目提要 (Jin & Hu 1996), lists 8577 gazetteers. Even this catalog, however, is not exhaustive,
because it includes only gazetteers on administrative regions published before 1949.
According to the editorial policy statement “Mountain-, river-, temple-gazetteers and the like
were not included” (Jin & Hu 1996, 凡例 1). This means that none of the temple gazetteers
discussed here are listed.
Nevertheless valuable quantitative information can be culled from these sources. Eberhard
(1964), in one of the first projects that made use of computers to digitize information, analyzed
temple building activity in Chinese history on the basis of the founding dates of temples as
included in a significant number of entries. To my knowledge his dataset (encoded with punchcards) was never migrated to a newer format.
Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 55
In the following, we will outline some basic bibliographic parameters which
7
describe what is known about available gazetteers of Buddhist sites. First, the paper
record: The last three decades saw the appearance of two collections of reprints of
8
Buddhist temple and mountain gazetteers:
Zhongguo Fosi Shizhi Huikan 中國佛寺史志彙刊. Taipei: Mingwen shuju
明文 局. 1980-1985. Compiled by Du Jiexiang 杜潔祥. (= ZFSH) 110 vols.
Zhongguo Fosizhi Congkan 中 國 佛 寺 志 叢 刊 . Hangzhou: Guangling
shushe 廣陵 社. 2006 . Compiled by Zhang Zhi 張智. (= ZFC) 130 vols.
The 110 volumes of the ZFSH contain 100 gazetteers and the 130 volumes of the ZFC
contain 197 gazetteers. Although the ZFC is the larger and newer collection, the ZFSH is
the better edited. Its editor, Du Jiexiang, has compiled detailed and helpful tables of
contents for each gazetteer, and gazetteers that appear in both collections are more often
complete in the ZFSH. Most gazetteers in each collection are from Ming and Qing
dynasty woodblock prints, while some are copies of manuscripts, and still others are from
newer printed editions set in movable type.
In order to build an electronic edition it is necessary to understand the overlap
between these two collections. This has not been attempted before, because it is only now
that we have the data available to answer some important questions.
How Many Buddhist Gazetteers do we Have in Hand?
Of the 100 gazetteers in the ZFSH and the 197 gazetteers of the ZFC, 78 have an overlap
with a gazetteer in the other set, forming 78 gazetteer pairs, i.e. two gazetteers, one from
the ZFC and one from the ZFSH, that describe the same location and might have the same
or a similar name. In 39 of these 78 pairs the gazetteers are identical, i.e. the reprints in
9
ZFSH and ZFC were made from identical editions. The relationships or rather the types
of relationship that exist between the remaining 39 pairs are more complex and can be
grouped broadly into the following categories:
7
8
9
After the work on this paper was concluded, Cao (2011) published his seminal work on
Buddhist temple gazetteers in the Ming dynasty. Had it been available earlier, this paper would
have looked differently, though its main task to document the printed and digital gazetteer
corpus of ZFSH and ZFC would have remained the same. Unfortunately, there was no time to
include all of Cao’s important results into this article.
In 2009 the Beijing National Library has announced the planned publication of a collection
named Quanguo difangzhi fodaojiao wenxian huibian
國地方志佛道教文獻匯編. This
collection will only contain excerpted passages pertaining to Buddhist and Daoist sites from
more general gazetteers, much like the data Eberhard (1964) studied. It stands to become an
important new resource and hopefully will enable us to follow Eberhard’s early lead in
performing quantitative research on the history of religious geography.
Here we include re-prints (chongkan 重刊) from the same woodblocks.
56 Chung-Hwa Buddhist Journal Volume 25 (2012)
- In 22 pairs the reprints were made from essentially the same work, but in one edition
some content has been omitted or added. These omissions and additions are usually
short, but sometimes significant. Omissions often reflect the fact that the original, from
which the ZFSH or ZFC reprint was taken, was already incomplete. Sometimes one
edition has been expanded, ZFC 63, for instance, includes two additional chapters (外
篇 卷) which are not found in the correlate ZFSH 72. At other times the situation is
even more complicated - in pair ZFSH 39/ ZFC 71, for instance, we find that the first
chapter of ZFSH 39 lacks pages pp.151-154 and 259-260 of ZFC 71. In the second
chapter, on the other hand, ZFC 71 lacks the material on pp. 267-270 of ZFSH 39.
- In two cases we have different works on the same location with a similar title.
- In five cases we find that on top of omissions and additions, the chapter order or
organization differs.
- With ten pairs the relationship is that of print and manuscript, i.e. one edition is a
manuscript copy of the other.
10
This typology does not cover all cases, but gives a sufficient overview of the field of
similarities and differences. For a more detailed survey of the differences between
gazetteers in these sets see Appendix B.
For the temple gazetteer archive we have digitized all the gazetteers from the ZFSH
and the ZFC, except those 39 in the ZFC which are completely identical with a gazetteer
already found in the ZFSH and another 21 that exhibit only minimal differences, such as
a few missing pages, a different set of maps etc. All in all, 237 gazetteers have been
digitized, and fifteen will be made available as digital full text for the first time. These
fifteen will benefit from new punctuation and XML/TEI mark-up identifying person and
place names as well as dates. Of these fifteen, twelve have been selected for a follow-up
project for a printed re-edition of the texts, with new punctuation, person and place name
11
indices and annotation.
10
11
In one case we have two different manuscripts (ZFC 45/ ZFSH 97) of the same text. There
are also a few rare instances where a gazetteer was reprinted with different layout i.e. not
from the original woodblocks. The Hangzhou shangtianzhu jiangsi zhi 杭
講寺志
was re-carved in 1897 (ZFSH 24), the ZFC (ZFC 88) preserves an older woodblock print of
1646. The Nanchao si kao 南朝寺考, of which the ZFC contains a 1907 woodblock print,
was re-set in movable type in 1944 for inclusion in the (never completed) Puhui Canon 慧
大藏經 (ZFSH 56). In the case of the Tiantaishan fangwai zhi
台山方外志, the ZFC
includes a reprint made from the original woodblocks (ZFC 115) and the ZFSH contains a
movable type edition made in Shanghai in 1922 (ZFSH 89). The ZFSH (ZFSH 46) preserves
a Wanli 萬曆 -era print of the Helinsi zhi 鶴林寺志, while ZFC 76 is an edition with a
different layout from 1909.
The series will be published with Xinwenfeng publishers 新文豐, Taipei, starting in 2013. It
will comprise ZFSH 8: Chongxiu putuoshan zhi 重修
山志, ZFSH 9: Putuoluojia xinzhi
Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 57
How Many Gazetteers of Buddhist Sites are There?
Before taking stock of what we know, let us briefly assess what we can not know. Exactly
how many temple gazetteers in total have been compiled is impossible to know for sure.
12
As with most of Chinese literature many gazetteers are lost forever. As we will see
below, descriptions of sacred sites were only recently included in the Buddhist canon.
Neither did they command the same esteem as gazetteers on administrative divisions,
which had a role in the administration of the realm and therefore the attention of the state
apparatus. As a result, throughout the Ming and Qing neither Buddhist nor Confucian
communities were strongly committed to the preservation of gazetteers of Buddhist sites.
Another reason why gazetteers were lost is that they were superseded by newer ones.
Gazetteers on administrative regions needed to be updated to stay useful and the same
13
need was perceived for other kinds of gazetteers as well. The woodblocks for older
editions were sometimes lost and once the woodblocks were gone, it was often more
practical to recompile a new, updated gazetteer than to re-cut the woodblocks from an old
paper copy. Even the more popular gazetteers only had print-runs of a few hundred copies
(Brook 2002, 38). This explains why 51 of the gazetteers in our archive have survived
only as manuscript taken from a print copy.
Often the print copies perished together with their woodblocks in fires and wars,
especially during the fall of the Ming (ca.1640-1660) and during the Taiping rebellion
(1850-1864). The Taiping were especially destructive in the Lower Yangzi area where, as
12
13
洛 迦 新 志 , ZFSH 10: Mingzhou ayuwangshan zhi 明
育 王 山 志 , ZFSH 11:
Mingzhou ayuwangshan xuzhi 明
育 王 山 續 志 , ZFSH 17: Yucenshan huiyin gaoli
huayanjiaosi zhi 玉岑山慧因高麗華嚴教寺志, ZFSH 43: Hanshansi zhi 寒山寺志, ZFSH
49: Emeishan zhi 峨眉山志, ZFSH 62: Fujian quanzhou kaiyuansi zhi 福建泉 開元寺志,
ZFSH 77: Jiuhuashan zhi 九華山志, ZFSH 81: Qingliangshan zhi 清涼山志, ZFSH 84:
Jizushan zhi 雞足山志, ZFSH 86: Huangboshansi zhi 黃檗山寺志, ZFSH 89: Tiantaishan
fangwai zhi 台山方外志.
Dudbrige (2000, 8) cites an estimate from the 17th century to the effect that less than forty or
fifty percent of books that had been available in the Song survived. Hahn (1997, 17) cites
estimates that only 10% of the works listed in the Jingji zhi 經籍誌 chapter of the Suishu 隨
survived until the Qing.
In his postscript (dated 1589) to the first Ming edition of the Putuoshan gazetteer, Hou
Jigao 侯繼高 writes: “It was no longer possible, in the end, to obtain a copy [of the previous
edition] for one’s armchair travels.... Since [Sheng] Ximing [盛]熙明 wrote the [previous]
gazetteer more than 230 years have passed. What is contained in the four parts [of his
gazetteer] can hardly be all there is [to tell]. When it comes to our Ming, with the increasing
incense fires the [literary] writings about the place also increased. Until now no one like
Sheng Ximing came and turned them into a chronicle. I sighed and said: ‘These famous
mountains, these great temples have to be made known to the world, they should not go
without description.’” (ZFSH 9: 594).
58 Chung-Hwa Buddhist Journal Volume 25 (2012)
we will see below, most gazetteers were produced. The rebels sacked Nanjing, Hangzhou,
Suzhou, and Ningbo, singling out temples and religious sites for destruction.
Having acknowledged these losses, we must proceed to assess the extent of the
corpus that is still available. Beyond the gazetteers digitized in this project, how many
gazetteers on Buddhist sites do we know of ? How many are still available in libraries?
Our database contains bibliographical references from several other works especially
Hahn (1997), Brook (2002) and unpublished notes by Du Jiexiang (2009), who kindly
shared this material with us. Next to the 219 distinct gazetteers from the ZFSH and the
ZFC, this data yields bibliographic data on 59 additional temple gazetteers, most of which
are still available in libraries, adding to a total of 278 in our database.
It is unlikely that more than a few pre-Ming gazetteers on Buddhist sites have
escaped the attention of bibliographers, as the overall number was so much smaller. For
the Ming dynasty Cao (2011, 71-75), against a list of 87 extant temple gazetteers, gives a
list of 65 “lost” gazetteers, which are mentioned in catalogs or cited in other works.
Although Cao has mainly used library holdings in China, and some of the titles might
eventually be found elsewhere, this means that ca. 40%. of known Buddhist gazetteers
from the Ming are now lost. For the Qing, which saw the largest number of gazetteers
produced in the 17th and again in the 19th century after the Taiping rebellion, the situation
is less clear. Our database lists 131 existing gazetteers for the Qing (1644-1911) and 59
published during the Republican period (1912-1949), the relatively high figure for the
latter reflecting both increased publication numbers for the book market in general as well
14
as for the publication of Buddhist material in particular.
After assessing the available bibliographic information, it would be surprising if the
final number of known gazetteers on Buddhist sites published before 1950 were to exceed
500, and the final tally of extant gazetteers is likely to be between 300 and 400.
How Many Locations do the Gazetteers in our Collection Describe?
Although temple gazetteers by definition tend to focus on one location, there are a
number of gazetteers which describe several Buddhist sites on a mountain range or in a
metropolitan area, such as the proto-gazetteer Luoyang qielan ji 洛陽伽藍記 (ZFSH
001), which describes the temples of Luoyang in the early 6th century; the Qingliangshan
zhi 清涼山志 (ZFSH 081) on the temples on Mt. Wutai in the 16th century; the huge
Jinling fancha zhi 金陵梵剎志 (ZFSH 006), a collection of material on the temples of
Nanjing; or the Wulin fan zhi 武林梵志 (ZFSH 007), a guide to the more than four
14
The latter is documented in a database by Gregory Scott’s Bibliography of Modern Chinese
Buddhism (http://bib.buddhiststudies.net/ [Nov. 2011]), which is part of his forthcoming PhD
dissertation “Conversion by the Book - Buddhist Print Culture in Republican China”
(Columbia University).
Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 59
hundred temples in the Wulin hills near Hangzhou’s famous West Lake. Since one
gazetteer may describe multiple sites, while a single temple may be the subject of more
15
than one gazetteer, the analysis must be performed with care. One visualization of the
sites described in the one hundred gazetteers of the ZFSH references 116 temples. As the
map shows, most of these are in Zhejiang and Jiangsu province:
Fig.1: Location of Buddhist sites described in the 100
gazetteers contained in the ZFSH.
Clearly recognizable are the centers of some “macro-regions” often used to discuss later
16
imperial China. This correlation does tell us more about the economics of publishing
than the level of Buddhist activity in the region. For the production of gazetteers
considerable resources were needed. Moreover, once printed it had to be sold to an
audience of interested literati, a market that was not available outside these centers. We
therefore see a cluster of sites in Guangdong (Lingnan), one along the coast of Fujian
(Southeast), the many sites around Ningbo, Hangzhou, Suzhou and Nanjing (Lower
Yangzi), fewer in the area around Jiujiang, Wuhan and Nanchang (Middle Yangzi) and a
cluster in the north around Beijing and Mt. Wutai. Interesting too are the absences of
gazetteers in certain regions. In the ZFSH, which has no regional bias, there are no
17
gazetteers describing sites in Shandong, none in the vast region comprising Hunan,
15
16
17
Current visualizations of the archive, which plots the referenced temples on a map can be
found at our website in KML format. At time of writing, the visualization includes all the
main sites described in the ZFSH and ZFC.
See a discussion of these macro-regions during the 18th century (when many of our gazetteers
were compiled) in Naquin and Rawski (1987).
For the low level of Buddhist activity in Shandong, see Brook (1993, 238-240). The only
gazetteer from Shandong in our archive is the Lingyan zhi 靈巖志 (ZFC 18).
60 Chung-Hwa Buddhist Journal Volume 25 (2012)
Guangxi, Guizhou, and eastern Sichuan (today Chongqing Municipality), and none north
of Mt. Wutai.
Somewhat remarkable is the absence of temple gazetteers for the old heartland of
Chinese Buddhism around Chang’an, in the area of today’s Xi’an in Shaanxi. There in the
northwest we find various editions and continuations of Yang Xuanzhi’s 楊 之 Luoyang
qielan ji 洛陽伽藍記, and the famous Songshan shaolinsi jizhi 嵩山少林寺輯志 (ZFSH 78)
of 1612, but on the whole surprisingly few gazetteers were produced in this region. This reflects
the fact that during the Ming and Qing Chinese Buddhism in the Northwest was much weaker
than during its heyday in the Tang. Though there still were many temples, some of considerable
antiquity, culturally, Chinese Buddhism faced competition in this region from both Islam and
Tibetan Buddhism. Moreover, Xi’an was not exactly a hotbed of literary activity. According to
Naquin and Rawski, “the elite of the northwest played now [in the 18-19th cent.] only a minor
role in national literati culture. There were few academies, and the region took a negligible part
18
in the scholarly projects so typical of the Qing period.” Gazetteer writing became popular
during the Song in the lower Yangzi region. It was a product of later Chinese literati culture, the
tastes and sensibilities of which were not universally accepted on the northwestern border of the
empire, where Chinese, Muslims, Tibetans, Mongols and Manchus co-existed uneasily.
Even more sites could be added to the visualization above, if all the temples mentioned in e.g.
the Nanchao fosi zhi 南朝佛寺志 (ZFSH 5) or the Jiangnan fancha zhi 江南梵剎志 (ZFSH 57)
were included. Moreover, information on one temple can be found in several gazetteers. The site
of the Jinshan si 金山寺 in Zhejiang, for instance, is associated with at least four gazetteers
(ZFSH 37, ZFSH 38, ZFSH 39, ZFSH 57).
Fig.2: Gazetteers in the ZFSH that contain descriptions of the Jinshan si temple.
As a result of the many-to-many relationship of gazetteers and temples described in them,
the archive contains descriptions of at least 400-500 temples. About 50% of these were or
19
are located in the lower Yangzi region (Jiangsu, Zhejiang and Anhui).
18
19
Naquin and Rawski (1987, 192).
A geo-referenced dataset of the sites described in the 234 gazetteers has been built and is
available from the author on request.
Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 61
Prefaces
When a gazetteer for a location had become unavailable, or an update was in order, new
20
editions were produced. Usually, the prefaces or, more rarely, the postscripts of
previous editions were included in later ones and from these an outline of the gazetteer’s
evolution can be traced. This is true for gazetteers in general as well as for the gazetteers
on Buddhist sites. Prefaces and postscripts therefore, play an important role in
understanding the genre itself and they are also one of the best places to look for
information about the author-compilers, their motivation and the history of the
compilation. This provides a useful angle for understanding how literati culture interacted
with Buddhism during the Ming and Qing. Moreover, prefaces are often the only place
where the voice of the compiler appears at all. As compiled works, gazetteers contain for
the most part texts (e.g. biographies, poems, epigraphy) that were collected from earlier
sources, the preface, postscript or, in later times, a section on ‘edition policy’ (fanli 凡例)
is essential for understanding the selection criteria.
21
With the gazetteer, as with other historiographical forms, a traditional genre was
adapted to record the history of Buddhism. Most of the earlier gazetteers of Buddhist sites
were not compiled by Buddhist scholar-monks, but by literati scholar-officials or
22
members of the local gentry. Often a gazetteer for a site was commissioned by Buddhist
monks or lay-believers to a literati writer, who was perceived as sympathetic or at least
indifferent to Buddhism. The commissioned compilers were, however, rarely purely
religiously motivated. Sometimes a Buddhist monk would later re-edit or rewrite the
gazetteer from a more Buddhist perspective. This was especially common during the
23
Chinese Buddhist revival of the late Qing and the Republican era. In the 1938 edition of
20
21
22
23
The relationship between a gazetteer and its previous editions is complicated at best (see Qiu
2008 for the edition history of some temple gazetteers from Guangdong and Guangxi). Only
sometimes would new editions be marked in the title as such. Generally text from the older
edition would be reused in varying amounts, while it was mainly up to the (re-)compiler what
to add.
The genres used by Chinese Buddhist historiographers are without exception drawn from
already existing precedents. Although Buddhism strongly influenced Chinese language and
literature, it did not develop a distinct way of writing history. (On the use of genre in
Buddhist historiography, see Bingenheimer 2009).
Of the 87 Ming gazetteers listed in Cao (2011) only 21 were compiled by monks.
During the 18th and 19th the Qing emperors had generally favored Tibetan over Chinese
Buddhism. Moreover the Taiping rebellion (1850-1864) had destroyed much of the Buddhist
infrastructure in the lower Yangzi region, the heartland of Chinese Buddhism. Therefore
founding of the Jinling Scriptural Press 金陵刻經處 in Nanjing by Yang Wenhui 楊文會
(1837-1911) in 1866 is widely seen as the beginning of a new chapter for Chinese Buddhism.
62 Chung-Hwa Buddhist Journal Volume 25 (2012)
the Jiuhuashan zhi 九華山志 (ZFSH 77), for instance, the eminent monk Yinguang 印
24
(1861-1940) compares the new edition with previous ones:
The earlier editions of this gazetteer were written by literati, who would not
even dream of the Buddhist teachings. To them, to believe or to doubt the
miraculous stories about [the Bodhisattva] Dizang 地藏 was all the same,
and they included his biography among those of [ordinary] humans, which
were placed after the chapters with literary texts and biographies of Daoist
immortals [my emphasis, M.B.]. In our new edition of the gazetteer the first
chapter is dedicated to the saintly traces [of Dizang’s deeds]. […] The
earlier editions gave pride of place to the temples that were established by
imperial decree or had received the inscription above their gate from the
court. Those temples that were built by private donations, or for which the
funds were collected [by the clergy] were called hermitages, chapels, groves,
or halls, and placed after the former. [...] From the Tang to our days more
than a thousand years have passed. There have been many upheavals,
[dynasties] rose and fell. Only a few monks might [nowadays] live in what
was designated a “temple” in the past, and what was called a “hermitage” or
a “chapel” now houses many. Society too has changed and no longer
follows the will of a king. In this gazetteer we therefore put the large [public]
conglin 叢林 monasteries, where monks from all directions gather, first.
After that we include the smaller
family temples [where the monks from one
25
ordination lineage reside].
Yinguang seems to have relished the freedom gained after the fall of the empire. During
the Republican era it was possible for Buddhists to claim superiority for their religious
sites in an unprecedented way. Being liberated from the need for rhetorical tributes to the
greatness of imperial power, Yinguang wryly comments on the lack of devotion the five
marchmounts (wuyue
) now inspired:
When talking about Jiuhua mountain people often used to regret that it was
not included in the five marchmounts where the imperial court makes
offerings. Did they not know that at the marchmounts’ temples no one but
the local government officials in charge make two offerings per year, one in
spring and one in autumn? At Jiuhua mountain, however, devotees from all
over the country offer their sincere respects, and the burning of incense and
24
25
Next to Taixu
虛 , Hongyi 弘 一 and Xuyun 虛 雲 , Yinguang was one of the most
influential monks of the Republican era. Before the Jiuhuashan zhi (1938) he had organized
the re-edition of the gazetteers of the three other ‘great mountains’ of Buddhism: the
Putuoshan zhi (ZFSH 9) (1924), the Qingliangshan zhi 清涼山志 (ZFSH 81) (1933), and the
Emeishan zhi 峨眉山志 (ZFSH 49) (1934).
ZFSH 77: 32.
Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 63
the prayers do not cease from dawn
to dusk. How could the five
26
marchmounts ever hope to compare?
That temple gazetteers were indeed compiled with the secular attitude criticized by
Yinguang can be seen in the three prefaces of an edition of the Putuoshan gazetteer
(Chongxiu putuoshan zhi ZFSH 8). These were clearly not written from a Buddhist
perspective and the three authors, all of them jinshi 進士 scholars writing in the early
17th century, mainly praise the emperor and the landscape, and emphasize the role secular
officials played in reconstructing the site. Clerics and the Bodhisattva Guanyin 觀音, to
27
whom Mount Putuo was dedicated, are mentioned only in passing.
The literati rhetoric, which downplays both any possible religious motivation on the
part of the authors and the religious context of the site, was not limited to literati authors.
Consider Yuanxian’s 元賢 (1578-1657) preface to the gazetteer of the Kaiyuan temple
開元寺 in Quanzhou written in 1643, at a time when Confucian hegemony was still
unchallenged. Compared to Yinguang, Yuanxian had to couch his critique of Confucian
literati writing on Buddhists sites in more careful language:
The first records [about the Kaiyuan temple of Quanzhou] were composed
in the Song, when Xu Lie 許列 wrote the “Biographies of Eminent Monks
28
of the Kaiyuan Temple”. The Yuan dynasty master Mengguan 夢觀
accused Xu’s work of being unreliable and based on hearsay, its
explanations being unfounded and labored, coarse and unrefined, not
worthy of being read. Master Mengguan then wrote the “Biographies of
Bodhisattvas”, his work was erudite and knowledgeable. […]
Since then more than 300 years have passed and today’s chan 禪 practice
cannot compare to that of yesteryear. In these days of decline there hardly
seems anything worth reporting. Nevertheless, the ups and downs, the
continuities and changes
should be recorded somehow. In 1596 Master
29
Chen “Zhizhi” 陳
first produced a gazetteer, but his research was
superficial and people felt he did not do a very good job of it. Then in the
winter of 1635-1636 some gentlemen of Wenling asked me to teach at the
Kaiyuan temple. [...(Yuanxian is asked several times to write a history of the
Kaiyuan temple)].
26
27
28
29
ZFSH 77: 31.
This attitude in the Chongxiu putuoshan zhi (ZFSH 8) of 1607 is much different from and in
fact a reaction to the first full fledged gazetteer of the site that was produced by the Admiral
Hou Jigao 侯繼高 and the poet Tu Long 屠隆 only some years earlier in 1589. Hou and his
friends were on the Buddhist side of the Confucian-Buddhist syncretist spectrum and broadly
sympathetic to Buddhism.
This is probably the monk Dagui 大圭 (14th century).
Otherwise unknown. Zhizhi was almost certainly a style name.
64 Chung-Hwa Buddhist Journal Volume 25 (2012)
Though I do not have the ability to write a gazetteer – me being just a rustic
from Nanzhou, who, not successful in studying Confucianism, gave up and
studied Buddhism instead [!] – I have followed the wishes of these
gentlemen. […] I have just tried to fill a30gap. Someday a better writer will
come and this gazetteer may be replaced.
Two things should be noted here. Firstly, the overview of previous gazetteers of the site –
a standard constituent of gazetteer prefaces – illustrates the change in genre: while in the
Song and Yuan dynasties the history of a temple was written in the (by then well-known)
form of collected biographies (zhuan 傳) (i.e. the works of Xu Lie and Mengguan), in the
late Ming Yuanxian is asked to write a gazetteer. The gazetteer as a genre continues the
historiographical tradition of earlier times. Secondly, Yuanxian, in spite of his humble
rhetoric, deftly disparages previous attempts by non-clerical writers to write about the
Kaiyuan temple. And yet, that the monk Yuanxian, during the last days of the Ming,
wrote passages like “not successful in studying Confucianism, gave up and studied
Buddhism instead” 學儒不成棄而學佛 testifies to the hegemony of the Confucian
discourse, of which Yinguang three hundred years later was newly freed.
Obviously, prefaces are the first place to look for the compilers’ intentions, but their
evaluation must take account of context and allow for semantic and rhetorical
polyvalence. When looking for prefaces one should bear in mind that they are not always
found at the beginning of a gazetteer; sometimes they are prefixed only to certain
chapters, while older prefaces might be collected in a special section somewhere within
31
the body of the text. Then again there are different types of texts called “preface” xu 序.
The Huangbo gazetteer (ZFSH 86) preserves, attached as “prefaces,” two interesting
32
endorsements of fund-raising appeals. The first, titled Preface to the Fund-raising
Efforts for the Reconstruction of Huangbo 重 黃 檗 募 緣 序 , was written by Ye
33
Xianggao 葉向高 (1559-1627) sometime between October 1614 and 1620. Ye, who
was a Fujian native, rose through the ranks to become one of the most important grand
secretaries during the Ming. He was a gifted writer and starts his preface in literary
fashion with a line from the Liang dynasty poet Jiang Yan 江 淹 (444-505), who
described the mountain scenery in his Journey to Mount Huangbo 游黃檗山: “The
30
31
32
33
ZFSH 62: 4-8.
The prefaces of previous editions of the 1607 gazetteer of Mt. Putou, for instance, are found
in Ch.4 (ZFSH 8: 312-389).
Endorsements gave the monastery a quasi-legal backing to approach prospective donors and
presumably were helpful in raising money from among the gentry. See Brook (1993, 196-213)
for a discussion of some other examples of these fund-raising appeals.
ZFSH 86: 240-242. For Ye Xianggao 葉向高 see his entry in Goodrich (1976, sub voc.).
Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 65
dazzling Luan birds glide by sunlit peaks, in shaded brooks gush dragon springs; (...) On
34
the crimson cliffs the cries of birds, monkeys shout in clear and empty spaces.”
The reference to Jiang Yan, who went to the Huangbo mountains before Buddhist
activity is recorded for the area, sets the tone for a secular recommendation in support of
a religious institution. Ye keeps his text largely devoid of Buddhist imagery. He recounts
how the Wanli emperor, on occasion of the death of his mother the Empress Dowager
Cisheng 慈聖 (1546-1614), donated a set of the Tripitaka to the monastery and uses this
and earlier land donations by the Hongwu 洪武 emperor as precedents to justify his own
35
support for the fund-raiser. He alludes to the fact that official support may not be taken
for granted:
Some say the Buddhist teachings are sheer nonsense, are to be avoided by
Confucians, and do not merit respect. [These people] do not realize that in this
universe this way does exist after all, and cannot just be abolished.
[Nevertheless, in spite of the example] of his majesty Emperor Gao [the
founder of the Ming] himself, there are [still people] saying this. When I stayed
in the capital I saw how in its vicinity everywhere there were landholdings that
temples had received from emperor Gao. Huangbo [Monastery] is more than a
thousand years old, and again
our Emperor has given orders [to support it].
36
How can one not admire this?
Ye supports the rebuilding of the temple, which had been destroyed by a fire in the Jiajing
嘉靖 period (1522-1569) and urges the “believers of the four directions” to assist the
monks in this task.
About two hundred years later the temple was again in dire straits and the monks
approached Zhang Jinyun 張縉雲, an official posted in the area. His argument is similar
to that of Ye. In his Preface to Donation Records 黃檗寺緣簿序 (c.1823-1826) Zhang
writes:
When I came to this area in 1823, I visited first Lingshi 靈石 [monastery],
then Huangbo. Both temples had fields that had been appropriated by
someone. I sent a messenger to make inquiries, and the people returned the
34
35
36
陽岫飛鸞彩,陰 噴龍泉,鳥 丹壁 ,猿嘯清虛間. As quoted (rather freely) by Ye in
ZFSH 86: 240.
Huangbo, under its abbot Zhongtian Zhengyuan 中
圓 (1537-1610), received one of
only six sets that were given to various monasteries on this occasion. The late Empress
Dowager had been an important supporter of Buddhism. In 1602 it was due to her influence
that abbot Xinkong Mingkai 心 空 明 開 (1568-1641) received a Tripitaka set for his
Guangming monastery (Brook 1993, 241; also 206 and 262). Recently, a comprehensive
study of these events, especially the promotion of Buddhism by the Empress Dowager and
her son the Wanli emperor, has been completed (Zhang 2010).
ZFSH 86: 241.
66 Chung-Hwa Buddhist Journal Volume 25 (2012)
fields to the temples, without charging for it.[...] [A while ago] the monks
from the [neighboring] Lingshi monastery asked me to write an
endorsement [for a fund-raiser]. I consented and less than one year later, the
monks from Huangbo too asked me to write an endorsement to raise funds.
Huangbo’s buildings are even more numerous than those of Lingshi, the
repair costs are huge and the monks have no choice but to ask for help. The
teachings of the two masters [Buddhism and Daoism] are not greatly
admired by [us] Confucians, but I felt that as the local official I would be at
fault if I would not see to the37repair of the famous sites of the area that have
been continued for centuries.
Both Ye and Zhang are hedging here against possible criticism from conservative
Confucians. Timothy Brook in his study of the relationship of late-Ming gentry with
Buddhism outlines the attempt of Neo-Confucians to integrate Confucianism and
Buddhism as well as the conservative backlash against this trend. The conservative
reaction against members of the gentry assimilating Buddhist practices had teeth. In 1602
Li Zhi 李 贄 (1527-1602), the radical champion for a synthesis of Confucian and
Buddhist ideals, committed suicide in prison after being impeached for heterodoxy. Ye,
who would have known Li as a fellow Fujianese personally, certainly remembered the
case. Even Zhang two hundred years later probably would have known about the incident,
38
as the indictment was widely circulated in later times.
This is one of the reasons why, although both Ye and Zhang were supportive of
Buddhism on other occasions as well, it is difficult to gauge the depth of their interest in
39
Buddhism. Belonging as they did to “Neo-Confucianism’s captive audience”, they had
to frame their support as part of their administrative duties and put a certain rhetorical
distance between them and their Buddhist subjects.
Gazetteers and the Canon
What is the relationship of the corpus of Buddhist temple gazetteers and the corpus of
religious texts preserved in canonical editions? Catalogs and editions of the Buddhist
canon existed before the gazetteer emerged as a genre. The Buddhist canon was never
closed, however, and new material was included in every new edition. Although by late
40
imperial times some of the proto-gazetteers were already several hundreds of years old,
37
38
39
40
ZFSH 86: 264-265.
The indictment (first translated by Franke 1938, 23-24) is fiercely critical of literati families
practicing Buddhism.
Brook (1993, 90).
The Luoyang qielan ji, written in 547 CE, even neared the 1000 th anniversary of its
publication.
Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 67
neither they nor the newer temple or mountain gazetteers were included in canonical
41
editions during the Ming and Qing. There are no a priori reasons why temple gazetteers
should not be included. The canon contains many works from secular genres such as
catalogs, biographies or dictionaries. However, many of the early gazetteers were not
written by monks and even the Taishō, one of the most liberal and inclusive editions,
contains very few texts that were not written by monks.
Another issue for the incorporation into the canon of post-Tang Chinese Buddhist
literature was timing: the older a text the more likely its inclusion. The annals of Song
and Yuan Buddhist historiography, for instance, appear only infrequently in canonical
editions of the Ming and Qing, and many of them are first included only in a Japanese
edition, the Man[ji] zokuzōkyō 卍續藏經 supplement to the canon proper (ed. 1905-1912).
Of the approximately 230 different gazetteers in our archive fewer than ten were
produced before 1600. This is in line of what Brook (1993, 64) proposed about the
adoption of Buddhism among the gentry during the Ming: that the popularity of
Buddhism among the gentry became visible again only in the latter half of the sixteenth
century. This perception, that the time between roughly 1350 and 1550 saw less Buddhist
activity in China than either before and after, is corroborated by the data Eberhard (1964,
42
280 and 298) has assembled from gazetteers. The decline of Buddhism in terms of
personnel and real estate during the early and mid-Ming was apparently due to the
restrictive regulations, which the Hongwu and Yongle emperors placed on Buddhist
activities.
It is therefore no coincidence that, modeled on gazetteers on administrative units, the
first sizhi 寺志 and shanzhi 山志 gazetteers on Buddhist sites appear only after this
period of relative decline. It was, however, too late for their inclusion in the Ming
canonical editions. The three official Ming editions (Hongwu nanzang 洪武南藏, Yongle
nanzang 永樂南藏, Yongle beizang 永樂 藏) were all carved before 1440 and only the
privately funded Jiaxing 嘉 canon would have been late enough to accommodate the
then brand-new gazetteer literature. Understandably, the editors of that edition decided that
the gazetteers did not merit inclusion, as many of them were compiled by lay-men and there
was no precedent for the inclusion of gazetteers. The major canonical edition of the Qing,
the Long zang 龍藏 created 1733-1738, was conservative with regard to inclusion and
41
42
The Ming especially saw the production of several canonical editions both by the court as
well as at private hands. Especially the Jiaxing zang 嘉 藏 added many scriptures that were
produced during the Song, Yuan and early Ming dynasties and were being included in a
canonical edition for the first time.
On the decline of Buddhism in the middle period of the Ming see Yü (1998, 918). See also
the dissertation of Zhang Dewei (2010), who clearly traces the impact of the imperial support
for Buddhism by the Wanli emperor and his mother.
68 Chung-Hwa Buddhist Journal Volume 25 (2012)
43
contained fewer texts than the Jiaxing zang. Only with the Taishō edition, created in the
early 20th century by Japanese scholars rather than government officials or lay Buddhists,
44
twelve proto-gazetteers were included. All of them were first published before the Ming.
Title
Author / Editor
Date
T 2092
(5 juan)
Luoyang
qielan ji
洛陽伽藍記
On the temples of Luoyang in ca. 500.
Yang Xuanzhi
楊 之 (active
around 547)
after 534
T 2088
(2 juan)
Shijia fangzhi On place names of India, and Central
Asia related to Buddhism. Last four
釋迦方志
sections deal with the introduction and
establishment of Buddhism in China.
Daoxuan 道宣
(596-667)
dated 650
T 2091
(1 juan)
Dunhuang lu
45
敦煌錄
Dunhuang fragment S.5448 (893
Author unknown
characters) describing Buddhist sites in
and around Dunhuang
after 756
T 2093
(1 juan)
Si ta ji
寺塔記
Short descriptions of some temples in
Luoyang (esp. the 大 善寺) ca. 843
Duan Chengshi
段成式
(c. 803-863)
after 843
T 2094
(1 juan)
Liangjing si ji Short notes on nine temples in Nanjing
during the first half of the 6th century
梁 寺記
Compiler
unknown
after
46
1160
T 2095
(5 juan)
Lushan ji
廬山記
43
44
45
46
47
Describes the Buddhist sites on Mt. Lu.
Chen Shunyu
1072
Recording biographies of eminent monks, 陳舜
(d. 1075)
poems and inscriptions. First work
resembling the mature gazetteer genre in
containing geographic and historical
47
information as well as belles lettres.
Including all supplements the Jiaxing zang contains 2090 texts and the Long zang only 1669.
I use the term proto-gazetteers for the chorographical works that do not yet have the size,
the self-awareness and the attitude of later gazetteers, but already exhibit the combined
interest in history, literary and topographical description that is found in the mature form.
Proto-gazetteers generally do not yet use zhi 志 in the title, but ji 記 or zhuan 傳.
Translated by L. Giles (Giles 1914, Giles 1915, cf. Hu 1915).
See Suwa (1977, 91) for the complicated history of the short text, which was compiled from
several earlier sources.
As student of Ouyang Xiu 歐陽脩, the author Chen Shunyu was well versed in historiography.
The Lushan ji has been studied by Reiter (1978 & 1980). Lushan is one of the sites which
have a large number of gazetteers, next to T 2095, there is ZFSH 75, ZFC 28, 29, and 118,
which remain unstudied.
Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 69
T 2096
(1 juan)
Tiantaishan ji The earliest account of sacred sites Xu Lingfu
after 815
around the Tiantai mountains.
台山記
靈府
Written by a Daoist associated with (c. 760-841)
the Shangqing 清 school and
mentioning Buddhist influence only
in passing.
T 2097
(3 juan)
Nanyue
zongsheng ji
南 總勝集
On the sites around Nanyue.
Nanyue, the “Southern
Marchmount” in the system of five
sacred peaks is commonly called
Hengshan 衡山. The text is also
included in the Daoist canon.
Chen Tianfu 1131 - 1163
preface dated:
陳田
嘉慶六 六 朔日
Preface by
Sun Xingyan (1801-07-11 CE)
孫星衍
48
Three short proto-gazetteers on the
sacred sites at Mt. Wutai 台.
Huixiang
慧祥
Preface by
Guangying
廣英
49
T 2098
(2 juan)
Gu Qingjing
zhuan
古清涼傳
T 2099
(3 juan)
Guang
Qingjing
zhuan
廣清涼傳
Que Jichuan dated: 嘉祐紀號龍
郄濟
集庚子
(1060-02-05 to
1060-03-04 CE)
T 2100
(2 juan)
Xu Qingjing
zhuan
續清涼傳
Zhang
Shangying
張商英
T 2101
(1 juan)
Butuoluojiash Contains descriptions and scriptural Sheng
an zhuan
sources for the Guanyin cult at Mt. Ximing
補 洛迦山傳 Putuo.
盛熙明
(1323—
1363)
after 680
preface dated:
大定辛丑歲
十
七日 (1181-03-04
CE)
dated: 大定四 九
十七日 (116410-04 CE)
50
1349-1359
The editors of the Taishō made the decision to include only topographical descriptions
that were written before the Ming. This was innovative as most of these texts had not
been part of canonical editions before. Although the Taishō has never been superseded as
the authoritative edition, there have been a number of attempts to re-edit or supplement
the canon. Some of these editions also noticed the value of topographical literature.
48
49
50
Robson makes ample use of this text and translated parts of the preface (2009, 2-3).
Cao (1999, 195).
Based on the remark by Hou Jigao in June 1589 to the effect that Sheng’s work predates the
gazetteer commissioned by Hou “more than 230 years” (ZFSH 8: 334).
70 Chung-Hwa Buddhist Journal Volume 25 (2012)
The uncompleted Puhui Canon 慧大藏經, 1944 – an unsuccessful, and in the end
aborted, attempt to create a new Chinese Buddhist canon in the 1930s and 40s – does
51
include the Nanchao si kao 南朝寺考. The Dazangjing bubian 大藏經補編 (Lan
1984), a little known recent supplement to the canon, shows a growing concern with
topographical sources and includes for the first time works such as the Jinling fancha zhi
(ZFSH 6) and the Wulin fan zhi (ZFSH 7). As the editing principles for canonical editions
of Chinese Buddhist texts grow more comprehensive, it is likely that the trend to include
topographic descriptions will continue and the (digital) Buddhist canons of the 21st
century will eventually include gazetteers. Gazetteers of Buddhist sites are valuable
sources for researchers trying to understand the actual practice of Buddhism in a certain
place, at a certain time. We hope that the digital archive of Chinese temple gazetteers will
make these sources better accessible to all.
51
The Nanchao si kao (ZFSH 56) is not a gazetteer in the narrow sense, but a Qing dynasty
attempt to gather or reconstruct information on temples during the Southern dynasties. For an
analysis of the information concerning the temples constructed in the Liang dynasty see Suwa
(1980).
Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 71
Appendix A
These are the tables of content for the ZFC and the ZFSH. To my knowledge these lists
are not available elsewhere, not even within the collections themselves. They constitute a
finding aid by indexing the location of the gazetteer in the collections. The tables also
cross-reference the collections: “=” indicates the gazetteer in the other collection is for all
practical purposes identical; “~” indicates the other collection contains another, similar
edition of this gazetteer. In the latter case the user might want to consult Appendix B for
more information. An alphabetical listing of the titles according to the pinyin
52
romanization is available on the web.
ZFC: Zhongguo Fosizhi Congkan 中國佛寺志叢刊:
ZFC 001 (Vol.001): Huang ming si guan zhi 皇明寺觀志
ZFC 002 (Vol.001): Bei ping miao yu tong jian
廟宇 檢
ZFC 003 (Vol.002): Bei jing miao yu zheng cun lu
廟宇征存錄
ZFC 004 (Vol.002): Shang fang shan zhi 方山志 ( ~ ZFSH 099)
ZFC 005 (Vol.003): Fa yuan si zhi gao 法源寺志稿
ZFC 006 (Vol.003): Tan zhe shan xiu yun si zhi 潭柘山岫雲寺志 ( ~ ZFSH 047)
ZFC 007 (Vol.004-005): Pan shan zhi 盤山志 ( ~ ZFSH 080)
ZFC 008 (Vol.006): Shao lin si zhi 少林寺志
ZFC 009 (Vol.007): Luo yang qie lan ji he jiao ben 洛陽伽藍記合校本 ( = ZFSH 004)
ZFC 010 (Vol.007): Luo yang qie lan ji gou chen 洛陽伽藍記鉤沉 ( = ZFSH 003)
ZFC 011 (Vol.008): Luo yang long men zhi 洛陽龍門志
ZFC 012 (Vol.008): Long men zhi xu zuan 龍門志續纂
ZFC 013 (Vol.008): Mai ji shan shi ku zhi 麥積山石窟志
ZFC 014 (Vol.008): Da tong wu zhou shi ku si ji 大同武 石窟寺記
ZFC 015 (Vol.009): Qing liang shan zhi 清涼山志 ( ~ ZFSH 081)
ZFC 016 (Vol.009): Bi shan xiao zhi 碧山小志
ZFC 017 (Vol.009): Qi yan shan zhi 七岩山志
ZFC 018 (Vol.010): Ling yan zhi 靈岩志
ZFC 019 (Vol.010): Zi peng shan zhi 紫蓬山志
ZFC 020 (Vol.011): Lang ye shan zhi 瑯琊山志
ZFC 021 (Vol.012): Ye fu shan zhi 冶父山志
ZFC 022 (Vol.012): Yun ling zhi 雲嶺志
ZFC 023 (Vol.013): Huang shan cui wei si zhi 黃山翠微寺志
ZFC 024 (Vol.013): Jiu hua shan zhi 九華山志 ( ~ ZFSH 077)
ZFC 025 (Vol.014-015): Yu quan si zhi 玉泉寺志 ( ~ ZFSH 096)
ZFC 026 (Vol.015): Hong shan bao tong chan si zhi 洪山寶 禪寺志 ( ~ ZFSH 095)
52
http://buddhistinformatics.ddbc.edu.tw/fosizhi/ (August 2011).
72 Chung-Hwa Buddhist Journal Volume 25 (2012)
ZFC 027 (Vol.016): Lian feng zhi 蓮 志
ZFC 028 (Vol.016): Lu shan gui zong si zhi 廬山 宗寺志
ZFC 029 (Vol.017): Lu shan xiu feng si zhi 廬山秀 寺志
ZFC 030 (Vol.018-019): Qing yuan zhi lue 青原志略 ( ~ ZFSH 094)
ZFC 031 (Vol.020): E hu feng ding zhi 鵝湖
志
ZFC 032 (Vol.020): Hui li si zhi 慧力寺志
ZFC 033 (Vol.021): Yun ju shan zhi 雲居山志 ( ~ ZFSH 074)
ZFC 034 (Vol.022-025): Jin ling fan cha zhi 金陵梵剎志 ( = ZFSH 006)
ZFC 035 (Vol.026): Zhe yi fan cha zhi 折疑梵剎志
ZFC 036 (Vol.027): Jin ling da bao en si ta zhi 金陵大報恩寺塔志 ( = ZFSH 068)
ZFC 037 (Vol.027): Nan chao si kao 南朝寺考 ( ~ ZFSH 056)
ZFC 038 (Vol.028): Nan chao fo si zhi 南朝佛寺志 ( = ZFSH 005)
ZFC 039 (Vol.028): Xian hua yan zhi 獻花岩志 ( ~ ZFSH 070)
ZFC 040 (Vol.029): Ling gu chan lin zhi 靈谷禪林志 ( ~ ZFSH 067)
ZFC 041 (Vol.030): Niu shou shan zhi 牛首山志 ( ~ ZFSH 069)
ZFC 042 (Vol.030-031): She shan zhi 攝山志 ( ~ ZFSH 034)
ZFC 043 (Vol.031): Qi xia xiao zhi 栖霞小志
ZFC 044 (Vol.031): Wei mo si zhi 維摩寺志
ZFC 045 (Vol.032-038): Wu du fa cheng 吳都法乘 ( ~ ZFSH 097)
ZFC 046 (Vol.039): Cang hai si zhi 藏海寺志
ZFC 047 (Vol.039): Chang shu xing fu si zhi 常熟 福寺志 ( ~ ZFSH 036)
ZFC 048 (Vol.039): San feng qing liang chan si zhi
清涼禪寺志
ZFC 049 (Vol.040-041): San feng qing liang si zhi
清涼寺志
ZFC 050 (Vol.041): Su zhou fu bao en si zhi
府報恩寺志
ZFC 051 (Vol.041): Kai yuan si zhi 開元寺志
ZFC 052 (Vol.042): Han shan si zhi 寒山寺志 ( = ZFSH 043)
ZFC 053 (Vol.042): Han shan zi shi ji 寒山子詩集
ZFC 054 (Vol.042): Han shan si han tong fo xiang ti yong 寒山寺漢銅佛像題詠
ZFC 055 (Vol.042): Han shan si xiao zhi 寒山寺小志
ZFC 056 (Vol.043): Yao feng shan zhi 堯 山志 ( ~ ZFSH 066)
ZFC 057 (Vol.043): Feng huang shan yong qing si zhi 凰山永慶寺志
ZFC 058 (Vol.043): Zhu tang si zhi 堂寺志
ZFC 059 (Vol.043): Zhu tang si zhi bu 堂寺志補
ZFC 060 (Vol.044-045): Deng wei shan sheng en si zhi 鄧尉山聖恩寺志 ( = ZFSH 042)
ZFC 061 (Vol.045): Wu jin tian ning si zhi 武進 寧寺志 ( ~ ZFSH 035)
ZFC 062 (Vol.046): Ling yan shan zhi 靈岩山志
ZFC 063 (Vol.046): Ling yan ji lue 靈岩紀略 ( ~ ZFSH 072)
ZFC 064 (Vol.047): Ling yan zhi lue 靈岩志略 ( = ZFSH 073)
ZFC 065 (Vol.047): Ling yan xiao zhi 靈岩小志
ZFC 066 (Vol.047): Wu xi nan chan si zhi 無錫南禪寺志
ZFC 067 (Vol.047): Ren cao an zhi 忍草庵志 ( = ZFSH 098)
ZFC 068 (Vol.047): Guan hua cong lu 貫華叢錄
ZFC 069 (Vol.047): Fu hui shuang xiu an xiao ji 福慧雙修庵小記
Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 73
ZFC 070 (Vol.048): Jin shan zhi 金山志 ( = ZFSH 038)
ZFC 071 (Vol.049): Xu jin shan zhi 續金山志 ( ~ ZFSH 039)
ZFC 072 (Vol.049-050): Jin shan long you chan si zhi 金山龍游禪寺志 ( = ZFSH 037)
ZFC 073 (Vol.050): Jin shan jiang tian si xiao zhi 金山江 寺小志
ZFC 074 (Vol.051): Jing kou jia shan zhu lin si zhi 口夾山 林寺志
ZFC 075 (Vol.051-052): Zhao yin shan zhi 招隱山志
ZFC 076 (Vol.052): He lin si zhi 鶴林寺志 ( ~ ZFSH 046)
ZFC 077 (Vol.053-054): Bao hua shan zhi 寶華山志 ( = ZFSH 041)
ZFC 078 (Vol.054): Jian long si zhi lue 建隆寺志略
ZFC 079 (Vol.055): Ping shan tang tu zhi 山堂圖志 ( = ZFSH 040)
ZFC 080 (Vol.056): Yuan jin chan yuan xiao zhi 圓津禪院小志
ZFC 081 (Vol.056): Hui yin si zhi 慧因寺志 ( = ZFSH 017)
ZFC 082 (Vol.057-058): Wu lin fan zhi 武林梵志 ( ~ ZFSH 007)
ZFC 083 (Vol.059): Long jing jian wen lu 龍 見聞錄 ( = ZFSH 020)
ZFC 084 (Vol.060): Ling yin si zhi 靈隱寺志 ( = ZFSH 021)
ZFC 085 (Vol.061): Yun lin si zhi 雲林寺志 ( = ZFSH 022)
ZFC 086 (Vol.062): Yun lin si xu zhi 雲林寺續志 ( = ZFSH 023)
ZFC 087 (Vol.063-066): Jing ci si zhi 凈慈寺志 ( ~ ZFSH 016)
ZFC 088 (Vol.067): Shang tian zhu shan zhi
山志 ( ~ ZFSH 024)
ZFC 089 (Vol.068): Fa jing si zhi 法凈寺志
ZFC 090 (Vol.068): Lin ping an yin si zhi 臨 安隱寺志
ZFC 091 (Vol.068): Chong fu si zhi 崇福寺志 ( = ZFSH 030)
ZFC 092 (Vol.068): Xu chong fu si zhi 續崇福寺志 ( = ZFSH 031)
ZFC 093 (Vol.069): Xi xi fan yin zhi 西 梵隱志 ( = ZFSH 029)
ZFC 094 (Vol.069): Xi xi qiu xue an zhi 西 秋雪庵志
ZFC 095 (Vol.070): Lian ju an zhi 蓮居庵志
ZFC 096 (Vol.070): Xiao ci an ji 孝慈庵集
ZFC 097 (Vol.070): Bian li yuan zhi 辯利院志 ( = ZFSH 092)
ZFC 098 (Vol.071): Da zhao qing lu si zhi 大昭慶律寺志 ( = ZFSH 015)
ZFC 099 (Vol.071): Zhao xian si lue ji 招賢寺略記
ZFC 100 (Vol.072): Hu pao fo zu cang dian zhi 虎跑佛祖藏殿志
ZFC 101 (Vol.072): Sheng guo si zhi 聖果寺志 ( = ZFSH 018)
ZFC 102 (Vol.073): Long xing xiang fu jie tan si zhi 龍 祥符戒壇寺志 ( = ZFSH 028)
ZFC 103 (Vol.074): Yun ju sheng shui si zhi 雲居聖水寺志 ( = ZFSH 025)
ZFC 104 (Vol.074): Sheng yin jie dai si zhi 聖因接待寺志 ( ~ ZFSH 088)
ZFC 105 (Vol.075): Yun qi zhi 雲栖志
ZFC 106 (Vol.076): Yun qi ji shi 雲栖紀 ( = ZFSH 027)
ZFC 107 (Vol.076): Guang shou hui yun si zhi 廣壽慧雲寺志
ZFC 108 (Vol.077): Li an si zhi 理安寺志 ( = ZFSH 019)
ZFC 109 (Vol.078): Jing shan ji 山集
ZFC 110 (Vol.078): Yun he xian da qing si zhi 雲和縣大慶寺志
ZFC 111 (Vol.078): Cheng shan cheng xin si zhi 偁山偁心寺志
ZFC 112 (Vol.079): Jin su si zhi 金粟寺志
74 Chung-Hwa Buddhist Journal Volume 25 (2012)
ZFC 113 (Vol.079): Yun men zhi lue 雲門志略
ZFC 114 (Vol.080): Yun men xian sheng si zhi 雲門顯聖寺志
ZFC 115 (Vol.081): Tian tai shan fang wai zhi 台山方外志 ( ~ ZFSH 089)
ZFC 116 (Vol.082): Pu tuo luo jia xin zhi
洛迦新志 ( = ZFSH 009)
ZFC 117 (Vol.083): Bao guo si zhi 國寺志
ZFC 118 (Vol.083): Lu shan si zhi 山寺志
ZFC 119 (Vol.083): Wu lei si zhi 磊寺志
ZFC 120 (Vol.083): Xian jue si zhi lue 覺寺志略
ZFC 121 (Vol.083): Chan yue si zhi 禪悅寺志
ZFC 122 (Vol.084): San mao pu an si zhi 茅 安寺志
ZFC 123 (Vol.084-085): Tian tong si zhi 童寺志 ( = ZFSH 012)
ZFC 124 (Vol.086): Tian tong si xu zhi 童寺續志
ZFC 125 (Vol.087-088): Xue dou si zhi 雪竇寺志
ZFC 126 (Vol.088): Xue dou si zhi lue 雪竇寺志略 ( ~ ZFSH 091)
ZFC 127 (Vol.088): Xue dou xiao zhi 雪竇小志
ZFC 128 (Vol.089-090): A yu wang shan si zhi 育王山寺志 ( ~ ZFSH 010 #g011)
ZFC 129 (Vol.091): Qi ta si zhi 七塔寺志 ( ~ ZFSH 013)
ZFC 130 (Vol.091): Yong shan he bai que si zhi 甬山和
寺志
ZFC 131 (Vol.091): Yue lin si zhi 岳林寺志 ( ~ ZFSH 014)
ZFC 132 (Vol.092): Jiang xin zhi 江心志
ZFC 133 (Vol.093): Xian yan si zhi 仙岩寺志
ZFC 134 (Vol.094): Xian yan shan zhi 仙岩山志
ZFC 135 (Vol.095): Xi tian mu zu shan zhi 西 目祖山志 ( ~ ZFSH 033)
ZFC 136 (Vol.096): Dong tian mu zhao ming chan si zhi 東 目昭明禪寺志
ZFC 137 (Vol.096): Bei tian mu ling feng si zhi
目靈 寺志
ZFC 138 (Vol.097-098): Gu shan zhi 鼓山志 ( ~ ZFSH 053)
ZFC 139 (Vol.099): Xu xiu gu shan zhi gao 續修鼓山志稿
ZFC 140 (Vol.099): He shan ji le si zhi 鶴山極樂寺志
ZFC 141 (Vol.100): Xi chan chang qing si zhi 西禪長慶寺志
ZFC 142 (Vol.100): Xi chan xiao ji 西禪小記
ZFC 143 (Vol.100): Nan shan lue ji 南山略紀
ZFC 144 (Vol.101): An xi qing shui yan zhi 安 清水岩志
ZFC 145 (Vol.102): Huang bo shan si zhi 黃檗山寺志 ( ~ ZFSH 086)
ZFC 146 (Vol.103): Xue feng zhi 雪 志 ( = ZFSH 061)
ZFC 147 (Vol.103): Jiu feng zhi 九 志
ZFC 148 (Vol.104): Ling shi si zhi 靈石寺志
ZFC 149 (Vol.104): Long hua si zhi 龍華寺志
ZFC 150 (Vol.104): Sha jing long quan si zhi 沙 龍泉寺志
ZFC 151 (Vol.105): Xia men nan pu tuo si zhi 廈門南
寺志 ( = ZFSH 063)
ZFC 152 (Vol.105): Zhi ti si zhi 支提寺志
ZFC 153 (Vol.106): Wen ling kai yuan si zhi 陵開元寺志 ( = ZFSH 062)
ZFC 154 (Vol.106): Pu tian guang hua si zhi 莆田廣 寺志
ZFC 155 (Vol.106): Ling guang bei chan shi ji he ke 靈
禪 跡合刻
Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 75
ZFC 156 (Vol.107~108): Dan xia shan zhi 丹霞山志
ZFC 157 (Vol.108~109): Yu xia shan zhi 禺峽山志
ZFC 158 (Vol.110): Ding hu shan qing yun si zhi 鼎湖山慶雲寺志 ( = ZFSH 051)
ZFC 159 (Vol.111~112): Cao xi tong zhi
志 ( ~ ZFSH 058)
ZFC 160 (Vol.113): Guang xiao si zhi 孝寺志 ( ~ ZFSH 085)
ZFC 161 (Vol.113): Qi xia si zhi 栖霞寺志
ZFC 162 (Vol.114): Xiang shan zhi 湘山志
ZFC 163 (Vol.115~116): Ji zu shan zhi 雞足山志 ( ~ ZFSH 084)
ZFC 164 (Vol.117): Ji zu shan zhi bu 雞足山志補
ZFC 165 (Vol.117): E mei shan zhi 峨嵋山志 ( ~ ZFSH 049)
ZFC 166 (Vol.118): Jin yun shan zhi 縉雲山志
ZFC 167 (Vol.118): Hua yan bei zhi 華岩備志
ZFC 168 (Vol.118): Hua yan si xu zhi 華岩寺續志
ZFC 169 (Vol.118): Shi lin ji jing 石林即
ZFC 170 (Vol.119): Hua yin shan zhi 華銀山志
ZFC 171 (Vol.120): Chong xiu zhao jue si zhi 重修昭覺寺志 ( = ZFSH 087)
ZFC 172 (Vol.121): Guang ji si xin zhi 廣濟寺新志 ( = ZFSH 048)
ZFC 173 (Vol.121): Xian shou shan zhi 賢首山志
ZFC 174 (Vol.121): Pu du si ling qiu zhi 渡寺靈湫志
ZFC 175 (Vol.121): Da xing shan si ji lue 大 善寺紀略
ZFC 176 (Vol.121): Qu jiang ci en si jin xi zhuang kuang ji 曲江慈恩寺今昔狀況記
ZFC 177 (Vol.122): Xin ban e shan tu zhi 新版峨山圖志 ( ~ ZFSH 050)
ZFC 178 (Vol.123): Chong xiu ma ji shan zhi 重修馬跡山志
ZFC 179 (Vol.124): Lu quan si zhi 鹿泉寺志 ( = ZFSH 044)
ZFC 180 (Vol.124): Huang mei lao si zhong shan zhi 黃梅老寺中山志
ZFC 181 (Vol.124): Zhu ming si chong xiu ji 珠明寺重修記
ZFC 182 (Vol.124): Cui shan si zhi 翠山寺志 ( = ZFSH 093)
ZFC 183 (Vol.124): Qing hua guang li chan si zhi 清 廣利禪寺志
ZFC 184 (Vol.125): Bao yan si zhi 寶嚴寺志
ZFC 185 (Vol.125): Liu ting an zhi 柳 庵志
ZFC 186 (Vol.126): Hu pao quan ding hui si zhi 虎跑泉定慧寺志 ( ~ ZFSH 026)
ZFC 187 (Vol.126): Ji shi ta yuan zhi 濟師塔院志
ZFC 188 (Vol.127): Hu yin chan yuan ji shi 湖隱禪院記
ZFC 189 (Vol.127): Chang shui ta yuan ji 長水塔院紀
ZFC 190 (Vol.127): Bao qing si zhi 慶寺志
ZFC 191 (Vol.127): Ming en si zhi 明恩寺志
ZFC 192 (Vol.128): Shou feng xian jue si zhi lue 壽
覺寺志略
ZFC 193 (Vol.128): Jin e si zhi 金峨寺志
ZFC 194 (Vol.129): You xi bie zhi 幽
志 ( = ZFSH 090)
ZFC 195 (Vol.130): Shang hai ming xin si zhi 海明心寺志
ZFC 196 (Vol.130): Ming xin si zhi 明心寺志
ZFC 197 (Vol.130): Long hua zhi 龍華志
76 Chung-Hwa Buddhist Journal Volume 25 (2012)
ZFSH: Zhongguo Fosi Shizhi Huikan 中國佛寺史志彙刊
ZFSH 001 (Part1 Vol.01): Luo yang qie lan ji 洛陽伽藍記
ZFSH 002 (Part1 Vol.01): Luo yang qie lan ji ji zheng 洛陽伽藍記集證
ZFSH 003 (Part1 Vol.01): Luo yang qie lan ji gou chen 洛陽伽藍記鉤沉 ( = ZFC 010)
ZF SH 004 (Part1 Vol.01): Luo yang qie lan ji he jiao ben 洛陽伽藍記合校本 ( = ZFC 009)
ZFSH 005 (Part1 Vol.02): Nan chao fo si zhi 南朝佛寺志 ( = ZFC 038)
ZFSH 006 (Part1 Vol.03-06): Jin ling fan cha zhi 金陵梵剎志 ( = ZFC 034)
ZFSH 007 (Part1 Vol.07-08): Wu lin fan zhi 武林梵志 ( ~ ZFC 082)
ZFSH 008 (Part1 Vol.09): Chong xiu pu tuo shan zhi 重修
山志
ZFSH 009 (Part1 Vol.10): Pu tuo luo jia xin zhi
洛迦新志 ( = ZFC 116)
ZFSH 010 (Part1 Vol.11): Ming zhou a yu wang shan zhi 明
育王山志 ( ~ ZFC 128)
ZFSH 011 (Part1 Vol.12): Ming zhou a yu wang shan xu zhi 明
育王山續志 ( ~ ZFC 128)
ZFSH 012 (Part1 Vol.13-14): Tian tong si zhi 童寺志 ( = ZFC 123)
ZFSH 013 (Part1 Vol.15): Qi ta si zhi 七塔寺志 ( ~ ZFC 129)
ZFSH 014 (Part1 Vol.15): Ming zhou yue lin si zhi 明 岳林寺志 ( ~ ZFC 131)
ZFSH 015 (Part1 Vol.16): Da zhao qing lu si zhi 大昭慶律寺志 ( = ZFC 098)
ZFSH 016 (Part1 Vol.17-19): Jing ci si zhi 淨慈寺志 ( ~ ZFC 087)
ZFSH 017 (Part1 Vol.20): Yu cen shan hui yin gao li hua yan jiao si zhi 玉岑山慧因高麗華嚴教
寺志 ( = ZFC 081)
Z FSH 018 (Part1 Vol.20): Feng huang shan sheng guo si zhi 凰山聖果寺志 ( = ZFC 101)
ZFSH 019 (Part1 Vol.21): Wu lin li an si zhi 武林理安寺志 ( = ZFC 108)
ZFSH 020 (Part1 Vol.12): Long jing jian wen lu 龍 見聞錄 ( = ZFC 083)
ZFSH 021 (Part1 Vol.23): Wu lin ling yin si zhi 武林靈隱寺志 ( = ZFC 084)
ZFSH 022 (Part1 Vol.24): Zeng xiu yun lin si zhi 增修雲林寺志 ( = ZFC 085)
ZFSH 023 (Part1 Vol.25): Yun lin si xu zhi 雲林寺續志 ( = ZFC 086)
Z FSH 024 (Part1 Vol.26): Hang zhou shang tian zhu jiang si zhi 杭
講寺志~ ZFC 088)
ZFSH 025 (Part1 Vol.27): Yun ju sheng shui si zhi 雲居聖水寺志 ( = ZFC 103)
ZFSH 026 (Part1 Vol.28): Hu pao ding hui si zhi 虎跑定慧寺志 ( ~ ZFC 186)
ZFSH 027 (Part1 Vol.28): Yun qi ji shi 雲棲紀 ( = ZFC 106)
Z FSH 028 (Part1 Vol.29): Long xing xiang fu jie tan si zhi 龍 祥符戒壇寺志 ( = ZFC 102)
ZFSH 029 (Part1 Vol.30): Xi xi fan yin zhi 西谿梵隱志 ( = ZFC 093)
ZFSH 030 (Part1 Vol.30): Chong fu si zhi 崇福寺志 ( = ZFC 091)
ZFSH 031 (Part1 Vol.30): Xu chong fu si zhi 續崇福寺志 ( = ZFC 092)
ZFSH 032 (Part1 Vol.31-32): Jing shan zhi 山志
ZFSH 033 (Part1 Vol.33): Xi tian mu zu shan zhi 西 目祖山志 ( ~ ZFC 135)
ZFSH 034 (Part1 Vol.34): She shan zhi 攝山志 ( ~ ZFC 042)
ZFSH 035 (Part1 Vol.35): Wu jin tian ning si zhi 武進 寧寺志 ( ~ ZFC 061)
ZFSH 036 (Part1 Vol.35): Po shan xing fu si zhi 破山 福寺志 ( ~ ZFC 047)
ZFSH 037 (Part1 Vol.36-37): Jin shan long you chan si zhi lue 金山龍游禪寺志略 ( = ZFC 072)
ZFSH 038 (Part1 Vol.38-39): Jin shan zhi 金山志 ( = ZFC 070)
ZFSH 039 (Part1 Vol.39): Xu jin shan zhi 續金山志 ( ~ ZFC 071)
ZFSH 040 (Part1 Vol.40): Ping shan tang tu zhi 山堂圖志 ( = ZFC 079)
ZFSH 041 (Part1 Vol.41): Bao hua shan zhi 寶華山志 ( = ZFC 077)
Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 77
ZFSH 042 (Part1 Vol.42): Deng wei shan sheng en si zhi 鄧尉山聖恩寺志 ( = ZFC 060)
ZFSH 043 (Part1 Vol.43): Han shan si zhi 寒山寺志 ( = ZFC 052)
ZFSH 044 (Part1 Vol.43): Lu quan si zhi 鹿泉寺志 ( = ZFC 179)
ZFSH 045 (Part1 Vol.43): He lin si zhi (jing kou san shan quan zhi) 鶴林寺志( 口 山 志)
ZFSH 046 (Part1 Vol.43): He lin si zhi (shi ming xian ben ) 鶴林寺志(釋明賢本) ( ~ ZFC 076)
ZFSH 047 (Part1 Vol.44): Tan zhe shan xiu yun si zhi 潭柘山岫雲寺志 ( ~ ZFC 006)
ZFSH 048 (Part1 Vol.44): Hong ci guang ji si xin zhi 弘慈廣濟寺新志 ( = ZFC 172)
ZFSH 049 (Part1 Vol.45): E mei shan zhi 峨眉山志 ( ~ ZFC 165)
ZFSH 050 (Part1 Vol.46): Xin ban e shan tu zhi 新版峨山圖志 ( ~ ZFC 177)
ZFSH 051 (Part1 Vol.47-48): Ding hu shan zhi 鼎湖山志 ( = ZFC 158)
ZFSH 052 (Part1 Vol.48): Hua feng shan zhi 華 山志
ZFSH 053 (Part1 Vol.49-50): Gu shan zhi 鼓山志 ( ~ ZFC 138)
ZFSH 054 (Part2 Vol.01): Luo yang qie lan ji jiao zhu 洛陽伽藍記校注
ZFSH 055 (Part2 Vol.02): Chong kan luo yang qie lan ji 重刊洛陽伽藍記
ZFSH 056 (Part2 Vol.02): Nan chao si kao 南朝寺考 ( ~ ZFC 037)
ZFSH 057 (Part2 Vol.03): Jiang nan fan cha zhi 江南梵剎志
ZFSH 058 (Part2 Vol.04-05): Chong xiu cao xi tong zhi 重修
志 ( ~ ZFC 159)
ZFSH 059 (Part2 Vol.06): Yun men shan zhi 雲門山志
ZFSH 060 (Part2 Vol.06): Da yu shan zhi 大 山志
ZFSH 061 (Part2 Vol.07): Xue feng zhi 雪 志 ( = ZFC 146)
ZFSH 062 (Part2 Vol.08): Quan zhou kai yuan si zhi 泉 開元寺志 ( = ZFC 153)
ZFSH 063 (Part2 Vol.08): Xia men nan pu tuo si zhi 廈門南
寺志 ( = ZFC 151)
ZFSH 064 (Part2 Vol.09): Tian tai sheng ji lu 台勝蹟錄
ZFSH 065 (Part2 Vol.10): Yan shan zhi 山志
ZFSH 066 (Part2 Vol.11): Yao feng shan zhi 堯 山志 ( ~ ZFC 056)
ZFSH 067 (Part2 Vol.12): Ling gu chan lin zhi 靈谷禪林志 ( ~ ZFC 040)
ZFSH 068 (Part2 Vol.13): Jin ling da bao en si ta zhi 金陵大報恩寺塔志 ( = ZFC 036)
ZFSH 069 (Part2 Vol.13): Niu shou shan zhi 牛首山志 ( ~ ZFC 041)
ZFSH 070 (Part2 Vol.13): Xian hua yan zhi 獻花巖志 ( ~ ZFC 039)
ZFSH 071 (Part2 Vol.14): Qi xia shan zhi 棲霞山志
ZFSH 072 (Part2 Vol.14): Ling yan ji lue 靈巖記略 ( ~ ZFC 063)
ZFSH 073 (Part2 Vol.14): Ling yan zhi lue 靈巖志略 ( = ZFC 064)
ZFSH 074 (Part2 Vol.15): Yun ju shan zhi 雲居山志 ( ~ ZFC 033)
ZFSH 075 (Part2 Vol.16-20): Lu shan zhi 盧山志
ZFSH 076 (Part2 Vol.21): Yang shan sheng 仰山乘
ZFSH 077 (Part2 Vol.22): Jiu hua shan zhi 九華山志 ( ~ ZFC 024)
ZFSH 078 (Part2 Vol.23~24): Song shan shao lin si ji zhi 嵩山少林寺輯志
ZFSH 079 (Part2 Vol.25): Ji fu fan cha zhi 畿輔梵剎志
ZFSH 080 (Part2 Vol.26-28): Qin ding pan shan zhi 欽定盤山志 ( ~ ZFC 007)
ZFSH 081 (Part2 Vol.29): Qing liang shan zhi 清涼山志 ( ~ ZFC 015)
ZFSH 082 (Part2 Vol.29): Yun gang shi ku si zhi 雲岡石窟寺志
ZFSH 083 (Part2 Vol.30): E mei shan zhi bu 峨眉山志補
ZFSH 084 (Part3 Vol.01-02): Ji zu shan zhi 雞足山志 ( ~ ZFC 163)
78 Chung-Hwa Buddhist Journal Volume 25 (2012)
ZFSH 085 (Part3 Vol.03): Guang xiao si zhi 孝寺志 ( ~ ZFC 160)
ZFSH 086 (Part3 Vol.04): Huang bo shan si zhi 黃檗山寺志 ( ~ ZFC 145)
ZFSH 087 (Part3 Vol.05-06): Chong xiu zhao jue si zhi 重修昭覺寺志 ( = ZFC 171)
ZFSH 088 (Part3 Vol.07): Sheng yin jie dai si zhi 聖因接待寺志 ( ~ ZFC 104)
ZFSH 089 (Part3 Vol.08~10): Tian tai shan fang wai zhi 台山方外志 ( ~ ZFC 115)
ZFSH 090 (Part3 Vol.11~12): You xi bie zhi 幽
志 ( = ZFC 194)
ZFSH 091 (Part3 Vol.13): Xue dou si zhi lue 雪竇寺志畧 ( ~ ZFC 126)
ZFSH 092 (Part3 Vol.13): Bian li yuan zhi 辯利院志 ( = ZFC 097)
ZFSH 093 (Part3 Vol.13): Cui shan si zhi 翠山寺志 ( = ZFC 182)
ZFSH 094 (Part3 Vol.14~15): Qing yuan zhi lue 青原志略 ( ~ ZFC 030)
ZFSH 095 (Part3 Vol.16): Hong shan bao tong si zhi 洪山寶 寺志 ( ~ ZFC 026)
ZFSH 096 (Part3 Vol.17~18): Yu quan si zhi 玉泉寺志 ( ~ ZFC 025)
ZFSH 097 (Part3 Vol.19~28): Wu du fa sheng 吳都法乘 ( ~ ZFC 045)
ZFSH 098 (Part3 Vol.29): Ren cao an zhi 忍草庵志 ( = ZFC 067)
ZFSH 099 (Part3 Vol.29): Shang fang shan zhi 方山志 ( ~ ZFC 004)
ZFSH 100 (Part3 Vol.30): Qing liang shan xin zhi 清涼山新志
Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 79
Appendix B
53
Both ZFC (197 gazetteers) and ZFSH (100 gazetteers) consist of facsimiles of
manuscripts, woodblock or movable-type prints. 78 gazetteers have a counterpart in the
other collection. This appendix describes the relationship between the gazetteers in these
78 pairs, in the hope that it will enable researchers to quickly decide which edition to
consult first and inform them of differences early on in the course of their study.
In the following 39 cases the gazetteers in ZFC and ZFSH are for all practical purposes
54
identical:
[ZFC 009 - ZFSH 004], [ZFC 010 - ZFSH 003], [ZFC 034 - ZFSH 006],
[ZFC 036 - ZFSH 068], [ZFC 038 - ZFSH 005], [ZFC 052 - ZFSH 043],
[ZFC 060 - ZFSH 042], [ZFC 064 - ZFSH 073], [ZFC 067 - ZFSH 098],
[ZFC 070 - ZFSH 038], [ZFC 072 - ZFSH 037], [ZFC 077 - ZFSH 041],
[ZFC 079 - ZFSH 040], [ZFC 081 - ZFSH 017], [ZFC 083 - ZFSH 020],
[ZFC 084 - ZFSH 021], [ZFC 085 - ZFSH 022], [ZFC 086 - ZFSH 023],
[ZFC 091 - ZFSH 030], [ZFC 092 - ZFSH 031], [ZFC 093 - ZFSH 029],
[ZFC 097 - ZFSH 092], [ZFC 098 - ZFSH 015], [ZFC 101 - ZFSH 018],
[ZFC 102 - ZFSH 028], [ZFC 103 - ZFSH 025], [ZFC 106 - ZFSH 027],
[ZFC 108 - ZFSH 019], [ZFC 116 - ZFSH 009], [ZFC 123 - ZFSH 012],
[ZFC 146 - ZFSH 061], [ZFC 151 - ZFSH 063], [ZFC 153 - ZFSH 062],
[ZFC 158 - ZFSH 051], [ZFC 171 - ZFSH 087], [ZFC 172 - ZFSH 048],
[ZFC 179 - ZFSH 044], [ZFC 182 - ZFSH 093], [ZFC 194 - ZFSH 090].
Two pairs are two different gazetteers written independently on the same location.
1. [ZFC 004 - ZFSH 099] Both prints are titled Shangfangshan zhi 方山志.
The ZFSH 99 was printed by the famous Sanshan tang
善堂 publishers in
1892. Originally the work in five chapters with an introduction was compiled
by the monk Ziru 自如 (1706-1796). ZFC 4, on the other hand, is a copy of a
work printed in 1933. It was compiled in 1930 in ten chapters by the famous
and reclusive artist Pu Xinyu 溥心
(aka Puru 溥儒) (1896-1963), who
almost became the last emperor of China.
2. [ZFC 186 - ZFSH 026]. The Hupaoquan dinghuisi zhi 虎跑泉定慧寺志
(ZFC 186) and the Hupao dinghuisi zhi 虎跑定慧寺志 (ZFSH 26) are both
reproductions of manuscripts. ZFC 186 consists of an introduction followed
53
54
Much of the detailed comparison between the collections was carried out in spring 2009 by
Mrs. Lin Xiuli: her help is acknowledged and deeply appreciated.
In a few cases (e.g. ZFC 70/ZFSH 38) one of the facsimiles was taken from a reprint,
whereas the other was done from the original.
80 Chung-Hwa Buddhist Journal Volume 25 (2012)
by six chapters. The original is preserved in the Shanghai Library and was
composed by the monk Changren 常仁 (aka Anren 安忍). ZFSH 26 is a
manuscript by the monk Shengguang 聖
dated 1900. It is not a complete
gazetteer, but the draft for a later, probably never realized, edition. It is not
divided into chapters.
In ten gazetteer pairs, one of the two is a manuscript copy, usually a transcription from a
print, and the other is a printed edition. The text is often identical, allowing for minor
mistakes and omissions (usually in the manuscript). The date given is usually taken from
the preface. Where the same date is given for manuscript and print, the date in the
manuscript might simply be copying the date of the print: it is not to be confused with the
actual date of the transcription. Further research on the relationship between the two
editions is needed in almost every case. Here only the general results:
Panshan zhi 盤山志
ZFC 007 (Ms dated 1755)
ZFSH 080 (Siku quanshu 四庫
edition dated 1755)
Qingyuan zhi lue 青原志略
ZFC 030 (1669)
ZFSH 094 (Ms)
Yunjushan zhi 雲居山志
ZFC 033 (Ms dated 1727 ) ZFSH 074
(printed in Hongkong 1959)
Xianhuayan zhi
獻花岩(巖)志
ZFC 039 (Ms)
ZFSH 070 (dated 1603)
Niushoushan zhi 牛首山志
ZFC 041 (Ms)
ZFSH 069 (print dated 1579,
handwritten preface added 1639)
Poshan xingfusi zhi
破山(常熟) 福寺志
ZFC 047 (movable- type
print 1919)
ZFSH 036 (Ms dated 1643)
Yaofengshan zhi 堯 山志
ZFC 056 (Ms (Chapters 4- ZFSH 066 (print dated 1638)
55
6) dated 1943)
Lingyan ji lue 靈岩紀(記)略
ZFC 063 (early Qing)
ZFSH 072 (Ms, early Qing)
Wulin fan zhi 武林梵志
ZFC 082 (Ms dated 1864)
ZFSH 007
(Siku quanshu edition dated 1780)
Shengyin jiedaisi zhi
聖因接待寺志
ZFC 104 (Ms)
ZFSH 088 (print dated 1748)
In the Chinese textual universe, print copies are preferred over manuscripts. There are
good reasons for this: usually the print copy is better proofed and provides a more reliable
and readable text. When the woodblocks had been lost and no new print copies could be
55
ZFC 056 was done from a copy in which three missing chapters (ch.4-6) were supplied in
manuscript in 1943. Chapters 1-3 and the introduction are identical with ZFSH 066.
Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 81
ordered, a scholar might transcribe or excerpt a gazetteer, or hire someone to do so.
Transcription, however, almost always introduces errors. A typical example is a date in
the manuscript copy of the Wulin fan zhi 武林梵志 (ZFC 082, p.15), which is given as
宋紹
十
(1152 CE). The correct print version (ZFSH 007, p.7) has 宋紹
十
(1162 CE).
Generally, in the case of the ten gazetteer pairs above the print versions are to be
preferred, but there are exceptions. In the pair ZFC 047 and ZFSH 036, the ZFSH
manuscript precedes the print by almost 300 years and is more complete (ZFC lacks the
text on p.127-132 in ZFSH).
Five gazetteers that appear in both collections differ in chapter number or arrangement:
ZFC 025
ZFSH 096
Yuquansi zhi
玉泉寺志
While the text in ZFSH has 6 chapters and an introduction, the ZFC
edition has a seventh chapter (added later). Moreover, ZFSH lacks
three pages of the second chapter (ZFC, p.207-208, 252).
ZFC 037
ZFSH 056
Nanchaosi kao The ZFSH text was printed for inclusion in the (never finished)
南朝寺考
Puhui Canon. It contains two additional chapters (the 梁 寺志,
and the 寺塔記, in themselves small gazetteers). While the ZFC
edition of 1907 is divided in six juan-chapters, the ZFSH is arranged
according to temple sections.
ZFC 061
ZFSH 035
Wujin
Two different editions, each completed at around the same time. The
tianningsi zhi ZFSH version contains an addendum (pp. 383-340). The ZFC also
武進 寧寺志 lacks the introduction and the maps that are preserved in ZFSH,
pp.1-6.
ZFC 129
ZFSH 013
Qitasi zhi
七塔寺志
ZFC 135
ZFSH 033
Xi tianmu
The edition preserved in the ZFSH is about a third more voluminous
zushan zhi
than the ZFC: It has eight chapters, plus an introduction and two
西 目祖山志 addenda. Against this the ZFC edition consists of only six chapters.
The editions contain different maps.
The ZFSH edition contains an addendum (pp.235-242).
82 Chung-Hwa Buddhist Journal Volume 25 (2012)
For the following 22 gazetteer pairs, the editions contained in ZFC and ZFSH show
various minor differences, omissions and additions.
Tan zhe shan xiu yun si zhi
ZFC 006 潭柘山岫雲寺志
ZFSH 047 Tan zhe shan xiu yun si zhi
潭柘山岫雲寺志
The chapter 名勝古蹟 in ZFSH, pp.139-170 was
moved into the addendum (續刊) of the ZFC, pp.149180. ZFC lacks ZFSH, pp.181-188 (再集唐句十首).
ZFC 015 Qing liang shan zhi 清涼山志
ZFSH 081 Qing liang shan zhi 清涼山志
1.Responsiblility statement in ZFC is given as
釋鎮澄纂, in the ZFSH as 釋印 重修
(經查原著者為釋鎮澄) (s. Preface).
2. ZFC and ZFSH contain a different map.
ZFC 024 Jiu hua shan zhi 九華山志
ZFSH 077 Jiu hua shan zhi 九華山志
ZFC lacks ZFSH, pp.3-4 (Dizang Image).
Hong shan bao tong chan si zhi
1. ZFC, pp.139, 145 and 151 are illegible.
ZFC 026 洪山寶 禪寺志
2. ZFC Ch. 3 lacks ZFSH, pp.177-178.
ZFSH 095
Hong shan bao tong si zhi 洪山寶 寺志 3. ZFSH Ch. 3 lacks ZFC, pp.180-181.
ZFC 040 Ling gu chan lin zhi 靈谷禪林志
ZFSH 067 Ling gu chan lin zhi 靈谷禪林志
1. Responsibility statement in ZFC is 謝元福纂輯, in
ZFSH as 釋德鎧撰 (The author is indeed 釋德鎧,
see preface).
2. ZFC edtion printed in 緒十 (1887), ZFSH is a
reprint of the 緒十 (1886) edition.
3. ZFC lacks ZFSH, pp.3-4 (Preface by 青芝老人).
4. ZFC Ch.14, p.414 differs slightly from ZFSH Ch.14,
p.420.
ZFC 042 She shan zhi 攝山志
ZFSH 034 She shan zhi 攝山志
ZFC lacks ZFSH “Principles of Organization” (凡例),
pp.23-26.
ZFC 045 Wu du fa sheng 吳都法乘
ZFSH 097 Wu du fa sheng 吳都法乘
1. ZFSH Ch.6c lacks ZFC Ch.6c, pp.1020-1021.
2. ZFSH Ch.30 lacks ZFC Ch.30, p. 3769.
ZFC 071 Xu jin shan zhi 續金山志
ZFSH 039 Xu jin shan zhi 續金山志
1. ZFSH Ch.1 lacks ZFC Ch.1, pp.151-154.
2. ZFSH Ch. 2 lacks ZFC Ch. 2, pp.259-260.
3. ZFC Ch. 2 lacks ZFSH Ch. 2, pp.267-270.
ZFC 076 He lin si zhi 鶴林寺志
ZFSH 046 He lin si zhi 鶴林寺志
1. ZFC is a “reprint” dated 1909 done at Helin 鶴林
temple on orders of the monk Fudeng 福登: the
edition in the ZFSH is a Wanli era (1573-1619) print.
2. Lay-out and calligraphy are different, therefore the
woodblocks must have been re-cut. 3. ZFSH lacks
one of the prefaces in ZFC, pp.11-16. 4. ZFSH lacks
ZFC, pp.205-212. 5. ZFC lacks ZFSH, pp.201-204.
ZFC 087 Jing ci si zhi 凈慈寺志
ZFSH 016 Jing ci si zhi 淨慈寺志
ZFC lacks ZFSH “Principles of organization”
(凡例), pp.17-24.
Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 83
1. ZFC edition is dated 順治
(1646). The ZFSH
edition, a “re-carving” (重刻本), was done in 緒
十 (1897) at the prolific 嘉惠堂 in Hangzhou.
2. ZFC and ZFSH differ in lay-out.
3. The map of the temple (寺圖 ) in ZFC, pp.25-40
differs from that in ZFSH, pp.17-24. Probably the
ZFSH reflects the lay-out of the temple as it was
rebuilt after the destruction during the Taiping.
4. ZFSH lacks ZFC, pp.145-152.
Shang tian zhu shan zhi
山志
ZFC 088
Hang zhou shang tian zhu jiang si zhi
ZFSH 024
杭
講寺志
ZFC 115 Tian tai shan fang wai zhi
ZFSH 089 Tian tai shan fang wai zhi
1. ZFC reproduces a string-bound edition dated 1922,
printed by Jiyunxuan 集雲軒 in moveable types in
Shanghai, which in turn was done from the the first
edition dated
緒 十 (1894) printed at the
台山方外志
Zhenjue temple 真覺寺 in Folong 佛隴 near Mt.
台山方外志
Tiantai. The ZFSH edition is a “reprint” dated 緒
十 (1894) done from the woodblocks of the
Folong edition.
2. ZFSH lacks ZFC addendum, pp.665-670.
ZFC 126 Xue dou si zhi lue 雪竇寺志略
ZFSH 091 Xue dou si zhi lue 雪竇寺志畧
1. ZFSH lacks ZFC, pp.5-14 (山圖).
2. ZFC lacks ZFSH, pp.49-50. 3. ZFC lacks ZFSH,
pp.60-62. 4. ZFC lacks ZFSH, pp.85-86.
ZFC 128
A yu wang shan si zhi
ZFSH 010
ZFSH lacks ZFC, pp.53-58.
育王山寺志
ZFC 131 Yue lin si zhi 岳林寺志
ZFSH 014 Ming zhou yue lin si zhi 明 岳林寺志
ZFSH Ch. 6 incomplete. ZFSH lacks ZFC, p.174.
ZFC 138 Gu shan zhi 鼓山志
ZFSH 053 Gu shan zhi 鼓山志
ZFC lacks ZFSH, p.556.
ZFC 145 Huang bo shan si zhi 黃檗山寺志
ZFSH 086 Huang bo shan si zhi 黃檗山寺志
1. ZFSH lacks ZFC (preface), pp.1-10.
2. ZFC, p.319 differs from ZFSH, p.311.
3. ZFC lacks ZFSH, p.460.
志
ZFC 159 Cao xi tong zhi
ZFSH 058 Chong xiu cao xi tong zhi 重修
ZFC 160 Guang xiao si zhi
ZFSH 085 Guang xiao si zhi
孝寺志
孝寺志
志
ZFC lacks ZFSH, pp.405-412.
ZFSH lacks ZFC (maps of the temple 寺圖), pp.33-62.
ZFC 163 Ji zu shan zhi 雞足山志
ZFSH 084 Ji zu shan zhi 雞足山志
ZFSH lacks ZFC (preface), pp.24-25.
ZFC 165 E mei shan zhi 峨嵋山志
ZFSH 049 E mei shan zhi 峨眉山志
ZFSH lacks ZFC (overview map of Emei), pp.19-20.
ZFC 177 Xin ban e shan tu zhi 新版峨山圖志
ZFSH 050 Xin ban e shan tu zhi 新版峨山圖志
ZFC lacks ZFSH, pp.453-454. This is so far the only
gazetteer published in Chinese together with English
translation (by Dryden L. Phelps)
84 Chung-Hwa Buddhist Journal Volume 25 (2012)
Abbreviations
ZFC
Zhongguo Fosizhi Congkan 中國佛寺志叢刊. Hangzhou: Guangling shushe
廣陵 社. 2006 . Compiled by Zhang Zhi 張智 et. al., 130 vols.
ZFSH
Zhongguo Fosi Shizhi Huikan 中國佛寺史志彙刊. Taipei: Mingwen shuju
明文 局. 1980-1985. Compiled by Du Jiexiang 杜潔祥, 110 vols.
References
Bingenheimer, Marcus. 2009. Writing History of Buddhist Thought in the 20th century –
Yinshun (1906-2005) in the Context of Chinese Buddhist Historiography. Journal of
Global Buddhism 10:255-290.
Bol, Peter K. 2001. The Rise of Local History: History, Geography, and Culture in Southern
Song and Yuan Wuzhou. Harvard Journal of Asiatic Studies 61(1) :37-76.
Brook, Timothy. 1993. Praying for Power – Buddhism and the Formation of Gentry Society
in Late-Ming China. Cambridge (MA) and London: Council on East Asian Studies
Harvard University.
Brook, Timothy. 2002. Geographical Sources of Ming-Qing History. Ann Arbor: Univ. of
Michigan, Center for Chinese Studies. (Michigan monographs in Chinese Studies 58,
First edition:1988).
Cao, Ganghua 剛華. 2011. Mingdai Fojiao Difangzhi Yanjiu 明代佛教地方志研究.
Beijing: Renmin daxue 中國人民大學出版社.
Cao, Shibang 仕邦. 1999. Zhongguo Fojiao Shixue Shi – Dongjin zhi Wu dai 中國佛教史
學史─東晉至 代 (History of Chinese Buddhist Historiography – Eastern Jin to Wudai
Period). Taipei: Faguwenhua 法鼓文 .
Dow, Francis D.M. 1969. A Study of Chiang-su and Che-chiang Gazetteers of the Ming
Dynasty. Canberra: Australia National University.
Du, Jiexiang 杜潔祥. 1981. Zhongguo Fosizhi Gaishuo 中國佛寺志概說. Putishu 菩提樹
29 (6) (=No.342) :19-20.
Dudbrigde, Glenn. 2000. Lost Books of Mediaval China. London: The British Library.
Eberhard, Wolfram. 1964. Temple Building Activities in Medieval and Modern China.
Monumenta Serica 23:264-318.
Franke, Otto. 1938. Li Tschi 李 贄 – Ein Beitrag zur Geschichte der Chinesischen
Geisteskämpfe im 16. Jahrhundert. Berlin: De Gruyter. (Abhandlungen der Preussischen
Akademie der Wissenschaften Jahrg.1937. Phil-hist. Klasse Nr. 10).
Franke, Wolfgang. 1968. An Introduction to the Sources of Ming History. Kuala Lumpur:
University of Malaya Press.
Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 85
Giles, Lionel. 1914 (July). Tun Huang Lu: Notes on the District of Tun-Huang. Journal of the
Royal Asiatic Society of Great Britain and Ireland. 703-728. (cf. Hu, Suh 1915)
Giles, Lionel. 1915 (Jan.). The Tun Huang Lu Re-Translated. Journal of the Royal Asiatic
Society of Great Britain and Ireland. 41-47.
Goodrich, L. Carrington. 1976. A Dictionary of Ming Biography. New York and London:
Columbia University Press.
Gu, Hongyi 顧宏義. 2010. Songchao Fangzhi Kao 宋朝方志考. Shanghai: Shanghai guiji
海古籍出版社.
Hahn, Thomas H. 1997. Formalisierter Wilder Raum—Chinesische Berge und ihre
Beschreibungen (shanzhi 山 志 ). Unpublished PhD-thesis Heidelberg University.
(Accessed online January 2008: http:/archiv.ub.uni-heidelberg.de/volltextserver/
volltexte/2007 [archived 2007-04-16]).
Hargett, James M. 1996. Song Dynasty Local Gazetteers and Their Place in the History of
Difangzhi Writing. Harvard Journal of Asiatic Studies 56(2):405-442.
Hu, Suh (aka Hu, Shi). 1915 (Jan.). Notes on Dr. Lionel Giles' Article on ‘Tun Huang Lu’.
Journal of the Royal Asiatic Society of Great Britain and Ireland. 35-39.
Jin, Enhui 金恩輝; Hu, Shuzhao 胡述兆. 1996. Zhongguo Difangzhi Zongmu Tiyao 中國地
方志總目提要. Sino-American Publishers 漢美圖 . 3 vols.
Lan, Jifu 藍吉富, ed. 1984. Dazangjing Bubian 大藏經補編. Taipei: Huayu 華宇.
Naquin, Susan; Rawski, Evelyn S. 1987. Chinese Society in the Eighteenth Century. New
Haven: Yale University Press.
Qiu, Jiang 仇江. 2008. Qing chu Lingnan Fomen Shiliao Zhengli Yanjiu 清初嶺南佛門史料
整理研究. Unpublished conference paper. Conference- 沉淪 懺悔 救度:中國文
的懺悔 寫 國際學 研討會 2008.12.4-6 Place: Taibei (Academia Sinica 中央
研究院) and Jinshan (Dharma Drum Buddhist College 法鼓佛教學院). 1-31.
Qiu, Jiang 仇江; Li, Fubiao 李福标, eds. 2003. Danxia Shanzhi 丹霞山志 . Beijing:
Zhonghua shuju 中華 局.
Reiter, Florian. 1978. Der “Bericht über den Berg Lu” (Lu-shan chi) von Ch’en Shun-yü; ein
Historiographischer Beitrag aus der Sung Zeit zum Kulturraum des Lu Shan. PhD
dissertation, Munich,
Reiter, Florian. 1980. Bergmonographien als Geographische und Historische Quellen,
Dargestellt an Ch’en Shun-yüs “Bericht über den Berg Lu” (Lu-shan chi) aus dem 11.
Jahrhundert. Zeitschrift der Deutschen Morgenländischen Gesellschaft 130:397–407.
Robson, James. 2009. Power of Place: The Religious Landscape of the Southern Sacred Peak
(Nanyue 南 ) in Medieval China. MA: Harvard University Asia Center. (Harvard East
Asian Monographs).
Suwa, Gijun 諏訪義純. 1977. ‘Ryankyō jiki’ Shiryō kō 梁 寺記 資料考. Indogaku
Bukkyōgaku Kenkyū 印度学仏教学研究 51 (26-1): 91-96.
86 Chung-Hwa Buddhist Journal Volume 25 (2012)
Suwa, Gijun 諏訪義純. 1980. Nanchō Butsuji kō – Ryandai Kenritsu 南朝仏寺考-梁代建
立. Bukkyō no Rekishi to Bunka 仏教の歴史と文 :仏教史学会30周 記念論集.
157-179.
Wu, Jiang. 2004. Leaving for the Rising Sun - The Historical Background of Yinyuan
Longqi’s Migration to Japan in 1654. Asia Major (Third Series) 17(2): 89-120.
Wu, Jiang. 2006. Building a Dharma Transmission Monastery in Seventeenth-Century China:
The Case of Mount Huangbo. East Asian History 31:29-52.
Yü, Chün-fang. 1998. Ming Buddhism. The Cambridge History of China. 8(2):893-952.
Zhang, Dewei. 2010. A Fragile Revival - Buddhism under the Political Shadow, 1522-1620.
PhD Thesis, University of British Columbia, Vancouver.
Zhuang, Weifeng 莊威 , et al., eds. 1985. Zhongguo Difangzhi Lianhe Mulu 中國地方志
聯合目錄. Beijing: Zhonghua Shuju.
Chung-Hwa Buddhist Journal (2012, 25:129-148)
Taipei: Chung-Hwa Institute of Buddhist Studies
中華佛學學報第二十五期 頁 129-148 (民國一百零一年),臺北:中華佛學研究所
ISSN:1017-7132
Verb Semantics and Argument Realization in
Pre-modern Japanese: A Corpus Based Study
Kerri L Russell and Stephen Wright Horn
University of Oxford
Abstract
We are developing a corpus in order to investigate argument realization in detail for premodern Japanese, giving a comprehensive account of the basic grammar of each major
stage of the language and allowing for both synchronic and diachronic analyses. When
completed, the corpus will contain texts from the 8th century until the beginning of the
16th century. The results of the project will impact the description and understanding of
pre-modern Japanese and its changes through time, furthering our understanding and
interpretation of earlier texts. The project is also expected to have implications for general
linguistic theory, both with regard to frameworks for understanding verb semantics and
clause structure, and with regard to the application of syntactic theory to 'dead' languages.
This paper focuses on the initial stages of corpus building, including methods for
encoding orthography, morphology, and syntax.
Keywords:
Old Japanese, Argument Realization, Lexicon, TEI markup, Corpus-based Linguistics
130 Chung-Hwa Buddhist Journal Volume 25 (2012)
前現代日語的動詞語意與論元體現:
以語料庫為基礎的研究
Kerri L Russell and Stephen Wright Horn
牛津大學
摘要
藉由對語言的每一主要階段的基本文法之詳盡說明,並考慮共時與歷時分析, 們
發展一語料庫以詳細研究前現代日語之論元體現 當此完 後,此語料庫將包含從
八世紀到十六世紀初的的文本 此計畫的 果將影響前現代日語的描寫與了解,及
其隨著時間所造 的改變,增進 們對於早期文本的理解與解釋 此計畫也期望對
一般語言學理論有所影響,包含在了解動詞語意與 法結構的架構上,以及對於不
再通行的語言之語法理論的應用兩層面 此篇著重在語料庫建立的初始階段,包含
編碼拼字的方法,型態與語法
關鍵詞: 日語 論元體現
詞典 TEI標記
語料庫語言學
Verb Semantics and Argument Realization in Pre-modern Japanese
131
Introduction
This paper presents the tagging conventions used in the development of a corpus for a
pre-modern Japanese syntax project at the University of Oxford. The project is entitled
Verb Semantics and Argument Realization in Pre-modern Japanese: A Comprehensive
Study of the Basic Syntax of Pre-modern Japanese (abbreviated as ‘VSARPJ’) and is
funded by a grant of almost £1 million from the Arts and Humanities Research Council in
the UK. An important first phase of the project is the construction of an annotated and
encoded corpus of texts. While the corpus is initially constructed specifically for the
purpose of serving the VSARPJ project, we believe it will eventually become useful for
the investigation of many other aspects of the syntax of pre-modern Japanese.
The primary and immediate goal of the VSARPJ project is to investigate argument
realization in detail for pre-modern Japanese. Argument realization is a fundamental
aspect of the syntax of a language which concerns the ways in which verb meaning
determines the number of arguments (e.g., subjects, objects, goals, etc.) in a clause and
their morpho-syntactic and semantic properties. In essence, the project will contribute to a
comprehensive account of the basic syntax of each of the stages of the pre-modern
Japanese language, from the beginning of its recorded history in the 8th century until the
end of the 16th century, and of the changes in basic syntax that have taken place over
1
these stages.
The VSARPJ project has two parts: Synchronic and Diachronic. In the synchronic
part, we investigate for the main stages of pre-modern Japanese the argument realization
patterns of individual verbs and of verb classes. For each verb attested in the pre-modern
Japanese texts we are using for this investigation, we establish both the syntactic frames
in which it can occur and also its basic argument realization pattern. An important part of
this will be the determination of what counts as an argument, and to what extent a more
finely graded range of categories between argument and adjunct is needed. We will also
look at other grammatical phenomena than argument realization which may be explained
by verb semantics, for example, aspect, auxiliary selection, ellipsis, and case drop.
The diachronic part of the project will build on the results of the synchronic part. In
addition to charting changes affecting individual verbs, we will be able to establish an
inventory of changes through the history of Japanese in argument realization both for
individual verbs and for classes of verbs and thereby be able to investigate general
patterns of change, including possible development pathways for verb meanings and
argument realization.
1
More detail about the VSARPJ project, including the framework we use for analysis is
presented on our website: http://vsarpj.orinst.ox.ac.uk/project.html.
132 Chung-Hwa Buddhist Journal Volume 25 (2012)
Apart from the intrinsic value the results of the project will have to the description
and understanding of Japanese grammar and its history, the project may also be expected
to yield results of more general interest, as this will be the first detailed application of the
type of framework employed here to a language such as Japanese, which frequently drops
case markers, has extensive argument ellipsis (pro-drop), and has fairly free word order.
It will also be the first large-scale investigation of this kind to a ‘dead’ language, which
poses particular challenges to research into syntax.
The initial stage of the VSARPJ corpus consists of building a digital corpus of texts,
encoded with information about various linguistic properties. Once this stage is
completed, the next stage will involve using the corpus to conduct various types of
linguistic analysis. As we are currently in the initial stage of corpus construction, this
paper will focus on the encoding of the corpus, and in particular, on the oldest stage of
texts in the corpus, Old Japanese (OJ). In this paper we describe the contents of the
VSARPJ corpus (section 2), the initial stage of marking up texts (section 3), and XML
mark-up conventions (section 4).
The VSARPJ Corpus
The corpus will in the initial stage comprise a selection of texts from the three main
periods of pre-modern Japanese (Old, Early Middle, and Late Middle Japanese):
Old Japanese (‘OJ’, approximately 700-800)
Kojiki kayō
古事記歌謡
Nihon shoki kayō
日本書紀歌謡
Fudoki kayō
風土記歌謡
Bussokuseki-uta
仏足石歌
Man’yōshū
万葉集
Shoku nihongi kayō
続日本紀歌謡
Shoku nihongi senmyō
続日本紀宣命
Engishiki norito
延喜式祝詞
712
720
730s
after 753
after 759
797
697-791
(compiled) 927
Early Middle Japanese (‘EMJ’, 800-1200)
Kokin wakashū preface
古今和歌集仮名序
Ise monogatari
伊勢物語
Tosa nikki
土佐日記
Taketori monogatari
竹取物語
Kagerō nikki
蜻蛉日記
Ochikubo monogatari
落窪物語
Makura no sōshi
枕草子
914
early 10th century
935
mid 10th century
second half of 10th century
late 10th century
c. 1000
Verb Semantics and Argument Realization in Pre-modern Japanese
Genji monogatari
Sarashina nikki
Konjaku monogatari-shū
源氏物語
更級日記
今昔物語集
Late Middle Japanese (‘LMJ’, 1200-1600)
Esopo no fabulas
Feiqe monogatari
133
1001-1010
1059-1060
1120
1593
1593
The corpus includes all main extant texts from the OJ period. For EMJ, the corpus
focuses on texts from the period 900-1100 which are thought to a large extent to reflect
the (spoken) language of the time. For large texts from this period, e.g., Genji monogatari,
only extensive selections and not the entire texts will be included in the initial phase of
the corpus. From the LMJ period, where most of the textual material is written in
‘classical Japanese’ rather than in the contemporary language and is characterized by a
high degree of fossilization, we use two texts produced by the Jesuit missionaries at the
end of the 16th century, the Esopo no fabulas and the Feiqe monogatari, which both
reflect the contemporary language at the very end of the period, and also have the
additional advantage of being written in alphabetic writing. For all periods, we follow the
readings in the critical edition of Nihon koten bungaku taikei (NBKT), published by
2
Iwanami Shoten.
Initial Stage of Markup
The first stage of markup was completed in MS Word. This process involved romanization
of texts and the use of symbols to indicate prefixes, suffixes, compounds, etc.
Romanization of Texts
First, each text was romanized to present a phonemic transcription in accordance
with the phonology of the time the text is thought to have been written, and reflecting the
sound changes which had been completed by that time. For example, the word which is
often written by 恋, which in Modern Japanese (NJ) has the shape koi and which may be
glossed very roughly as ‘love’. In the historical kana spelling (歴史的仮名遣い) this
word is writtenこひ, regardless of the time from which the text dates. In a phonemic
3
transcription, however, this word has the shape /kwopwi/ (こ甲ひ乙) in OJ. As a result of
2
3
At this stage, construction of the OJ corpus is complete. The corpus consists of nearly 5,000
poems of around 90,000 words, 20,000 of which are verbs. We have not yet decided on how
much to include from other periods, so we are not yet certain of the size of the corpora we
will develop for later stages of pre-modern Japanese.
We use the Frellesvig & Whitman (2008) transcription system for OJ.
134 Chung-Hwa Buddhist Journal Volume 25 (2012)
sound changes which took place since OJ, the shape of this word has changed as shown in
(1) below with approximate dating, and the corpus uses those shapes in accordance with
the dates of the texts. Thus, in the Tosa nikki (from 935), this word is transcribed kopi, but
in the Genji monogatari (from just after 1000) it will be written kowi. This is a very basic
point, but one which is often ignored in the presentation of pre-modern Japanese texts.
(1)
OJ kwopwi > EMJ
800
kwopi>
950
kopi >
1000
kowi >
1100
koi
Further, in the process of romanizing texts, we preserved a three-way distinction found in
the texts: phonographic, logographic, and “not in text” for items which are not
orthographically represented in the original text. This distinction is shown in (2) from the
Man’yōshū (MYS 1:1) with phonographically written material in italicized text,
logographically written material in plain text, and items not orthographically represented
in the original text (“not in text”) written in underlined text.
(2) 篭
毛
與
美篭
母乳
布久思
kwo mo
yo
mi-kwo
moti
pukusi
basket ETOP EMPH HON-basket hold.INF shovel
美夫君志
持
此
岳
mi-bukusi
moti
ko no
woka
HON-shovel hold.INF this GEN hill
採須
毛
與
mo
yo
ETOP EMPH
尓 菜
ni na
DAT greens
兒
家
吉閑名
tuma-su
kwo
ipye
kikana
pick-RESP.ADN
child
home
ask.OPT
告紗根
nora-sane
tell-RESP.OPT
‘Girl with your basket, with your pretty basket, with your shovel, with your pretty
shovel, picking greens on this hillside, I want to ask your home. Please tell me!’
The interpretation of logographic writing relies on reading tradition and is in many
respects uncertain. This is sometimes reflected in the existence of significantly different
reading traditions of some texts. If a text or crucial parts of it are written logographically,
we can not, strictly speaking, be certain of which words, or inflected forms, are reflected
in the text. For example, in (2) above, we can not be certain that the verb written by 持
(in bold face) really is mot- ‘to hold’, nor that its inflected form really is the infinitive
moti, as it is read according to the reading tradition, and not the adnominal motu. Thus,
logographically written text is far less reliable than phonographically written text and can
be used as linguistic evidence only with great caution.
Verb Semantics and Argument Realization in Pre-modern Japanese
135
The items which are not orthographically represented in the text are also based
solely on reading tradition. The word 此 ‘this’ in (2) is interpreted as as “ko no” but the
genitive particle no is not represented by a character in the text. This issue will become
particularly important when investigating argument structure in contexts where a case
particle marking an argument is understood only from the reading tradition and not from
the written text itself. As there is no way to prove the existence of the case particle, such
examples are less reliable as evidence of case marking than those where a particle is
written phonographically or even logographically.
In the initial stage of markup we indicated this three way distinction by rendering
phonographically written material in lower case, logographically written material in upper
case, and items not recorded in the original script written in upper case with a comment
4
saying “not in text”.
Symbols Used in Markup
While romanizing the texts in this stage of marking up texts in MS Word, we added
information about certain types of words with the symbols =, -, +, and ~.
The symbol “=” was used to indicate a particle. For example, “ko no” in (2) above
was written as KO=NO to indicate that “no” is a genitive particle. Following the
discussion above, a comment was also attached to “no” to mark it as not having been
represented orthographically in the original text.
Next, “-” was used to indicate
1) inflecting forms following verbs and adjectives and
2) compound verbs.
The last word in (2), norasane, consists of the stem of the verb nor- ‘to tell’ and the
optative inflection of the respect auxiliary -(a)s-. This was marked in our word files as
NORA-sane at this stage, thus simultaneously indicating orthography and morphology.
The “+” symbol was used to indicate
1) nouns in compounds, including noun+noun and noun+verb combinations and
2) nominal and adjectival prefixes.
For example, mikwo in (2) above consists of the honorific prefix mi followed by kwo
‘child’. This was marked as mi+KWO.
Last, the symbol ~ was used for verbal prefixes and circumfixes. There are no
examples of this in (2), but take, for example, sanuru (MYS 14.3504) which consists of
4
In hindsight, from the point of view of converting Word files to XML format, it would have
been to our advantage had we indicated these three types of orthography using distinct styles
in MS Word.
136 Chung-Hwa Buddhist Journal Volume 25 (2012)
5
the prefix sa- and the adnominal form of the verb ne- ‘to sleep’. This word was marked
up as sa~nuru.
XML Markup Conventions
The next stage in corpus building involves XML markup according to the guidelines of
the Text Encoding Initiative (TEI). The inventory of TEI coding is a small set of tags
which are used to enclose portions of text; text enclosed by tags can further be
characterized by various attributes, such as type, subtype, function, inflection, etc. The
inventory of coding elements and conventions of the TEI are under constant development
and improvement; they may be viewed at http://www.tei-c.org/. A major consideration for
adopting TEI technology and guidelines for the corpus was that such standards ensure
that the corpus we design will be long lasting, non-idiosyncratic, and updateable along
with future changes in technology. We attempted to follow the TEI guidelines as closely
as possible, however, we had to add some attributes for items we felt important for
markup and which were not available in TEI. For example, we felt it important to indicate
the inflection for all forms which can inflect (e.g., verbs, adjectives, copulas, auxiliaries)
and created the ‘inflection’ attribute to allow us to do this. By indicating the inflection,
we can easily compare all forms in any given inflection. The inflected form of the
predicate also indicates clause types, so we can investigate main clauses or subordinate
clauses based on the inflection of the predicate.
Most of the OJ texts were marked up using MS Word, as described above. These
6
were then converted into XML format. Our mark-up policies consist of ways to link the
original and romanized version of a text (section 1), to preserve orthographic conventions
(section 2), to encode information about words, morphemes, and parts of speech (section
3), to identify lexemes and morphemes (section 4), and to encode syntactic features
(section 5). As an example, we also present a fully marked up poem (section 6).
1. Original and Transliterated Text
In order to reflect the crucial distinction between logographic and phonographic writing
and to represent information about how words and/or morphemes were written in the
7
original script, we have adopted the following policies. First, for OJ texts we preserve
the original script together with the phonemically transcribed text. Thus, reference can be
made to the original script. This is done by having the original script in an <ab>
(“anonymous block”) tag and assigned the @type attribute with the value ‘original’. We
5
6
7
The function of this prefix is unclear.
The scripts for converting our word files into XML were written by James Cummings.
By ‘original script’ we mean the script employed in the critical edition upon which a text is
based.
Verb Semantics and Argument Realization in Pre-modern Japanese
137
use “ojp” as the value for @xml:lang for texts written in Old Japanese and “ojp-Latn” for
the transliterated version of the OJ texts. The romanized version of the script follows in
its own <ab> tag with the @type attribute value ‘transliteration’. Line breaks ( <lb>) are
also linked in the original and transliterated version using @xml:id and @corresp
attributes in order to make it easy to see how a line of text was rendered in the original or
how a line of text should be read. The @xml:id value contains the poem and line number;
“MYS.1.1” means that the poem is from the Man’yōshū (MYS), Book 1, poem number 1,
and orig_1 defines this as the first line break in the poem. This is illustrated in (3) using
an excerpt from the poem presented in (2) above.
(3)<ab type="original" xml:lang="ojp">
篭毛與
<lb xml:id="MYS.1.1-orig_1" corresp="#MYS.1.1-trans_1"/>
美篭母乳
<!-- … -->
</ab>
<ab type="transliteration" xml:lang="ojp-Latn">
kwo mo yo
<lb xml:id="MYS.1.1-trans_1" corresp="#MYS.1.1-orig_1"/>
mikwo moti
<!-- … -->
</ab>
2. Encoding Orthography
To preserve the three-way writing distinction discussed above, we use the character tag
<c> with the @type attribute. The possible values for @type are “phon” for items written
phonographically, “logo” for those written logographically, and “noLogo” for items not
orthographically represented in the original text. This is shown in (4) below with (a)
presenting the original text, the phonemic transcription, and glosses, and (b) showing the
markup.
(4) a.
wa
ga
I
GEN
‘of my hut’
b.
屋戸
乃
yadwo no
hut
GEN
(MYS 8.1606)
<c type="logo"> wa </c>
<c type="noLogo"> ga </c>
<c type="logo"> yadwo </c>
<c type="phon"> no </c>
138 Chung-Hwa Buddhist Journal Volume 25 (2012)
3. Words, Morphemes, and Part of Speech
Words are enclosed in ‘word(-like)’ tags, <w>, and information about part of speech is
supplied by the @type attribute. The main word classes represented in this way are noun,
pronoun, adverb, verb, adjective, copula, adjectival noun, verbal noun and particle.
Complex words can consist of more than one word, forming a compound word. And
they can consist of one or more words followed or preceded by one or more morphemes. The
morpheme tag <m> is used for bound forms, and is then categorized by @type attributes with
8
9
the possible values of auxiliary, prefix, suffix, numeral, counter, and adjectival copula. The
grammatical system and terminology reflected in the coding is that of Frellesvig (2010).
Several of the parts of speech are further subcategorized, notably particles and
auxiliaries, which are given subtypes and functions. For example, ga is a word (<w>) of
the @type value “particle”, @subtype value "case" with the @function value “genitive”;
and -(i)ki is a morpheme (<m>) of the @type value “auxiliary” and with the @function
value “simple past”. A full, current list of the parts of speech, including subcategories,
which are distinguished throughout the corpus is available at the corpus website
(http://vsarpj.orinst.ox.ac.uk/corpus/).
Inflecting parts of speech, such as verbs, auxiliaries, extensions, copulas, and
adjectival copulas are supplied with information about their inflectional forms with the
@inflection attribute. For inflectional forms which are identical in shape, we do not specify
which inflecting form is shown even when the syntax allows us to chose one or the other.
For example, both the adnominal and conclusive form of the verb yuk- ‘to go’ is yuku; it is
impossible to tell which inflection this is just by the shape of the word. The verb in this case
is assigned the @inflection value “adnconc” and not “adnominal” or “conclusive”.
Similarly, for conjugation classes which do not have a distinction between
conclusive and infinitive, we mark those categories with the @inflection value “infconc”,
see (5). The reason for marking only morphologically distinct categories also at the level
of individual conjugation classes is that it seems likely that there is a correlation between
the inflected form of a clause predicate and the marking of its arguments, and that it
therefore is important to distinguish between forms which are positively identifiable by
their shape and on the other hand forms which on the basis of their shape may be assigned
to either of two syncretic categories.
(5) adnconc
infconc
8
9
yuku
ari
Auxiliaries are inflecting suffixes, corresponding largely to the jodōshi (助動詞) of traditional
Japanese grammar, e.g., negative -(a)zu or perfective -(i)te- and -(i)n-.
The adjectival copula is the inflectional morpheme which usually follows adjective stems,
with forms like conclusive -si, adnominal -ki, and infinitive -ku.
Verb Semantics and Argument Realization in Pre-modern Japanese
139
In (6) we give an example of markup of part of speech and inflection.
(6) a.
君
之
行 氣
長
奴
kimi
ga yuki
ke
naga-ku
nari-nu
my.lord GEN go
day
long
become-PERF
‘My lord, it has been a long time since you left’(MYS 2.85)
b. <w type="noun"> kimi </w>
<w type="particle" subtype="case" function="gen"> ga </w>
<w type="verb" inflection="infinitive"> yuki </w>
<w type="noun"> ke </w>
<w>
<w type="adjective"> naga </w>
<m type="adjcop" inflection="infinitive"> ku </m>
</w>
<w>
<w type="verb" inflection="stem"> nari </w>
<m type="auxiliary" inflection="conclusive" function="perf"> nu </m>
</w>
4. Lexeme and Morpheme Identification
Each distinct item (word or morpheme) in the corpus is assigned a unique ID number.
This has a number of advantages, in particular in making it possible to divorce searches in
the corpus from actual strings of text.
● Searches for inflecting words or morphemes in the texts will not be limited to the
actual inflected forms of an item. Thus, a search for the verb sin- ‘die’ will return
all the inflected forms of that verb. However, searches can also be modified to
give only a subset of forms, for example defined by specific inflected forms or
combination with specific auxiliaries.
● Searches across time for items which have changed shape as a result of sound
change will be straightforward. For example, as a result of sound change the verb
OJ kwopwi- has a number of different shapes through time, as outlined above (1),
and appears in texts from different periods in significantly different shapes
(kwopi-, kopi-, kowi-, koi-). With unique ID numbering, it is not necessary to
search for all of these shapes, but it is possible to search for all, or a specific set
of, occurrences of this verb through the corpus, regardless of the actual shape of
the verb at any particular stage.
● Searches are not contaminated by text strings which are identical to the intended
target of a search. For example, the verb ‘request, ask’ OJ kop- has a number of
forms which are segmentally identical with forms of ‘love’ from somewhere in
the first half of the EMJ period (for example infinitive kopi, kowi, koi). With
unique ID numbering, forms of one verb will not show up in searches for the
other verb.
140 Chung-Hwa Buddhist Journal Volume 25 (2012)
In our current practice, unique ID numbers consist of the letter ‘L’ and a six-digit number.
They are assigned to a word (<w>) or morpheme (<m>) as an @ana attribute. For
example, the form nari-nu (cf. (6) above) is marked as shown in (7).
(7) <w>
<w ana="#L031317"> nari </w>
<m ana="#L000018"> nu </m>
</w>
The unique IDs are stored in a separate lexicon file, which is linked to the corpus and which
contains basic information about each word or morpheme, including variant shapes of a
form over time, its part of speech, conjugation class (where relevant), and a simple gloss.
The information currently contained within a simple lexicon entry is as shown in (8).
(8) Shapes:From the 8th century: kwopwi- > From 800: kwopi- >
From before 950: kopi- > From c. 950-1000: kowi- >
From c. 1100: koiPart of speech: verb
Conjugation class: upper bigrade (上二段)
Gloss: love
This information in (8) was extracted from an entry presented below in (9).
(9) <superEntry xml:id="L030731">
<entry>
<form type="stem">
<orth stage="I">kwopwi-</orth>
<orth stage="II">kwopi-</orth>
<orth stage="III">kopi-</orth>
<orth stage="V">kowi-</orth>
<orth stage="VII">koi-</orth>
<gramGrp>
<pos>verb</pos>
<iType type="UB"/>
</gramGrp>
</form>
<def>love</def>
</entry>
<entry>
<form type="noun">
<orth stage="I">kwopwi</orth>
<orth stage="II">kwopi</orth>
<orth stage="III">kopi</orth>
<orth stage="V">kowi</orth>
<orth stage="VII">koi</orth>
Verb Semantics and Argument Realization in Pre-modern Japanese
141
<gramGrp>
<pos>noun</pos>
</gramGrp>
</form>
</entry>
</superEntry>
Here, the <superEntry> element defines the @xml:id for the lexical item. The <entry/>
element is used to indicate one or more related lexical entries. The <form> element can
be further specified with the @type attribute, which we currently only use for verbs to
indicate their “stem” and the derived “noun” form of a verb. Next, <orth> (orthography)
presents the shape of the form (e.g., kwopwi-) and also has the @stage attribute
corresponding to stages of phonological development in the pre-modern period.
Grammatical information is presented in <gramGrp>. This includes part of speech <pos>
and conjugation class <iType>. The example in (9) above is defined as @type="UB"
which stands for “upper bigrade”. Finally, the meaning is presented in the <def> tag;
where more than one meaning is possible, the element <sense> is also used.
As the research of the VSARPJ project progresses, additional grammatical
information will also be entered into the lexicon. This will include information about the
possible argument realization patterns of a verb. In this way, the lexicon will also be an
important tool for organizing the results of our research as they appear.
Finally, although outside the scope of the VSARPJ project, it should be mentioned
that a lexicon linked to a text corpus by means of unique ID numbering has enormous
potential for enriching the field of Japanese lexicography.
5. Syntax
Syntactic information is encoded by means of a minimal inventory of constituents,
namely those of clause, <cl>, and phrase, <phr>. The @type attribute can be used to
identify the clause or phrase as being an argument (predicate selected) or adjunct (e.g.,
free adverbials).
Clauses can be embedded within other clauses as subordinate clauses. Adnominal, or
relative, clauses are embedded within phrases. Nominalized clauses are first wrapped as
clauses to show the clausal structure and then wrapped as phrases to put them on the same
level as noun phrases. Predicate-selected clauses (including but not limited to
complement clauses) are categorized by the @type attribute as arguments ("arg").
Phrases can be headed by adverbs and nominalized clauses, in addition to nouns.
Phrases are categorized by the @type attribute as arguments if they are clearly predicate
selected, and as adjuncts ("djunct") if they are clearly free adverbials or sentence adjuncts.
At this stage of markup, a large proportion of phrases are marked neither as arguments
nor as adjuncts, because their status is not entirely clear. Resolving the status of such
142 Chung-Hwa Buddhist Journal Volume 25 (2012)
phrases, and other important issues such as the determination of whether categories may
be needed which are intermediary between the poles of argument and adjunct, or whether
argumenthood is a scalar property, are parts of the substantive research of the VSARPJ
project. The corpus will eventually reflect the results of this research.
10
The structure of both clauses and phrases is generally flat. The words which can
form predicates of clauses are verbs, adjectives, or copulas. Within a clause, the word or
words which form its predicate are identifiable by not being enclosed in phrase tags.
Topics and right dislocated elements are located outside of the clauses they relate to. (10)
exemplifies syntactic markup: (10a) shows a complex clause from the poem in (6a); (10b)
shows the topic pito pa; (10c) shows the relative clause a ga kwopuru modifying kimi;
and (10d) shows the right dislocated topic ware pa.
(10)
a. Complex clause
<cl>
<cl>
<phr type="arg"> kimi ga </phr>
yuki
</cl>
<cl type="arg">
<phr type="arg"> ke </phr>
nagaku
</cl>
narinu
</cl>
b. Topic
人
者
待跡
不来家留
pito
pa
matedo
ko-zu-kyeru
person TOP wait.CONC
come-NEG.INF-MPAST.ADN
Even though I wait for you, you do not come’(MYS 4.589)
<phr> pito pa </phr>
<cl>
<cl> matedo </cl>
kozukyeru
</cl>
c. Relative clause
吾
戀
流君
kwopuru kimi
a ga
I GEN love.ADN lord
10 Within phrases constituency is usually predictable from the sequence of constituents, but if
not, constituency can be marked as necessary.
Verb Semantics and Argument Realization in Pre-modern Japanese
143
‘my lord, whom I love’ (MYS 4.485)
<phr type="arg">
<cl>
<phr type="arg"> a ga </phr>
kwopuru
</cl>
kimi
</phr>
d. Right dislocated topic
野嶋
左吉
爾
伊保里 須
和礼 波
Nwosima
ga
saki
ni
ipori
su
ware pa
[place name] GEN cape DAT
hut
do.CONCL I
TOP
‘me, I make a hut on the cape of Noshima’(MYS 15.3606)
<cl>
<phr> nwosima ga saki ni </phr>
ipori su
</cl>
<phr>
ware pa
</phr>
6. An Example of Full Markup
Finally in this section, we provide as an example the full markup of the text in (6a) above.
(11)
<ab type="original" xml:lang="ojp">
君之行
<lb xml:id="MYS.2.85-orig_1" corresp="#MYS.2.85-trans_1"/>
氣長 奴
<lb xml:id="MYS.2.85-orig_2" corresp="#MYS.2.85-trans_2"/>
山多都祢
<lb xml:id="MYS.2.85-orig_3" corresp="#MYS.2.85-trans_3"/>
迎加将行
<lb xml:id="MYS.2.85-orig_4" corresp="#MYS.2.85-trans_4"/>
待尓可将待
</ab>
<ab type="transliteration" xml:lang="ojp-Latn">
<s>
<cl>
<cl>
<phr type="arg">
<w type="noun" ana="#L042066">
<c type="logo">kimi</c>
144 Chung-Hwa Buddhist Journal Volume 25 (2012)
</w>
<w type="particle" subtype="case"
function="gen" ana="#L000503">
<c type="logo">ga</c>
</w>
</phr>
<w type="verb" inflection="infinitive"
ana="#L031840">
<c type="logo">yuki</c>
</w>
</cl>
<lb/>
<cl type="arg">
<phr type="arg">
<w type="noun" ana="#L050033">
<c type="phon">ke</c>
</w>
</phr>
<w>
<w type="adjective" ana="#L007007">
<c type="logo">naga</c>
</w>
<m type="adjcop"
inflection="infinitive" ana="#L000033">
<c type="logo">ku</c>
</m>
</w>
</cl>
<w>
<w type="verb" inflection="stem"
ana="#L031317">
<c type="logo">nari</c>
</w>
<m type="auxiliary" function="perf"
inflection="conclusive" ana="#L000018">
<c type="phon">nu</c>
</m>
</w>
</cl>
</s>
<lb/>
<s>
<cl>
<phr>
<cl>
<cl type="djunct">
<phr type="arg">
<w type="noun"
Verb Semantics and Argument Realization in Pre-modern Japanese
ana="#L050034">
<c
type="logo">yama</c>
</w>
</phr>
<w type="verb"
inflection="infinitive" ana="#L031047">
<c type="phon">tadune</c>
</w>
</cl>
<lb/>
<w type="verb" inflection="infinitive"
ana="#L031722">
<c type="logo">mukape</c>
</w>
</cl>
<w type="particle" subtype="foc"
ana="#L000506">
<c type="phon">ka</c>
</w>
</phr>
<w>
<w type="verb" inflection="stem"
ana="#L031840">
<c type="logo">yuka</c>
</w>
<m type="auxiliary" function="conjectural"
inflection="adnconc"
ana="#L000002">
<c type="logo">mu</c>
</m>
</w>
</cl>
</s>
<lb/>
<s>
<cl>
<phr>
<cl>
<w type="verb" inflection="stem"
ana="#L031644">
<c type="logo">mati</c>
</w>
</cl>
<w type="particle" subtype="case"
function="dat" ana="#L000519">
<c type="phon">ni</c>
</w>
<w type="particle" subtype="foc"
145
146 Chung-Hwa Buddhist Journal Volume 25 (2012)
ana="#L000506">
<c type="phon">ka</c>
</w>
</phr>
<w>
<w type="verb" inflection="stem"
ana="#L031644">
<c type="logo">mata</c>
</w>
<m type="auxiliary" function="conjectural"
ana="#L000002"
inflection="adnconc">
<c type="logo">mu</c>
</m>
</w>
</cl>
</s>
</ab>
Conclusion
This small inventory of syntactic elements and conventions for their use, combined with
the material they can contain, will allow unique identification of at least all of these
elements or properties in the corpus: topics, right dislocated elements, focused elements,
noun phrase heads, particle scope, clause predicates (including analytic predicates), zero
marked arguments, topicalized arguments, relative order of case marked and zero marked
arguments (including ordering relative to focused elements), and clause types (main,
subordinate, adnominal, nominalized). Furthermore, all such elements and properties, as
well as combinations of them, and combinations with other items and properties coded in
the corpus will be searchable and extractable from the corpus. For example, we will be
able to use the corpus to extract all attested syntactic frames for individual verbs, within
individual stages of the language as well as across different stages. All of this is highly
relevant, not just to the VSARPJ research project, but also more generally and widely to
11
investigation of most features of pre-modern Japanese syntax.
11
Needless to say, these coding conventions easily lend themselves to the creation of equally
powerful corpora of modern Japanese.
Verb Semantics and Argument Realization in Pre-modern Japanese
Abbreviations
General
TEI
VSARPJ
Text Encoding Initiative
Verb Semantics and Argument Realization in Pre-modern Japanese
Grammatical Terms
AND
Adnominal
CONC
Concessive
CONCL
Conclusive
DAT
Dative
EMPH
Emphatic
ETOP
Emphatic topic
HON
Honorific
NEG
Negative
OPT
Optative
RESP
TOP
Respect
Topic
Languages
EMJ
LMJ
MJ
NJ
OJ
Early Middle Japanese
Late Middle Japanese
Middle Japanese
Modern Japanese
Old Japanese
Texts
MYS
Man’yōshū
147
148 Chung-Hwa Buddhist Journal Volume 25 (2012)
References
Frellesvig, Bjarke and Whitman, John. eds. 2008. Proto-Japanese: Issues and Prospects.
Amsterdam: John Benjamins.
Frellesvig, Bjarke. 2010. A History of the Japanese Language. Cambridge: Cambridge
University Press.
Frellesvig, Bjarke; Hom, Stephen Wright; Russell, Kerri L.; Sells, Peter. The Oxford Corpus of
Old Japanese. http://vsarpj.orinst.ox.ac.uk/corpus/corpus.html.
Levin, Beth and Hovav, Malka Rappaport. 2005. Argument Realization. Cambridge: Cambridge
University Press.
Text Encoding Initiative. (n.d.) P5: Guidelines for Electronic Text Encoding and Interchange.
http://www.tei-c.org/release/doc/tei-p5-doc/en/html/index-toc.html.
Chung-Hwa Buddhist Journal (2012, 25:105-128)
Taipei: Chung-Hwa Institute of Buddhist Studies
中華 學學報第二十五期 頁 105-128 (民國一百零一年),臺北
ISSN:1017-7132
中華
學研究所
The XML-Based DDB:The DDB Document Structure
and the P5 Dictionary Module; New Developments of
DDB Interoperation and Access
Charles Muller (University of Tokyo)
Kiyonori Nagasaki (International Institute for Digital Humanities, General Incorporated Foundation)
Jean Soulat (Independent scholar)
Abstract
1
This paper has three parts. The first part, by A. Charles Muller, consists of a comparative
analysis of the DDB structure with that of the Dictionary Module in the current TEI P5
recommendations. The second and third parts are short summaries of the recent
applications offering enhanced access to and usage of the DDB, created by Kiyonori
2
3
Nagasaki and Jean Soulat.
Keywords:
Lexicons, XML, TEI, Web API, Interoperability
1
2
3
A. Charles Muller teaches Buddhism, East Asian thought, and a little bit of XML at the
University of Tokyo. He is the founder and chief editor of the Digital Dictionary of Buddhism, its
companion Chinese-Japanese-Korean-Vietnamese/English Dictionary (CJKV-E). He is also the
founder and managing editor of the scholarly network H-Buddhism (<http://www.h-net.
org/~buddhism>). His primary fields of research are Korean Buddhism and East Asian
Yogācāra/Tathāgatagarbha thought, along with occasional forays into Zen, Confucianism, and
Daoism. A listing of his books and articles on these topics can be accessed through his web site,
Resources for East Asian Language and Thought (<http://www.acmuller.net/index.html>).
Kiyonori Nagasaki (永崎研宣) has an M.A. in Buddhist Studies from Tsukuba University,
and is best known for his work as the primary developer handling the <SAT Taish
Database> and <INBUDS Database> in Tokyo. He has developed a range support structures
to provide interoperation between SAT and the DDB, as well as INBUDS. He also wrote the
Perl code for our “Feedback” option.
Jean Soulat is a telecom engineer with a personal interest in Buddhism and Chinese Culture.
He has worked in the area of computer networks since the early days of the French public
data network and then with different large scale networking and IT programs. He has created
the application tool named Smarthanzi (<http://www.smarthanzi.net>) for looking up Sinitic
words and characters in East Asian texts. Based on Smarthanzi, he has also created a
specialized application for the DDB, called DDB Access <http://download.smarthanzi.
net/ddbaccess>), which adds extensive functionality to the standard DDB lookup.
106 Chung-Hwa Buddhist Journal Volume 25 (2012)
以可擴展標記語言(XML)為基礎的電子 教字典
(DDB): DDB文件結構與P5字典模組; DDB 相互
操 與取用技術的新發展
Charles Muller (東京大學)
永崎研宣 (一般財団法人人文情報學研究所)
Jean Soulat (獨立學者)
摘要
此篇文章為三部分 第一部分由Charles Muller所寫,以現行的TEI P5所建議的字典
模組,對DDB結構之比較分析 第二與第三部分則是由Kiyonori Nagasaki 與 Jean
Soulat 所寫,簡短地概述近來提供加強對於DDB的取用技術與使用
關鍵詞 詞典 XML TEI 網路應用程式介面 相互操
The XML-Based DDB
107
The DDB Document Structure and the P5 Dictionary
Module
Charles Muller
No doubt that many of those of us who began their engagement in the development of
web-based canonical collections, online databases, and various other research tools
related to Buddhist Studies and East Asian studies at the time of the inception the
WWWeb (circa 1994-95) look back in sheer amazement at the fact that almost fifteen
years have passed since we made our most rudimentary stabs at developing these
materials. At that time, Unicode, XML, Yahoo, Google, Internet Explorer, and scores of
other now-commonplace Internet tools were yet to be heard of. In a short decade and a
half, our way of doing research — and especially textual research — has been radically
transformed.
Because of this radical change, young scholars coming into our field today need an
entirely different set of skills for finding and organizing information. On the other hand,
they no longer need, upon their departure from graduate school, to begin to try to figure
out how they are going to afford to buy their first printed Taishō canon, and all the
dictionaries and other reference tools needed to work with East Asian Buddhist texts.
Most of these are now available digitally, and online, in one format or another. And these
young scholars will have far more than simply the printed Taishō, Zokuzōkyō and other
smaller canonical collections presently available at their disposal, as new, heretofore
unavailable materials are being made searchable and downloadable — a main case in
point being that of the newly developing Chan Texts Database, which will make available
a variety of Chan texts, along with Dunhuang materials which were almost impossible to
get one's hands on before this. And of course, for working with these texts, there is the
DDB.
As has been explained over the years in numerous other presentations and project
reports, I began my compilation of what turned out to be the DDB in my early days in
graduate school (1986) having become aware of the incredible dearth in adequate
lexicographical and other reference works in English language for the textual scholar of
East Asian Buddhism in particular, and East Asian philosophy and religion in general. I
worked at compiling terms for about ten years, and in 1995, shortly after the birth of the
WWWeb, uploaded the collection that I had gathered up to that time up to my first web
4
site, and the rest is history.
4
For various accounts of the development of the DDB up to its present state, please see the
bibliography, which provides a fairly complete listing of presentations, both published and
unpublished.
108 Chung-Hwa Buddhist Journal Volume 25 (2012)
Suffice it to say that the DDB has become the de facto choice among reference tools
for young Western scholars doing work involving East Asian Buddhism. It is introduced
as a primary reference work in all major North American universities that have programs
dealing with East Asian Buddhism; it is supported in terms of content and programming
by more than sixty scholars, many of whom are recognized as leading figures in their own
sub-areas of Buddhist Studies or Information Technology; and it is presently subscribed
5
to by twenty-eight university libraries. It is also now accessible through online canonical
6
text databases such as that of the SAT Taishō Database, and is included in various Han7
8
character-based lookup tools, including Smarthanzi, the WWWebJDic Server, and
9
Tangorin.
In prior papers dealing with the DDB, I have explained various aspects of the project,
10
including history, design, collaboration strategies, XML structure, and so forth. Here, I
would like to focus on a specific issue with the present XML structure, paying special
attention to its relation with the TEI P5 Dictionary Module.
At the conference where this paper was originally presented (which is the basis for
the present volume), a significant portion of the presentations dealt with XML in one way
or another. What most of them had in common, however, was their presentation of XML
as a way of marking up pre-existent materials, whether they be pre-existent canonical
collections, lexicons, or whatever. The DDB was unusual among the presented projects at
this venue in that it was one of the very few where XML was shown as a framework for
the development of a new data set from the ground up, and which, working through
XSLT, provides the systematic structure for an online database-reference resource.
Indeed, among online academic reference tools of its kind, the DDB as a fully XML
structured resource is unusual, since most online reference resources tend to be run from a
more traditional database structure.
The original choice of XML to structure the DDB data is basically an accident of
history, related to the background of the people from whom I received my earliest
technical advice. Most important in this regard is Christian Wittern, who discovered my
earliest, hard-linked HTML version of the DDB on the Web sometime in 1995 or 1996.
He applied a basic SGML structure to the data, where the tags referred to elements of the
content and document structure, rather than being the mere style commands of HTML.
Christian send me a copy of his SGML-restructured data, along with SoftQuad Panorama,
See <http://www.buddhism-dict.net/ddb/subscribing_libraries.html.>
See <http://21dzk.l.u-tokyo.ac.jp/SAT/ddb-sat2.php.>
Also available in the Windows desktop application DDB Access; both of these are available
at <http://www.smarthanzi.net>; to be discussed by its developer below.
8 <http://www.csse.monash.edu.au/~jwb/cgi-bin/wwwjdic.cgi?1C>
9 <http://tangorin.com>
10 See JODI (2002).
5
6
7
The XML-Based DDB
109
a freeware viewer for SGML. I knew nothing of SGML at this time, but could see that it
could be quite useful to mark up the data with content-meaningful tags as opposed to
simple HTML style markers. Before long, the news of the impending release of the new
XML standard to replace SGML had many people excited, and it seemed as if major
software companies intended to support it, so I converted the DDB to XML, and stored it
that way during the next couple of years, running an array of MS-Word macros to
generate new HTML files periodically, uploading these to my web site.
But generating files this way each time was a convoluted and time-consuming
process. Around 2000, I knew that people were beginning to publish XML materials on
the web via XSL, and that more and more major markup and publication projects were
turning to XML. But there were virtually no examples of serious real-world
implementation other than in brief W3C explanatory materials. And without any kind of
precedent available from which to learn, my programming skills were entirely insufficient
for trying to implement raw XML on the web on my own.
At the same time, there were really no major data sets like my own readily available
for testbed purposes, so newly appearing XML software development companies had no
way to thoroughly test their new applications on actual large and complex data sets. On
my website at the time, I had a description of the content and structure of my data (which
was importantly, multilingual, including Chinese, Korean, and Japanese scripts, and a
fairly large range of diacritical characters), and I was contacted by a few companies,
including Microsoft and Altova (XMLSpy) who asked to use my data for testing of their
11
XML software currently under development.
After having been contacted by these companies, it occurred to me that there may be
other individual XML developers who could take advantage of the DDB data for their
own purposes, and at the same time help me to begin to take proper advantage of the
XML structure and begin delivering the data on the web in real time, through XSL and
whatever else might be required. I posted a message on the Mulberry XSL list inquiring
as to whether anyone was interested in working with my data in this way. Within a week,
I received a response from Michael Beddow, who, in connection with work he was doing
12
on the web-based version of the Anglo-Norman Dictionary, expressed a willingness to
try to get the XML-DDB up and running on a server. In a very short time, he
accomplished this to a level far beyond what I could have ever hoped. Since both Michael
11 I agreed to both of these requests. From Microsoft, I never even received so much as a thankyou note. Altova gave me one free upgrade (to my already-purchased license), but then forgot
about me, demanding that I pay the full price for their next Enterprise version. This turned out
to be a major motivation toward my efforts to learn Emacs. Also, luckily, not too much later,
<oXygen/> made its appearance on the scene, with its much more reasonably priced, fullyfeatured XML editor, and much more humane support staff.
12 <http://www.anglo-norman.net/>
110 Chung-Hwa Buddhist Journal Volume 25 (2012)
13
and I have recounted the main points of this landmark task in some detail in the past, I
will not go into great detail retelling this stage of the process, except to say that Michael
is still providing the basic technical support for the project, including security as well as
the basic delivery of the data, for which I, and thousands of researchers of Buddhism
around the world can be eternally grateful.
This simple but elegant XML/Perl/XSL delivery system developed by Michael has
functioned in the same way, basically unchanged for almost a decade, and technically
speaking, there have been no special demands or changes to our system that XML/XSL
can't deal with, so although the suggestion of changing over to a traditional database
system has been made to me from time to time, I have never felt the need to give it
serious consideration. Although a database setup such as MySQL may be a bit faster in
retrieving entries, having the data in XML format allows me to fully integrate it with the
rest of my work on my desktop. Since I do virtually all of my scholarly research and
translation in XML, and maintain various related data sets in XML or plain text format,
having the DDB in XML while using the same basic tag structure for the rest of my
documents makes it very easy to move things back and forth.
Having mentioned the fact that I use the same basic tagging structure in the rest of
my work, I would like, from here, to deal with a technical aspect of the project that I have
touched on briefly from time to time, but have never really worked through in detail: that
is the relationship of the structure of the DDB to the TEI document model. I have been
using TEI for my writing and most of the other phases of my work for about eight years
now. Also, the two major technical contributors to the DDB project, Christian Wittern
and Michael Beddow, are persons well-versed in the development and implementation of
TEI. Since TEI has a subset of tags specifically designed for the structuring of lexical
materials, it might be reasonable to assume that the DDB would be a fully TEI-based
project.
It is to a significant degree. Since I have been using TEI in my work for several
years, it has been the case that when I have needed a new tagging structure for the DDB, I
have always first checked the TEI tag set to search for an appropriate tag. Almost always
finding one, I have done my best to implement new elements in the DDB according to
TEI hierarchical rules and with the recommended attributes. Thus, the content of the
<sense> nodes in the DDB (discussed in further detail below) is fully TEI(P4)-compliant.
This covers many sub-structures, including <list>, <biblStruct> and many other basic
prose structures necessary for writing short dictionary entries, as well as encyclopedic
entries—basically replicating the rules of what would be allowable inside the TEI <p>
element.
13 See the JODI (2002) article, ibid. I have also discussed Michael's role in the project in a few
other articles.
The XML-Based DDB
111
For the nodes above, and outside <sense> however, the structure of DDB entries is
somewhat different from the sort of thing that one would build if one were to start from
the ground up with the present TEI P5 Dictionary Module. When the XML for the DDB
was first set up, there was no special intention to reject the TEI (at that time P4) structure.
Christian Wittern and I sat down at a conference one time and tried to write a tag
structure that best fit that of the DDB at the time. At this time I knew nothing of TEI, and
Christian was just getting seriously involved in this Initiative. Thus, while this initial
structure was informed by TEI concepts, it tended to conform more closely to the actual
structure of the DDB, rather than trying to force a full TEI framework.
The basic structure of a DDB entry is currently like this:
<entry> (one dictionary entry)
<hdwd> (Chinese logographic headword)
<pron_list> (grouping the pronunciations into a separate node)
<pron> (pronunciations of the headword in various East Asian languages, in roman script
as well as native syllabaries)
<pron>
<pron>
...
</pron_list>
<sense_area> (grouping semantic/content information)
<trans> (a short, primary translation or meaning of the head word)
<sense> (explanatory portion of the headword, for which there is usually more than one)
<sense>
...
</sense_area>
<dictref> (list of references to entries for the term in other major reference works)
<dict>
<dict>
...
</dictref>
</entry>
Filled out with attributes and data, a relatively short sample entry looks like this:
<entry ID="b9403" added_by="cmuller" add_date="1993-09-01"
112 Chung-Hwa Buddhist Journal Volume 25 (2012)
update="2009-11-25" rad="金" radval="08" radno="167" strokes="12">
<hdwd>鐃</hdwd>
<pron_list>
<pron lang="zh" system="py" resp="c.wittern">naó</pron>
<pron lang="zh" system="wg" resp="cmuller">jao</pron>
<pron lang="ko" system="hg" resp="cmuller">요</pron>
<pron lang="ko" system="mc" resp="cmuller">yo</pron>
<pron lang="ko" system="mr" resp="cmuller">yo</pron>
<pron lang="ja" system="kk" resp="cmuller">ドウ</pron>
<pron lang="ja" system="hb" resp="cmuller">ny </pron>
<pron lang="vi" system="qn" resp="daouyen">nao</pron>
</pron_list>
<sense_area>
<trans resp="cmuller" rend="hide">a <term lang="en">hand-bell</term></trans>
<sense resp="cmuller" ref="Yokoi,Hirakawa">Cymbals.(Skt. <term lang="sa-mw"
n="11740">tūrya</term>) <bibl type="canonlink">法華經 <xref
canonref="http://21dzk.l.utokyo.ac.jp/SAT/T0262_,09,0009a11:0262_,09,0009b11.html">T
262.9.9a13</xref></bibl> </sense>
</sense_area>
<dictref>
<dict><title>Zengaku daijiten (Komazawa U.)</title><page>989b</page></dict>
<dict><title>Japanese-English Zen Buddhist Dictionary
(Yokoi)</title><page>512</page></dict>
<dict><title>Ding Fubao</title><page/></dict>
<dict><title>Buddhist Chinese-Sanskrit Dictionary
(Hirakawa)</title><page>1193</page></dict>
<dict><title>Bukky daijiten (Mochizuki)</title><page>(v.16)1307b,2595b,4137a</page></dict>
<dict><title>Bukky daijiten (Oda)</title><page>1370-1</page></dict>
</dictref>
</entry>
The XML-Based DDB
113
We will get into the treatment of comparative issues in detail below, but just to
provide the reader with some context, it is probably useful to have some idea of the basic
14
P5 recommendation for dictionary structures, which, as provided on the TEI web site, is
like this:
<entry>
<form>
<orth>disproof</orth>
<pron>dis"pru:f</pron>
</form>
<gramGrp>
<pos>n</pos>
</gramGrp>
<sense n="1">
<def>facts that disprove something.</def>
</sense>
<sense n="2">
<def>the act of disproving.</def>
</sense>
</entry>
As we can see, the fundamental elements <entry>, <pron>, and <sense> are used in the
DDB for the same purposes, and with basically the same kind of hierarchical structure.
The most glaring difference is seen in the DDB element <hdwd> (head word), which is
idiosyncratic, an odd tag that I created during a short period in which the DDB was stored
in a mixed structure of XML and HTML tags, and the attempt to use the tag <head> for
head words produced obvious problems in HTML. It would have been better to get rid of
this at an earlier stage, but opportunities were missed, so it remains here, embedded at the
core of the DDB. In P5, the corresponding tag would probably be <orth>. Beyond this,
the other major difference is the presence in the DDB of the node <dictrefs>, within
which information is contained regarding references on the same term in other
dictionaries.
Before we address specific issues of tags and their structure, a word regarding the
nature of the data set itself is in order. That is, the East Asian notion that is translated into
English as “dictionary” — 辭典 (Ch. cidian; K. sajeon; J. jiten) sometimes refers to
14 <http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-entry.html>
114 Chung-Hwa Buddhist Journal Volume 25 (2012)
something that is basically the equivalent of a Western dictionary. But it is also often
something quite different, in the sense that it may well end up containing entries that are
more like those of an encyclopedia in terms of length and complexity. And there is less
linguistic-oriented information (such as grammatical forms and so forth).
Another distinctive characteristic for East Asian works of this type is that with
Sinitic Buddhism being a pan-East Asian phenomenon, the Chinese logographic head
words have distinct pronunciations in Mandarin, Korean, Japanese, and Vietnamese
(including variant readings within these languages), with these being represented in both
native syllabaries and Western romanization systems. Since the TEI dictionary module is
basically constructed upon the Western model, problems will be evident from the start.
Acknowledging these points, let us try to see what would be involved in bringing the
DDB structure in line with TEI P5. For the moment, we will leave the level of attributes
aside, focusing on elements.
The first obvious change would be that of replacing <hdwd> to <orth>. This would
not be terribly difficult, since a global replacement throughout the data and XSL files
should not result in any major problems. Next, removing the <pron_list> wrapper from
the level above the <pron> elements would not pose major problems at the XML level,
but it would require some degree of rewriting of the style sheet; the same would be true
for removing the <sense_area> wrapper from around the senses. The TEI element
<gramGrp> is not relevant to the DDB.
A major consideration would be the conversion of the <dictrefs> area. This is an
idiosyncratic component of the DDB, since it is not customary for dictionaries in
general — whether they be Eastern or Western — to provide a list of references in other
dictionaries or encyclopedias. Among the child nodes offered in TEI entry/dictionaries,
the only thing that appears to come close to this is the element <xr> (x-reference). But if
we used this, it would probably be more appropriate to use it in the place of the <dict>
reference, rather than the wrapper <dictrefs>.
With this kind of list, including essentially bibliographical references, it would be
helpful for styling and other programming purposes to have a wrapper for this list of <xr>
elements, something playing a similar to <listBibl>. Actually, the inclusion of
<listBibl> — or something like it — as a possible child of <entry> would very helpful in
this case. Then would could convert the <dictrefs/dict> structure into a basic TEI
<listBibl/bibl> tree (<listBibl> does appear under <xr>, so this would be another possible
route.). But again, this is a special dimension of the DDB, and not a something that one
15
would see needed for dictionary entries in general.
15 In this regard, the DDB is often more like an encyclopedia than a dictionary, but the TEI does
not at the moment have an encyclopedia module. A discussion of encyclopedias on TEI-L
(<http://listserv.brown.edu/archives/cgi-bin/wa?A0=TEI-L>) at the end of 2009 concluded in
The XML-Based DDB
115
A provisional rewrite of the DDB entry structure, based on the above changes,
would now look something like this (shortening some sections for the sake of readability):
<entry ID="b9403" added_by="cmuller" add_date="1993-09-01" update="2009-11-25"
rad="金" radval="08" radno="167" strokes="12">
<form>
<orth>鐃</orth>
<pron lang="zh" system="py" resp="c.wittern">naó</pron>
<pron lang="ko" system="hg" resp="cmuller">요</pron>
<pron lang="ja" system="kk" resp="cmuller">ドウ</pron>
</form>
<sense type="brief"><def>a <term lang="en">hand-bell</term></def></sense>
<sense type="normal" resp="cmuller" ref="Yokoi,Hirakawa">Cymbals.(Skt. <term
lang="sa-mw" n="11740">tūrya</term>) <bibl type="canonlink">法華經<xref
canonref="http://21dzk.l.utokyo.ac.jp/SAT/T0262_,09,0009a11:0262_,09,0009b11.html">T
262.9.9a13</xref></bibl> </sense>
<xr>
<bibl><title>Buddhist Chinese-Sanskrit Dictionary (Hirakawa)</title><biblScope
type="pages">1193</biblScope></bibl>
<bibl><title>Bukky daijiten (Mochizuki)</title><biblScope type="pages">(v.16)1307b,2595b,4137a</biblScope></bibl>
<bibl><title>Bukky daijiten (Oda)</title><biblScope type="pages">13701</biblScope></bibl>
</xr>
</entry>
The next level of conversion—that of attributes—gets more complicated, as the DDB
utilizes a number of attributes that are not contained either as attributes or elements in the
dictionary module or elsewhere in the TEI P5 tag set, as far as I can determine. The
character of the attributes currently used in the DDB can serve to draw our attention to
the recommendation for the encyclopedia maker to structure his data with a series of nested
<div> tags with various attributes making the content distinctions.
116 Chung-Hwa Buddhist Journal Volume 25 (2012)
some of the distinctive aspects of the DDB mentioned above. That is, rather than being
the markup of some pre-existent lexicon, the DDB is a new work in progress. To properly
embed information related to the development of each entry, the attributes attached to our
<entry> tag contain several pieces of information that provide important history regarding
the entry, as well as categorizing and sorting information. These include, at the entry level,
@added_by, @add_date, and @updated. Interestingly, the TEI has always shown concern
about this kind of documentation, as these kinds of elements have always been part of
TEI document headers. But as far as I can tell, there is no mechanism for recording this
kind of information at the level of entries or entry child nodes in a reference work. So if
we tried to convert to P5, these would need to be added to a customized schema.
Similarly, at the <sense> level, the @resp, @source, and @ref attributes are critical to the
DDB for keeping clear records of sources, contributions, responsibility, and related
references. Unless I have missed some alternative way of dealing with these in the
Guidelines, it seems that the committee that developed the dictionary module had in mind
the markup of pre-existent dictionaries, rather than the collaboration-based creation of a
new dictionary in mind when they created this attribute structure.
Would it be worth the effort to convert to P5? The thought of going through this
present comparison of the DDB entry structure with that of P5 has been on my mind for
some time. Why would one go through the trouble of making this kind of major
conversion in an XML structure that is working fine as it is?
There would be a few significant advantages to doing this. The first reason is that, as
mentioned above, most of the rest of the academic research and writing that I am doing is
being composed in TEI P5. Having a DDB structure that is fully TEI compatible would
allow me to freely copy data back and forth without generating non-validity problems at
either end. Second, this would allow the usage of the same basic style sheets for all of my
projects. Third, full TEI compatibility would allow me to take advantage of other tools
produced by members of the TEI community, including its schemas, and CSS/XSL sheets.
There are, however, a couple of significant drawbacks. First, it would not only
require a major reworking of the data and the style sheets. It would also entail a
reworking of scores of MS-Word macros that have been the background for the actual
production of the data for more than a decade. So careful consideration is needed before
taking the leap.
The XML-Based DDB
117
Interoperation I: The DDB and SAT
Kiyonori Nagasaki
The digitization of the resources for Buddhist studies—as well as those for other fields of
academic inquiry—has now been in progress for a few decades. As a result of the diligent
efforts of those engaged in various digitization projects, researchers of Buddhism now
have access to a wide range of electronic materials, a state of affairs that serves to
enhance the efficiency, accuracy, and overall quality of their research. The emergence of
the Web environment has been the fundamental catalyst allowing a wide range of new
ways of storing, representing, and sharing of resources. Recently, the next evolution of
the Web—known as Web 2.0—has brought about a transformation in the delivery and
handling of digital scholarly resources for all kinds of research. Most important here is
the availability of the AJAX technology and Web API, which have enhanced the ways of
sharing and delivering Web resources by leaps and bounds. The dissemination of cloud
computing technology will further serve to support these kinds of developments.
Even only a decade or so ago it was taken for granted that for complementary digital
resources—such as text and lexicon—to work together effectively, they had to be
integrated one way or another into a single database format. While this may still well
probably happen in a case where both resources are developed by the same individual or
within a single project, if the resources were developed by separate entities, the
combining of both into a single structure would usually entail the loss of independence or
identity for one party or the other. However, in recent years, the situation has changed
significantly, since, by adopting AJAX, Web API, and similar technologies, those who
have been developing Web-based resources in the Humanities will be able to
cooperate/interoperate between projects while each project maintains full independence.
The prominent example to be offered here is the recent interoperation developed
16
(starting in 2008) between the SAT Taishō Database and the DDB and INBUDS (Indian
17
and Buddhist Studies Treatise Database) on the Web environment using the AJAX
technology. Since April of 2008, the SAT Web service has been providing the function
wherein if the user selects a portion of kanbun text from the Taishō canon with the mouse,
a list of terms within that text that are available in the DDB will be generated alongside
the text, along with English head words and links into the DDB itself. We are continuing
to enhance various aspects of this function.
Since the time of the presentation of this application at the Chan texts conference in
Oslo in October 2009, SAT has been providing further new functions implemented with
16 <http://21dzk.l.u-tokyo.ac.jp/SAT/search.php>
17 <http://21dzk.l.u-tokyo.ac.jp/INBUDS/search.php>
118 Chung-Hwa Buddhist Journal Volume 25 (2012)
AJAX and Web API. Previously, users could search related articles from the INBUDS
18
database (maintained by the Japanese Association of Indian and Buddhist Studies ), but
were only able to elicit basic bibliographical reference information. Under this new
function, users are also able to obtain PDF files of the articles (when PDFs are available)
by clicking on the PDF icon displayed on the ending of each line of the search results.
19
Clicking on the icon opens up a page within the CiNii service that includes a link to the
PDF file. This PDF file service is provided for the whole academic society, not only to
Buddhist Studies or the Humanities. CiNii distributes their bibliographic data as a PDF
file through their Web API service.
INBUDS has taken optimal advantage of this public service by providing a Web API
that allows other Web services to retrieve the INBUDS search results, including their
PDF file information. The SAT Web service has implemented this, but it is important to
know that every scholarly web service is welcome to enrich itself by taking advantage of
CiNii's offering. Furthermore, the SAT Web service has been further contributing to
CiNii's efforts by providing some Web APIs. SAT is also planning to provide some more
efficient APIs so that the other Buddhist service providers can also distribute better
services. Adopting AJAX and Web API, each project/service can enrich other services,
while maintaining their independence as individual projects.
In this kind of Web environment, we will have the opportunity to work together not
only as isolated contributors of data but also as individual and cooperative service
providers so that researchers in our field can benefit from more efficient services. By so
doing, our study and inner space will be greatly enriched.
18 <http://www.jaibs.jp>
19 (Scholarly and Academic Information Navigator, pronounced like “sigh-knee”) is a database
service maintained as a Japanese government project by the National Institute of Informatics,
which enables searching of information on academic articles published in academic society
journals or university research bulletins, or articles included in the National Diet Library's
Japanese Periodicals Index Database. <http://ci.nii.ac.jp/en>
The XML-Based DDB
1. DDB Parsing From the SAT Database
1.1. SAT Text View
The user opens up desired text by scrolling or computer search:
119
120 Chung-Hwa Buddhist Journal Volume 25 (2012)
1.2. Selecting and Generating a Word List
One then selects a portion of text with the mouse, upon which the DDB words contained
in the selected text will be arranged in a list on the left:
The XML-Based DDB
1.3. Lookup in the DDB
Clicking on term in the list will open up the entry in the DDB:
121
122 Chung-Hwa Buddhist Journal Volume 25 (2012)
Interoperation II: DDB Parsing and Lookup with
SmartHanzi.net and DDB Access
Jean Soulat
1. Smarthanzi.net
SmartHanzi.net is a website with a parsing and lookup tool developed by Jean Soulat for
Chinese and links to etymological lessons by Dr. L. Wieger, S.J., in his CHINESE
20
CHARACTERS – Their origin, etymology, history, classification and signification
Finding a given character in this book can be a trying experience, since one is forced to
work through a number of indexes. The introduction of simplified characters in
continental China has added a further level of complication. This is precisely the sort of
situation where information technology can be of the greatest help: with just a mouse
click, the website points to the relevant etymological lesson (out of 177) and phonetic
series (out of 858).
1.1. Parsing and Lookup
Parsing and lookup relies on various Chinese word lists available on the Internet:
●
●
For basic Chinese, CEDICT MDBG (English), HanDeDict (German); for
Buddhism, the DDB and Soothill & Hodous.
The companion site Smartkanji.net uses the JMDict multilingual list for
21
Japanese available on Jim Breen’s Monash Nihongo FTP Archive. Jim
Breen also kindly provides Japanese specific tables for adjectives and words.
When a text is submitted to the application, the server parses it and displays a first view
of all words found (in the main list) just under the text. Users can then lookup anywhere
in the text either with a mouse click or by dragging over if more convenient. The website
shows all words recognized at the mouse position. It does not try to make a choice.
Users have to select one among the available dictionaries. If one needs to lookup
from several dictionaries, several tabs in the browser window offer a convenient solution.
First published in 1899 (French) and 1915 (English), based on the 2nd century Shuowen Jiezi.
This work contains numerous technical errors, but is a valuable historical document in that it
reflects the understanding that many Chinese had regarding their writing system.
21 < http://www.bcit-broadcast.com/monash/>
20
The XML-Based DDB
123
1.2. Technology
SmartHanzi.net uses the so-called “Ajax” technology: one HTML page is used as an
application. Further data are then updated through XmlHttpRequest and JavaScript within
the original HTML page. The server is written in PHP and uses flat files (no database) to
keep parsing time acceptable.
Since some large size tables need to be loaded for each text, the website works best
when users submit full paragraphs or short texts.
1.3. Limitations
The word lists available on the Internet are convenient for parsing and lookup. But they
do not contain enough detail to navigate from one word to another, as many people love
to do with paper dictionaries. This is where the DDB XML access provides a great
opportunity.
1.4. The DDB Access Application
Both Smarthanzi (<http://www.smarthanzi.net>) and DDB Access (<http://download.
smarthanzi.net/ddbaccess>) work off of the same DDB data extract, which includes
headwords, pronunciations, and basic definitions in a public file published monthly by
22
Charles Muller. It does not include the full set of data accessible through the DDB
website (<http://buddhism-dict.net/ddb>). Since the full data displayed on the DDB
website are also available in XML format on per word requests, there was a perspective
to put together the parsing and lookup facility of SmartHanzi.net and the complete DDB
data.
22
This file, <http://acmuller.net/download/buddhdic.txt.gz> is the same as that which is
published on Jim Breen's WWWJDic Server (<http://www.csse.monash.edu.au/~jwb/cgibin/wwwjdic.cgi?1C>). It is also used by the SAT database to access terms in the DDB, as
well as by the developer of Tangorin.
124 Chung-Hwa Buddhist Journal Volume 25 (2012)
1.5. Views of DDB Access
1.5.1. Step One: Paste in Text
The user pastes in some East Asian text containing Chinese characters:
The XML-Based DDB
125
1.5.2. Step Two: Parse Text
The text is then parsed, separating compound words on the left and single characters on
the right:
126 Chung-Hwa Buddhist Journal Volume 25 (2012)
1.5.3. Step Three: Select Word for Lookup
The user can then select a character or compound word from the generated list for lookup
in the DDB:
The XML-Based DDB
127
1.6. Meeting the DDB Password Policy
Both for security purposes, and in order to encourage users to contribute to the DDB,
Muller has implemented a tiered password policy. This has led to a maximum number of
calls per day for users with the “guest” login. In order to meet the DDB password policy,
DDB data have to be requested from the user PC. Since making XML calls from the
SmartHanzi.net server would have infringed the DDB password policy it was not
considered.
One option might have been to include XML data into SmartHanzi.net HTML / Ajax
pages, of course subject to agreement from the DDB team. However, for security reasons,
JavaScript does not allow a web page from one site (SmartHanzi.net) to access XML data
from another site (DDB). Cross-domain calls may be allowed in latest generation
browsers but not all users have a recent browser.
1.7. The DDB Access Application
The chosen solution was to develop a PC application, called “DDB Access,” which is not
subject to the cross-domain limitation. The application has the same look and feel as the
website and uses the same DDB extract to parse the text. When the user clicks on a word,
the application makes a request to the DDB server and gets the full XML data. Each user
needs to provide his or her DDB password.
The XML data are presented with different views in separate tabs:
●
Standard view: similar to the DDB website.
●
Text view: no formatting, convenient for paste and copy into a word processor.
●
XML: formatted XML view.
●
XML (raw): unformatted XML view, for paste and copy into a XML editor.
To make sure that parsing and lookup maintain consistency, it is recommended to
download monthly updates.
1.8. Technology
The application is developed with Microsoft Windows Presentation Foundation
(WPF, .NET 3.5). It embeds a SQLite lightweight database which makes it easy, for
instance, to add the “Also contained in” function.
1.9. Soothill and Hodous
Both SmartHanzi.net and DDB Access also include Soothill and Hodous entries, as digitized
and published by A. Charles Muller. (<http://www.acmuller.net/soothill /index. html>).
128 Chung-Hwa Buddhist Journal Volume 25 (2012)
References
Muller, A. Charles. 2009. The Digital Dictionary of Buddhism [DDB] as a Model for Web
Collaboration. Symposium of the Information Processing Society of Japan, University
of Tokyo.
----. 2009. The Digital Dictionary of Buddhism [DDB]: Present Status and Future
Developments. Scholars of Buddhism in Japan: Buddhist Studies in the 21st Century.
Kyoto: International Research Center for Japanese Studies. 87–100. http://acmuller.net
/articles/ddb-nichibunken-200803.html.
----. 2005. A Model for Scholarly Collaboration in the Development of On-line Reference
Works: The Digital Dictionary of Buddhism. Conference on New Technology in the
Handling of East Asian Documents; Chinese National Library, Beijing. http://acmuller.
net/articles/ddb-beijing-conference.pdf.
----. 2002. Moving into XML Functionality: The Combined Digital Dictionaries of
Buddhism and East Asian Literary Terms. Journal of Digital Information: Special
Issue on Chinese Collections in the Digital Library 3(2). http://journals.tdl.org/jodi
article/view/83/82.
----. 2001. ディジタル媒体を使用して 仏教データの調査と普及: 仏教学のディジタ
ル辞書. (Using the Digital Medium for Research and the Dissemination of Buddhist
Studies Data: The Digital Dictionary of Buddhism). Annual Conference of the Japanese
Association for Indian and Buddhist Studies, Tokyo University. http://acmuller.
net/articles/jaibs2001.html.
----. 1999. Developments of the Web Dictionaries of East Asian Thought: Stepping up to XML.
Seminar on Computing in East Asian Studies, Kyoto University Computing Center.
----. 1999. Update on the Development of the Digital Dictionary of East Asian Buddhist
Terms. Fifth meeting of the EBTI, Academia Sinica, Taipei. http://acmuller.net/articles
report1999ebti.htm.
----. 1998. The Structure and Function of the Interlinked Electronic CJK-English and
Buddhist CJK-English Dictionaries. International Conference of Asian Scholars (ICAS),
Leiden University. http://www.acmuller.net/articles/dictionaries1.htm.
----. 1998. The Structure and Function of the Interlinked Electronic CJK-English and
Buddhist CJK-English Dictionaries. Meeting of the Pacific Neighborhood Consortium
(PNC), Taiwan.
----. 1997. The Usage and Development of Digital Reference Tools in Working with CJK
Buddhist Texts: Interlinked CJK and Buddhist CJK Dictionaries. Fourth Meeting of the
Electronic Buddhist Text Initiative, tani University, Japan.
----. 1996. Introducing the Web Dictionary of East Asian Buddhist Terms. Third Meeting of
the Electronic Buddhist Text Initiative, Foguang Shan, Taiwan.
Nagasaki, Kiyonori; Muller, A. Charles; Shimoda, Masahiro. 2009. Aspects of the
Interoperability in the Digital Humanities. Digital Humanities. 375-377.