Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Chung-Hwa Buddhist Journal (2012: 25:149-166) Taipei: Chung-Hwa Institute of Buddhist Studies 中華 學學報第二十五期 頁 149-166 (民國一 ISSN:1017-7132 零一年),臺 :中華 學研究所 ISSN:1017-7132 A Relational Database for Text-Critical Studies Wojciech Simson Univ. of Zürich Abstract After a brief introduction to the scope of a project digitizing the Confucian Analects and a short explanation of the working principles of relational databases in general the architecture of the actual relational database used in the project is outlined. The database was designed to store, classify, compare and sort textual variants and to handle a considerable number of textual witnesses in such a way that strains of transmission could be compared with one another. Some attention is also paid to the handling of problems typical for manuscripts like illegible or doubtful characters, lacunae and non-standard characters that are not included in the Unicode standard. The database is further enhanced by a tagging system allowing to classify and to analyze different types of variants. Finally an evaluation of the whole system and suggestions for its further development are given. Keywords: Digitization, Relational Database, Textual Criticism, Stemmatology, Lunyu 150 Chung-Hwa Buddhist Journal Volume 25 (2012) 經文鑑別研究之關係資料庫 Wojciech Simson 蘇黎世大學 摘要 在簡短的介紹完數位 孔子 論語 專案的概況及對於關係資料庫的一般工 原則 之說明後, 計畫所使用的關係資料庫架構得由 可見其輪廓 資料庫用以設計 為貯存 分類 比較並排序文本的差異,同時處理相當數量文本見證,依 ,不同 的傳承系譜能夠有所比較 另外也關注一些處理寫本上的典型問題,例如難辨認的 或可疑的字元,脫漏及未被包含在 Unicode 標準的非標準字元 資料庫更進一 由標籤系統加強,可以歸類及分析差異的不同型態 另外, 文也提供一完整系統 評估及進一 展的建議 關鍵詞:數位 關係資料庫 經文鑑別 文獻系譜學 論語 A Relational Database for Text-Critical Studies 151 Introduction Whereas most papers in this volume deal with the digitization of East Asian texts by means of mark-up languages, the following project is quite different as to the digitization method and the kind of text that has been digitized. The text in question is not Buddhist but Confucian and is no other than the well known Analects or Sayings of Confucius (論語), and it was digitized not in a mark-up language but in a relational database. I think, however, that there is quite a bit of common ground: Among the copious textual witnesses of the Analects incorporated into the database there were, among others, more than 70 fragments of Tang time manuscripts stemming mostly from Dunhuang and partly from the ancient city of Gaochang near the modern village of Astana in Xinjiang province. They are very similar in age and in provenience to the Chan texts. The problems with digitization are, therefore, similar: We regularly have to deal with variant characters some of them not to be found even in the largest of dictionaries, we have many textual variants, hardly legible or even illegible passages, and large lacunae. These features are even more prevalent in the Analects manuscripts than in Buddhist texts, because the copies of the Analects were produced not by accomplished scribes but by children who underwent an elementary education in a more or less public school that must have been integrated into the monastery of Dunhuang. The Analects manuscripts were never intended to be treated as holy script that was to be preserved for future generations in a library. They look rather like the wastepaper left over from the school’s daily practice. Due to the very frequent scribal errors committed by the young students the Analects manuscripts have to be treated with great caution as witnesses of the ancient text, nevertheless as very important witnesses, because they antedate by several centuries the earliest extant printed editions on which the textus receptus is based. Moreover, the Dunhuang manuscripts represent strains of transmission that are clearly distinct from the printed editions and, therefore, of great textcritical interest. The very frequent corruptions on the manuscripts were not regarded as a deficiency of this material but, on the contrary, came into a special focus of interest. Scope of the Project The primary goal of the project was not to produce a critical edition of the Analects, as might be expected from what has been said so far, but to provide the necessary material and methods for such a task. The scope of the project was therefore: 1) To gather the relevant material and to arrange it in a most flexible way for further investigation. 2) To study the mechanisms of textual corruption, i.e. to determine the conditions under which certain corruptions occur and, where possible, to establish rules that would enable a textual critic to discern original readings from errors. For this purpose it turned out to be a great advantage that the elementary students in 152 Chung-Hwa Buddhist Journal Volume 25 (2012) Dunhuang and Astana had produced a great amount of very obvious scribal errors that could not be regarded as valid textual variants but made it possible to determine with great certainty which reading is original and which a corruption. Without a clear and reliable identification of the original readings and the errors respectively it would have been simply impossible to develop an adequate understanding of the corruption process. 3) In the third place, one of the aims of the project was to test the applicability of 1 stemmatology to Chinese manuscripts. Stemmatology has been a major and in certain cases extremely efficient and reliable text-critical tool in bible studies and classical European philology. 2 4) Finally, the project resulted in a comprehensive textual history of the Analects separating and describing the main strains of textual transmission and discerning within these strains the dependent from independent textual witnesses. The Relational Database Approach Most contributors to the present volume are concerned with the digitization of manuscripts, i.e. they produce digital representations of manuscripts that can be reproduced and read in standard Chinese characters and can be electronically searched or processed otherwise on a computer. Features of the manuscript that might be of interest but cannot be represented in standard Chinese characters like lacunae, uncertain readings, emendations and many others are usually represented by means of a mark-up language, a versatile and extendable code that has been designed to describe such features. The present project was not so much concerned with the manuscripts themselves but with their differences. From the beginning it seemed to be a detour to digitize every single textual witness of the Analects. Though individual digital representations of the many text witnesses could have been easily produced by introducing their variants into an already digitized version of the text, these variants, however, would have been to be sieved out again from these digitized versions by collating them again by means of a specialized software. The whole procedure would have been very susceptible for input and processing errors, compatibility problems etc. It seemed, therefore, more straightforward to store only the variants right from the start. To store them in a relational database allowed to maintain the data in a relatively flexible form that allowed further modifications and, most important, could be searched and sorted according to various criteria. 1 2 Cf. Simson (2002). Simson (2006). A Relational Database for Text-Critical Studies 153 Relational Databases For those not familiar with relational databases a short outline of their underlying working principles is given in this section and may be skipped by those who are well acquainted with them. Relational databases were not developed to store whole texts of an unlimited length. They are mainly designed to store and handle different types of standardized information with a well defined length. Each type of information is assigned a so called field and types of information that belong inseparably together are stored together in the same table. In a bibliographical database, for example, we would group the title of the book together with its publishing year in the same table, and for each physical book we would always have one data set with the same structure. Books Title Year of Publication Figure 1: Books table There exists, of course, other very important information about the book, as the author or the publishing house, which is, however, better stored in separate tables. The reason for this splitting up of tables is that one and the same author may write several books as well as one and the same publisher will publish many different books. By separating the books and publishers from the book titles we need to store each piece of information only once. This saves a lot of input time and avoids typing errors because we don’t have to retype the name of the author each time a book of his is entered into the database. Moreover, when the data of the author or publisher has to be updated or corrected, we have to change it only once. Ideally the database is built in such a way that it contains no redundant or contradictory data sets, this state is also called data consistency. The different tables into which the information about the books in our bibliographic database has been subdivided must be related to one another, otherwise we will never find out which book belongs to which author or publishing house. This is achieved by keys. Authors Authors AuthorKey Author‘s First Name Author‘s Surname 1 Books m AuthorKeys 1 Title Publishers Publishers PublisherKey n PublisherKey Year of Publication Figure 2: Authors, Books and Publishers tables Publishing House Place of Publication 154 Chung-Hwa Buddhist Journal Volume 25 (2012) Keys are, in most cases, integer numbers generated by the database system automatically. Each author, for example, is assigned a number and this number is stored not only in the respective author’s data set but also in the Books table to indicate the author of a book. While the AuthorKey is unique to the Authors table it can be stored in the Books table an unlimited number of times. This is called a one-to-many relation and is the mathematical representation of the fact that in real life one author may write several books. The Publishers are linked to the Books table accordingly. The relation between the Authors and the Publishers is called a many-to-many relation, and represents the real life situation where the same publishing house publishes books by various authors and the same author publishes his books in different publishing houses. Such many-to-many relations imply always the use of an intermediate or pivot table and can never be established between two tables immediately. A relational database system provides not only the keys and safeguards their consistency but it is, furthermore, able to maintain a powerful indexing system which allows one to search millions of datasets within fractions of a second. Chopping up the Analects After this short introduction to relational databases it’s time to ask how the text of the Analects covering around 15,000 Chinese characters can be stored in the limited fields of such a relational data structure. As already mentioned, it needs not to be stored there at all, because what is in the focus of interest is not the text as a whole but its variants. Collecting the variants alone, however, makes little sense without knowing to which place in the text they belong. It is, therefore, necessary to refer to text passages in an unambiguous way, and to do this it is necessary to lay a grid of coordinates over the text. The Reference System The grid follows the conventional way of referring to passages of the Lunyu, and moreover, has to be further refined as to be able to point unambiguously to short passages or even single characters within the chapters. Traditionally the Lunyu is divided into 20 books (篇) and each book is further subdivided into chapters (章). Most of these chapters contain one saying of Confucius’ and cover not more than a few dozens of characters. Both the books and the chapters have a conventional numbering generally accepted among western scholars. This reference system is taken over in the database and further refined by numbering the characters within each chapter. Because different versions of the text differ slightly in the total number of characters they contain, it is therefore necessary to stick to a certain text version to maintain the reference system unambiguous. This is an easily available electronic version of the textus receptus. This leads to the following data structure: A Relational Database for Text-Critical Studies 155 Chapter Chapters 1 Passages m ChapterKey ChapterKey ChapterKey ChapterNumber Start Text End Commentaries etc. Figure 3: Chapters and Passages tables The Chapters table contains all the chapter numbers of the Lunyu, 01.01 being the first chapter of the first book and so on. The reference text is included for practical reasons, but strictly speaking, it is but a help for the user and not an indispensable part of the database. The Passages table contains, of course, all the passages to which variants are found. Because a passage consists very often of only one character and the same character or even short phrases can possibly reappear several times within the same chapter, it is essential to store the beginning and end of the passage in the table. The wording of the passage is stored in the table too, but, strictly speaking, it could be also discarded as redundant information. No overlapping passages are allowed. Otherwise consistency in the overlapping sections of the passages would be very difficult to maintain. Some other information, like transmitted commentaries referring to the passage, is also stored in the Passages table. They are skipped here, however, to keep the focus on the essentials. The two tables are connected with one another by a one-to-many relation with the Passages table on the many side. This is the representation of the simple fact that there can be more than one passage within one and the same chapter. The Variants Having built a reference system that enables us to localize the variants within the text we can proceed to collect the variants. Of course we will have to attach a further table. It is linked to the Chapters table by a one-to-many relation, because there is always more than one variant for a certain passage. It is, of course, essential to know where this variant was found. Therefore we have to introduce another table storing all the textual witnesses of the text. Passages Chapters ChapterKey 1 m PassageKey ChapterKey Variants 1 m PassageKey WitnessKey Witnesses m 1 WitnessKey ChapterNumber Start Reading Description Text End … … … Figure 4: Chapters, Passages, Variants, Witnesses 156 Chung-Hwa Buddhist Journal Volume 25 (2012) One may ask here, why the Witnesses table is related to the Variants table by a one-tomany relation with the Variant table on the one side. One and the same witness bears usually not only a great number of variants referring to different passages, but the same variant is very often found on several witnesses. This seems to be a typical many-to-many relation. Basically this is true and it is possible to build the database in such a way. The system would thus just store each variant only once and ignore the readings that coincide with the textus receptus. This would make the data less redundant and more consistent. For text critical purposes it is, however, more convenient to have all the readings right at hand and not to have to determine first if a witness that is not listed among the variants has the same reading or no reading at all and has to be, therefore, counted as a lacuna. The Variants table stores the readings of all the textual witnesses whether they contain a deviation from the textus receptus or not. This entails a lot of redundant data but is more practical for text critical comparisons and for the presentation to the user who gets a good overview over all extant readings (see figure 5). It is, moreover, much easier to write queries with such an arrangement of data than with a mathematically more consistent one. To the user the hitherto established data structure is presented as follows. book and chapter variant type commen -taries notes passage in the textus receptus textus receptus witness variant reading Figure 5: Main window of the user interface A Relational Database for Text-Critical Studies 157 The Representation of Lacunae, Doubtful Readings and Rare Characters Two kinds of lacunae are differentiated: One type is represented by a black upright square (■). This is used when the number of lacking characters can be counted. This is the case with block prints, stone steles and Japanese manuscripts which have a regular number of characters per line. In cases where the number of missing characters cannot be determined a twisted black square ( ) is used. The reason for this differentiation is that in some cases the witnesses show variants that differ in the number of characters. In such cases we can still decide which version was followed by a certain witness if we count the missing characters. Illegible characters are represented with an upright outlined square (□) when the illegible characters can be counted and by a twisted outlined square ( ) when they can be not. In doubtful readings every single character is put in brackets ( ). Another symbol, a black circle (●), is used to represent characters that are lacking from one version while are present in others. Strictly speaking, such a symbol is unnecessary, but it is much more conspicuous than just the lack of a character. This is of some practical importance when you have to scan tens and hundreds of variants (cf. figure 5). Rare characters that are not included in the Unicode standard put a very serious difficulty to any attempt to digitize Chinese manuscripts. One rather elegant method to treat them is simply to extend the Unicode standard and to add these new characters to the Chinese font on the computer. Thus the non-standard characters can be processed by the computer without difficulties. The Lunyu-database takes, however, a different and more complicated approach. Each character beyond the Unicode standard is put in braces ({}) which contain the assumed standard form of the character. In many cases this is not enough to identify the character in question unambiguously, because there often exists more than one scribal variant of one and the same character. Therefore, a special table with all the scribal variants was attached to the Variants table where all the characters beyond the Unicode standard are stored together with their pronunciation and a description of how they were written. Example: The scribal variant 扵 for 於 is stored as { 於 } together with a description such as “扌on the left + 仒 on the right hand side” in a separate table. At first glance the user sees only the form in braces{於}but can open by double clicking on the character the auxiliary form and read the description. Though such a complicated treatment of rare characters ensues considerable programming effort to process the data correctly, it has also some advantages to offer. When comparing textual witnesses of the text in order to establish a genealogical tree of the textual witnesses one is usually not interested in orthographical variants, because they mostly give no reliable hints towards the lineage of manuscripts but can be used by 158 Chung-Hwa Buddhist Journal Volume 25 (2012) scribes at random. One is looking for so called significative errors instead, because they are, unlike most orthographical variants, irreversible and therefore can be interpreted as traces of the transmission process. By simply ignoring the braces in a query most of the ubiquitous scribal variants are ignored too and this helps a lot to focus on those variant readings that could be useful for stemmatological investigations. Classifying the Witnesses We have already several times touched upon the problem of tracing the lineage of textual witnesses. This is necessary in order to apply the stemmatological method which is able to decide between variant readings on the basis of their position on the pedigree of textual transmission and not on interpretative criterions. In order to establish the pedigree or the so called stemma it is very useful to have a reference system for the textual witnesses that makes it possible to group together witnesses that possibly belong to the same strand of transmission in order to check them for uniformity or to compare them against other branches of the pedigree. Though the result looks very logical and simple it took some time and several revisions of the data structure to establish a classifying system for the textual witnesses that is simple and efficient enough to satisfy these needs. The witnesses are classified according to three main criteria: The Main Lines of Descent The earliest historical report of the transmission of the Analects makes a distinction between three main traditions in which the Lunyu appeared in Early Han times. Namely the L – (Lu 魯論) Tradition from the ancient state of Lu, the native state of Confucius. Q – (Qi 薺論) Tradition from the ancient state of Qi. G – (Guwen 古文論語) Tradition in ancient script found in the wall of Confucius’ house during the Former Han. None of these three has survived as an entire text to our days, but some readings could be identified as belonging definitely to the Guwen tradition and some fragments of two lost chapters of the Qi tradition have been handed down to us in commentaries and encyclopedias. All transmitted text versions of the Lunyu go back to one major collation: LQ – A text based on the Lu version into which Qi readings were introduced by Zhang Yu, the Marquis of Anchang (張禹 † 5 BC), during the Former Han. A Relational Database for Text-Critical Studies 159 Another mixed version had to be introduced for the sake of a single but very important textual witness: LG – The text of a bamboo manuscript found in modern Zhengzhou dating back to the first half of the first century BC. The Commentarial Traditions It can be expected that the text of the Lunyu was transmitted together with the commentaries which were written down together with the text on the same paper scroll from the second century AD on. By distinguishing the commentarial traditions we can also separate the lines of transmission. In analogy with the main lines of descent these commentarial traditions are abbreviated with the initials of the commentator’s name. The earliest extant commentaries stem from the end of the second and the beginning of the third centuries AD: ZX – Zheng Xuan (鄭玄 127-200) HY – He Yan (何晏 190-249) The He Yan commentary was subcommented three times: HK – Huang Kan (皇侃 488-545) XB – Xing Bing (邢昺 932-1010) LDM – Lu Deming (陸德明 550?-630) Two other later commentaries are included too, because they also contain variant readings that have to be taken into consideration: HL – Han Yu (韓愈 768-824) and Li Ao (李翺 772-841) ZZ – Zhu Xi (朱熹 1130-1200) There is also a handful of manuscript fragments, one print and all the stone steles that don’t carry a commentary, though they are all clearly derived from He Yan’s version. These are abbreviated as BW for baiwen ( 文) or plain text. Types of Witnesses Apart from the commentaries a further, rather vague, distinction was introduced to account for differences between straits of transmission due to geographical diversification: dh – manuscripts from Dunhuang (敦煌) gc – manuscripts from the ancient city of Gaochang (高昌) np – for Nippon (日本), prints and manuscripts from Japan 160 Chung-Hwa Buddhist Journal Volume 25 (2012) Some other types of witnesses were introduced to distinguish groups of witnesses that show typical problems originating from different methods or circumstances of reproduction: zj – for zhujian (竹簡) or bamboo slips kb – for keben (刻本) or Chinese block prints sj – for shijing (石經) or the stone classics erected by various dynasties in front of the imperial academy Moreover, quotations of the Lunyu found with early authors are marked by the initials of the respective author. Texts having all three criteria – i.e. descent, commentator and type – in common are further differentiated by adding a consecutive number to the abbreviation. Thus a short but meaningful label for every witness is provided. descent commentary type consecutive number lqHYdh01 Figure 6: Labeling system “lqHYdh01” for example denotes the first manuscript from Dunhuang with the He Yan commentary which goes back to the collation of the Lu and the Qi versions of the Lunyu. In queries single elements of such a caption can be skipped. We can refer to the whole group of Dunhuang manuscripts as “dh” or to the subgroup of Dunhuang manuscripts with the Zheng Xuan commentary as “ZXdh” and so on. This system allows us to build very precise queries that are able to pinpoint the data we are looking for. A Relational Database for Text-Critical Studies 161 Represented in database terms, this results in a four table structure: Descents 1 Witnesses WitnessKey DescentKey CommentaryKey TypeKey DescentKey Descent n n 1 Commentaries CommentaryKey n Commentary ConsecutiveNo Description … 1 Types TypeKey Type Figure 7: Witnesses with related Descents, Commentaries, Types The Witnesses table is also used to store further information about the witness like a description of its physical appearance or which parts of the text are covered by the witness. Organizing the data in such a way provides not only a unique label for each textual witness which can be used to refer to the witness for example in an apparatus criticus, but it also makes it possible to bundle or separate single strands of transmission and to compare them with one another, which is, of course, a necessary procedure when establishing a genealogical tree. Classifying the Variants Apart from the stemmatological investigation of the Lunyu tradition one major aim of the project was to provide a tool that could be used for the study of textual corruption. Understanding the mechanisms of corruption is a precondition for what every textual critic is aiming at, namely emendation. To understand these mechanisms it is necessary to sort out certain types of variants and to study them in their specific contexts in order to discover similarities that could be the reason for textual corruption or regularities that could help us to establish rules of emendation. What is needed here is a versatile and handy sieve for textual variants. Because the computer is but a stupid machine that processes mechanically binary data without even the slightest idea of what they are standing for it cannot be expected to sort the variants in a useful way, unless we implement some of our own intelligence into the machine. This is achieved by attaching 162 Chung-Hwa Buddhist Journal Volume 25 (2012) at least one label to each passage. This label indicates the type of variant that is found among the various witnesses of the text that cover the passage in question. VariantTypeLink Passages PassageKey 1 m PassageKey VariantTypeKey ChapterKey m Start 1 VariantTypes VariantTypeKey Description End Figure 8: Passages, TypeLink, VariantTypes As can be seen from figure 8, the relation between the passages and the variant types is many-to-many. This is because a potentially unlimited number of variants can occur for one and the same passage. Moreover, one and the same variant can be classified in more than one way for different purposes. On the other side of the many-to-many relation the same type of variant can occur in different passages. The intermediate VariantTypeLink table is necessary to link the Passages with the VariantTypes in a many-to-many relation, because only one-to-one and one-to-many relations are allowed between two tables. What types of variants are distinguished then? – At first a very formal classification system for the variants was devised that does not contain any interpretative criteria, i.e. it does not suggest which variants are to be interpreted as errors and which as original readings. Variants quantitative doublings 實詞 others visual similarity 虛詞 transpositions qualitative phonetic similarity semantic similarity kinetic similarity proper names more subcategories and non-formal criteria Figure 9: Hierarchy of variant types The main distinction is between quantitative and qualitative variants and transpositions. Quantitative means that there are differences in the number of characters between the variant readings of a certain passage. These quantitative variants are subdivided into A Relational Database for Text-Critical Studies 163 doublings of single or more characters and others. These others can consist in nonrepetitive differences of only one character or of whole phrases. The differences in single characters are further differentiated into those concerning particles and those concerning other words and so on. The qualitative variants, on the other hand, describe variant readings differing not in the number of characters but in the characters themselves. This category comprises a long series of subcategories of which I want to mention only the major ones: Variants between characters that have a phonetic, visual, logical or kinetic similarity. As the investigation proceeds the need for more differentiation grows constantly and new categories and subcategories can be easily introduced into the system. Subcategories are usually defined in such a way that the label of a generic category is extended. The generic category called “quantitative” (variants) is extended into two subcategories “quantitative/doubling” and “quantitative/others”. The former is extended into “quantitative/doubling/character” and “quantitative/doubling/phrase”. The former can be further differentiated into “quantitative/ doubling/character/particle”, “quantitative/doubling/character/proper name” and so on. In queries wildcards can be used to refer to a generic category or to a whole group of subcategories. For example, we can search the database only for “*particle” variants and we will get all variants that concern particles, be they quantitative, qualitative or transpositions. Based on the results of this query we can make a statistic analysis of the types of variants that occur with particles and so on. The whole tagging system is very versatile and easily expandable in order to cope with new questions. Evaluation of the System Capabilities The Lunyu is probably the best evidenced secular text in ancient Chinese literature. This means that we have not only a large amount of early textual witnesses at our disposal, but that these witnesses have also a great diversity in age, geographical distribution and commentarial tradition. Moreover, most of the many thousands of variant readings that could be collected in the database are obvious errors. To put it the other way around: In most cases we know with great certainty what the correct reading has to be. We have therefore a very rich source of material that allows us to study under which conditions errors occur and what shape they take. Such a study would provide an empirical data basis for a methodology of textual criticism of ancient Chinese texts. Such an empirically based methodology would be of great importance for every one dealing with ancient Chinese texts. Scholars rely on traditional Chinese emendation strategies and concepts of textual corruption instead. Some of their assumptions cannot be corroborated by empirical data at all or seem at least extremely farfetched when tested against real life material. 164 Chung-Hwa Buddhist Journal Volume 25 (2012) 3 An empirically grounded methodology, though not yet fully materialized, was a major goal of the database project. The afore mentioned labeling system of variant readings is the main device for such investigations. It allows us to extract variants of a certain type from the database in order to provide the necessary data for the study of textual corruption. We could focus for example on variants that show phonetic similarities and investigate which degree of phonetic similarity can occur. We can even easily confine our selection of phonetic variants to those coming from Dunhuang in order to investigate the local dialect spoken there in the middle ages. The whole classification system of variants can be adopted very easily to satisfy different needs. The combination of the two classification systems for textual witnesses and variant categories respectively proves also to be a very powerful tool when it comes to trace the lines of transmission. The database can answer questions like “What variants have the Dunhuang manuscripts in common with the Japanese tradition that are not shared by other witnesses?” The result of such a query can be further confined to potentially significative variants by applying the variant categories to it. To sieve out and to evaluate such variants is the essential task of the stemmatological approach in textual criticism. Strictly speaking, the database does not provide us with the desired variants themselves, but sieves out all the passages to which a certain type of variants can be found in a certain group of witnesses. For each passage we get therefore a long list of variant readings that we have to sieve again manually to get really at the variants we are looking for. This may look somewhat painstaking at first glance, but variant readings exist only in contrast with other readings and have to be viewed together with them in order to be understood as variants. We need, however, to accept that there may be several distinct variants to the same passage and that the result of a query may contain also some undesired material that we have to sort out manually. Though more than 130 witnesses were incorporated into the database this has never become really a problem. To sum up we can say that the relational organization of the variants provides a capable tool for the study of textual corruption and stemmatological investigations. It stores also all necessary data for a critical edition. The text critical work can be even supported by adding notes and commentaries to the passages. Problems At this point it has first to be mentioned that the Lunyu database was never intended to be published or to be used by other people than the author himself. It has always remained a never ending construction site that was modified gradually in order to cope with uprising 3 Partial results will be published in Simson (2013). A Relational Database for Text-Critical Studies 165 questions and a shifting focus of interest. As with all software, a long list of known problems has to be added: ● A major inconveniency when processing Chinese characters with computers is that computers do not provide a useful sorting order for Chinese characters. At best they can be arranged according to their position in the Unicode code table which roughly follows the stroke number of the characters. An arrangement according to pronunciation or radical for example would be much more useful for most practical purposes. This is, however, not a problem of the database approach but of computing in general. ● As some readers may have noticed already, the database makes also use of markups. The handling of rare, doubtful or illegible characters with their brackets and braces involves a special handling of such mark-ups that has to be implemented into the database. This means a lot of programming effort and a slowdown of the whole database because each time characters are processed the character string has to be checked for mark-ups. Moreover, the routines handling the mark-ups are compiled at run-time and this makes them much slower than precompiled programming code. ● Most variants of the text cover only one character and such one character variants can be handled easily by the system. Variants spanning over whole phrases are sometimes more cumbersome, especially when there is more than one variant to the passage and each has developed its own subvariants. The representation in the database becomes rather intricate and the results of queries contain a lot of redundant data that have to be sorted out manually. This has never become a real problem in the project, but the system would have serious difficulties to tackle texts that regularly differ in longer passages of several sentences in length. ● A special difficulty was the handling of lacunae. They were treated like a special sort of variants and required an even larger amount of programming than the mark-ups mentioned before. ● As already mentioned, the database is organized in such a way that the variant readings for a certain passage are stored for each witness separately. This involves a lot of redundant data, because the same variant reading is usually found on several witnesses. Moreover, it takes a lot of programming and computing time to maintain consistency among this redundant data, when manipulating them by introducing new witnesses or passages. A more consistent data structure should be, therefore, considered for the further development of the database. 166 Chung-Hwa Buddhist Journal Volume 25 (2012) References Maas, Paul. 1950. Textkritik (2.verbesserte Auflage). Leipzig: B.G. Teubner Verlagsgesellschaft. Pasquali, Giorgio. 1988. Storia Della Tradizione e Critica del Testo. Firenze: Casa Editrice Le Lettere. Reenen, Pieter van; Mulken, Margot van, ed. 1996. Studies in Stemmatology. Amsterdam/Philadelphia: John Benjamins Publishing Company. Reenen, Pieter van; Hollander, August den; Mulken, Margot van, ed. 2004. Studies in Stemmatology II; Amsterdam/Philadelphia: John Benjamins Publishing Company. Simson, Wojciech Jan. Applying Stemmatology to Chinese Textual Traditions. Textual Scholarship in Chinese Studies. Ed. Vogelsang, Kai. Papers from the Munich Conference 2000; Asiatische Studien/Études Asiatiques 2002/3. 587–608. Simson, Wojciech Jan. 2006. Die Geschichte der Aussprüche des Konfuzius. Bern: Peter Lang. Simson, Wojciech Jan. 2013 (forthcoming). Contaminations in Chinese Manuscripts. The Idea of Writing – Lapses, Glitches and Blunders in Writing Systems. Ed. Behr, Wolfgang; Voogt, ALex de; Leiden: E. J. Brill.
Chung-Hwa Buddhist Journal (2012, 25:167-194) Taipei: Chung-Hwa Institute of Buddhist Studies 中華 學學報第 十五期 頁 167-194 (民國一百零一年),臺 ISSN:1017-7132 中華 學研究所 Digital Editions of Premodern Chinese Texts: Methods and Problems – Exemplified Using the 1 Daozang Jiyao 道藏輯要 Christian Wittern Kyoto University Abstract Digital editions do have a great potential for new avenues of research, but they also pose vexing research questions that have to be resolved adequately in order to make the resulting edition useful in the long run. One of the many differences between printed editions of texts and digital editions is the open-endedness of the latter, which means that it can be done incrementally and updated without incurring substantial expenses. The medium of digital editions requires the creator to make many assumptions about the texts explicit and record them in a way that can be processed automatically. This is a new concept, which seems foreign to the agenda of a scholar whose ultimate aim is to engage with the text. This article demonstrates that what seems like a detour is actually advancing the understanding of the text and the need objectify a text in this gives access to new dimensions of a text. It then goes on to provide details of a conceptual model for describing a premodern text digitally that has been developed working on a digital edition of the early Qing Daoist collection Daozang jiyao. Keywords: Text Encoding, Digital Editions, Character Encoding, XML, Doaist Studies 1 I would like to thank the anonymous reviewers for this journal for their very helpful suggestions for clarifications and general improvement of the article. 168 Chung-Hwa Buddhist Journal Volume 25 (2012) 前現 漢語文本的數位版本 方法與問題— 道藏輯要 為例 維習安 京都大學 摘要 數位版本對於研究的新方向有著極大的潛力,但它們也引起 人困擾的研究問題, 而這些問題必須適當地解決, 使得 版本的成 對長期來說是有所助益的 紙本 與數位版本的許多差異之一是後者的開放性,也就是能夠 需要實質上的花費而增 加或更新 使用數位版本工 需要建立者建構許多有關文本的清晰假設,且能夠 自動執行的方式紀錄 這是一個新的概念,似乎對目標為參與文本的學者之預設立 場 同 篇文章說明那些看起來像是繞道而行,但 實上卻是增進文本理解與客 觀 文本需求的情形,依 開啟進入了解文本的新面向 同時並詳盡地提供說明數 位 前現 文本的概念模式,而 工 是建立在清初的 道藏輯要 之數位版本上 而發展的 關鍵詞 文本編碼 數位版本 字元編碼 XML 道教研究 Digital Editions of Premodern Chinese Texts 169 Introduction Text transmitted on traditional written surfaces is immediately available and transparent to the reader, without any additional steps involved. In contrast to this, any text stored digitally, in whatever format, has to be rendered to the screen (or paper) by correctly interpreting (decoding) the values of 0 and 1 that have been used to prepare (encode) the text. Without this correct interpretation, the result of the decoding will be just illegible garbage that does not make any sense whatsoever. In order to make this decoding successful, the model, according to which the encoding was done, has to be known at the time of the decoding. Even more importantly, as is true for any digital format, the encoding of text into digital format can not be done without a model of the text. The activity of developing and enhancing a model of the text thus becomes a crucial, foundational activity, laying the groundwork for the actual digitization of texts themselves. The first fundamental decision that has to be made when devising such a model is to whether to treat the text either just as a series of symbols or as a two-dimensional array of spots of different color spread out over a flat surface. Descendants of the first type of model would lead to a transcribed version of a text (an example of a page is shown in Figure 1), while those of the second type of model would be some kind of facsimile representation of the text, these will be called digital facsimile (see Figure 2). None of these representations is intrinsically superior to the other; they do in fact very nicely complement each other. Figure 1: An example of a transcribed text Figure 2: An example of a digital facsimile 170 Chung-Hwa Buddhist Journal Volume 25 (2012) If a text is to be used for information retrieval or any other purpose that requires access to its symbolic content, like for example, text analysis or even the creation of a new version with a different layout, it has to be encoded in a way that somehow represents the symbols used to write the text. This requires a reading of the text and is thus always also an interpretation of the text. While the transcription of a text as a series of symbols is comparatively straightforward in most alphabetical languages, the logographic languages of East-Asia pose specific problems, since exactly this transcription is not a given, but is open to various interpretations and in fact has to be considered part of the research question. It thus needs a model that allows to make these interpretations transparent instead of hiding them in the transcription process, which takes place before the text even gets to the reader. This paper will discuss models used for such a representation and proposes a new working model specific for premodern Chinese text. It might be tempting to try to avoid the whole issue of legacy character encoding and try to come up with a completely different way to encode characters. One such attempt is 2 the CHISE project , which tries to build a whole ontology of characters and character information. In the model discussed here, the encoding is based on Unicode, but an intermediate layer of dereference is introduced as explained below. In the practice of transcribing primary sources, there is an additional complication through the fact that there might be more than one witness for a text and therefore a collation and analysis of textual variants in other text witnesses might be required. The model will have to be able to account for this. One last requirement is that it has to be possible to establish and maintain a normalized version of the text in addition to establishing a copy text faithful to the original. Preliminaries and Prerequisites Before starting to describe the proposed new model, some preliminaries and basic assumptions have to be discussed. This involves a very brief description of the model most widely used for transcribing primary sources, but will also involve a brief discussion of the writing system for Chinese and how its basic properties have been reflected in today's most widely used character encoding, Unicode. 2 See the CHISE (Character Information Service Environment) project. (http://www.kanji.zinbun.kyotou.ac.jp/projects/chise/) Digital Editions of Premodern Chinese Texts 171 The TEI/XML Text Model Text encoding according to the recommendations of the Text Encoding Initiative (TEI) is today the most widely used format for the creation and processing of texts for research in 3 the Humanities. In XML, which is the technical basis for the TEI text format, a text is basically seen as a hierarchy of textual content objects, expressed as a hierarchy of XML elements and 4 attributes , this is the so-called OHCO (Ordered Hierarchy of Content Objects) view of a text. While this provides a powerful model to deal with many aspects of a text and allows the definition of sophisticated vocabularies, there are a few problems that are hard to solve using this model. One of these problems is that digital texts do in fact require different hierarchical views, depending on the purpose of the creation and the intended processing of the text. There are several ways the TEI attempts to solve this problem, one of them being considering one of the hierarchies in a document as the primary hierarchy (Guidelines, 20.3 Fragmentation and Reconstitution of Virtual Elements). Textual features that do not nest cleanly into this hierarchy are then arbitrarily split into two (or more) parts. And then introducing additional notions, that can be used for example to virtually join elements together, which have been arbitrarily split within the primary hierarchy. Another way to overcome this problem is by using elements without text content to indicate points in a text, at which features of the 'other' hierarchy starts. A classic example for this is the use of milestones in TEI. Since the main hierarchy of a TEI document is constructed using elements that describe the semantic content of the document (e.g. 5 <body>, <div>, <p>), elements that hold the content of pages and lines can not exist in the same hierarchy. Pages (and columns and lines; these are all generalized into the concept of 'milestones') are thus only indicated by marking the point in the text flow where a new page begins. This makes it possible to work with both hierarchies at the same time, but there is a tradeoff: It prioritizes one hierarchy, thus making it considerably more difficult to retrieve the content of a page, as opposed to the content of, e.g. a paragraph. 3 4 5 It goes without saying that TEI can be used to encode premodern Chinese texts, which is amply demonstrated for example by the texts produced by the Chinese Buddhist Electronic Text Association (CBETA), whose latest release had to be put on a DVD, since even in compressed form, a CD-ROM could not hold the amount of material anymore. The earliest of these texts are nearly 2000 years old. See for example Renear&Mylonas&Durand (1996). Earlier versions of the TEI contained elements <page>, <col> and <line> etc, which could be used to construct a concurrent hierarchy that reflects how the text was laid down on the text bearing surface, but these have been removed in the latest release, P5. 172 Chung-Hwa Buddhist Journal Volume 25 (2012) There is also another difficulty of a more practical nature, that is, through what procedure the encoded text is created. If text encoding is seen as a process of gaining insight and enhancing the understanding of a text, this will be a circular process that adds more information in several passes through the text. What this means is that the sophistication of the TEI model, while serving the needs of text encoders well in providing the expressive power to encode the features observed in a text, it puts an enormous burden on text encoders, wishing to employ the system for their texts. This seems to be especially true for premodern Chinese texts, where not only the writing system poses additional difficulties, but there is also usually no indication of paragraph or sentence boundaries, punctuation; the only given is the text as it is divided into 'scrolls', pages and lines. For the purpose of this model then, the main hierarchy in the document is that of the physical representation of the text on the text bearing surface of the witness that is serving as the source for digitization. As the encoding of the text progresses, markers of the points of change in the content hierarchy are inserted, thus gradually bringing this other hierarchy into existence. In some ways this is thus an inversion of the relationship between these hierarchies as they exist in the TEI model. The following discussion will be targeted at requirements of Chinese text and no claims are made about usefulness in other areas. The model described in this paper is not intended as a replacement for the TEI text model, but rather as a heuristic, methodological model that allows the creation of a sophisticated text, most likely as the childhood of a text that will prepare it to spend its adult life in a TEI environment. Writing System The main difficulty with encoding Chinese texts lies in the writing system. Over thousands of years, the script used to write Chinese texts has evolved and has seen many changes in conventions, styles and character usage. The result is thus a rich and deep cultural heritage, which engraves in the writing system memories of a people that values history and memory in a way few others do, resulting in a writing system that contains an 6 open ended, unknown number of distinct characters . Since the beginning of the 20th century, there have been attempts at dealing with this problem from a practical side, by limiting the use of characters in daily life and thus making it possible for the first time to enable more than a tiny elite to acquire enough knowledge of the writing system to participate in a modern society based on the written word, be it application forms, contracts, newspapers or novels. 6 The largest dictionary known to this writer is the Zhonghua zihai, which contains 85000 characters, but the difficulty here is not really the number of distinct characters, but the question what has to be seen as a character as opposed to a mere variant of another character. We will return to this question. Digital Editions of Premodern Chinese Texts 173 The last incarnation of the Unicode character set provides almost 75000 Chinese 7 characters . In this case also the definition of what has to be considered a separate character changed significantly during the process of defining these, which has been 8 going on more than 20 years . Although there are now assigned code-points for all characters in daily use and even most rare characters that appear in historical sources, there are still problems with the character encoding that are intrinsic to the way it is defined and evolved over the years of 9 its development: unwanted unification and unwanted separation of characters . Unwanted unification: Especially in the early phase of the development, when there was only insufficient space set aside and processing memory limited, efforts were made to unify similarly looking character shapes into one code-point value. This makes it impossible to refer to just one of the character shapes as opposed to 10 the other character shapes also defined with a given code-point in a universal way. ● Unwanted separation: On the other hand, there are certain code-points that encode characters of a slightly shape separately; the most famous being 説 (U+8AAC) and 說 (U+8AAA); the character shapes in many fonts do indeed look identical for characters in this group, thus making it extremely difficult to consistently only 11 using one of them and avoiding the unwanted other pairs. Another reason for this is the 'code separation rule' which meant that characters already encoded separately in one of the character encodings that formed the source of Unicode, these had to be treated separately. 12 ● Inconsistencies, duplications, wrong assignments: do also exist, but these are not by design and much less disrupting. ● While these are annoying problems when dealing with Unicode, it is clear that the advantages of using a universal encoding for all texts far outnumber the problems mentioned here. The strategy adopted here is thus not the development or use of a 7 8 With the release of Unicode 6.1 the total count of CJK characters is 74617. Development of Unicode started with a document (http://www.unicode.org/history/unicode88. pdf) by Joe Becker of Xerox corporation, published in August 1988. 9 It would be more precise to talk about glyphs here, but what it comes down to in digital text is code-points. 10 In practice, this can be done by specifying one specific font to be used to represent a character. Modern font technology also allows fonts to contain several character shapes for one codepoint and allow a rendering program to select them as needed. There is however no standardized way to do so across applications. 11 In practice, the only way to deal with this is to preprocess a document with a table that changes the unwanted member of such a pair into the desired one. 12 See Kawabata (2006) for some examples. 174 Chung-Hwa Buddhist Journal Volume 25 (2012) different encoding system, but rather a strategy to deal with these problems within and on top of Unicode. This will be achieved through a character database and the definition of additional private characters where necessary. The Process of Encoding a Character It might be useful here to look a bit more carefully into what exactly happens in the process of encoding a character, that is transcribing a character from a source text to its digital equivalent. In an encoded character set, each character that has been assigned to a code-point can be seen as a kind of platonic, ideal character that stands for any number of real-world, existing character shapes (glyphs), as we see them on a text bearing surface. However, it is impossible to design such an encoded character set in a way that each platonic character is only represented once, since it is in many cases impossible to unambiguously assign one specific glyph shape to only one character, since it is not only the shape, but also meaning and sound that contribute to this assignment and all of these might be dependent on the specifics of area and era as additional conditionals. In the case of the Unicode/ISO 10646 character set, this has led to a development where more and more glyphs that had already been represented as members of the set of glyphs represented by a given character, are now also encoded separately. The result is thus that a given glyph can be logically represented in several sets. In such a situation, the process of assigning a character code to a given glyph has to look for the set of glyphs that as a whole most closely resemble the given glyph, or, to put it differently, to look for the most specific representation of a given character. If that can not be found, there are in principle two choices: To add this glyph (G) to an existing set, encoded by an existing character code (C) and thus in fact extending the set to accommodate this new glyph. ● To add a new character code (N) to the system, with this glyph as the most representative of the set of glyphs represented by this character code. ● The first option makes the assumption G has been recognized as in principle belonging to the set of glyphs represented by C, which assumes knowledge of G and of the set of allowed representatives for C. Since the set of allowed representatives for C is an open set, which is not defined exhaustively in the relevant standards, but only by giving a sample of such representatives, this decision has to be made case by case and can not be 13 generalized . The second option does not require any knowledge of the character beyond 13 Text encoding is in this respect more of an art than an exact science in that many decisions depend on the encoder. This can and should be made less arbitrary than this sounds by recognizing this fact and define a policy as to what exactly should the set of represented glyphs be. The first step to this could be for example to use a specific reference font and define what kinds of deviation from the glyphs used in this font are allowable. Such Digital Editions of Premodern Chinese Texts 175 this glyph and is the only one available if nothing more is known about this character. The downside is of course that this new character is not integrated into the network of implicit knowledge that is already in the system, through system level character properties and/or a database. It would therefore be wise to provide also a way to add such information together with the character. Figure 3: The semantic fields around the character 保 according to the HYDCD Given this situation, information about the relationship between the characters in the character set has to be maintained. Different types of such relations have to be distinguished. On the one hand, characters can be seen as mere variants of each other, serving essentially as a replacement for each other. More often, however, such a relationship covers only part of the semantic field of a given character, which makes it necessary to allow for a character to belong to different groups of variant characters, depending on 14 which aspect of its meaning is called upon . In other cases, the relationship might be due to a phonetic replacement or even error. Dictionaries and commentaries have for a long time collected such information, which has to be taken into account. This type of relationship could be called a generic relationship, which is true for all characters in this set, thus it is a relationship (to use a technical term) on the level of the class of characters, not the instances. definitions should go into the project documentation. 14 The historic dimension of the development of the writing system towards more specific characters is also playing a role here; what had been written with the same character in earlier texts might be delegated to different characters later on. 176 Chung-Hwa Buddhist Journal Volume 25 (2012) On the other hand, out of all the possible relationships that exist on a class level, or sometimes even in addition to these, for every instance of a character that is not identical with the character in modern usage, the corresponding modern character form needs to be established. While this might not seem necessary for a pure diplomatic transcription of a text, it is necessary to do proper searches and other text analytic tasks. Without this the value of a transcribed version is not much more than a digital facsimile. Between these two types of relationships, the one completely generic and the other completely tied to the specific instance, it might well be useful to generalize from the instance-specific relationships to relationships that are relevant for the whole text, text corpus or text collection, thus forming a third type of relationship (of which could exist a number of sub-types depending on the scope). A New Model for Encoding Chinese Primary Sources In this paper, a new model is presented, together with a description of an implementation that acts on the model. The model again is described in two parts that are complementing each other, that is (1) a representation of the text and (2) a database of characters. Representation of the Text With respect to the character encoding, the main problem for premodern Chinese texts is that there is a friction between the modern usage, as reflected in the encoding systems available for digital texts, and the characters as they are used in a source text. In order to learn more about the writing system, and better understand the development of character forms and usages, one ideally should not have to rely on modern encoding systems for premodern texts, since they tend to hide exactly the differences that are the object of such a study, but if we are to transcribe the texts digitally, there is in fact hardly another way then to use such a modern encoding system. The only realistic way out is to give up on using character encoding as the only trace of the characters from the written source. This is however not easily achieved, since due to the way text encoding is done at the moment, the character encoding is a given, on which the layer of markup is built. Although there is some support, for example in TEI P5 to reach down into the encoding layer and introduce additional characters through markup, this mechanism is not flexible enough for cases, where the research questions involve investigation of the writing system itself. The reason character encoding is performed is that this opens the way to computationally simply deal with the symbols encoded and abstract from the idiosyncrasics of the actual written characters. In alphabetical languages, this is very seldom problematic and even for logographic languages, this is only problematic where fundamental questions about the characters themselves need to be answered. On the other side, if character encoding does not provide the stable framework on which the following interpretative layers can be built, something else has to take its place. Digital Editions of Premodern Chinese Texts 177 The fundamental difference with respect to character encoding in the model proposed here is that first and foremost the location of a position in the text is recorded. Only in a second step is this position than associated with an encoded character that might 15 provisionally serve to represent it. The model proposed here takes one representative edition of a text as a reference edition for digital encoding. This text is seen for the purpose of this model as a sequence of pages (or scrolls or other writing surfaces), which contain a sequence of lines, and the lines again containing a sequence of characters. While there is a provisional transcription into encoded characters, these encoded characters are considered to be preliminary and serve mainly as placeholders to mark slots for the positions in the text they fill. The characters used might be replaced by others or further annotated and linked to. The encoding is considered to be mainly positional (that is, identifying a character at a specific position in a text), rather than mainly symbolic (i.e. identifying the symbol that will be used for all such characters in this text). In addition to the transcribed text of the reference edition, there are additional layers of text, that might contain characters as they are found on other witnesses of the text, or for example a regularized form that reflects modern usage. These layers are considered to be linked positionally through the sequential numbers of the pages, lines and characters (See Figure 5). The number of layers is unlimited, but for practical purposes they are assigned to different categories: ● The new edition to be created ● The reference edition ● Editions used for collation ● Other editions 16 By convention any character position left empty will be filled by the character in the reference edition, which has to be present for all characters. In addition to these transcribed layers, a digital facsimile of the reference edition is linked to each page. If necessary, a cutout from this digital can be linked to the characters on this position, thus providing a connection between these two different representations of the text. The model also allows for the possibility of linking a digital facsimile of other editions (with possible 17 different page arrangement) to the reference edition. 15 This idea is of course not new, it has been used implicitly in previous work, for example Yasuoka (2005). 16 This category includes for example other electronic transcriptions of the text that are linked to the reference edition to improve the proofreading, but are not in themselves witnesses of the text. 17 This can become rather complex and may in practice be difficult to realize if there are big differences in the arrangement of text in different sources. 178 Chung-Hwa Buddhist Journal Volume 25 (2012) Figure 4 Representation of the different editions Digital Editions of Premodern Chinese Texts 179 Figure 5: Attempt to visualize the connection between two layers. The provisional encoding is by no means the only or final encoding that should be used, its main purpose is simply to occupy the position and show a representative that might stand for the character used at that position. Closer examination of this and other similar characters might bring up other possible candidates. The transcription of the text is not seen just as a precondition for dealing computationally with the text, but is in itself a means to acquire better understanding of the writing system used to write the text and ultimately the content of the text. To gain an increasingly detailed understanding of the text, a kind of hermeneutical circle has to be performed, consisting of several steps to be performed in sequence. ● ● ● For every character that seems doubtful, unintelligible or a non-standard representation, the word intended by this character needs to be established. This can be done by ● Looking at the context of the occurrence of this character and compare it with other, similar contexts ● Looking at characters that are similar, either in visual, phonetical or semantic respects The result of this research gets registered into the database and thus provides context for future lookups. 180 Chung-Hwa Buddhist Journal Volume 25 (2012) ● ● Information about context and registered variants becomes only available as the processing of the text progresses, therefore several loops of this activity have to be performed. Like a hermeneutical circle, this activity is in principle open ended and holds the potential for ever new discoveries and observations. Through the performing of several loops of proofreading and digesting of different representations of characters, a new understanding of the text and the conventions and idiosyncrasy used to write it is gained. Quite separate from these layers of textual representation there is an interpretative layer that might be thought to hover over the positional layer; in this layer connections or disconnections between similar or different characters are established and investigations of characters and their contexts is conducted. Character Database The model developed here relies on a database of characters. In this database, relations between characters, their occurrences within the text and among groups of characters are registered. The groupings of the characters can be organized according to different properties of the characters, thus allowing the researcher to built sets of characters similar in its phonetic, semantic or visual properties. Since the relation to the occurrence of the character in the text is maintained, these relations are never thought to be abstract and generic, but are specific to the text under investigation. Information in the database is held in two parts. One is holding generalized relations, as they are recorded in dictionaries, here the table of variant characters of the Hanyu dacidian 漢語大詞 (HYDZD) and the Dictionary of Variant Characters compiled by the Taiwanese Ministry of Eductions are used, these are the most comprehensive tables of this kind. This serves as a backdrop for a specific database, which records the relations as they are observed in the text. This information is thus specific to the text it was developed with and the records of the database are always tied to the context the information was abstracted from. Nevertheless, as the number of texts processed with this system increases, and information held for these texts in the databases is aggregated, it is hoped that more general information on the Chinese writing system and its development can be 18 gained, which are not available at the moment. 18 It should be noted here, that the development of encoded character sets by necessity predates the creation of textual material using these character sets. This precludes then of course any statistical base that might be used as a guidance in developing such encoded character sets. The results of work using systems such as the one developed here could serve as a guidance for the future development of such character sets. Digital Editions of Premodern Chinese Texts 181 The database connects the specific instance of the character, which is registered not with a character code, but with the location of the character within the text, with a generic identifier that is, an encoded representation of the character, if such a representation is available in the encoded character set. If no such representation is available, a private character will be created in order to allow computational processing and representation of this character. In such cases, structural information about the character, as well as an image cut from the digital facsimile is added to the record for this character. If a suitable representation can be found within the almost 75000 character codes registered in Unicode, there might still be slight differences in appearance that can't be accounted for using the standard glyphs present in the operating system of the used computer. In such cases, and whenever a doubt about this character arises, an image cut from the facsimile representation of the text will be added to the record. The database can thus also be seen as connecting the digital facsimile representation and the transcribed representation of the text. The Daozang Jiyao and its Editing Environment The Daozang Jiyao After the Daoist Canon of the Ming period ( 統道藏 Zhengtong Daozang, 1445), the Daozang jiyao (Essentials of the Daoist Canon) is the most important collection of Daoist texts. It is by far the largest anthology of premodern Daoist texts and an indispensable source for research on Daoism in the Ming and Qing period (fourteenth to late nineteenth century). Although the collection is chiefly derived from the Ming Canon, it contains more than 100 texts that are not included there and thus is undoubtedly the most valuable collection of Daoist literature of the late imperial period. It features texts on neidan or inner alchemy, cosmology, philosophy, ritual, precepts, commentaries on Buddhist, Confucian and Daoist classics, hagiographic, topographic, epigraphic and literary works, and much else. At the Institute for Research in Humanities in Kyoto, a research project on the DZJY is being conducted. This was started by the late Monica Esposito with the help of Mugitani Kunio and Christian Wittern, with the aim to investigate the origin of the collection, but also create a new critical electronic edition and develop the tools for 19 exploring all aspects of its content . 19 More on the history of the Daozang jiyao and the projects sponsored by the Chiang-ching Kuo Foundation (CCK) and the Japanese Society for the Promotion of Science (JSPS) can be found at http://www.daozangjiyao.org. Due to the untimely passing away of Dr. Monica Esposito in March 2011, the project has seen a reassessment and will be continued under the leadership of Lai Chi-tim and in close collaboration with the Centre of Daoist Studies at the Chinese University in Hong Kong. 182 Chung-Hwa Buddhist Journal Volume 25 (2012) The genesis of this collection is still hardly explored. According to the most common account, often presented even in recent articles and primarily based on Zhao 20 Zongcheng (1995)’s hypothesis , it is believed that there are at least three different editions of the Daozang jiyao: ● ● ● by 彭定求 Peng Dingqiu (1645-1719) compiled around 1700 and containing 200 titles from the Ming Canon; by 蔣元庭 Jiang Yuanting (予蒲 Yupu, 1755-1819), who reportedly added 79 texts not contained in the Ming Canon (Weng Dujian, 1935) during the Jiaqing era (1796-1820); by 賀龍驤 He Longxiang and 彭瀚然 Peng Hanran published in 1906 at the 仙菴 Erxian'an of Chengdu (Sichuan) under the name of Chongkan Daozang jiyao 重 刊 道 藏 輯 要 (New Edition of the Essentials of the Daoist Canon), and (according to this hypothesis) containing a total of 319 titles. However as early in 1955, 吉岡義豊 Yoshioka Yoshitoyo in his work entitled Dōkyō kyōten shiron 道教教 史論 (Historical Studies on Daoist Scriptures) cast doubt on the belief and affirmed that there were only two editions of the Daozang jiyao (number 2 and number 3). One avenue that might provide new light in this controversy is the establishing of a stemma of existing textual witnesses. This should provide an answer to this question. However, a close reading and comparing of the existing witnesses is required, as well as a method to computationally compare these versions and calculate the respective closeness of individual witnesses. Editing Environment The editing environment has been realized as a Web application that can be used from any compatible browser, anywhere on the Internet. One of the reasons for choosing this platform was to be able to allow collaborative editing in a distributed environment, another was the hope to use this interface either directly, or at least most of it for a webbased publication of the texts. Mapping to a Relational Database A relational database management system (in this case, PostgreSQL 8.3) has been used to hold the data, while the user interface was developed with the Python-based web application framework Django (post 1.0 SVN version) and the Javascript framework ExtJS. In Django terms, there are two applications, 'textcoll' for holding the textual 20 Zhao (1995). Digital Editions of Premodern Chinese Texts 183 content and 'chardb' for the character database; these two are glued together with a frontend called 'md'. One of the difficult tasks at the outset was to model the text collection, 21 which has been done in the following tables : Tablename Work Edition TextPage TextLine TextChar Kind of Information Title of the work, date and other information Information about the edition, editor, publication details Page number, graphical image of the page, serial number of the first character, number of characters Line number, serial number of the first character, number of characters Serial number of character, associated extra 22 information , Unicode value of the character, serial number of previous and next character Relations Work Edition, TextChar TextPage, TextChar TextLine, Edition, TextChar, Interpunction As can be seen, there is in principle a hierarchical relationship from the Work through Edition, TextPage and TextLine down to the TextChar table, which holds all the information related to the character at this position. It goes without saying that this incurs a tremendous overhead for the storage and processing of a simple text, but it should be kept in mind that this is the equivalent to a raster electron microscope, which tries to study the atomic units of a text, so there has to be some effort for isolating and handling these atomic units. There are some anomalies in the hierarchy, which are for the convenience of processing, which are that through the serial numbers of the first character on pages and lines the TextPage and TextLine tables are linked also to the TextChar table, which also has some internal links to the previous and following character position. Any character that spans more than one position in the grid, as well as talismans, outlines of movements in rituals or similar material that falls out of this simplistic model for the layout of a text is treated as a graphic outside of the textual flow. In addition to these tables representing the text and allowing the modeling of its digital representation, there are a few other tables necessary for holding information about the text structure and content, as follows: 21 22 Only tables and information relevant to this discussion are shown, implementation details are ignored to keep the table simple. Information about interpunction or other extra characters attached to this character is held here. This does include the possibility to add additional information, for example in the case of space characters that are used honorifically before names. 184 Chung-Hwa Buddhist Journal Volume 25 (2012) Tablename Attribute Kind of Information key, value, note Mark Interpunction tag, name, gloss, scope, note, color position, category Relations TextChar (start), TextChar (end), Mark The Mark table provides the tags that can be associated with locations in the text, whereas the Attribute table does provide the actual connection between an instance of a mark and a specific text location, given its start and end TextChar. Interpunction, except for space that is already present in the source text, is held in a separate table, linked to the text from the TextChar; besides the character used to represent the interpunction, the position 23 24 relative to the character and a category is recorded. Here is a table of the tables in the Chardb, the part of the application that maintains the character database: Tablename Char Unihan CharGroup Variant Pinyin IDS Kind of Information Relations unicode codepoint, character, types external link to TextChar key, value Char members, type Char through Variant type, character, note Char, CharGroup pinyin reading Char 25 Char IDS (Ideographic Descriptor Sequence) Groups of characters are built by linking the characters through the Variant table to a CharGroup and declaring thereby membership to that group. Additional properties can be set on Variant and CharGroup. The modeling of semantic is currently done through the definition in the Unihan table; the sound is modeled through the Pinyin table. This is provisional and is awaiting a more thorough solution. User Interface The user interface is accessed by opening the URL. It requires an account in the web application. Upon login, the user will be presented with the last page visited before leaving the system, like in Figure 6. The initially visible screen space is divided into three parts, at the right is a page as digital facsimile of the text, in the center pane is a transcribed version of this same page, while the left pane holds some administrative functions: There is information about the current page, the user (including a logout button and a possibility to 23 24 25 This is given as one of eight compass positions with the character in question at the center, numbered clockwise and starting in the 'East', that is, after the character. At the moment, the categories are phrase-end, sentence-end and phrase/sentence-start The IDS is a sequence of operators and character parts that together describe how a character is composed. Digital Editions of Premodern Chinese Texts 185 look at a change log), in the second part is a panel for navigating the text collection and finally the bottom left has a multifunction panel for showing additional information and perform other tasks on this text page. Figure 6: The web application interface for establishing the source text The main functions for interacting with the text however are not visible here. Most editing actions are performed by clicking or selecting text and through the dialog boxes that pop up following such an action. Figure 7 shows an example of this popup window, in this case the fourth character position in the second line has been clicked, as a visual feedback to remember which character position is the target of the actions taken in this dialog, the character in this position is highlighted. The new window that opened gives in the top line the TextLine of this position, the character and then a number of input boxes. 26 The first input box has the current character for the edition [CK-KZ] which is given in the second box. By providing a different character and selecting a different edition, the user can associate a new reading for another witness of the text, or give a different character to be used in the JYE edition. If the correction or replacement is occurring several times, the scope for this action can be set in the third selection box to be either valid for the current character, for the whole page, or even for the remaining part of the 27 text . Below this line, there are four tabs for further action or inspection; by default it 26 The conventions for identifying the edition here is constructed as follows: Currently, there are two edition groups, indicated by CK and YP. The actual edition from within the group is then indicated in the second part of the sigle, in this case it is the Kaozheng reprint of the Chongkan edition CK-KZ. An exception to this scheme is the new regularized edition created here, which will be indicated as JYE. 27 This is mainly to make the editorial process more efficient, under the assumption that only text not yet seen will be touched. 186 Chung-Hwa Buddhist Journal Volume 25 (2012) opens to the second tab, which provides a glimpse into the information in the character database for the character at this position. Among other things, the number of occurrences of the character here are given (464) and images of the character as it has been cut from the text. The main part gives additional information about the character, including 28 pronunciation and definition according to the Unihan database . More important however, for the present context, is the ability to maintain character relations here. The information about character variants, that is hold for the character is shown in Figure 8. In this 29 case, the Hanyu da cidian 漢語大字 , on which the initial information is based, has assigned this character to five different groups of characters. For all characters in this group, the Unicode code-point, number of occurrences in the DZJY, as well as definition and pronunciation is given. Characters can be added to groups or deleted from groups, or new groups created as necessary, thus allowing to model this information exactly as is needed for this text collection. In addition to that, to assist the user in distinguishing characters that might be mistaken for each other, it is also possible to register characters to the system which are not cognates of the current character. Figure 7: The dialog box that opens when a character position in the transcribed text is clicked 28 This is a database of basic character properties, maintained by the Unicode Consortium. 29 Hanyu da zidian weiyuanhui (1986-1989). Digital Editions of Premodern Chinese Texts Figure 8: Information about held in the character database 187 188 Chung-Hwa Buddhist Journal Volume 25 (2012) The first tab on this window allows the user to cut an image from the digital facsimile and associate it with the current position in the transcribed text. In addition, this image is also associated with the corresponding character in the character database. Figure 9: Cutting a character from the text The next tab on this window allow the user to see all information associated with a character, as shown in Figure 10. Here, a regularized version of the character has been registered for the JYE edition. It is also possible to add further notes to the character into the textbox to the right. The last tab (not shown), allows for adding or deleting of larger chunks of text. Figure 10: Detailed information about this text location Digital Editions of Premodern Chinese Texts 189 Another way to interact with the text is to select a string of characters. The action following a selection can be configured to either copy the selected string to the search box, or to apply markup to the selection, as shown in Figure 11. Currently, this is mostly used to record characters that have been printed smaller as inline notes, but this will also be used for titles, personal names and other items of interest in the text. To record structural elements in the text, like paragraphs, verse lines or section headings, yet another dialog can be used that pops up when clicked on the horizontal bars at the top of a text line (see Figure 12); this assumes however that the features is starting at the beginning of the line. Figure 11 Figure 12: Applying markup to a line 190 Chung-Hwa Buddhist Journal Volume 25 (2012) Context The discussion here stands in the context of practical experience and theoretical considerations with digital text in Chinese. Some ideas have been pursued and have been discussed in earlier presentations and articles. In particular, in the last several years, I was 30 developing an ontological model for understanding text from a perspective quite different from the one taken here. The model presented here is meant to complement this from a different perspective, filling some of the gaps in the earlier model. The work here can also be seen as a continuation of an earlier line of thought, which was concerned with a 'scholarly workbench'; the last incarnation of which was a Filemaker-based application called KanDoku that supported annotation, translation and markup of digital texts. When I tried to implement support for more flexible handling of character representation and variant readings for different text witnesses, I quickly ran into the limitations inherent in that platform. The present work should be seen as aiming in a similar direction, except that this time and attempt has been made to start with a firm foundation. It is planned however, to gradually add more of the possibilities of that earlier KanDoku. Another difference of the present work, to KanDoku is that the latter took as its input a completed TEI P5 compatible digital version of a text, while the former will attempt to produce such a thing as its output (among other things), in fact one of its design goals is to improve the workflow of creating high quality digital edition of text, but hopefully its usefulness will extend beyond that and allow the user to gain new insights into the text itself. In the Daozang jiyao project, the work was initially done by editing TEI conformant XML files with the XML editor oXygen. This was considered cumbersome and time consuming by the researchers involved, so this editing application has been developed to provide a more convenient interface for performing specific tasks on the text easier than could be done otherwise. It should be noted however, that such a specialization also involves an enormeous limitation to what can be done while editing the text, there will therefore be many cases where such a solution can not be applied. It is planned to add a routine to export the texts edited using this interface into TEI conformant XML documents. As it stands at the moment it is very much work in progress and much of the necessary functionality, for example to visualize textual context in a way that takes into account the several different layers of characters that might be available at a given point in the text is still missing. The results that have been achieved so far in the context of 30 In English, this is presented most detailed in Wittern (2007), but more references can be found here http://kanji.zinbun.kyoto-u.ac.jp/~wittern/publications articles/index.html. Digital Editions of Premodern Chinese Texts 191 work on the Daozang jiyao seem to suggest that the work is going in the right direction and will indeed be able to open up new avenues for digital texts. It will be interesting to see how well this approach could also be applied to earlier stages of the development of the Chinese writing system, such as bronze inscriptions or texts on bamboo slips. The model presented here implicitly assumes a regular grid for the layout of a text, so that model would require some extension, but it will have to be actually tried with such a text to see in what way such extensions should be implemented. 192 Chung-Hwa Buddhist Journal Volume 25 (2012) Abbreviations CBETA Chinese Buddhist Electronic Text Association 中華電子 (see http://www.cbeta.org) 協會 CJK Chinese, Japanese and Korean Characters CK Chongkan 重刊 (reprint) edition of the DZJY, Sichuan 1906ff. CK-KZ Facsimile edtion of CK published by the Kaozheng publishing company DZJY Daozang Jiyao 道臧輯要 HYDCD Hanyu Dacidian 漢語大詞 IDS Ideographic Definition Sequences JYE New Electronic edition of the DZJY TEI Text Encoding Initiative (see http://www.tei-c.org) UCS Universal Character Set, also known as Unicode (see http://www.unicode.org) XML eXtensible Markup Language (see http://www.w3.org/XML/) YP DZJY original edition by Jiang Yupu 蔣予蒲 (1755-1819) Digital Editions of Premodern Chinese Texts 193 References Becker, Joe. 1988. Unicode 88. (http://www.unicode.org/history/unicode88.pdf. Accessed 2012-03-23). Hanyu da zidian weiyuanhui 漢語大字 委員會, eds. 1986-1989. .Hanyu da Zidian 漢語 大字 . 8 vols. Wuhan: Hubei cishu chubanshe and Sichuan cishu chubanshe. Kawabata, Taichi 川幡太一. 2005. Possible Multiple-encoded Ideographs in the UCS. (http://www.cse.cuhk.edu.hk/~irg/irg/irg25/IRGN1155_Possible_Duplicates.pdf. Accessed 2012-03-23). Kawabata, Taichi 川幡太一. 2006. IDS による UCS 漢字の 同一性 の判定手法 (Methods to Assert 'Sameness' of a Character in UCS Kanji Through IDS. ) 東洋学へ の コ ン ヒュ ー タ 利 用 第 17 回 研 究 セ ミ ナ ー . Kyoto: Institute of Research in Humanities. Leng, Yulong 冷玉龍 and Wei, Yixin 韋一心, eds. 1994. Zhonghua Zihai 中華字海. Beijing: Zhongua shuju. Luo, Zhufeng 羅竹風, ed. 1987-1994. Hanyu Dacidian 漢語大詞 . Shanghai: Dictionary Publishing House. Morioka, Tomohiko. 守岡知彦 The CHISE (Character Information Service Environment) project. (http://www.kanji.zinbun.kyoto-u.ac.jp/projects/chise/. Accessed 2012-03-23). Renear, Alan; Mylonas, Elli; Durand, David. 1996. Refining Our Notion of What Text Really is: the Problem of Overlapping Hierarchies. Research in Humanities Computing. Ed. Ide, Nancy and Hockey, Susan. Oxford: Oxford University Press. Wittern, Christian. 2007. Digital Text, Meaning and the World: Preliminary Considerations for a Knowledgebase of Oriental Studies. Higashi Ajia ni Okeru Reigi to Keibatsu 東アシ アにおけろ儀礼と刑罰 (Ritual and Punishment in East Asia). Ed. Tomiya, Itaru 冨谷 至. Kyoto: Institute for Research in Humanities. 41-58. Yasuoka, Koichi 安岡孝一. Text-Searchable Image and Its Applications (http://kanji.zinbun. kyoto-u.ac.jp/~yasuoka/publications/2005-01-22.pdf. Accessed 2012-03-23). Yoshioka, Yoshitoyo 吉岡義豊. 1955. Dōkyō Kyōten Shiron 道教教 史論 (Historical Studies on Daoist Scriptures) Tokyo: Gogatsu shobo. Zhao, Zongcheng 趙宗誠. 1995. Daozang Jiyao de Bianzuan yu Zengbu 道蔵輯要的編纂 增補 (The Compilation of the Daozang Jiyao and its Enlarged Editions) Sichuan Wenwu 四川文物 2:27-31.
Chung-Hwa Buddhist Journal (2012, 25:87-104) Taipei: Chung-Hwa Institute of Buddhist Studies 中華佛學學報第二十五期 頁 87-104 (民國一百零一 ISSN:1017-7132 ),臺 :中華佛學研究所 The Corpus Search and Results Handling System Glossa – a Description Janne Bondi Johannessen The Text Laboratory, Department of linguistics and Nordic studies University of Oslo Abstract The paper presents and describes Glossa, a corpus search and results handling system that has two main characteristics: It is advanced with respect to search and handling options, and it is very user-friendly. Also, it is freely downloadable. The system is suitable for monolingual and parallel corpora, and for combining different kinds of information in the search results. In the paper I show how sound, video and maps, as well as sets of double transcriptions, are presented to the Glossa user. Keywords: Corpus Search System, User-friendly Interface, Advanced Search, Parallel Corpora, Speech Corpora 88 Chung-Hwa Buddhist Journal Volume 25 (2012) 語料庫搜尋與結果處理系統 Glossa 之說明 Janne Bondi Johannessen 奧斯陸大學 摘要 此篇文章介紹並說明語料庫搜尋與結果處理系統—Glossa,此系統有兩個主要的特 性:它具有搜尋與處理選擇 的優越性,並相當考慮使用者的需要 另外,此系統 可免費 載,且適用於單一語言及 行語料庫,同時可以在搜尋結果中結合不同的 資訊 在此文,我將說明聲音,影像及地圖,及一套雙重抄寫如何呈現於 Glossa 的使用者 關鍵詞:語料庫搜尋系統 人性 用戶界面 進階搜尋 行語料庫 口語語料庫 The Corpus Search and Results Handling System Glossa 89 Introduction 1 The paper presents and describes Glossa, a corpus search and results handling system that has two main characteristics: It is advanced with respect to search and handling options, and it is very user-friendly. Also, it is freely downloadable, which means that those who have a corpus and would like it to be available on the web in a nice interface, can use Glossa. Many corpora are used with Glossa both at the University of Oslo and elsewhere. The paper is structured as follows. In section 2, I briefly describe the importance of userfriendliness. Section 3 illustrates querying with Glossa, showing options with as different texts as parallel corpora and speech corpora. That section concludes with an illustration of the indispensability of Glossa for certain types of research. The illustration shows how finding isoglosses for variation in noun morphology depend on the Glossa options of maps and parallel transcription search. Section 4 gives the technical details, including requirements on input data and a small discussion on the use of Google APIs. Section 5 concludes the paper. Importance of User-Friendliness There are several corpus interfaces available, see e.g. Johannessen et al. (2000), Bick (2004), Hoffmann and Evert (2006). However, they often have limitations: some are not network-enabled (i.e. each user has to download and manage corpora), some lack flexibility with regard to queries, results display and post-processing, many are tied to a specific corpus, and few are completely GUI-driven. Typically, corpus applications require queries to be formed as regular expressions in some formal language. Many corpus users find it difficult to learn such query languages, with their requirements for accurate use of parentheses, asterisks, percentage signs etc. Furthermore, applications often require the users to know the full tag set before querying the corpus. Many corpus users find it hard to have to know the tag inventory, tag names and necessary tag abbreviations, as well as abbreviations for source texts, etc. For many potential users, these issues act as an obstacle, preventing them from making easy or efficient use of corpus tools. We believe that an easy-to-use, flexible graphic user interface is important for maximizing the potential of corpora in research, development and teaching. Furthermore, 1 I would like to thank the two anonymous reviewers for very good advice. I also thank my colleagues at the Text Laboratory, University of Oslo, especially Joel Priestley, Anders Nøklestad and Kristin Hagen, who are vital for the Glossa development and Lars Nygaard for his important contributions in the early development phase. 90 Chung-Hwa Buddhist Journal Volume 25 (2012) the interface should not presuppose full-text access to the corpora, as licence conditions may prohibit free redistribution, even if they often do allow web-based querying. Glossa satisfies these criteria. Querying with Glossa The corpus user can query the corpus by linguistic features or by non-linguistic features, or by a combination. The most common linguistic queries involve specifying a token by given attributes: word, lemma, affix or part of word (start, middle, and of word), part of speech, morphological features, syntactic functions, sentence position. These queries can always be done in a user-friendly way. In (1) we exemplify what a search using a search language of regular expressions would be like, in order to search for a plural noun starting with the letter sequence jump. In figure 1 we see the same query in Glossa, with its pull-down menus. (The latter search is translated by Glossa into regular expressions.) (1) (word="jump.*"%c&(number="pl")&(pos="n")) Figure 1: Querying Glossa using linguistic specifications. All searches are done using checkboxes, pull-down menus, or writing simple letters to make words or other strings. The Corpus Search and Results Handling System Glossa 91 The querying in figure 1 is a monolingual search. In figure 2 we see how a query can address more than one language in a parallel translation corpus. The user has indicated that (s)he wishes to get all hits where the English text contains jump followed by a preposition, and the Norwegian translation equivalent contains hopp. Figure 2: Querying for a parallel search. The parallel search in figure 2 is translated to a regular expression by the system, presented in (2), and the search results are presented in figure 3. Without the interface, the users would themselves have to write this regular expression. (2) "([((word="jump" %c))][((pos="prep"))]) :OMC4_NO ([((word="hopp.*"%c))]) ;" Figure 3: Some results from a parallel corpus query. 92 Chung-Hwa Buddhist Journal Volume 25 (2012) The examples we have seen up to this point are ones that query linguistic features. The corpus user can also filter the searches by non-linguistic features, on the same query page. Here the choices are hidden in clickable boxes that appear when the little plus-sign is expanded. This is shown in figure 4. Figure 4: The non-linguistic features that can be used for filtering the search. The Nordic Dialect Corpus (Johannessen et al. 2009) is a corpus that contains a lot of information: the speech of five languages, hundreds of dialects, tagging, at least one transcription for each dialect, but sometimes two, information on informants, like age and sex, type of recording, recording year etc. But querying the corpus is no more complicated than with simpler corpora. Figure 5 illustrates how to search for a suffix. The suffix –um in the Övdalian dialect of Sweden has this suffix for two functions: dative plural on nouns and 1st person plural on verbs. It is an interesting search option since the Övdalian dialect is rapidly changing, and one does not know whether the various inflections are still used. (They are no longer used in standard Swedish.) Especially use of the dative suffix is clearly rapidly losing ground. Since –um is a non-standard suffix, there is no standard orthography for handling it, and consequently, a standard orthographic search is not viable. The user must specify that the search should be performed in the phonetic transcription (the option at the bottom of the pull-down menu). The standard orthography simply leaves out the suffix altogether, so there is no way this could be used (see figure 7 for the results of the search, illustrating also the difference between the two types of transcription). The Corpus Search and Results Handling System Glossa 93 Figure 5: A simple query for the suffix –um. Figure 6 illustrates many of the non-linguistic variables that can also be used to limit the search. In addition to those regarding informants, there are also some other choices that deal with the presentation of the result (top of figure 6). I will mention particularly the option of choosing one or two or both types of transcription. This option is irrelevant of whether the researcher originally searched in the phonetic or orthographic transcription. Figure 6: Non-linguistic search options in the Nordic Dialect Corpus. 94 Chung-Hwa Buddhist Journal Volume 25 (2012) In figure 7 some of the search results for the suffix query for –um are displayed, and we see how the two transcriptions complement each other. Figure 7: Three different displays of search results: two types of transcription (orthographic and phonetic) plus an English translation – in that order. Without being able to search in the phonetic transcription we would not have been able to find these suffixes. Without the orthographic transcription a non-expert dialect speaker would not have been able to understand the phonetic transcription, given how far it is from the standard. We would like to point to the fact that the displayed results are translated to English by using a Google Translate API. This has to be done for each concordance line separately, and is a service to less proficient, Nordic language speakers. Figure 8 shows what the results window looks like when the film icon button next to a result line is pushed. The video and audio give exactly the same segment as the text line in the results list. The Corpus Search and Results Handling System Glossa 95 Figure 8: Search results with audio and video. In addition to the many search options, there are also various options for handling the results. The Action menu visible in figure 7 and 8 gives a large selection of choices, for example: sorting on matching phrases, bibliographic information or arbitrary points in the context, counting matched phrases, downloading result sets in various formats (e.g. tab separated values and Excel spreadsheets), collocation analysis, co-occurrence analysis, user-defined annotation, singling out individual hits or whole results file for saving or deletion, viewing with regards to metadata distribution, frequency count of all hits. In figure 9, we have simply asked for a count of the results from a search on jump as first part of a word. This option gives the researcher a very nice overview of the words of the resulting search concordance. Here, the case-sensitive option has been chosen, thereby distinguishing jump from Jump. This is a choice the user has to make before displaying the result of the count. 96 Chung-Hwa Buddhist Journal Volume 25 (2012) Figure 9: Word count The word count can also be represented by a pie chart or a histogram, among other things, as illustrated in figures 10 and 11: Figure 10: Frequency displayed as a pie chart. The Corpus Search and Results Handling System Glossa 97 Figure 11: Frequency displayed as a histogram. The Action menu also gives the possibility of showing collocation data, as in figure 12. Figure 12: Collocations 98 Chung-Hwa Buddhist Journal Volume 25 (2012) As mentioned, Glossa is continuously being developed and is getting new features. I have not shown all the options that can be had with this corpus search and results handling system, but I would like to mention one of the newest additions to the system; that of showing maps for each concordance line. Thus, if we make a search for some feature that is distributed geographically, a map display is very useful. I choose to present a final example that illustrates how useful the Glossa options are for linguistic research, by the overall research question of isoglosses for noun morphology. A topic that has interested Norwegian dialectologists over many years is the distribution of the various noun suffixes. While detailed maps were drawn for the noun morphology in the mid 1900s, it is expected that the situation differs now, but it is costly to do a full dialect survey only for this topic. With the Nordic Dialect Corpus available in Glossa, a simple search for a specific, common noun such as ungene ‘the children. MASCULINE, will give the desired results revealing the geographical distribution of this plural definite suffix within seconds. There are 568 hits, and the results page shows each form of the noun as in figure 13, and the geographical distribution on a Google map, as in figure 14. It should be mentioned that I could also have chosen to search for just the string –ene ‘plural definite suffix’, but have chosen not to do so here, since that would have given hits for all three genders (neuter and feminine as well). Since many dialects distinguish the plural definite suffix according to gender, I would have gotten many more forms, which I do not find useful for this illustrative example. The Corpus Search and Results Handling System Glossa 99 Figure 13: The full range of pronunciations of the word ungene ‘the children in the Nordic Dialect Corpus, transcribed in a traditional Norwegian system. The corpus users themselves choose which words to group together by way of a colour code. Here I have chosen to distinguish between three types: the full two-syllable suffix – ane (green [editor's note: download PDF for color reproductions] ), the apocoped onesyllable suffix –an (black), the short non-nasal suffix –a (orange), and the dative suffix consisting of a rounded vowel and a bilabial consonant –om (yellow). In figure 14 the geographical distribution of these types is clearly displayed, and the isoglosses easy to see. 100 Chung-Hwa Buddhist Journal Volume 25 (2012) Figure 14: Map with the distribution of the plural definite suffix. The map shown that the full suffix –ane (green markers) is commonly used in the south and west parts of southern Norway. The apocoped suffix –an (black) is used in all of north Norway and the middle part of south Norway down to the coast. The short suffix –a (orange) is mainly found in the eastern part of south Norway and in one place in north Norway. The latter could be evidence of an immigrant group that came to this area in the 1700s–1800s from the eastern valleys of south Norway, a fact that, even today, is clearly reflected in the language. The dative suffix –om (yellow) is only found in a few places in the northern part of south Norway. Dative case is slowly dying in Norway, just as it is on the other side of the border (recall the discussion on the similar Övdalian dative case suffix –um). The last suffix search with the resulting map illustrates two important features of Glossa. If it had not had the possibility of searching for aligned phonetic transcription variants via an orthographic search, finding so many versions of the suffix would have been nearly impossible. With only access to orthographic transcription, no variation would have been found, and conversely, with access only to phonetic transcription, a The Corpus Search and Results Handling System Glossa 101 comprehensive search would have required detailed knowledge of all the dialect forms, a near-impossible requirement. The second important feature for this search is the map. Without the visual illustration, the isoglosses would have been hard to spot – with so many places and so many linguistic forms. Technical Details Glossa (Nygaard 2007, Johannessen et al. 2008) is implemented partly through new programming and partly with other reusable resources. The corpus search part is performed with the IMS Corpus Workbench (CWB, Christ 1994), and the meta information is put into a relational MySQL database. Although the web interface is simple, it allows users to create complex queries in very simple ways, browse, process, download result sets etc. Glossa supports all types of corpora, both multilingual and multimodal, with various amounts and kinds of annotation. The statistics options are implemented with the Ngram package (Pedersen 2008). Google Translate and Google Maps are used for added value of display of search results. All in all, as indicated, Glossa combines several features and functions, and makes them all available in the same user interface. We know of no other interface that combines so many options. We have used existing APIs for the programs mentioned here, but have not developed additional ones. The use of Google APIs for translation and map functions deserves a comment. Google is a commercial company with whom one does not communicate directly. They offer good quality programs via APIs for free and have therefore been a valuable choice for us. For example, we could have gotten Norwegian electronic maps free, from a different company, as a university institution. However, we needed maps for many countries in Northern Europe, while our institution only had agreements with one company for Norwegian maps. Thus, Google’s free service turned out to be our only option. Their API covers a lot and their functionality is good. A problem with Google, however, is that they as a commercial company change their terms of service along the way. Thus, the translation option that we have described here, was provided free of charge, but now has to be paid for. Using Google thus makes some of the modules less predictable in the long run. When it comes to formats, Glossa needs texts to be in the format required by the CWB, i.e., tab-separated text with XML tags. Glossa uses the XML tags for structural (i.e., not about individual words) information such as sentence ID and time codes (for audio and video files). If input texts come with TEI or other XML markup, information from these tags will be extracted and inserted into the MySQL database. For Glossa to be able to communicate with the third-party services (for example maps) and to link the corpus text directly to audio and video, the corpus must have markup that includes latitude/longitude coordinates and time codes, respectively. Grammatical tags are part of the input tab-separated text, and must be mapped to the menu-structure of the search- 102 Chung-Hwa Buddhist Journal Volume 25 (2012) interface. Mapping for TreeTagger for some languages is included in the system. But any tag set and values can be imported. Glossa itself requires simply text in tab-separated format and a MySQL database for metadata (extra-linguistic information on informants or text sources). The programming languages used are Perl, Ruby, PHP and JavaScript. CWB allows Unicode. Configuration of the interface and the mapping from corpus data to menus and search options is achieved using a set of corpus-specific configuration files. Search results can be exported to several formats, such as tab separated and comma separated text, Excel etc. Glossa is freely downloadable on a GPL licence from GitHub, and is undergoing regular development and improvement in close contact between users and developers. Some installation support can be given upon request. The Glossa package includes scripts that convert written texts in TEI formats, as well as spoken language in Transcriber-XML, into a full corpus and database. There are many types of corpora that use Glossa (speech corpora, parallel, written corpora, and monolingual written corpora). For a list, please consult the end of the paper. Conclusion The paper describes some features of the corpus search and results handling system Glossa, developed at the Text Laboratory, UiO. We have seen that the basic search system is the same for any kind of corpus, but that specific features (audio, audio or translated texts) will give various additions to the usability. Glossa is currently used for monolingual and multilingual, parallel written corpora and for speech corpora with audio and video. The Glossa system is freely downloadable (see web site below) and some support can be given for corpus installation. The Corpus Search and Results Handling System Glossa 103 References Bick, Eckhard. 2004. Corpuseye: Et Brugervenligt Webinterface for Grammatisk Opmærkede Korpora. Møde om Udforskningen af Dansk Sprog, Proceedings. Ed. Peter Widell and Mette Kunøe. Denmark: Århus University. 46-57. Christ, Oli. 1994. A Modular and Flexible Architecture for an Integrated Corpus Query System. Complex' ’94. Budapest: Research Institute for Linguistics, Hungarian Academy of Sciences. Evert, Stefan. 2005. The CQP Query Language Tutorial. Germany: Institute for Natural Language Processing, University of Stuttgart. (http://www.ims.uni-stuttgart.de/ projekte/CorpusWorkbench/CQPTutorial) Hoffmann, Sebastian and Evert, Stefan. 2006. Bncweb (cqp-edition): The Marriage of two Corpus Tools. Corpus Technology and Language Pedagogy: New Resources, New Tools, New Methods, volume 3 of English Corpus Linguistics. Eds. S. Braun, K. Kohn, and J. Mukherjee. Frankfurt am Main: Peter Lang. 177 - 195. Johannessen, Janne Bondi; Nøklestad, Anders; Hagen, Kristin. 2000. A Web-Based Advanced and User-Friendly System: The Oslo Corpus of Tagged Norwegian Texts. Second International Conference on Language Resources and Evaluation. Proceedings. Johannessen, Janne Bondi; Nygaard, Lars; Priestley, Joel; Nøklestad, Anders. 2008. Glossa: a Multilingual, Multimodal, Configurable User Interface. Proceedings of the Sixth International Language Resources and Evaluation (LREC'08). Paris: European Language Resources Association (ELRA). Johannessen, Janne Bondi; Priestley, Joel; Hagen, Kristin; Åfarli, Tor Anders; Vangsnes, Øystein Alexander. 2009. The Nordic Dialect Corpus - an Advanced Research Tool. Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. NEALT Proceedings Series Volume 4. Eds. Jokinen, Kristiina and Bick, Eckhard Bick. Denmark: Northern European Association for Language Technology. Nygaard, Lars. 2007. The Glossa Manual. Norway: The Text Laboratory. Pedersen, Ted. 2008. Ngram Statistics Package. (http://www.d.umn.edu/~tpederse) 104 Chung-Hwa Buddhist Journal Volume 25 (2012) Corpora that use Glossa Big Brother Corpus (Speech), Norwegian: http://www.tekstlab.uio.no/nota/bigbrother/ The European Parliamentary Comparable and Parallel Corpora (ECPC) (under development): http://www.ecpc.uji.es/EN/home.php?language= en Lexiographical Bokmål Corpus: http://www.hf.uio.no/iln/forskning/samlingene/bokmal/index.html#bokma lskorpus Lule Sámi Corpus: http://giellatekno.uit.no/doc/lang/corp/corpus-smj.html Macedonian Text Corpus: http://www.tekstlab.uio.no/glossa/html/index_dev.php?corpus= mak Mörkuð íslensk málheild (Icelandic Corpus): http://mim.hi.is/ Nordic Dialect Corpus (Speech): http://www.tekstlab.uio.no/nota/scandiasyn/ North Sámi Corpus: http://giellatekno.uit.no/doc/lang/corp/corpus-sme.html NoTa Oslo Speech Corpus: http://www.tekstlab.uio.no/nota/oslo/ Oslo Multilingual Corpus: http://www.hf.uio.no/ilos/OMC/ Ruija Speech Corpus of Kven: http://www.hf.uio.no/iln/tjenester/kunnskap/sprak/korpus/talesprakskorp us/ruija/index.html RUN Parallel Corpus: http://www.hf.uio.no/ilos/forskning/forskningsprosjekter/run/corpus/ TAUS Speech Corpus of Norwegian: http://www.tekstlab.uio.no/nota/taus/index.html UPUS Speech Corpus Multiethnic Norwegian: http://www.hf.uio.no/iln/forskning/prosjekter/upus/ Other Web Sites GitHub: https://github.com/ Glossa: http://www.hf.uio.no/tekstlab/glossa.html Google Translate: http://translate.google.com IMS Corpus Workbench: http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/ MySQL: http://www.mysql.com Open Source: http://www.opensource.org Text Laboratory: http://www.hf.uio.no/tekstlab/
Chung-Hwa Buddhist Journal (2012, 25:3-6) Taipei: Chung-Hwa Institute of Buddhist Studies 中華佛學學報第二十五期 頁 85-102 (民國一百零一年),臺北:中華佛學研究所 ISSN:1017-7132 Introduction Christoph Anderl The papers collected in this volume are originally based on lectures presented at the workshop “Resources in the Mark-up and Digitization of Historical Texts”, organized at the University of Oslo, bringing together scholars from a variety of backgrounds and with 1 different research interests. The workshop was organized in connection to a conference on various aspects of Dūnhuáng manuscripts and early Chán Buddhism, and also included 2 a TEI meeting. Whereas the Dūnhuáng Conference was primarily concerned with early Chán in the Dūnhuáng area and its relation to ritual practices, esoteric Buddhism and Daoism, as reflected in Dūnhuáng manuscripts and early Chán historiographical materials, one of the main objectives of the Resources Conference included questions concerning the analysis of text/manuscript materials and methods of presenting them to scholarly communities and to the general public. Many aspects of text/manuscript digitization and text mark-up have undergone tremendous progress in recent years and it is sometimes difficult to follow the new developments and approaches in this field. Originally, the workshops focused on East Asian materials and the scripts used in these texts and manuscripts, however, through the interdisciplinary approach of including a variety of other languages, and locations, the discussion was greatly enriched by this comparative approach. The workshop featured both descriptive presentations of existing database projects (providing an overview of 3 current projects), lectures dealing with specific analytical research questions, in addition 1 2 3 For the conference website, see http://folk.uio.no/christoa/ZenManus_Front.html; see also http://www.hf.uio.no/ikos/english/research/projects/zen/. The conference (September 27th – October 1st, 2009) was a joint project with the Institute of Research in Humanities (Kyōto University), and was co-organized by Christian Wittern. The conference and workshops were part of a larger interdisciplinary project on Chán/Zen Buddhist culture, literature, and language at the Department of Culture Studies and Oriental Languages (IKOS, University of Oslo). The results of the 2008 conference on Chán/Sǒn/Zen Buddhist rhetoric were recently published in form of an edited volume (Anderl, Christoph. ed. 2012. Zen Buddhist Rhetoric in China, Korea, and Japan. Leiden/Boston: Brill). This included presentations concerning the Text Encoding Initiative (TEI), CBETA, the Digital Dictionary of Buddhism (DDB), Thesaurus Literaturae Buddhicae (Oslo Univ.), the Thesaurus Linguae Sericae Project (Oslo/Heidelberg Univ.), The International Dunhuang Project (British Library), digitization projects a the Dharma Drum College (Taiwan), a project on the Daoist Canon (Kyōto University), the PROIEL parallel corpus (Oslo Univ.), the Text Laboratory 4 Chung-Hwa Buddhist Journal Volume 25 (2012) to approaches focusing on the technological aspects of digitization and mark-up. In the context of this volume, research-focused presentations primarily concerned with text analysis and the development of analytical tools were selected. Through the presentation of a variety of analytical approaches to texts and manuscripts, as well as methods of (visual) presentations, we hope to stimulate further interdisciplinary work and collaboration in this field. Digital Humanities have become an indispensible aspect of continuously increasing significance in historical, religious, and linguistic studies with all their sub-fields, and it is a research area where technology and philology (in the broadest sense) interact in a dynamic and fascinating way. The workshops also focused on some problems arising through the enormous speed with which aspects of Digital Humanities have been developing during recent times. Many philologically inclined scholars (including myself!) have originally not been trained in the arts of mark-up and programming, or the magic skills of transformations, or have only been confronted with these disciplines during a later stage of their scholarly development. My interest in XML-based methods of structuring and analyzing texts was generated by concrete research questions (and – I have to admit – by a certain fascination concerning the flexibility of XML on the one hand, and the rigid structure which has to be imposed on the material, on the other hand). Often it is difficult for the non-specialist to make the right decisions concerning the adaptation of appropriate methods in the framework of specific research projects. In addition, it is difficult to maintain an overview of ongoing projects and the technological and mark-up strategies chosen and developed. As such, collaboration and interdisciplinary communication is crucial for copying with these challenges, and in order to avoid redundant work. With the publication of this volume, we hope to make a small contribution to these ongoing developments. I want to extend my special thanks to the co-organizer of the conference and the workshops, Christian Wittern. Without him and his expertise in the field of Digital Humanities, neither the conference nor the subsequent publications would have materialized. We want to thank all the participants who contributed during the workshops and discussions, as well as the staff of IKOS who helped with the organization of the conference, especially Arne Bugge Amundsen (Head of Department), Rune Svarverud (Head of Research), Mona Bjørbæk (Head of Academics), Cecilie Wingerei Lilleheil (Oslo Univ.), digitization projects in the context of the Turfan Collection in Berlin, a project on Old Japanese syntax at Oxford University, a relational database on Confucius' Analects, as well as the Organon Knowledge Editor (AnaCypher, Oslo). The presentations gave an overview of important current projects within the field of Digital Humanities. For more information and a pdf of the abstracts, see http://folk.uio.no/christoa/Zen%20conference_2009_abstracts_03.pdf. The meetings were complemented by workshops on text mark-up, in addition to a course on text mark-up for Master and Ph.D. students at Oslo University. The Introduction 5 (Research Advisor), and Sathya Sritharan (Economy Officer). Our gratitude also extends to the conference assistants Bori Kim, Therese Sollien, Øystein Krogh Visted, and Kevin Dippner. Special thanks to Daniel Paul O’Donnell (TEI chair in 2009) for co-organizing the TEI meeting. We are also greatly indebted to Marcus Bingenheimer, Bill Magee, and Su-an Lin for providing the opportunity to publish the articles in this journal, and their great efforts in editing and proofreading the papers. Our thanks also extends to the readers/reviewers who provided many good suggestions for improving the quality of the papers. The conference for which these papers were written was generously funded by The Research Council of Norway (NFR), The Chiang Ching-kuo Foundation (Taiwan), and The Department of Culture Studies and Oriental Languages (IKOS) at the University of Oslo. An editorial note on the illustrations and photographs used in the articles: Originally, several of the articles contained high-resolution color illustrations, as well as sections of text marked with different colors. Since these features could not be integrated in the black-and-white printing of the journal, we kindly ask the reader to refer to the following web site containing these photographs and pictures: http://www.chibs.edu.tw/ch_html/index_ch00_07.html
Chung-Hwa Buddhist Journal (2012, 25:7-50) Taipei: Chung-Hwa Institute of Buddhist Studies 中華 學學報第 十 期 頁 7-50 (民國 零 ISSN:1017-7132 ),臺 :中華 學研究 Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts: Exemplified by the Platform Sūtra Christoph Anderl (University of Ghent) Kevin Dippner (Malakoff High School) Øystein Krogh Visted (Jiao Xie Center for Chinese Culture and Language) Abstract This paper deals with several questions and problems related to the editing, digitization and analysis of Buddhist Dūnhuáng texts. The Dūnhuáng corpus of Chán (Zen) manuscripts is the most important source for the study of the early history of this Chinese Buddhist school. The authors discuss paleographic and textual features of the manuscripts and investigate several possibilities of TEI-compatible mark-up concerning the collation, translation, annotation, and semantic and syntactic analysis of this type of manuscript literature, in addition to methods of transformations into visual media. The approaches are exemplified by an experimental mark-up of the Dūnhuáng versions of the Platform Sūtra. In the second part of the paper, the newly initiated Chan Database Project is introduced and collaborative methods of dealing with Chán literature are discussed. In the appendix to the paper, the system of phonetic loans, as well as scribal conventions and errors in the manuscript versions of the Platform Sūtra are described and compared. Keywords: Platform Sūtra of the Sixth Patriarch (Liuzǔ Tanjing), Dūnhuang Manuscripts, Phonetic Loan Characters, Analytic Mark‐up, Zen Buddhism 8 Chung-Hwa Buddhist Journal Volume 25 (2012) 檢視敦煌寫 的標記 析— 祖壇經為例 Christoph Anderl (根特大學) Kevin Dippner (馬拉科 高中) Øystein Krogh Visted (交 中國文 語 中心) 摘要 篇文 處理有關 教敦煌文獻的編輯,數位 及 析 的問題,而其中有關 禪的文集是研究中國 教宗派 期歷史的重要資源 者討論寫 的 文 學及文 性質並探討許多 文獻編碼協定(TEI)可相容性之標記的各種可能性,而除了影 像的轉 外, 些是有關 類文獻的校對 翻 註解 語意和 法之 析 其方 法是 壇經 的敦煌文 之實驗性的標記為例 而 文的第 部 則是 紹新近 開始的禪學資料專案(Chan Database Project) 及討論處理有關禪學文集的協 方 式 在附錄,則 述並比較 壇經 不 版 的形聲系 ,抄寫慣例及錯誤 關鍵字: 祖壇經 敦煌寫 借字 析性標記 禪學 Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 9 Introduction: The Significance of Dūnhuáng Manuscripts 1 In ca. 1900, thousands of manuscripts were found behind a wall of the Mògāo 莫高 Cave 16/17 (Dūnhuáng, Gānsù Province, China). Soon after, most of the manuscripts were removed from China by several expeditions from Great Britain, France, Russia, and Japan. Today, the majority of the Dūnhuáng manuscripts are stored at various institutions such as the British Library (Stein Collection) and the Bibliothèque Nationale (Pelliot Collection), as well as collections in Russia (The Institute of Oriental Manuscripts), Japan (e.g., Ryūkoku Univ., Kyoto), and China (e.g., The Dūnhuáng Academy, The National Library of China in Běijīng; Běijīng University Library; there are also collections in 2 Tiānjīn, Shànghăi, and other places in China). Especially since after World War II ‘Dūnhuáng studies’ have developed into a major field of research and today numerous individual scholars and institutions are investigating the textual and iconographic materials from a variety of perspectives. The manuscripts are one of our most important sources for the study of medieval Chinese religion and culture. Whereas most of the Chinese manuscripts consist of copies of canonical Buddhist scriptures, there is also a significant amount of texts on popular religion, as well as sectarian texts. Many of these non-canonical texts were not transmitted after the Táng Dynasty and the Dūnhuáng materials give us a unique window for studying Buddhist history, doctrine and practice from ca. the 7th to the 10th centuries. Texts of the early Chán 禪 Schools, Esoteric Buddhism, Buddho-Daoist texts, ‘popular’ Chinese religion and related topics (including devotional and ritual texts, almanacs, prognostication and astronomical texts, talisman manuals, etc.) have received special attention among scholars. Until the discovery of the Dūnhuáng texts, our understanding of the early history of Chán was to a great degree based on much later Sòng Dynasty materials and the 3 retrospective understanding of Táng Chán during that period. The study of the 1 2 3 We want to thank the two anonymous reviewers of the article for their many helpful comments. For a very good introduction to Dūnhuáng studies and the history of the manuscripts, see the following webpage (‘The International Dunhuang Project’): http://idp.bl.uk/. 10.000s of manuscripts and manuscript fragments are digitized in high quality and freely downloadable (the digitization of the Pelliot and Stein collections is nearly complete, whereas only parts of the Russian and Chinese collections are included so far). The digitized manuscripts are most conveniently found by manuscript number, other search functions of the webpage are unfortunately only at a rudimentary stage. The Sòng versions of Táng materials were often heavily revised and altered, and, retrospectively, a Sòng Dynasty understanding of the development of the Chán School(s) was imposed on earlier materials. Táng texts which did not fit the doctrinal or historiographic 10 Chung-Hwa Buddhist Journal Volume 25 (2012) Dūnhuáng Chán texts revolutionized the study of the early period in the evolution of Chán. However, despite the immense progress of Chán studies from the 1970s to the 1990s there are still many texts which have not been properly edited, analyzed or 4 translated, and many problems pertaining to the texts have not been solved. The Scholarly Value of Dūnháng Manuscripts The manuscripts are not only an important source for the study of medieval Chinese Buddhism but also for research in the development of the semantics and syntax of medieval Chinese, including colloquial grammatical constructions (classifier constructions, plural formation, coverb constructions, sentence finals, etc.). There are certain types of Dūnhuáng manuscripts which contain a considerable amount of vernacular elements, most importantly the so-called Transformation Texts 5 (biànwén 變 文 ) and related genres. Also certain types of Chán treatises contain important information of the development of medieval vernacular Chinese (e.g., the 6 treatises attributed to Shénhuì and his disciples, and the Lìdài fǎbǎo jì 歷 法寶記). As such, these materials are important sources for the study of the transition from treatises written in Buddhist Hybrid Chinese to more vernacular types of narratives (many of these texts are characterized by containing a considerable portion of passages with direct speech).7 Copied by hand, the manuscripts are equally important for the study of palaeography during the Táng period, in addition to scribal conventions and errors, the study of phonetic loans, dialects, and vernacularisms. Medieval manuscripts are a significant source for reconstructing the development of Middle Chinese with its colloquial vocabulary and vernacular grammatical constructions. Many grammaticalized function words still current in Modern Mandarin and other modern varieties of Chinese originated during the late Táng (or, more precisely, surfaced in texts during that time). Thus, some 4 5 6 7 standards of the Sòng Dynasty were often not transmitted at all (on “text sanitation” during the transition period from Táng to Sòng, see for example Anderl 2012a, 16-26). E.g., the interdependence between texts; there are also few properly collated and annotated texts at this point, and many textual and philological problems have only been touched upon. On the genre of Transformation Texts, see for example Mair (1989). For a recent excellent study of that text, see Adamek (2007). Naturally, vernacular elements appear in passages recording direct speech and as such reflecting the spoken word to some degree. This can be also observed in another early vernacular text dating from the middle of the 10 th century, the Zǔtáng jí 祖堂集 (ZTJ). In this text, the frame narratives are usually using a more conservative language whereas many of the passages in direct speech are written in the vernacular (on aspects of the language of ZTJ, see Anderl 2004; more generally, on the features of vernacular Chán texts, see Anderl 2012a). Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 11 manuscripts contain many early written forms of function words used in spoken Chinese. Since many of these function words were representations of words used in the spoken language, Chinese characters were loaned in order to present their phonetic value. It was usually not before the Sòng period that specific characters were created to represent these colloquial words. A good example is the appearance of the pronoun shénme 什麼 (什么) which was written in various forms on Dūnhuáng manuscripts, e.g., 是沒 (Dūnbó 77), 是 摩 / 甚摩 (Stein 2503), 甚謨 (Stein 2669), 甚物 / 甚沒 (Bǎolín zhuàn 寶林傳, 801 AD), 甚麼 (10th cent.). Dūnhuáng Chán materials reflect different degrees of colloquialisms, depending on the period they were written in and which genre they belong to. The Chan Database Project (CDP) The recently initiated CDP8 aims at electronically publishing Chán texts with a critical apparatus and a set of analytical modules. In this paper, certain strategies and problems concerning this aim will be discussed. Although a variety of Chán texts (including the printed editions starting from the Sòng Dynasty) are included in this project, one of the major challenges will be the technical and analytical framework for the publication of the corpus of the Dūnhuáng Chán manuscripts. In this paper, only a few problems will be addressed and illustrated by an experimental edition of the Dūnhuáng manuscripts of the famous Platform sūtra.9 The aim was the production of a collated and annotated version of the Dūnhuáng Platform sūtra which allowed annotations and comments on several aspects of the text. One of the motivations for the initiation of such a project was the realization that— despite the above described importance of the manuscripts in terms of Buddhist and linguistic studies—there are frequently no authoritive and collated editions of many important manuscript texts, and often the philological and linguistic aspects have been somewhat neglected in the study of the materials. In many studies of Chinese Buddhist texts in the West, there seems to be an overall contrast to the approach taken in the research on Sanskrit Buddhist texts and Gāndhārī manuscripts, for example (which shows a strong emphasis on thoroughly edited texts and philological studies).10 Not only being a 8 This project was originally initiated by the late John McRae, Christian Wittern, and Christoph Anderl, and aims at creating and applying tools for editing and analyzing Chán/Zen Buddhist texts, as well as organizing collaboration within the field of Chán/Zen Buddhist text studies. 9 This work on the Platform sūtra edition was originally started as a master class on Buddhist Dūnhuáng texts at Oslo University taught by Christoph Anderl, with Christian Wittern (Kyoto University) supervising the work on TEI compatibility and programming. The basic programming and transformation of the xml mark-up was done by Kevin Dippner. The markup and anaylsis was done by Anderl and Visted. We want to thank all participants of the course for their helpful comments. 10 An exception to this tendency is the study of (early) Buddhist translation literature in China; 12 Chung-Hwa Buddhist Journal Volume 25 (2012) purpose in itself, thorough philological research on the texts will reflect back on our understanding of their contents, as well as being helpful in contextualizing them historically and intertextually.11 Some Important Features of the Manuscript Texts Variant Characters The study of character variants has developed into a significant subfield in the study of Dūnhuáng manuscripts and the materials are important sources for the study of the orthography and writing conventions of the Táng period. The history of many ‘nonstandard’ characters is extremely complex and important for deciphering the texts. Historically, many Chinese characters which served as models for establishing the abbreviated characters in the process of the language reforms in 20th century China, were actually based on ‘vulgar’ (and other) forms of Táng and Sòng characters, in addition to ‘ancient’ forms of characters which were revived during these periods. After the Táng, Dūnhuáng texts gradually ceased to circulate in China and many forms of characters typical for Dūnhuáng writing conventions were forgotten or became obsolete. On the other hand, many character forms were transmitted to Japan and continued to circulate there until modern times.12 By recording the palaeographic features of the manuscripts 11 12 these studies are deeply influenced by the philological approach of Sanskrit/P li studies. Specifically, modern Chán Buddhist studies in the West often seem somewhat reluctant to approach texts also from a linguistic and philological angle, occasionally resulting in interpretations and translations based on a fragmentary understanding of the language they are written in. Part of the problem is maybe the fact that there is hardly any systematic training in the semantics and syntax of Buddhist Hybrid or Medieval Vernacular Chinese at Western universities. These types of texts are in many respects fundamentally different from texts written in ‘Literary Chinese’ (for a good contrastive case study, see for example Harbsmeier 2012; for a grammar of the vernacular language of the 10th century, see Anderl 2004). Interesting examples are the contractions (for púsà 菩薩 ‘bodhisattva’), (for nièpán (for pútí 菩 ‘bodhi’) which were widely used in Dūnhuáng texts 涅槃 ‘nirvāṇa’), and but eventually ceased to be used in China. However, these characters continued to circulate in Japan and are nowadays even frequently recognized by non-specialists! For a list of special characters used in Japanese Buddhist manuscripts, see Ui (1983). The history of many Dūnhuáng variants needs further investigation. Dictionaries such as the Lóngkān shŏujìng 龍 龕 鏡 (10th century) were criticized by scholars of subsequent periods for containing unusual Chinese character forms. However, after the discovery of the Dūnhuáng manuscripts in 1900 it became clear that the motivation for the compilation of this dictionary aimed at providing the reader with the correct pronunciation of characters, as well as providing reference to non-standard characters widely circulating on handwritten manuscripts and inscriptions. Even for early Sòng Buddhists themselves, it had become difficult to understand texts written in countless different forms of characters. Establishing the ‘correct’ (zhèng ) Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 13 and collecting them in a database, the development of the Chinese characters during these periods can be studied in a more systematic way. 13 In addition, orthography and calligraphy can be an important factor in dating the copies of the manuscripts. In many Dūnhuáng materials, multiple forms of the same character can appear in the very same text. Below, there are a few examples of character forms appearing in the beginning section of the Stein (left) and Dūnbó (right) versions of the Platform sūtra:14 Scribal Errors and Conventions By contrast to the often heavily edited and revised printed Chán scriptures of the Sòng period (many of them eventually being integrated in the official Buddhist canon sanctioned by the imperial court), Dūnhuáng Chán manuscripts were copied by hand and—besides giving us information about the early stages of a text’s formation—are a rich source for studying scribal conventions during different periods of the Táng dynasty, in addition to errors and inaccuracies typical for the process of copying. The study and identification of these typical errors and misreadings (for a few examples, see below) facilitate the reading of handwritten manuscripts and the identification of corrupt pronunciation and form was of great concern for the Buddhist scholars during the Táng and later periods; on the one hand for reasons of philological concerns (there was an amazingly high level of insight by many Buddhist scholars concerning the phonological, palaeographic, and semantic aspects of texts), on the other hand based on the assumption that only correctly pronounced characters/words were soteriologically efficient (especially in the dhāraṇī and mantra texts which became greatly popular among all Buddhists from the 8th century onwards). 13 On a discussion of character databases, see the article by Christian Wittern in this volume. 14 There are both differences in character shapes internally (i.e., within the same text) as well as compared to the other manuscripts. 14 Chung-Hwa Buddhist Journal Volume 25 (2012) passages. Dūnhuáng manuscripts are also a rich source for studying conventions of adding diacritics and markers in the texts. During the process of editing texts during the Sòng dynasty, these markers (including section markers) were usually removed. Thus, Táng dynasty manuscripts give us important information not only on the process of copying but also on the conventions of reading the texts15 (often, markers are inserted by the reader or monastery librarian rather than the copyist).16 A rich source for errors is the similarity of characters in their handwritten forms which—in the process of copying— are confused with each other. Dūnhuáng manuscripts are also an very important source for the oral features of texts and the phonetic loans used in them (for a list of phonetic loans in the Platform sūtra, see the Appendix to the article). An important subtype are dialect phonetic loans which appear in a number of manuscripts and usually reflect the language of the Northwestern regions during the periods of the Táng Dynasty. Some Important Aspects in the Digitization of Buddhist Manuscripts The digitization of Buddhist texts and the availability of manuscript facsimile have progressed immensely during the recent years. This opens for the possibility to develop tools for enhancing our understanding of these texts and manuscripts through an analytical ‘fine-reading’. Analytical Modules The multi-faceted features (paleography, orthography, linguistic and Buddhological aspects, etc.) of manuscript study call for flexible approaches in the study of the E.g., there are ‘performance markers’ (text portions usually inserted with smaller characters) in the manuscripts, suggesting that the scripture was used in ritual contexts related to the bestowal of the precepts/commandments. The inserted passage informs the reader how often sets of precepts have to be recited unisono during the ceremony. These markers are usually not extant in the Sòng editions. 16 For an interesting study of these markers, see Galambos (forthcoming). For a more thorough forthcoming study on these features of the Platform sūtra, see Anderl (2012b). In this paper, I also try to show that a thorough philological approach can unravel new aspects of a text. Concretely, a study of the textual features, internal structure, and intertextual relations (i.e., certain features typical for ‘esoteric’ texts can be found) of the Platform manuscripts suggest certain re-evaluations of the text, for example, the possibility that the title Tánjīng 壇經 (Platform sūtra) originally did not refer to the text itself at all, but rather to the Diamond sūtra, a text which was especially important in the Platform rituals of conferring the Mah y na precepts at large congregations. As such, the text itself originated possibly as a commentary to the Diamond sūtra, and the Platform sūtra only gradually developed an ‘internal’ reference to itself (for a detailed forthcoming study, see Anderl 2012b). 15 Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 15 materials. 17 The development and implementation of XML-based markup seems to accommodate many needs in this respect, including analytic ‘modules’ for different purposes, the possibility for constant revision, multiple transformations and visualizations, as well as entering into an interactive dialogue with the ‘text consumer’ or fellowresearcher.18 Some Objectives for the Study of Chán Texts - Web-based editions of important Chán manuscripts and texts can be permanently updated, extended, and revised. - Once developed, the edited texts can be analyzed by a set of analytical tools (e.g., syntactic analysis, terminology/dictionary tools, ‘text dependency’ analysis, character analysis). - Chán materials in non-Chinese languages (e.g., Tibetan, Uighur, Tangut, etc.)—which are of great importance for the development of this branch of Buddhism in the East Asian context—have so far been rather neglected in Chán studies. - Manuscripts give us a unique insight in the processes of text production and reproduction (as opposed to extant printed texts edited and ‘sanitized’ during the Sòng period, for example). A thorough documentation of these features is the basis of a better understanding of these processes. A documentation of textual features is not only important for palaeographic and linguistic studies but also in the framework of religious studies; e.g., the textual build-up and structure can give us important information on the development of a text, which again might reflect the evolution of doctrines, lineage systems, for example. In addition, the study of textual features can be important for the 17 18 A similar approach was taken in a recently initiated database project on Buddhist narratives at the Ruhr University Bochum (The Mercator/Ceres Database of Buddhist Narratives; edited by Christoph Anderl and Jessie Pons). Based on the diversity of the materials (both textual and iconographic materials, in addition to information on locations), a system of dynamically interconnected sets of sub-collections was used in the XML database. According to specific needs arising during the concrete work with the iconographic and textual materials, customtailored tools and modules are developed and implemented (e.g., input masks for subsets of data, analytical tools, visualizations, etc.). The ca. 20 sub-databases are held together by a system of ‘labels’ for narratives, texts/manuscripts, and places (which can be interconnected to each other). The internal research database has been online since 2011, whereas a public version will be published in November 2012. As it is also pointed out in other contributions, the XML approach also contains certain difficulties, such as the necessity to follow a strictly hierarchical build-up and nesting. Thus, multiple mark-up of the same text might overlap and offend against this rule. A ‘module’ approach could facilitate the work on the text, i.e., different aspects of the same text are analyzed and marked-up separately (“stand-off” mark-up; as a by-product, the reader can activate or deactivate specific modules when reading the text). Another problem is naturally the time-consuming aspect of implementing analytical mark-up to texts. As such, questions of quantity versus analytical quality have to be constantly considered and balanced. 16 Chung-Hwa Buddhist Journal Volume 25 (2012) - - - - dating of texts, as well as for linking and ‘contextualizing’ them within a corpus/group of texts.19 The analysis of Chinese characters: The Táng Dynasty witnessed the emergence of numerous new character forms (specifically vulgar and abbreviated forms of Chinese characters). Syntactic analysis (see below). The development of Chán terminology: The mark-up and registration of Chán terminology in the relevant texts can provide researchers with important information of the evolution of terms. A ‘text dependency’ module will enable the mark-up of relationships between texts and parallel passages. This will facilitate the study of the often complex relations between texts or text portions and also aid in the dating of the manuscript texts. Such a tool would also help researchers to retrace the origin, development, and interdependence of themes, topics, ideas, and concepts as they appear in texts from various periods. Ideally, instead of marking-up text portions or narrative sections by hand, dependent texts could be automatically identified by sets of overlapping items. Dictionary module (e.g., the linking with internal referential databases or external databases such as the DDB).20 19 See also the Appendix to the paper: the study of manuscripts features can give us important information on the actual function of texts, e.g., the emphasis on ‘orality’ and ritual functions (as indicated by ‘performance markers’ which were often removed in edited and printed versions of texts). 20 On the Digital Dictionary of Buddhism (DDB), see Charles Muller’s article in this volume. Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 17 Illustration 1: Library building at Haein-sa 海印寺 where the Tripiṭaka Koreana is stored (Second Kǒryo 高麗 edition; also referred to as Chaejo Taejanggyǒng 再調大藏經). The project was initiated in 1236 by King Kojong 高宗 in order to secure help from Buddhas and Bodhisattvas against a pending invasion of Korea by foreign armies (i.e., a project in the context of ‘state-protecting Buddhism’). The work of carving the 81.258 wood blocks (most of them carved on both sides, amounting to 162.516 surfaces) lasted until 1251. One woodblock measures ca. 67x23 cm and is ca. 3 cm thick, weighing around 3,5 kg. There are typically 23 lines carved on each surface, each line consisting of 14 Chinese characters (ca. 322 per surface), totaling about 52.330.000 characters. After having disappeared from China during the Song dynasty, the text survived in Korea and was carved in the 15th century as part of the ‘supplementary canon’ of the Tripiṭaka Koreana. However, the text was never printed before the printing blocks were rediscovered in the beginning of the 20th century in Korea. ZTJ (which is one of our main sources of early Chán historiography) was carved on 386 surfaces (ca. 190.000 characters). Today, the canon is still stored in the library building which dates back to the 15th century. There was an attempt to move the printing blocks to a modern library facility but within weeks the woodblocks started to decay and had to be returned to the old building. The original building appears to have been designed intuitively to provide ideal storage conditions (e.g., windows of different size insure natural ventilation; a special kind of moisture-absorbing clay which covered the floor; the way the woodblocks are arranged on shelves; etc.).21 21 Photograph by C. Anderl; on the background of the printing of ZTJ, see Anderl (2004, 1:2-52). 18 Chung-Hwa Buddhist Journal Volume 25 (2012) Illustration 2: Detail of a printing-block of ZTJ; scribes outlined each character on the woodblock in mirror-writing and afterwards the wood surrounding each character was chiseled out; the tool marks are still recognizable on the blocks; the wood (birch tree) is of exceptional hardness and was especially prepared for carving during a process lasting several years (photograph by C. Anderl). Work-steps in the Establishment of a Chán Database: - Determining the text corpus22 - Input and text collation - Linking of facsimiles with digital editions - Basic mark-up and linking the text with reference materials (e.g. information on proper names, Buddhist terms, etc.) 22 The most important groups of materials consist of (1) Dūnhuáng texts, (2) the printed texts of ‘classical’ Sòng Dynasty Chán (including primarily historical transmission texts (chuándēng lù 傳 錄), recorded sayings texts (yǔlù 語錄), and collections (gōngàn 案); (3) materials which complement and contextualize the above materials, e.g., letter-exchanges between monks and officials, descriptions of Chán Buddhism in non-Buddhist materials, funeral and pagoda inscriptions, imperial edicts, Neo-Confucian yǔlù, ritual texts, texts on monastic rules, iconographic materials, lineage charts and other diagrams, etc. Another important aspect is the inclusion of non-Chinese materials (e.g., in Tibetan, Tangut, Uighur). Whereas the corpus of (2) is relatively easy to determine, it is considerable more difficult to pinpoint the relevant Dūnhuáng manuscript materials. The point of departure are the texts listed in Yanagida Seizan’s Zenseki kaidai 禪籍解題 (Nishitani, Keiji 西谷啟治/Yanagida, Seizan 柳 聖山 1974, 445-514). This list was recently expanded by Tanaka, Ryoshū; see also Sørensen (1989) for a discussion of early Chán materials (with an emphasis on the esoteric texts). There needs to be done more research concerning the manuscripts stored in the minor collections (e.g., the collections of the Peking University and the Peking National Library, and those in Shànghǎi, Ti njīn, Dūnhuáng, etc.). Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 19 - Development and implementation of analytical modules (terminology, syntactic analysis, text dependency,…) - Collaboration, development of (multiple-user) ‘interfaces’,23 specific projects, etc. Illustration 3: Experimental transformation of a Zǔtáng jí mark-up into an edited text parallel to the woodblock facsimile. Circled items mark place and personal names, respectively, and can be connected to referential databases on proper names. In addition, the edited text was linked with an XML version of Anderl’s grammar on ZTJ. Entries in the grammar are automatically matched with the text and the grey dots on top make the grammatical annotations by Anderl visible (the initial mark-up of ZTJ and the transformation/programming was done by Christian Wittern; this version of ZTJ is currently off-line). 23 The implementation of input- and analysis-interfaces for specific tasks can facilitate the work on the mark-up considerably, as compared to the time-consuming work in programs such as Oxygen. 20 Chung-Hwa Buddhist Journal Volume 25 (2012) Illustration 4: This diagram shows the complex interrelation between the manuscript and printed versions of the Platform sūtra (the diagram is drawn based on Yáng Zēngwén’s reconstruction of the genealogy of the text). 24 The Mark-up of the Platform Sūtra: Collations Many Chán texts exist in several versions, having varying textual features. An important issue for analytical web editions will be the collation of these manuscripts and the inclusion of other important witnesses (on the Platform sūtra versions, see ill. 4; for a short description, see the bibliography).25 In the concrete work on the Platform scripture one of the specific problems was related to the question how the label <lem> should be applied. All manuscripts of the Dūnhuáng text contain a great amount of errors, phonetic loans, and corrupt passages. The <lem> labels was—somewhat atypically—used for marking an ‘ideal’ reading of the text; thus it is the ‘reconstruction’ of an ideal textual version according to the view of the 24 Yáng (1993, 297) and Lǐ (1999a, 19). 25 In the work on the text, it was attempted to include all extant manuscript witnesses (Or.8210/S.5475, Dūnbó 77, BD.48; the Lǚshùn manuscript was recently ‘rediscovered’ in China; however, no facsimile reproductions were accessible during the work on the text), in addition to occasional references to Sòng printed versions. For a description of the manuscripts, see Anderl (2012b); for the Sòng editions, see Schlütter (2007, 394-405). Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 21 editors. The differing readings of the other witnesses are added with the <rdg> label. In future versions of the web publication there will be the choice to read the text according to one specific manuscript version or to read an ‘ideal’ text with notes on the readings of the differing versions. Illustration 5: Portion of the Platfom sūtra mark-up and manuscript collation in Oxygen. Note that sentence and phrase borders are generated with the <s> and <phr> tags. The basic mark-up contains references to personal names (‘persName’, subdivided into several categories), title (‘roleName’, with subdivisions), place names (‘placeName’), and terms (‘term’, with subdivisions). The collation within the apparatus <app> includes references to an ‘ideal’ reading according to the editors and mostly based on a manuscript witness. If all manuscripts have ‘corrupt’ readings, than a <lem> reading according to a later Sòng edition and/or the editors is established (e.g., <lem wit="#Editor">). Notes on the collation and the witnesses are inserted with <witDetail>, including references to the secondary literature. Additions, notes, deletions, etc. are also recorded in the manuscript description. 22 Chung-Hwa Buddhist Journal Volume 25 (2012) Example of Recording and Commenting Different Readings: <app><lem wit="#Stein_5475"> 業 </lem><rdg wit="#Dunbo_77" type="errShape" xml:id="w093-02"> 葉 </rdg><witDetail target="#w093-02" wit="#Dunbo_77">The characters 葉 and 業 are frequently confused with each other in Dùnhuáng treatises. Note that they have the same pronunciation and at the same time are similar in shape with each other. As such, this is a a “mixture” of errShape and phonLoan, or a case where characters are habitually interchanged with each other although they do not have a direct connection with each other.</witDetail></app> Within the apparatus (<app>) the lemma (<lem>) establishes the ‘correct’ reading according to the witness “#Stein_5475”, whereas the corrupt’ reading in the Dunbo_77 manuscript (wit=“#Dunbo_77”) is cited within <rdg>, with references to the type of corruption (type=“errShape”, i.e. based on the an confusion of handwritten characters). Details on the type of corruption are provided in <witDetail>. Example of Recording a Scribal Intervention: <app><lem wit="#Stein_5475 #Huixin"></lem><rdg wit="#Dunbo_77" type="annotation" hand="reader" rend="small"><add place="right"> </add></rdg></app> In this example the ‘correct’ reading (<lem>) is indicated as the absence of a character (by the lack of any information between the <lem></lem> tags) which is incorrectly inserted in Dunbo_77 manuscript on the right side (place=“right”) by an unidentified ‘reader’ of the manuscript (this can be for example either the copyist himself, a later reader or a temple librarian who archived the manuscript, hand=“reader”), rendered in small characters (rend=“small”). XSL defining the transformation into HTML for the <app> element (including <lem>, <rdg>, <witDetail>, etc.), with inserted programming commands in Javascript: <xsl:template match="tei:app"> <div class="balloonstyle" id="{generate-id(.)}"> <xsl:text>Reading(s):</xsl:text><br/> <xsl:apply-templates select="tei:rdg"/> <xsl:apply-templates select="tei:witDetail"/> </div> <a rel="{generate-id(.)}" onclick="right_side('{generateid((preceding::tei:pb[@ed='#Stein_5475'])[last()])}','{generate-id(.)}');"><xsl:apply-templates select="tei:lem"/></a> </xsl:template> <xsl:template match="tei:lem"> <font color="00bb00"><xsl:apply-templates/></font> </xsl:template> <xsl:template match="tei:rdg"> <script type="text/javascript">document.write(getWitName("<xsl:value-of select="@wit"/>"));</script> <xsl:text>:</xsl:text><br/> <script type="text/javascript">document.write(getRdgErrorType("<xsl:value-of Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 23 select="@type"/>"));</script> <xsl:text>: </xsl:text> <xsl:apply-templates/> <br /> </xsl:template> <xsl:template match="tei:witDetail"> <p/><xsl:text>Details:</xsl:text><br/> <xsl:apply-templates/> </xsl:template> <xsl:template match="tei:teiHeader"> <xsl:variable name="witnesstext"><xsl:apply-templates select="//tei:witness"/></xsl:variable> <script type="text/javascript"> function newWindow() { var generator=window.open('','vindu','height=500,width=600,scrollbars=1'); generator.moveTo("300","150"); generator.document.write('&lt;html>&lt;head>&lt;title>Witness details&lt;/title>&lt;/head>'); generator.document.write('&lt;body bgcolor="#aaaaaa"><h2>Witness details</h2><br/><xsl:value-of select="normalize-space($witnesstext)"/>'); generator.document.write('&lt;/body>&lt;/html>'); } </script> <a href="javascript:newWindow();"><div align="center"><b>View witness details</b></div></a> </xsl:template> <xsl:template match="tei:witness"> <xsl:text disable-output-escaping="yes">&lt;h3></xsl:text><xsl:value-of select="@xml:id"/><xsl:text>&lt;/h3></xsl:text> <xsl:variable name="a">'</xsl:variable> <xsl:variable name="b">"</xsl:variable> <xsl:value-of select="translate(., $a, $b)"/> </xsl:template> 24 Chung-Hwa Buddhist Journal Volume 25 (2012) Illustration 6: A ‘tripartite’ visualization of the marked-up text: On the left, the facsimile reproduction of the manuscript passage; in the middle, the collated version of the text, circled passages indicate parts where the manuscripts have different readings. The ‘ideal’ reading (<lem>) of the text can be chosen, or one of the readings recorded in the <rdg> section. By clicking on the green text portions the information on different readings is projected to the right column. Proper names are underlined. Translations and notes in the middle can be shown or hidden. In upcoming versions, the digitized text will be arranged vertically. Mark-up and text collation by C. Anderl and Ø. K. Visted; transformation/programming by K. Dippner (with support by C. Wittern). In order to encourage scholarly collaboration and permanent revision of the entries, future versions envisage a ‘comment box’ (concretely, the above entry could be modified by noting that wú 吾 actually did not become “obsolete” after the Hàn but that the usage of the pronoun decreased until the Middle Táng period). - As part of the collation process, the differences between the witnesses were analyzed and categorized (phonetic loans; erroneous characters because of similar shapes; added characters; scribal interventions, etc.). Since this type of mark-up is very timeconsuming other possibilities for collating texts should be considered, e.g., the digitization of electronic versions of different manuscripts which successively are ‘overlapped’ and a record of the differences automatically generated. As a second step, these differences have to be ‘manually’ analyzed. In addition, specific interfaces for mark-up work could be developed. Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 25 Typology of Textual Features in Manuscript Collations: - General ‘visual’ features, i.e., information about paper features, writing tools, text arrangement, general character size, characters per column/line, alignment of columns/lines, features of the title section, calligraphic/paleographic information: the description of these important features are difficult to integrate in the formalized collation itself; alternatively, more ‘narrative’ descriptions of manuscript sections could be useful, or an integration in the ‘head’ section of the mark-up. As a useful aspect of the ‘tripartite’ visual presentation of the material, these features can be directly viewed in the facsimile reproduction represented to the left. - Markers and scribal interventions 26 (punctuation, repetition markers, markers for reversing reading sequence (e.g. ), markers for superfluous characters (e.g. ), 27 scratched out characters ( ), empty spaces, inserted characters, small-sized characters): information on these features is integrated in the ‘collation’ part of the manuscripts. Example of a passage with characters inserted to the right side of the column/line: As an interesting feature, the text in small characters also includes repetition markers (rm) which do not mark the repetition of a single characters, but the group of characters preceding it (and, in addition, this group extending beyond sentence borders): this being the case, the passage must be analyzed in the following way: … 祖[弘忍 rm 和尚 rm]問 能… > … 祖弘忍和尚 弘忍和尚問 能… - Textual variations and ‘deviations’: this includes information on ‘missing’ characters, superfluous characters, corrupted characters,28 superfluous characters, phonetic loans, the wrong sequence of characters: An important aspect here is not only the recording of these deviations but also reflections on their typology and causes.29 Other variations It is sometimes difficult to decide by which ‘hand’ these interventions were inserted, either by the copyist himself (who read through his copy of the manuscript), by an owner/reader, or by a temple-librarian. Sometimes, manuscripts have layers of interventions and annotations. 27 Stein 5475:03.01; Stein 5475:20.04.03. 28 Corruptions are often caused by the speed of the copying process, and by the decreasing capacity of concentration in the course of copying a text. Many of the corruptions are inherited from one copy to the next, and in some cases become even fixed parts of a text. One special type of corruption concerns the ‘miscopying by context’, i.e., the copyist copies a characters which appears in the columns/lines to the right or left. Another corruption could be called ‘miscopying based on conventionalized sequences’ and often appears in disyllabic terms/words: the copyist replaces a somehow unusual character combination with one which is ‘fixed’ in his mind, e.g., frequently used Buddhist terms. 29 For a typology of phonetic loan characters and the miscopying based on vernacular, handwritten forms of the characters, see the Appendix. 26 26 Chung-Hwa Buddhist Journal Volume 25 (2012) encountered consists of the replacement of characters by (near-)synonyms or the replacement of a term/concept by a related term/concept. Examples for Frequently Miscopied Characters, Based on Their Handwritten Forms > 伐 > 特 > 持 > 自 (Stein 5475: 04-01-09) 30 (Stein 5475: 05-03-02; etc. ) (Stein 5475: 04-02-05) (Stein 5475: 05-02-10; 05-04-02) > Stein 5475: 09-01) > 但 ( 31 記 > 訖 (Stein 5475: 04-11-17) Some of the Many Handwritten ‘Vulgar’ Forms of Characters Found in the Platform Manuscripts:32 zuì 最 (modification/replacement of the determinative and right part of the phoneticum) bān 般 (modification of the upper right part of the phoneticum, typical for handwritten/inscribed forms during that period) jīng 經 (abbreviation of the phonetic part) 鍵 xiàng 相 (replacement of the determinative and modification of the phoneticum)33 入 文 jiān 兼 (modification/replacement of the lower part of the character) 件 的 shēng 昇 引 文 30 或This error can be found throughout the manuscript! For a thorough list of this type of errors, 重see the table in the Appendix. 31 點Note that the error is also motivated by the fact that the compound 集記 appeared earlier in the manuscript (‘error generated by the context’). 32 的Recently, many good reference works on Dūnhuáng variant characters have been published in 摘the PRC. A very good resource is also the ‘The Dictionary of Chinese Character Variants’ 要(http://140.111.1.40/main.htm), recording more than 100.000 different variants and providingth references to dozens of historical dictionaries (of major importance in this respect is the 10 century Lóngkān shŏujìng 龍龕 鏡). 33 您In the handwriting of many Dūnhuáng manuscripts, the number of strokes within ‘boxes’ is 可often modified, and structural elements such as 目 and 日 become undistinguishable. 將 文 字 Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 27 zuò 座 (modification of the left upper part of the phoneticum, 人 > , typically the same modification appears in other character containing the phoneticum 坐; compare also the right upper part of bān above.) xué 學 (a typical way of writing 學 in certain Dūnhuáng manuscripts; it is not incidentally that the replacement wén 文 ‘pattern; Chinese character; literature’ is chosen for the character meaning ‘to study’; this is actually an ancient form of this character.) zōng 宗 (an odd variant form of this characters, replacing both the determinative and modifying the phonetic part) zhǐ (‘slight’ modification of the upper part) dì 遞 (a radical abbreviation of the phonetic part) - The edition should be flexible enough and allow annotations and comments on several levels (multiple translations; multiple comments; linguistic analysis,…). These modules can be made visible or excluded, according to the interests of the reader. Tripartite Structure An important question is how to ideally structure and visualize the edition of such a text. Also in this respect, the flexibility of XML is convenient since different types of visualization can be generated according to specific purposes (e.g., printed editions, different types of web editions, ‘working’ editions, etc.). For our project, the following solution was chosen: on the left side, a reproduction of the original (inhibited by copy right limitations; in the text version only the Stein version is visible); in the middle, the edited and collated text; on the right side, the annotations to the textual features (see ill. 6). Some Notes on Syntactic Analysis One of the challenges of the CDP is to find proper methods for recording the textual and linguistic features of Dūnhuáng texts, in addition to providing other analytical tools. Many manuscripts pose great problems in terms of linguistic analysis, also due to the fact that many texts have heterogeneous (hybrid) features, i.e., integrating a variety of syntactic and semantic features based on a variety of styles, genres, and periods of language development. The section on grammatical mark-up in the TEI manuals is in this respect not fully developed yet and maybe also has to be better adapted to non-European 28 Chung-Hwa Buddhist Journal Volume 25 (2012) languages. 34 For consequent syntactic mark-up it would be also necessary to develop visual adds and interfaces for specific analytical purposes. Ideally, there should be the possibility for a layered analysis which covers different features of a text, e.g., the mark-up of syntactic units and the relationship between them, the identification and analysis of grammatical function words, the marking of modal and style features, etc. These reflections on useful grammatical analysis are still in a very tentative stage since considerable technical problems are involved. In terms of Literary Chinese/Buddhist Chinese, an ‘immediate constituent’ approach for the analysis of sentences seems to be useful since the sentence structure fits well to the hierarchal structure of XML mark-up. As such, the syntactic units are identified and their relationship between them determined. This kind of approach could be enormously useful as an aid for producing more analytical approaches to Buddhist texts and eventually more reliable translations. Another promising approach is the implementation of an underlying narrative grammar in XML-format which is linked to the texts (as described in the example above, where in a collaborative effort a mark-up version of ZTJ by Wittern was linked to a XML version of Anderl’s grammar on the text).35 In the course of the work on the Platform sūtra, several possibilities concerning the linguistic mark-up were considered. However, these consideration are only in an experimental stage (one problem is also the time-consuming aspect of this mark-up). 34 35 For a very interesting approach for the mark-up of Old Japanese see the article by Kerri L Russell and Stephen Wright Horn in this volume. After the transformation, the XML file of the grammatical notes still has to be ‘cleaned-up’ for the next version. Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 29 Illustration 7: Mark-up of a sentence in the Platform sūtra; <s> and <phr> are used in order to indicate the phrase structure and constituents are broken down until word level (<w>), specified with ‘type’ and ‘subtype’; further specification by ‘function’ and ‘ana’ elements ; ‘next’ and ‘prev’ are untypically (in terms of their definition in the TEI manual) used to indicated relations between immediate constituents; in future version, this will be replaced by ‘links’ (which will be used to define the relations between the phrases). Illustration 8: Possible ‘visualization’ of a grammatical mark-up based on the immediate constituent analysis; successive analytical ‘break-down’: sentences level, phrase level, word level, etc. The relationship between the constituents is indicated by a set of symbols. 30 Chung-Hwa Buddhist Journal Volume 25 (2012) Appendix: A Comparison of Some Textual Features of the Platform Manuscripts Conventions Used in the Table with Notes on the ‘Northwestern’ Dialect In the table below, the variations in the use of Chinese characters in the four manuscripts are compared.36 The addition and deletion of characters and other aspects of important differences between the manuscripts are not taken into account here.37 The focus is on phonetic loans, alterations of parts of the characters (such as the determinative or phonetic parts of the Chinese characters) and on mistakes made by the copyists based on similar (and often ‘vernacular’) shapes of the characters in the handwritings. There is also a minor category marked with ‘c’, indicating mistakes based on the context in which the characters appear.38 In addition to the registration of the ‘dialect phonetic loans’ it was attempted to analyze the system of ‘regular phonetic loans’ as well. Occasionally, it was difficult to determine whether a character variation was caused by an alteration of the determinative part (a very common phenomenon encountered in Dūnhuáng manuscripts) or should rather be interpreted as a phonetic substitution. It can be shown that except the rather high number of dialect loans and a few number of other uncommon phonetic loans, the manuscripts of the Platform sūtra generally use a system of more or less established phonetic substitutions, some having a very long tradition. As such, the use of phonetic loan characters is by no means arbitrary in the manuscripts.39 Attention has been given to the uncommon phonetic loans based on the dialect of the Northwestern region during the late Táng period. These loans are marked with ‘*’ and In the table, the Dūnbó 77 manuscript is abbreviated to ‘D.’, Stein 5475 to ‘S.’, the Běijīng manuscript to ‘B.’, the Lǚshùn manuscript to ‘L.’ (for a discussion of these manuscript copies, see Anderl 2012b). To the left, the assumed ‘correct’ character is listed. References to the later K sh ji (‘K.’, reflecting the Huìxīn version, based on Yampolsky’s edition) and Z ngbǎo (‘Z.’) editions are only provided occasionally for purposes of comparison. It also nicely illustrates how loans and mistakes were ‘normalized’ or ‘sanitized’ in the Sòng versions of the Platform sūtra (on these issue, see also Schlütter 1989 and Anderl 2012a, 16-26). The characters are usually listed according to their first appearance in the manuscripts, however, phenomena such as phonetic loans which are related to each other are grouped together (the characters taken out of their order of appearance are marked with ‘/’). This method aims at allowing a more direct comparison and illustrating ‘clusters’ of phonetic loans, for example. 37 Concerning this aspect of the manuscripts, see Anderl (2012b). 38 E.g., the case when the copyist mistakenly inserts a character which also appears in the right or left line/column. 39 References to two large dictionaries on phonetic loans have been used in the anal ysis of the system of loan characters (Loan 1 and Loan 2, see the bibliography). 36 Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 31 references to explanations in Dèng and Róng (1999) are provided. These loans are of great importance for determining the regional character of the manuscript copies and the differences in the use of this kind of loans among them. Although the Stein, Dūnbó and Běijīng manuscripts all use dialect loans, it is very obvious that they are most commonly used in the Stein manuscript (i.e., the ‘*’ appears most frequently in the ‘S.’ column of the table). The abundant use of regular and dialect loans also shows the important role of ‘orality’ in this type of manuscripts, i.e., the recording of the ‘sound’ of these texts was more important than focusing on orthography and finding the ‘standardized’ characters. This phenomenon can be observed in many Dūnhuáng manuscripts but seems to be especially current in texts originating during the Táng period (as, for example, the Chán treatises).40 A such, there is an abundant use of phonetic loans in this rather short text, in 40 Luó, Chángpéi 羅常培 (1933) was one of the first who tried to reconstruct the NorthWestern dialect based on a selection of Buddhist scriptures. However, the sources he had available for this purpose were rather limited. Later on, these dialect studies were expanded based on the identification of an ever-growing number of Dūnhuáng manuscripts in which dialect loans were detected. The most important scholar in this respect is Takata Tokio (e.g., Takata 1987 and 1988). He discerns two specific types of dialects which can be detected on Dūnhuáng materials, first, the dialect based on the language of Cháng’ n, the capital of Táng China. The ‘standard’ colloquial language of that time was based on this dialect, and also current in Dūnhuáng until it came under the control of Tibet (787 AD). The other one is the Héxī 河西 dialect. This dialect is also referred to as North-Western (Xīběi 西 ) dialect which started to prosper after the relations to the central government of China were cut. According to Takata, the dialect was also influenced by elements of the Tibetan language (e.g., zhū 諸 was pronounced ‘ci’). The usage of the dialect was at its height after 851 when Dūnhuáng became a quasi-independent area. Typical for the dialect loans used in the Dūnhuáng Platform sūtra, especially the Stein version, are the features that syllables with a nasal final ‘-ng’ are not distinguished from those without, resulting in homophones such as mí 迷- míng , tǐ 體 – tīng 聽, dì 第 – dìng 定, xī 西 – xīng 星, lǐ 禮 – lìng , etc. In addition, the initial consonants (shēngmǔ 聲母) of the 端 – 定 and the 審 – 心 categories are not differentiated, as well as the finals (rhymes) of the 侵 and 庚 groups (see Dèng and Róng 1999, 25-26; for other studies concerning the Northwestern dialect, see for example Shào Róngfēn 1963; for more bibliographic references, see Dèng and Róng 1999, 39-40). More recently, Takata (2000) has drawn attention to the heavy influence of the Tibetan language during the period of the Dūnhuáng occupation, and the 10 th century when Dūnhuáng was quasi-independent and communication to Central China reduced to a minimum. Large copying projects were initiated by the Tibetans (especially during 815-841, ibid:7) and bilingual communities (Chinese-Tibetan) were prospering. Eventually, many Chinese would even use the Tibetan writing system for writing Chinese! “What is important here is the fact that the tradition of writing Chinese and the Tibetan script established during the period of Tibetan rule was still maintained in the tenth century under Return-toAllegiance Army of the Cáo.” (ibid.:9). The developments outlined by Takata might as well be one of the factors that are reflected in the complex textual features of the late copies of 32 Chung-Hwa Buddhist Journal Volume 25 (2012) addition to exchanges of parts of the characters such as the determinatives (for example in Dūnhuáng manuscripts the exchange between the ‘tree’ 木 and ‘hand’ determinatives is frequently encountered), the many passages where characters are mistakenly left out or added, and the many corrupt passages based on the copyists’ misreading of the handwritten characters. These are all factors which make parts of the Dūnhuáng versions of the Platform sūtra difficult to decipher and understand. The corrupt characters based on copyists’ errors are marked with ‘#’ in the table. Although it is clear that the Stein manuscript has a larger amount of corrupt characters in this category, the Dūnbó manuscript nevertheless also contains plentiful of mistakes based on misreadings and a wrong interpretations of character forms. 41 A comparison of the use of phonetic loans and the number and type of corrupt characters also shows that the Dūnbó and Běijīng manuscripts are clearly closer to each other concerning their textual features (although by no means identical!).42 Many confusions concerning the copying of characters are caused by the use of ‘vernacular’ forms of characters and the structural similarities between them. Within the scope of this paper a thorough analysis of the orthography and paleographic features cannot be included here. Generally, it can be observed that there are major differences concerning the calligraphy and choice of character forms between the Stein and Běijīng manuscripts. In addition to the differences between the individual manuscripts, there are also significant internal differences, i.e., several forms of the same character are used in the same manuscript. The calligraphy of the Dūnbó manuscript (and also the Běijīng manuscript) is without doubt more ‘tidy’ and somewhat less ‘vernacular’ than the characters on Stein. the Platform sūtra, which include many oral and dialect features, a particular system of phonetic loans, vernacular and often faulty orthography, and all kinds of textual corruptions. 41 Especially in Chinese secondary literature, the Stein manuscript is referred to as ‘bad copy’ (èběn ), as opposed to the ‘good’ Dūnbó and Běijīng manuscripts. Another aspect of this judgment is the fact that the amount of mistakenly added or deleted characters is somewhat smaller on the Dūnbó manuscript, in addition to the much more even style of writing and text arrangement and the use of less distorted character forms as compared to the Stein manuscript. The Stein manuscript, on the other hand, often gives the impression that it was copied in a hasty and sloppy way. 42 A quantitative analysis is also difficult in this respect since in the Běijīng manuscript only ca. one third of the text is extant. Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 33 Table 'CORRECT' S. D. 般 授 般 波 B. 淨 官 陽 官 陽 K. Z. COMMENTS/REFERENCES 般 授 /授 授 靜 L. 凈 Traditionally not distinguished (Loan 1:#1529) Several occurrences; frequently interchanged in Dūnhuáng texts Loan 1:#2914 # 小 小 少 /小 /小 亦 /亦 少 小 小c 無(无#?) 少 小 亦 亦 乏 賣 之# 買 乏 賣 客 明 客 容# 明 小/少 (which are originally two forms of the same character) are frequently interchanged 少 少 又 亦 明 問 明 聞^ /問 /聞 /(聞) 聞 問^ 門 聞 聞 問 聞 /問 縣 (見=)現 門 問 縣 見 # Mistake in S. (deriving from structural similarities of the abbreviated version of 無?) which transforms by negation the meaning to its opposite 賣 /明 問 見 小 Deletion of the upper part of the character; traditionally, 買 is also a loan for 賣 (Loan 1:#0464) Many occurrences, but does not seem to be a regular phonetic loan Several occurrences 問 問 聞 Note that S. often incorrectly interchanges 聞 and 問; this is not a regular phonetic loan; note the cluster of these interchanges in all manuscripts Loan 1:#4591 Deletion of the inner part of the character in S.; however, 門 can function a phonetic loan for both 問 and 聞 (Loan 1:#4588, #4589, #4590) 縣 Phonetic loan (Loan 1:#4909) In the Sòng editions, 見 and 現 are usually differentiated Note the mistake in all mss.! 34 Chung-Hwa Buddhist Journal Volume 25 (2012) 'CORRECT' S. / / / / D. B. # # # L. K. 特# 持 持 /待 持# 待 待 業 業 葉# 業 /業 葉# 等# 業 等# 嶺 性 /性 /世 /性 /聖 領 嶺 世* 性* 聖* 性* 性 世 性 聖 語 * 差 記# 訖 * * 着 記# 說# 汝 外 汝 汝 * 如* 汝* 汝* 汝* 汝* * 汝 汝 汝 (汝) /汝 /汝 / / / / 汝 / /性 COMMENTS/REFERENCES Note that in this series the confusion of the two characters appear in all mss.! Often confused in Dūnhuáng texts; several occurrences ( ‘hand’ – 牜 ‘ox’) Typical substitution / confusion of determinatives ( ‘hand’ – 彳 ‘step’) This is probably not a phonetic loan. The replacement based on structural similarities occurs several times in D. (and in many other Dūnhuáng mss.)43 # 持 訖 訖 汝 Z. # # # 葉# In 善業; Dèng and Róng 1999:398, n.1 嶺 性 性 性 性 Often interchanged Dèng and Róng 1999:327, n.13 Dèng and Róng 1999:421, n.1 Dèng and Róng 1999:371, n.7 聖 Several occurrences; Dèng and Róng 1999:250, n.6; 390, n.2 Dèng and Róng 1999:223, n.3 Synonym 說# * 汝 Many occurrences; Dèng and Róng 1999:226, n.5; 397, n.19, n.21; 400, n.9; 411, n.4 汝 Dèng and Róng 1999:244, n.4 Dèng and Róng 1999:383, n.1 Dèng and Róng 1999:399, n.7 Dèng and Róng 1999:371, n.10 Dèng and Róng 1999:278, n.1 已* 汝 Dèng and Róng 1999:369, n.12 44 * Dèng and Róng 1999:371, n.9 Dèng and Róng 1999:313, n.3 43 Very similar shape in vernacular writing! 44 Can be interpreted as reversed sequence or as (twofold) dialect phonetic loan. Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 35 'CORRECT' S. D. 求 /求 汝等 救 求 汝汝 求 救 汝汝 B. L. K. Z. 智慧/知 慧 之知 知 智 知之 之知 知 之 知 之 /之 /之 /之 悟 /悟 知 知 智 諸* 吾 悟 之 之 智 之 悟 吾 /悟 /悟 /悟 /悟 /吾 急 澄 息 吾 吾 伍 俉 悟 急 呈 息 吾 伍 悟 悟 吾 呈 識*(?) 澄 息 識 衣 息* 於* 識 衣 衣 衣 /依 於* 於* 衣 依 Dèng and Róng 1999:229, n.9; several on S.; note this cluster of phonetic dialect loans Dèng and Róng 1999:324, n.8 依 /依 於* 於* /依 /於 /於 衣 於 放# 依 衣* 放# 依 Several dialect replacements on S.; e.g., Dèng and Róng 1999: 400, n.22 Several occurrences; Dèng and Róng 1999:407, n.11; 421, n.7; both mss. use the dialect loan! Here, of course, a ‘regular’ loan! /於 衣* 於 於 救 COMMENTS/REFERENCES Probably not a phonetic loan? Compare above! 汝等 Plural by reduplication (rare with pronouns!) as opposed to plural by suffixing Reversed sequence (or ‘reversed loans’!) Often interchanged (as demonstrated by the clusters below); but probably not a regular phonetic loan. 之知 之 之 智 Dèng and Róng 1999:423, n.9 悟 Note this cluster of interchanges! 吾 for 悟 is a traditional phonetic loan (Loan 1:#0598) 悟 悟 吾 吾 Several occurrences in S. c 衣 依 放# 依 澄 Dèng and Róng 1999:229, n.7 interprets this as dialect form Making this cluster of interchanges even more complicated, this corruption by structural similarity is intermixed with the above Dèng and Róng 1999:278, n.3; 279, n.11 36 Chung-Hwa Buddhist Journal Volume 25 (2012) 'CORRECT' S. D. /於 壁 衣* 壁 於 糪 教 /教 /教 求法即 善 終 間 /鄣 +教 故# 敬# 即善求 法 修# 問# 教 教 教 即善求 法 修# 間 鄣 / /鄣 / /鄣 秉 知 拂 喚 讀訖 留 問 祖 不/ B. 請#記# 流 門# 祖 K. Z. This is probably not a loan but a copying mistake A rare case of an added radical 教 教 求法 即善 終 間 求法 即善 終 鄣 Mistake in both manuscripts! # 知 拂 喚 請#記# 拂 喚 留 門# 但# 不 Loan 2:54 Confusion of determinatives Changed by modern editors; 請 記 makes sense in the original context Loan 2:653 留 問 (Near-)synonym Corruption? Several occurrences; Dèng and Róng 1999:238, n.13 for further examples Dèng and Róng 1999:340, n.8 是* 青 從# 但(#) 知 Reversed sequence Usually no differentiation in Dūnhuáng manuscript texts Deleted determinative in S. 唱 是* / 題 清 徒 COMMENTS/REFERENCES Dèng and Róng 1999:401, n.1 鄣 鄣 秉 和# L. 題 清 徒 Loan 1:#405 Loan 1:#2665 This does not seem to be a regular phonetic loan Missing determinative 法 法 氣如 氣如 去(#) 氣如茲# 命如 起 於 去* 生# 起 起 起 去 去* 起* 起 起* Several occurrences! Dèng and Róng 1999:247, n.1 Dèng and Róng 1999:272, n.9 去 Dèng and Róng 1999:264, n.12; 266, n.1 廋 庚# 庚# 命如 起 廋 廋 Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 37 'CORRECT' S. D. 捉/害 僚 除 餘 頭# 奪# 餘 除 捉 寮 除 餘 如 智 /智 /智 /知 /智 遇 於* 知 智 諸* 諸* 遇 如 智 知 智 知 智 愚 /遇 /愚 /愚 /過 定 等 遇 遇 愚 愚# 等 遇 愚 遇 遇# 等 坐 坐 (直心) 曲 情 /情 /情 /情 須 被 /被 座 座 真(#)心 典# 清 親* 性 性 順# 彼 被 置 盤 盤 故 明* * 坐 座 真(#)心 曲 情 親* 性 情 須 彼 彼 置 般 槃 故 迷 迷 無# 念#念 讀 為 念#念 續 般 /槃 (固) 迷 迷 為 念 續 B. L. K. Z. COMMENTS/REFERENCES 僚 除 Loan 2:1051 This ‘direction’ (除 > 餘) of loaning is unusual! Dèng and Róng 1999:251, n.9 如 智 Commonly interchanged Dèng and Róng 1999:383, n.5 知 Dèng and Róng 1999:267, n.7 Dèng and Róng 1999:365, n.7 Often interchanged; see Dèng and Róng 1999:251, n.11 Loan 2:611 (愚 > 遇) 愚 愚 愚 Loan 2:917 (遇 > 愚) 過 座 坐 坐 直心 曲 情 情 過 定 等 Often interchanged 直心 Also similar semantics Several occurrences Loan 2:460 情 Dèng and Róng 1999:390, n.11 Dèng and Róng 1999:401, n.2 Dèng and Róng 1999:402, n.3 須 Loan 2:249 彼 Loan 2:726 Loan 2:663 (entry #1) Loan 2:663 Loan 2:410 迷 迷 Dèng and Róng 1999:259, n.10 Several occurrences; Dèng and Róng 1999:264, n.7; 277, n. 20; 282, n.8; 325, n.4; 383, n.2; 407, n.6 為 念 Mistake in both manuscripts What looks like a change or confusion of determinatives (糹‘silk’ – ‘speech’) is 38 Chung-Hwa Buddhist Journal Volume 25 (2012) 'CORRECT' S. 是 為 K. Z. 為 為 是 無 住為 是 無 住為 離 雜見 境 雜# 雜見 鏡 離 離#境 境 境 /境 /境 /境 邪 /邪 /(耶) 須 /雖 敬 境 境 境 邪 耶 那# 雖* 須* # 邪 那# 雖* 雖 /雖 第 雖 弟 須* 第 着 體 /體 起心 看# 體 聽* 心起 看# 凈 起心 起心 不見 人過患 見 人過患 見 人過患 見 人過患 既 /記 記*[?] 既*[?] 記*[?] 記 記*[?] 記 見 見 見 自# 是*(?) 須# 願 體* 在自 在自性 思量 西* 參(#?) 妄 億 唱 曰 時 原 源 德 自在 自性在 思 星 森 妄 意 唱 曰 時 原 源 德 自在 無住 D. 見... 曰 時 源/原 /源 德 在自 在自性 思 /思量 星 森 妄 意 唱 無住 B. 無住 L. COMMENTS/REFERENCES actually an ‘established’ loan (Loan 2:966) I did not find any precedence to this exchange No precedence found 境 境 Loan 2:689 Loan 2:689 邪 邪 Common replacement Dèng and Róng 1999:266, n.2 雖 Several occurrences; Dèng and Róng 1999:347, n.11; 429, n.3 Dèng and Róng 1999:407, n.9 Several occurrences; originally identical characters (Loan 2:98) Several occurrences 着 Corruption in D. Dèng and Róng 1999:399, n.5 Reversed sequence 不見 人過 患 不見 人過 患 Missing negation in all manuscripts (generating the opposite meaning of the passage) Dèng and Róng 1999:271, n.6 記 見 時 見 Dèng and Róng 1999:298, n.5 classified as phonetic loan and not as dialect loan [?] More precise reference in later (K. and Z.) editions Dèng and Róng 1999:273, n.18 Not an established phonetic loan Dèng and Róng 1999:275, n.8 ‘Conventionalized sequence’ 在自性 Sequence 思 星 森 Synonymous Dèng and Róng 1999:280, n.17 Loan 2:594 意 昌 Loan 2:90 Loan 2:420 (entry #4.2) Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 39 'CORRECT' S. D. 各各 既 /即/則 /即/則 前 各各 既 則 即 前 各各 即 即 則 何 矯誑 矯誰#? 妬 垢# 證 如 如 如如 B. 冬冬# L. K. Z. 既 COMMENTS/REFERENCES Loan 2:129 則 則 何 Synonymous Two occurrences of this corruption 西# 矯雜# 疫# 垢# 西# 證 證 如如 如如 矯雜# 疫# 矯誑 矯誑 垢# 如 如 如 如 Shapes very similar in vernacular writing! Change / confusion of determinatives ‘conventionalized sequence’ 如 如 which is a frequently used Buddhist term Loan 2:647 猶 /猶 河 愚 彼 無 到 猶 何 愚 彼 無 到 何 思# 波# 不 到 河 思# 彼 不 倒 /倒 憶 倒 億 到 億 到 億 增 般若 增 般若 曾 增 譬 辟 譬 / 盡 心 深 /心 聞 是故 示 示 謂 見性 譬 來# # 盡 深* 心* 身* 聞 是故 亦# 是* 為 見 Replacement by conceptually / terminologically related items Loan 2:969 (#10) 盡 心 深 身* 聞 是 示 示 為 見 # 心 Dèng and Róng 1999:315, n.1 脫 說 脫 脫 Loan 2:48 2 occurrences 到 > 倒 seems to be more common than the ‘reverse’ loan 性 憶 憶 性 億> 憶 does not seem to be an established phonetic loan Loan 2:434 Dèng and Róng 1999:426, n.11 心 聞 是 示 示 為 見 心 Dèng and Róng 1999:421, n.4 Reversed sequence 示 Dèng and Róng 1999:319, n.6 謂 見性 脫 Loan 2:537(#2) 見性 Substitution by a term of related semantics There is a long history of the replacement of 脫 with 說 40 Chung-Hwa Buddhist Journal Volume 25 (2012) 'CORRECT' S. D. B. 縛 傳 傳# 縛# 縛 傳 縛 傳 縛 傳 /傳 轉 傳 謗 頌 謾# 訟 謗 頌 傍(#) 頌 頌 /頌 而 造 但 而 在 但 如 在 如 在 造 造 在 造 造 在 在 在 在 元 /元 出 無# 元 在 元 無# (无) /元 悔 大 裏 願 海# 大 中 自# 元 悔 疑 摩 磨 * 磨 摩 L. K. 頌 # 裏 Z. COMMENTS/REFERENCES (Loan 2:948) Confusion of determinative in S. ‘Complementary’ confusion (縛 < > 傳) The somewhat more usual direction of loaning is 傳 > 轉 and not, as here, 轉 > 傳 This is a rather common replacement 頌 45 Dèng and Róng 1999:326, n.7 是 Confusion by context (see also below)? Confusion by context (see also above)? 46 Several occurrences47 元 Note that the confusions appears both in S. (above) and D., based on the abbreviated version of 無! No precedence found 悔 # 裏 悔 Two occurrences Near-synonym Frequently confused in S. 自# 疑 摩 磨 疑 摩 Dèng and Róng 1999:329, n.11 Near-synonym and homophone! See above, but in reverse! 45 但是頓教 vs. 頌是頓教. 46 Róng and Dèng (1999, 350, n.1) consider chū 出 as mistake; however, this is not clear since the passage reads 邪見出 (在/是) 世間, 見出世間,邪 悉打卻,(菩 性宛然) (the last phrase is inserted according to K sh ji and is lacking in the manuscripts). It could be considered as ‘mistake by context’ since 出 appears in the second phrase and the copyist maybe sensed a parallel construction. In addition, 出 can have several meanings which fit the contexts, either ‘to emerge from’ (first phrase) or ‘to transcend’ (second phrase); K sh ji has the copula shì 是 instead of 出 (Stein) or zài 在 (Dūnbó: ‘be located in’). Possible translations which all make sense: “Wrong views emerge from the mundane (or: “Wrong views are located in the mundane”), right views emerge from the mundane (or: “right views transcend the mundane”), if ‘wrong’ and ‘right’ are all smashed, (the nature of bodhi is just as such).” The whole passage must have posed problems to the copyist/reader since the last phrase (the ‘conclusion’) was missing in the manuscripts. 47 The abbreviated form of 無 (无) is easy to confuse with 元. Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 41 'CORRECT' S. D. B. /魔 摩 伐# 魔 伐# 花 伐# 帝 帝 已 大* 德 /花 / 帝 48 L. K. Z. Several occurrences Omitted determinative (or loan?); several occurrences, but no precedence found for a ‘loan’ See above! 花 已 陀 得 陀 得 已 大* 德 得 不 種 /種 方 彈 德 不 重* 眾* 者 禪# 得 不 種 種 者 禪# 遠 遙(#) 帝 These are frequently interchanged in Dūnhuáng texts; Dèng and Róng 1999:334, n.12 否 種 Common interchange Dèng and Róng 1999:402, n.5 者 禪# 彈 /但 目 坦* 日# 壞 壞 壞 海 /大 海 除人 大海 海 害 害 肉# 破 波 破 破 了 西 人# # 了 西 了 西 無 This does not seem to be a common replacement (confusion by ‘convention’, maybe, since has a much higher frequency than 帝 in Buddhist texts) Dèng and Róng 1999:334, n.10 遠 但* 但 目 無人 COMMENTS/REFERENCES 彈 遠 但* High-frequency character 禪 in Buddhist texts; easily confused in the copying process Synonymous and similar in shape Dèng and Róng 1999:340, n.4 48 目 Dèng and Róng 1999:423, n.5 Frequently interchanged in Dūnhuáng texts (compare and 自) > 壞 is a ‘common’ replacement (Loan 2:625) Similar meaning 海 除人 人 Corruption or misunderstaning of this passage in the manuscripts The vernacular character for 肉: 宍 is similar in shape to 害 No precedence of the replacement of these (phonetically distinct) characters found; thus, rather a confusion or exchange of determinatives Could that also be interpreted as modification or confusion of the determinative instead of a dialect loan? 42 Chung-Hwa Buddhist Journal Volume 25 (2012) 'CORRECT' 處 理 S. D. K. Z. COMMENTS/REFERENCES 處 理 處 理 Corruption in S. 離 處 理 悉 俱 俱 悉 B. 悉 若欲 覓 破彼 得悟自 迷(#?) 若欲覓 真 破彼 得悟自 疑 疑 城 請 當(#?) 漕 疑 癡 誠 清 當(#?) 漕 癡 癡 除 喻 時 喻 除 如* /悉 彼有 得悟自 性 誠 請 常 L. No historical precedence for this replacement found Corruption in S.; replaced by a synonym ‘all’ (悉 > 俱) in the Sòng editions 欲得見 者 彼有 得悟 自性 誠 彼有 得悟 自性 In the phrase 無 彼有疑 Missing character in both manuscripts! Loan 2:164 Loan 2:461 (#5) 常 常 49 All occurrences in the mss. 疑 疑 Mistake in both manuscripts! Maybe motivated by the structural similarity and the somewhat related semantics in the Buddhist context (‘doubt’ vs. ‘ignorance’) [?] Dèng and Róng 1999:370, n.1 (>於) 文 聞 文 字 覺 # 家(#) 覺 字 文見 覺 覺 覺 覺 華 曾 Dèng and Róng 1999:374, n.1 ‘decomposed’ character > 斍50 斍 # (#?) 僧 即 Loan 2:1028 (#2) (#?) 曾 即 華 曾 Note this series of mistakes/ alternations on the D. manuscript involving the same character! Here motivated by the resemblance of the abbreviated form 斍 (覺) with . Mistake in both manuscripts! Added determinative in S. Mistake in both manuscripts! Probably not a confusion triggered by similar shape after all: there is a history of 當 replacing words of the ‘陽禪 ’ phonetic group (such as 嘗 and 償); however, no concrete precedence for the replacement 當 > 常 was found. 50 斍 is a vernacular form of 覺 misread by the copyist as two characters. 49 Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 43 'CORRECT' S. D. B. L. K. Z. 幸 人 把 想 入# 把 相 人 犯# 想 味 觸 含 獨# 含 味 觸 合# /含 含 舍# (用?) # No precedence found (usually, none of the two characters are loaned or have phonetic loans) Several occurrences Missing determinative in S. (no precedence for a phonetic loan found) Loan 2:333 味 觸 Altered determinative Dèng and Róng 1999:387, n.4 # 定 定 空# /定 弟* 定 Dèng and Róng 1999:404, n.5 油 火 能/解 遞 壇 十/拾 大# 解 迎(#?) 檀 拾 COMMENTS/REFERENCES Added determinative in S. (same phonetic value, ‘ 喻’, however, no concrete precedence found) (憂) 義 禮 淨 有 語* * 諍 火 能 遞 壇 十 因 有 義 禮 淨 久/永 撩 若 鞠 永 遼 共(#) 因# 掬 久 遼 共(#) 因# 鞠 四十 十四 四十 Reversed sequence 嶮 劍 嶮 中 眾 中 No precedence as phonetic loan found No precedence found but probably an unusual phonetic loan; both characters can have the Synonymous Several occurrences Several occurrences Synonymous Omitted determinative in D. 義 憂 義 Not phonetically identical Dèng and Róng 1999:402, n.8 Dèng and Róng 1999:402, n.9 久 撩 久 撩 Can be loaned for ‘靜從 ’ phonetics, such as 靖, 靜, etc. As such, this should be regarded as phonetic loan Near-synonym Loan 2:926 Altered determinative or phonetic loan? 44 Chung-Hwa Buddhist Journal Volume 25 (2012) 'CORRECT' S. D. B. L. K. Z. pronunciation ‘東知 ’ (Loan 2:10 and 744#4); both characters are sometimes loaned for 終 (which has the same pronunciation; see Loan 1:3352 and 3354) Very common loan (Loan 2:218) 員 覓/求 求 覓 報 保 保 遂 遂 Synonymous 報 日 日 日 處(#) 根 材 (建立) No precedence found and probably not a phonetic loan ( tone vs. 去 tone) One character is ‘decomposed’ into two in the process of copying 香 氛氛 崩 朋 據 報 # 香 氛 崩 COMMENTS/REFERENCES #立 Loan 2:546 Two characters misread (‘composed’) as one 據 根 林# 氛 Confusion of determinatives 林# #立 Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 45 Manuscripts, Editions, Bibliography Manuscript Witnesses Dunbo_77: The manuscript Dūnbó 77 is preserved at the Dūnhuáng Museum (Dūnhuáng bówùguǎn 敦煌博物館) as a booklet with 93 pages (‘butterfly binding), containing 4 texts, three claiming to be authored by Shénhuì 神會 and/or disciples, the Platform sūtra, and a Commentary to the Heart sūtra by the Northern School master Jìngjué 淨覺. Jorgensen (2008, 596) assumes that the texts were combined into a book in Dūnhuáng, since at the end of the 8th century a disciple of Shénhuì by name of Móhēyán 摩訶 (‘Mah y na’) tried to harmonize the teachings of the ‘Northern’ and ‘Southern’ Schools. P. 2045 contains the three Shénhuì texts in the same order and one can assume that the texts were written about the same time (during the period when Dūnhuáng was under the administration of Tibet; see Jorgensen 2002, 399-404 and Jorgensen 2005, 597). In Anderl (2012b), it is argued that the reason for combining the texts could have been motivated by the fact that they all deal with the teachings of prajñāpāramitā thought. The page reference of the digital edition follows the edition in Dèng and Róng (1999) who counts each side (and not full pages) of the butterfly binding. In the facsimile edition of Gansu (1999), there is an alternative way of counting the pages. The manuscript is complete and contains somewhat less variations and corruptions as the Stein manuscript, and has a more even and visually appealing calligraphic style. Stein_5475: The British Library manuscript with the number Or.8210/S.5475 is nearly complete, only three lines in the middle are missing; this manuscript is the source text of Yampolsky’s translation; this is a booklet consisting of 52 pages (including six blank pages: pp. 1, 44, 49-52 and two half-blank pages: pp. 2, 48). This manuscript is accessible as facsimile reproduction with very good resolution at the IDP (International Dunhuang Project; http://idp.bl.uk/database/). The first reproduction as facsimile appeared in Yabuki 1933, 102-103 and is also the source of the edition in T 48/2007, 337a01-345b17 (many mistakes!). It is also the source of the critical edition and translation of Yampolsky 1967, as well as the translation of Chan 1963. The edited text was also published by Suzuki/Kudo 1934 (divided into 57 sections; a structure which was adapted by Yampoksky in his translation) and Ui 1939-1943, vol.2:117-172. In this edition, each ‘page’ of the booklet is counted separately, thus each page consists usually of 6 lines/columns (the page with the title consisting of 4 lines). Beijing_48: Manuscript BD.48 (8024) is preserved at the Běijīng National Library. Parts of the beginning and the end are missing and only ca. one third is extant. The text is written on the back of an apocryphal sūtra, the Wúliàng shòu zōngyào jīng 無量壽宗要經. This version of the text was probably copied somewhat later than the Dunbo 77 copy.51 51 There is a manuscript fragment of the Platform sūtra stored at the same institution. However, BD.79 (8958) only contains four and a half lines of the text. For a facsimile reproduction, see Lǐ Shēn and F ng Guǎngch ng (1999, 232). 46 Chung-Hwa Buddhist Journal Volume 25 (2012) Lushun: This manuscript is preserved at the Lǚshùn 旅順 Museum (Lǚshùn bówùguăn 旅 順博物館) near Dàlián 大連 (Liáoníng Province) and has a complicated history; previously it was part of the tani Collection (which was scattered into public and private collections throughout Asia in 1914). In 1954, 620 Dūnhuáng manuscripts were removed and incorporated into the Běijīng National Library collection. Only 9 Dūnhuáng manuscripts remained at the museum, together with the bulk of ca. 20.000 manuscript fragments from Central Asia (Turfan, Kharakhoto). The manuscript with the Platform sūtra (no number) consisted originally of 45 folios (booklet with butterfly binding), folded into 90 pages (dated 959 AD). The whereabouts of the manuscript were unknown and until recently only two photographs of the beginning and the end were extant (Ryūkoku Library in Japan). However, recently, the manuscript was ‘rediscovered’ and seems to be complete (the discovery was celebrated as a sensation in the Chinese press, and an exhibition was organized at the Lǚshùn Museum). During the work on this paper, no facsimile reproduction was available yet. We want to express our gratitude to John Jorgensen who just informed us on a recent publication of the rediscovered manuscript. This version will be considered in our future work on the Platform sūtra. Printed Editions as Witnesses52 Huixin: This refers to the ‘reconstructed’ early Sòng Dynasty edition by Huìxīn 昕 (967); Huìxīn introduced the title Liù-zǔ tánjīng 祖壇經, in contrast to the extremely lengthy title of the Dūnhuáng manuscripts with an unclear referent to the appellation ‘sūtra’, the title by Huìxīn does not leave any doubt that the text itself is regarded as ‘sūtra’ (see Yanagida 1976 on this edition). Koshoji: The edition preserved at the K sh -ji temple (Kyoto, discovered in the 1930s) is based on this text. This version of the sūtra is much longer than the above discussed Dūnhuáng manuscripts editions, and includes materials appended during the Sòng dynasty (in addition of being heavily revised). The Qisong, Zongbao and Deyi versions consist of ca. 20,000 graphs. On the Koshoji, see Ui 1939-1943, vol. 2:113; reproduced photolitographically by Suzuki 1938; for a printed version, see Suzuki/Kudo 1934. Qisong: The edition by Qìs ng 契嵩 dates from 1056; he changed the title to Liùzǔ dàshī fǎbǎo tánjīng cáoqī yuánběn 漕溪大師法寶壇經 溪原 (The Platform sūtra of the dharma treasure of the great master Cáoqī—the original Cáoqī edition), usually referred to as Cáoqī yuánběn 溪原 (Yanagida 1976). The text consists of 20.000 52 For more extensive information on the manuscripts, see Anderl (2012b, forthcoming); on the Sòng editions, see Schlütter (1989). For an extensive and exquisite study on the formation of the hagiography of Huineng, see Jorgensen (2005). The study also includes useful materials on the manuscripts and editions, as well as a discussion of ZTJ in the context of Platform sūtra studies. Jorgensen’s work will be the foundation of subsequent studies in this field for many years to come. Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 47 characters, as compared to ca. 12.000 characters of the Dūnhuáng manuscript versions and ca. 14.000 of the Huixin version. Zongbao: The Z ngbǎo edition dates from 1291 and has the title Liù-zǔ dàshī fǎbǎo tánjīng 祖大師法寶壇經. This edition became the ‘canonical’ version of the text and is the source of T 48/2008, 245-265. Deyi: The Déyì 德異 edition is another edition from the Yuán period, edited in: Gen en’yū kōrai kokubon rokuso daishi hōbō dankyō 元延祐高麗刻 祖大師法寶壇經 (Zengaku kenkyū 禪學研究 23 [1935]:1-63). Xixia: The extant parts of the Xīxià 西夏 edition can be found in Shǐ (1993). In 1929 Beiping (Peking) University obtained more than 100 manuscripts from the Xīxià Buddhist canon, among those were 5 pages of the Platform sūtra (a translation into Chinese and reproductions of photographs were published in Luó 1932). Yampolsky_1967: This version, for a long time the authoritative edition and translation in the West, is based on Stein 5475, compared and supplemented with the Koshoji edition. Bibliography of Modern Editions and Secondary Literature Adamek, Wendi. 2007. The Mystique of Transmission: On an Early Chan History and Its Context. New York: Columbia University Press. Anderl, Christoph. 2012a. Zen Rhetoric: An Introduction. Zen Rhetoric in China, Korea, and Japan. Ed. Christoph Anderl. Leiden/Boston: Brill. 1-94. Anderl, Christoph. 2012b (forthcoming). Was the Platform Sūtra Always a Sūtra? Studies in the Textual Features of the Platform Scripture Manuscripts from Dūnhuáng. Chinese Manuscripts: Copies and Originals. Ed. Imre Galambos. Budapest: Eötvös Loránd University. Anderl, Christoph. 2004. Studies in the Language of Zǔ-táng jí 祖堂集. 2 vols. Oslo: Unipub. App, Urs. 1993. Rokuso Dankyō Ichiji Sakuin 祖壇經 字索引. Kyoto: Hanazono daigaku kokusai zengaku kenkyūjo 花 大學國 禪學研究 . Dèng, Wénku n 鄧文寬 and Róng, Xīnku n 榮新寬. 1998. Dūnbó běn Chán-jí Lùjiào 敦 博 禪籍錄校. Nánjīng: Ji ngsū gǔjí chūbǎnshè 江蘇 籍出版社. DDB. Digital Dictionary of Buddhism (general editor: Charles Muller). http://www.buddhismdict.net/ddb/. Dīng, Zh ngyòu 仲祜. 2000. Liùzǔ Tánjīng 祖壇經. Hong Kong: Xi nggǎng fójīng liút ng chù 香港 經流 處. F ng, Guǎngch ng 方廣錩. 1999. Tán Dūnhuáng běn Tánjīng Bi otí de Géshì 談敦煌 壇 經標題的格式. Dūnhuáng Tánjīng Héjiào Jiǎnzhù 敦煌壇經合校簡注. 139-144. F ng, Guǎngch ng 方廣錩. 2001. Gu nyú Dūnhuáng běn Tánjīng 關於敦煌 壇經 . Dūnhuáng Wénxiàn Lùnjí 壇經文獻論集. Ed. Hǎo, Chūnwén 郝春文. Shěnyáng: Liáoníng rénmín chūbǎnshè 遼寧人民出版社. 48 Chung-Hwa Buddhist Journal Volume 25 (2012) Féng, Qíy ng 馮其庸 and Dèng, nshēng 鄧安生. 2006. Tōngjiǎ Zìhuìshì 字彙釋 (An Explanation of Phonetic Loan Characters). Běijīng: Běijīng chūbǎnshè 京出版 社 2006. (Abbreviated reference in the table: Loan 2) Galambos, Imre. (forthcoming). Punctuation Marks in Medieval Chinese Manuscripts. Forthcoming. Manuscript Cultures: Mapping the Field. Ed. Jan-Ulrich Söbisch and Jörg B. Quenzer. Berlin: de Gruyter. (Page numbers according to the draft version) G nsù cáng Dūnhuáng wénxiàn bi nwěihuì 甘肅藏敦煌文獻編 會, ed. 1999. Gānsù Cáng Dūnhuáng Wénxiàn 甘肅藏敦煌文獻. 6 vols. Lánzh u: G nsù rénmín chūbǎnshè 甘肅 人民出版社. Guó, Péng 郭朋, ed. 1981. Tánjīng Duìkān 壇經 對勘. Jìnán: Qílǔ shūdiàn 齊魯 社. Guó, Péng 郭朋, ed. 1983. Tánjīng Jiàoshì 壇經校釋. Bĕijīng: Zh nghuá shūjú 中華 局. Guó, Péng 郭朋. 1987. Tánjīng dǎo dú 壇經導讀. Chéngdū: B shŭ chūbǎnshè 巴蜀 社. Harbsmeier, Christoph. 2012. Reading the One Hundred Parables Sūtra: The Dialogue Preface and the G th Postface. Zen Rhetoric in China, Korea, and Japan. Ed. Christoph Anderl. Leiden/Boston: Brill. 163-204. Jorgensen, John. 2002. The Platform Sutra and the Corpus of Shen-hui: Recent Critical Text Editions and Studies. Revue Bibliographique de Sinologie. 399-438. Jorgensen, John. 2005. Inventing Hui-neng, the Sixth Patriarch – Hagiography and Biography in Early Ch’an. Leiden: Brill. Lǐ, Shēn 李申 and F ng, Guǎngch ng 方廣錩, eds. 1999. Dūnhuáng Tánjīng Héjiào Jiǎnzhù 敦煌壇經合校簡注. Tàiyuán: Sh nxī gǔjí chūbǎnshè 山西 籍出版社. Lǐ, Shēn 李申. 1999a. Tánjīng Bànběn Chúyì 壇經 版 芻 . Dūnhuáng Tánjīng Héjiào Jiǎnzhù 敦煌壇經合校簡注. 12-26. Lǐ, Shēn 李申. 1999b. S nbù Dūnhuáng Tánjīng Jiàoběn dú hòu 部敦煌 壇經 校 讀 後. Dūnhuáng Tánjīng Héjiào Jiǎnzhù 敦煌壇經合校簡注. 109-138. Luó, Chángpéi 羅常培. 1933. Táng-Wǔdài Xīběi Fāngyīn 唐 西 方音. Shànghǎi: Academia Sinica. Luó, Fúchéng 羅福 . 1932. Liùzǔ Dàshī Făbăo Tánjīng Cánběn Shìwén 祖大師法寶壇 經殘 釋文. Guólì Běipíng Túshūguănkān 國立 圖 館刊 4/3 (Xīxià wén zhu nhào 西夏文專號). Mair, Victor. 1989. T’ang Transformation Texts: A Study of the Buddhist Contributions to the Rise of Vernacular Fiction and Drama in China. Cambridge (Mass.) and London: Harvard University Press. (Harvard-Yenching Institute Monograph Series 28) McRae, John R. 1986. The Northern School and the Formation of Early Ch’an Buddhism. Honolulu: University of Hawaii Press. McRae, John R., tr. 2000. The Platform Sutra of the Sixth Patriarch. Berkeley: Numata Center for Buddhist Translation and Research. Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts 49 Nakagawa, Taka 中 孝. 1953. Rokuso Danky no Ihon ni Tsuite いて. Indogaku Bukkyōgaku Kenkyū 3:155-156. 祖壇經の異 に就 Nakagawa, Taka 中 孝. 1954. Danky no Shis shikiteki Kenkyū 壇經の思想史的研究. Indogaku Bukkyōgaku Kenkyū 5:281-284. Nakagawa, Taka 中 孝. 1976. Rokuso Dangy 祖壇経. Zen no Goroku 禅の語録 4. T ky : Chikuma shob 筑摩 . Nishitani, Keiji 西谷啟治 and Yanagida Seizan 柳 聖山, eds. 1972. Zenke Goroku 禪家 語錄, vol. 2. Sekai Koten Bungaku Zenshû 世界 典文學 集, no. 36B. T ky : Chikuma shob 筑摩 . P n, Zhòngguī 潘重規. 1996. Dūnhuáng běn Liùzǔ Tánjīng dú hòu Guǎnjiàn 敦煌 祖 壇經 讀後管見. Dūnhuáng Tǔlǔfān xué Yánjiū Lùnjí 敦煌吐魯 學研究論集. Shūmù wénxiàn chūbǎnshè . Shào, Róngfēn 邵榮芬. 1963. Dūnhuáng Súwénxué Zuòpĭn Zh ng de Biézì Yìwén hé Táng Wǔdài Xīběi F ngyīn 敦煌俗文學 品中的別字異文和唐 西 方音. Zhōngguó Yǔwén 中國語文 3:193-217. Shǐ, Jīnb 史金波. 1993. Xīxià wén Liùzǔ Tánjīng cán yè Yìshì 西夏文 祖壇經 殘頁 釋. Shìjiè Zōngjiào Yánjiū 世界宗教研究 3:90-100. Schlütter, Morten. 1989. A Study of the Genealogy of the Platform Sutra. Studies in Central & East Asian Religions 2:53-114. Sørensen, H. Henrik. 1989. Observations on the Characteristics of the Chinese Chan Manuscripts from Dunhuang. Studies in Central & East Asian Religions 2:115-139. Suzuki, Daisetsu (Teitar ) 鈴木大拙 貞 郎 and Kuda, Rentar 連 郎. 1934. Tonkō Shutsudo Jinne Zenji Goroku Kaisetsu Oyobi Mokuji; Tonkō Shutsudo Rokuso Dankyō Kaisetsu Oyobi Mokuji; Kōshōji bon Rokuso Dankyō Kaisetsu Oyobi Mokuji. 煌出土神會禪師語錄解說及目次; 煌出土 祖壇經解說及目次; 聖寺 祖壇經解說及目次. T ky : Morie shoten 森江 店. Suzuki, Daisetsu (Teitar ) 鈴木大拙 (貞 郎), ed. 1942. Jōshū Sōkei-zan Roku Soshi Dangyō 韶 溪山 祖師壇經. T ky : Iwanami shoten 岩波 店. Takata, Tokio 高 時雄. 1987. Le Dialecte Chinois de la Region du Hexi. Cahiers d’Extrême-Asie 3:93-102. Takata, Tokio. 1988. Tonkō Shiryō ni Yoru Chūgokugo shi no Kenkyū: Kyū, Jusseiki no Kasei Hōgen 敦煌資料 見中國語 史之研究: 九・十世紀の河西方 . T ky : S bunsha 創文社. Takata, Tokio. 2000. Multilingualism in Tun-huang. Acta Asiatica 78:49-70. (page number references in this paper according to a digitized draft version of the article) Tanaka, Ry shū 中良昭. 1983. Tonkō Zenshū Bunken no Kenkyū 敦煌禪宗文獻の研究. T ky : Dait shuppansha 大東出版社. 50 Chung-Hwa Buddhist Journal Volume 25 (2012) Ui, Hakuju 宇 伯壽. 1966. Zengaku shi Kenkyū 禪學史研究. 3 vols. T ky : Iwanami shoten 岩波 店. Wáng, Huī 王輝, ed. 2008. Gǔwénzì Tōngjiǎ Zìdiǎn 文字 字典 (A Dictionary of Ancient Phonetic Loan Characters). Běijīng: Zh nghuá shūjú 中華 局. (Abbreviated reference in the table: Loan 1) Xiàndài fójiào xuéshù cóngk n bi njí wěiyuánhuì 現 教學術 刊編輯 員會, ed. 1976. Liùzǔ Ttánjīng Yánjiū Lùnjí 祖壇經研究論集. Táiběi: Dàshèng wénhuà chūbǎnshè 大乘文 出版社. Yabuki, Keiki 矢吹慶輝. 1930. Meisha Yoin – Tonkō Shutsudo Miden Koitsu Butten Kaihō 鳴沙餘韻﹣敦煌出土 傳 逸 典開寶 (Rare and Unknown Chinese Manuscript Remains of Buddhist Literature Discovered in Tun-huang Collected by Sir Aurel Stein and Preserved in the British Museum). T ky : Iwanami shoten 岩波 店. Yampolsky, Philip. 1967. The Platform Sūtra of the Sixth Patriarch. New York: Columbia University Press. Yanagida, Seizan 柳 聖山. 1972. Zenseki Kaidai 禪籍解題. Nishitani and Yanagida. 445-514. Yanagida, Seizan 柳 聖山, ed. 1976. Rokuso Dangyō Shohon Shūsei 祖壇經諸 集 . Ky to: Chūmon shuppansha 中文出版社. Yáng, Zēngwén 楊曾文. 1993. Dūnhuáng Xīnběn Liùzǔ Tánjīng 敦煌新 Shànghǎi: Shànghǎi gǔjí chūbǎnshè 海 籍出版社. 祖壇經 . Yáng, Zēngwén 楊曾文. 1996. Shénhuì Héshàng Chán-huà lù 神會和尚禪話錄. Běijīng: Zh nghuá shūjú 中華 局. Yè, G ngchuò 葉恭綽 1926. Lǚshùn Gu nd ng-tíng Bówùguăn suŏ cún Dūnhuáng Chūtŭ zhī Fójiào Jīngdiăn 旅順關東廳博物館 存敦煌出土之 教經典. Túshūguăn xué Jìkān 圖 館學季刊 1/4. Zh u, Shàoliáng 周紹良. 1997. Dūnhuáng Xiěběn Tánjīng Yuánbĕn 敦煌寫 壇經原 . Běijīng: Wénwù chūbǎnshè 文物出版社. Zh u, Shàoliáng 周紹良. 1998. Xù èr 序 . Dūnbó běn Chán-jí Lùjiào 敦博 禪籍錄校. 1-26.
Chung-Hwa Buddhist Journal (2012, 25:51-86) Taipei: Chung-Hwa Institute of Buddhist Studies 中華佛學學報第 十 期 51- 86 (民國一 零一 ISSN:1017-7132 ),臺 :中華佛學研究所 Bibliographical Notes on Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon Marcus Bingenheimer Temple University Abstract This article is part of the Buddhist Temple Gazetteer Project funded by the Chung-hwa Institute of Buddhist Studies. The project resulted in the digitization of more than 230 1 gazetteers (zhi 志) of Chinese Buddhist sites. The task of compiling a high-quality digital archive involves making both academic and technological decisions, which in turn necessitate research. In order to visualize gazetteer literature in various ways according to temporal or geographic parameters, we need first to understand the provenance of the texts, which often have complex edition histories. The aim of this paper is to summarize some of the bibliographical data for the more than 230 mountain and temple gazetteers of which the archive is comprised, to compare the two available print collections, to illustrate the importance of prefaces for understanding these texts and to outline the relationship between texts on Buddhist religious sites and the Buddhist canon. Keywords: Buddhist History, Temple Gazetteers, Chinese Temples, Digital Archive, Digitization Project 1 The project is conducted at the Dharma Drum Buddhist College and the archive is currently hosted at http://buddhistinformatics.ddbc.edu.tw/fosizhi/ (July 2009). The data presented here is largely the result of sustained team work. The catalog data was produced 2008-2009 by Lin Zhimiao 林智妙, Ke April 柯春玉, Peng Chuanqin 彭 芩, Lin Xiuli 林綉麗 and myself. Many of the texts cited here were first examined in a reading group led by Lin Zhimiao 林智 妙, whose explanations solved many difficult passages. I am grateful to Simon Wiles for improving the English, and Peter Bol, John Kieschnick and Wu Jiang for their valuable comments. The text also profited from the helpful suggestions made by two anonymous reviewers. 52 Chung-Hwa Buddhist Journal Volume 25 (2012) 目註記—佛寺志及 序言 之於佛教藏經的關係 馬德偉 大學 摘要 篇文章是中華佛學研究所佛教寺廟志計 的一部分, 計 包含超過230個中國 佛教寺廟志 計 之任務在於匯編一高品質的數位 藏,而 藏涉及一必要的 研究,也就是學 技 的 斷 根據時間 地理 的參考點,為了從多方面檢 視地方志文本,需要 了解文本的來源,而 些文本 常 複雜的版本 史 篇文章旨在針對 藏所包含的超過230個山岳 寺院的地方志,總集部分的 目 資料,比較 個刻版的收藏,說明 序文對於了解 些文本的重要性,並闡述存於 佛教據點的文本 藏經的關係 關鍵字:佛教 史 寺廟志 中國寺院 數位 藏 數位 計 Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 53 Introduction Among the most precious sources for the study of later Chinese Buddhist history are the large number of gazetteers on Buddhist sites and institutions. Gazetteers, as Sinology has come to translate zhi 志 (or its variant 誌), are composite works compiled from texts belonging to different genres (topographic descriptions, biographies, essays, poems, epigraphia, maps, portraits etc.). The contribution of the compiler was to select, collect and arrange the texts, his own additions ranging from merely adding a preface to writing or rewriting a substantial amount of the volume. The Temple Gazetteer Project, of which this paper is a part, aims at collecting and digitally editing all available temple gazetteers of Buddhist sites with the goal of making them available to a wider audience. Filled to the brim with facts and legends about a location, the vast majority of these 2 gazetteers were published between the 16th and the 20th centuries and offer valuable information about the history of Buddhism. The mature form of the gazetteer, which attempts to provide a comprehensive, cultural description of a site, was widely adopted only after the Northern Song. At that time too it became common practice to include the term zhi 志 in the title (Hargett 1996, 419). The first work on Buddhist chorography which uses the term fangzhi 方志 was Daoxuan’s 道宣 Shijia fangzhi 釋迦方志 (T 2088) of 650, in which he lists places in India and Central Asia. In the corpus of texts discussed here only a few important ‘proto-gazetteers’ such as 3 the Luoyang qielan ji 洛陽伽藍記 (ZFSH 001) or the Tiantai shengji lu 台勝蹟錄 (ZFSH 064), were published prior to the mid-16th century. As in the case of paper editions, digital editions need to carefully record the provenance of their content and describe its relationship with other texts. In the following, therefore, we will give a bibliographic overview of the corpus at hand. 2 3 See Hargett (1996), Hahn (1997), and Bol (2001) on the antecedents of the gazetteer genre before the Ming and bibliographic references to more extensive discussions of the topic in Chinese. Gu (2010) is the most comprehensive reference work for Song gazetteers. For an analysis of the often mixed Buddhist and Daoist character of “sacred mountains” see Robson (2009). For a periodization and overview of gazetteer production on Buddhist sites see Cao (2011, 235-243). Zhongguo fosi shizhi huikan 中國佛寺史志彙刊. 54 Chung-Hwa Buddhist Journal Volume 25 (2012) Bibliographic Research 4 A considerable amount of bibliographic research has been done on gazetteers in general. More still remains to be done. This is a rather dull but indispensable task, both because of the large quantity of gazetteers and because of the complicated edition histories many of 5 them have. Most of the more than 8500 gazetteers that are known to us are on governmental administrative divisions such as counties (xian 縣), subprefectures (zhou ), prefectures (fu 府) or provinces (sheng 省). However, as Brook (2002, 31) has remarked, there are other types of gazetteers, and he provides valuable bibliographic information on 860 “topographical and institutional gazetteers” which take landscape features and individual institutions as their subjects. While the gazetteers on administrative divisions usually include information on Buddhist and Daoist temples and monastics for the region, this information is generally terse and cannot compare with the breadth of cultural information that gazetteers dedicated to a site and compiled or 6 commissioned with religious intent can offer. Hahn (1997) in his dissertation on “mountain gazetteers” (shanzhi 山 志 ) pays attention to the importance of gazetteer literature for the understanding of religious space; his focus, however, is exclusively on the category of mountain gazetteers, many of which treat Daoist sites. Our project, on the other hand, includes both mountain and temple gazetteers (sizhi 寺志), but we are interested only in records of Buddhist sites. Among such gazetteers two subgroups can be distinguished: gazetteers that relate information on a number of Buddhist sites and institutions within a certain region; and those only concerned with one temple and its adjacent sites. The former subsumes many of the mountain gazetteers that Hahn (1997) has described, but also includes gazetteers that describe Buddhist sites of a city or region (e.g. ZFSH 1, ZFSH 7, ZFSH 57). 4 5 6 In English see Brook (2002), Dow (1969), and Franke (1968) and in Chinese Zhuang et al. (1985), Jin & Hu (1996), Gu (2010) to name only a few. The most comprehensive catalog so far, the Zhongguo difangzhi zongmu tiyao 中國地方志總 目提要 (Jin & Hu 1996), lists 8577 gazetteers. Even this catalog, however, is not exhaustive, because it includes only gazetteers on administrative regions published before 1949. According to the editorial policy statement “Mountain-, river-, temple-gazetteers and the like were not included” (Jin & Hu 1996, 凡例 1). This means that none of the temple gazetteers discussed here are listed. Nevertheless valuable quantitative information can be culled from these sources. Eberhard (1964), in one of the first projects that made use of computers to digitize information, analyzed temple building activity in Chinese history on the basis of the founding dates of temples as included in a significant number of entries. To my knowledge his dataset (encoded with punchcards) was never migrated to a newer format. Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 55 In the following, we will outline some basic bibliographic parameters which 7 describe what is known about available gazetteers of Buddhist sites. First, the paper record: The last three decades saw the appearance of two collections of reprints of 8 Buddhist temple and mountain gazetteers: Zhongguo Fosi Shizhi Huikan 中國佛寺史志彙刊. Taipei: Mingwen shuju 明文 局. 1980-1985. Compiled by Du Jiexiang 杜潔祥. (= ZFSH) 110 vols. Zhongguo Fosizhi Congkan 中 國 佛 寺 志 叢 刊 . Hangzhou: Guangling shushe 廣陵 社. 2006 . Compiled by Zhang Zhi 張智. (= ZFC) 130 vols. The 110 volumes of the ZFSH contain 100 gazetteers and the 130 volumes of the ZFC contain 197 gazetteers. Although the ZFC is the larger and newer collection, the ZFSH is the better edited. Its editor, Du Jiexiang, has compiled detailed and helpful tables of contents for each gazetteer, and gazetteers that appear in both collections are more often complete in the ZFSH. Most gazetteers in each collection are from Ming and Qing dynasty woodblock prints, while some are copies of manuscripts, and still others are from newer printed editions set in movable type. In order to build an electronic edition it is necessary to understand the overlap between these two collections. This has not been attempted before, because it is only now that we have the data available to answer some important questions. How Many Buddhist Gazetteers do we Have in Hand? Of the 100 gazetteers in the ZFSH and the 197 gazetteers of the ZFC, 78 have an overlap with a gazetteer in the other set, forming 78 gazetteer pairs, i.e. two gazetteers, one from the ZFC and one from the ZFSH, that describe the same location and might have the same or a similar name. In 39 of these 78 pairs the gazetteers are identical, i.e. the reprints in 9 ZFSH and ZFC were made from identical editions. The relationships or rather the types of relationship that exist between the remaining 39 pairs are more complex and can be grouped broadly into the following categories: 7 8 9 After the work on this paper was concluded, Cao (2011) published his seminal work on Buddhist temple gazetteers in the Ming dynasty. Had it been available earlier, this paper would have looked differently, though its main task to document the printed and digital gazetteer corpus of ZFSH and ZFC would have remained the same. Unfortunately, there was no time to include all of Cao’s important results into this article. In 2009 the Beijing National Library has announced the planned publication of a collection named Quanguo difangzhi fodaojiao wenxian huibian 國地方志佛道教文獻匯編. This collection will only contain excerpted passages pertaining to Buddhist and Daoist sites from more general gazetteers, much like the data Eberhard (1964) studied. It stands to become an important new resource and hopefully will enable us to follow Eberhard’s early lead in performing quantitative research on the history of religious geography. Here we include re-prints (chongkan 重刊) from the same woodblocks. 56 Chung-Hwa Buddhist Journal Volume 25 (2012) - In 22 pairs the reprints were made from essentially the same work, but in one edition some content has been omitted or added. These omissions and additions are usually short, but sometimes significant. Omissions often reflect the fact that the original, from which the ZFSH or ZFC reprint was taken, was already incomplete. Sometimes one edition has been expanded, ZFC 63, for instance, includes two additional chapters (外 篇 卷) which are not found in the correlate ZFSH 72. At other times the situation is even more complicated - in pair ZFSH 39/ ZFC 71, for instance, we find that the first chapter of ZFSH 39 lacks pages pp.151-154 and 259-260 of ZFC 71. In the second chapter, on the other hand, ZFC 71 lacks the material on pp. 267-270 of ZFSH 39. - In two cases we have different works on the same location with a similar title. - In five cases we find that on top of omissions and additions, the chapter order or organization differs. - With ten pairs the relationship is that of print and manuscript, i.e. one edition is a manuscript copy of the other. 10 This typology does not cover all cases, but gives a sufficient overview of the field of similarities and differences. For a more detailed survey of the differences between gazetteers in these sets see Appendix B. For the temple gazetteer archive we have digitized all the gazetteers from the ZFSH and the ZFC, except those 39 in the ZFC which are completely identical with a gazetteer already found in the ZFSH and another 21 that exhibit only minimal differences, such as a few missing pages, a different set of maps etc. All in all, 237 gazetteers have been digitized, and fifteen will be made available as digital full text for the first time. These fifteen will benefit from new punctuation and XML/TEI mark-up identifying person and place names as well as dates. Of these fifteen, twelve have been selected for a follow-up project for a printed re-edition of the texts, with new punctuation, person and place name 11 indices and annotation. 10 11 In one case we have two different manuscripts (ZFC 45/ ZFSH 97) of the same text. There are also a few rare instances where a gazetteer was reprinted with different layout i.e. not from the original woodblocks. The Hangzhou shangtianzhu jiangsi zhi 杭 講寺志 was re-carved in 1897 (ZFSH 24), the ZFC (ZFC 88) preserves an older woodblock print of 1646. The Nanchao si kao 南朝寺考, of which the ZFC contains a 1907 woodblock print, was re-set in movable type in 1944 for inclusion in the (never completed) Puhui Canon 慧 大藏經 (ZFSH 56). In the case of the Tiantaishan fangwai zhi 台山方外志, the ZFC includes a reprint made from the original woodblocks (ZFC 115) and the ZFSH contains a movable type edition made in Shanghai in 1922 (ZFSH 89). The ZFSH (ZFSH 46) preserves a Wanli 萬曆 -era print of the Helinsi zhi 鶴林寺志, while ZFC 76 is an edition with a different layout from 1909. The series will be published with Xinwenfeng publishers 新文豐, Taipei, starting in 2013. It will comprise ZFSH 8: Chongxiu putuoshan zhi 重修 山志, ZFSH 9: Putuoluojia xinzhi Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 57 How Many Gazetteers of Buddhist Sites are There? Before taking stock of what we know, let us briefly assess what we can not know. Exactly how many temple gazetteers in total have been compiled is impossible to know for sure. 12 As with most of Chinese literature many gazetteers are lost forever. As we will see below, descriptions of sacred sites were only recently included in the Buddhist canon. Neither did they command the same esteem as gazetteers on administrative divisions, which had a role in the administration of the realm and therefore the attention of the state apparatus. As a result, throughout the Ming and Qing neither Buddhist nor Confucian communities were strongly committed to the preservation of gazetteers of Buddhist sites. Another reason why gazetteers were lost is that they were superseded by newer ones. Gazetteers on administrative regions needed to be updated to stay useful and the same 13 need was perceived for other kinds of gazetteers as well. The woodblocks for older editions were sometimes lost and once the woodblocks were gone, it was often more practical to recompile a new, updated gazetteer than to re-cut the woodblocks from an old paper copy. Even the more popular gazetteers only had print-runs of a few hundred copies (Brook 2002, 38). This explains why 51 of the gazetteers in our archive have survived only as manuscript taken from a print copy. Often the print copies perished together with their woodblocks in fires and wars, especially during the fall of the Ming (ca.1640-1660) and during the Taiping rebellion (1850-1864). The Taiping were especially destructive in the Lower Yangzi area where, as 12 13 洛 迦 新 志 , ZFSH 10: Mingzhou ayuwangshan zhi 明 育 王 山 志 , ZFSH 11: Mingzhou ayuwangshan xuzhi 明 育 王 山 續 志 , ZFSH 17: Yucenshan huiyin gaoli huayanjiaosi zhi 玉岑山慧因高麗華嚴教寺志, ZFSH 43: Hanshansi zhi 寒山寺志, ZFSH 49: Emeishan zhi 峨眉山志, ZFSH 62: Fujian quanzhou kaiyuansi zhi 福建泉 開元寺志, ZFSH 77: Jiuhuashan zhi 九華山志, ZFSH 81: Qingliangshan zhi 清涼山志, ZFSH 84: Jizushan zhi 雞足山志, ZFSH 86: Huangboshansi zhi 黃檗山寺志, ZFSH 89: Tiantaishan fangwai zhi 台山方外志. Dudbrige (2000, 8) cites an estimate from the 17th century to the effect that less than forty or fifty percent of books that had been available in the Song survived. Hahn (1997, 17) cites estimates that only 10% of the works listed in the Jingji zhi 經籍誌 chapter of the Suishu 隨 survived until the Qing. In his postscript (dated 1589) to the first Ming edition of the Putuoshan gazetteer, Hou Jigao 侯繼高 writes: “It was no longer possible, in the end, to obtain a copy [of the previous edition] for one’s armchair travels.... Since [Sheng] Ximing [盛]熙明 wrote the [previous] gazetteer more than 230 years have passed. What is contained in the four parts [of his gazetteer] can hardly be all there is [to tell]. When it comes to our Ming, with the increasing incense fires the [literary] writings about the place also increased. Until now no one like Sheng Ximing came and turned them into a chronicle. I sighed and said: ‘These famous mountains, these great temples have to be made known to the world, they should not go without description.’” (ZFSH 9: 594). 58 Chung-Hwa Buddhist Journal Volume 25 (2012) we will see below, most gazetteers were produced. The rebels sacked Nanjing, Hangzhou, Suzhou, and Ningbo, singling out temples and religious sites for destruction. Having acknowledged these losses, we must proceed to assess the extent of the corpus that is still available. Beyond the gazetteers digitized in this project, how many gazetteers on Buddhist sites do we know of ? How many are still available in libraries? Our database contains bibliographical references from several other works especially Hahn (1997), Brook (2002) and unpublished notes by Du Jiexiang (2009), who kindly shared this material with us. Next to the 219 distinct gazetteers from the ZFSH and the ZFC, this data yields bibliographic data on 59 additional temple gazetteers, most of which are still available in libraries, adding to a total of 278 in our database. It is unlikely that more than a few pre-Ming gazetteers on Buddhist sites have escaped the attention of bibliographers, as the overall number was so much smaller. For the Ming dynasty Cao (2011, 71-75), against a list of 87 extant temple gazetteers, gives a list of 65 “lost” gazetteers, which are mentioned in catalogs or cited in other works. Although Cao has mainly used library holdings in China, and some of the titles might eventually be found elsewhere, this means that ca. 40%. of known Buddhist gazetteers from the Ming are now lost. For the Qing, which saw the largest number of gazetteers produced in the 17th and again in the 19th century after the Taiping rebellion, the situation is less clear. Our database lists 131 existing gazetteers for the Qing (1644-1911) and 59 published during the Republican period (1912-1949), the relatively high figure for the latter reflecting both increased publication numbers for the book market in general as well 14 as for the publication of Buddhist material in particular. After assessing the available bibliographic information, it would be surprising if the final number of known gazetteers on Buddhist sites published before 1950 were to exceed 500, and the final tally of extant gazetteers is likely to be between 300 and 400. How Many Locations do the Gazetteers in our Collection Describe? Although temple gazetteers by definition tend to focus on one location, there are a number of gazetteers which describe several Buddhist sites on a mountain range or in a metropolitan area, such as the proto-gazetteer Luoyang qielan ji 洛陽伽藍記 (ZFSH 001), which describes the temples of Luoyang in the early 6th century; the Qingliangshan zhi 清涼山志 (ZFSH 081) on the temples on Mt. Wutai in the 16th century; the huge Jinling fancha zhi 金陵梵剎志 (ZFSH 006), a collection of material on the temples of Nanjing; or the Wulin fan zhi 武林梵志 (ZFSH 007), a guide to the more than four 14 The latter is documented in a database by Gregory Scott’s Bibliography of Modern Chinese Buddhism (http://bib.buddhiststudies.net/ [Nov. 2011]), which is part of his forthcoming PhD dissertation “Conversion by the Book - Buddhist Print Culture in Republican China” (Columbia University). Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 59 hundred temples in the Wulin hills near Hangzhou’s famous West Lake. Since one gazetteer may describe multiple sites, while a single temple may be the subject of more 15 than one gazetteer, the analysis must be performed with care. One visualization of the sites described in the one hundred gazetteers of the ZFSH references 116 temples. As the map shows, most of these are in Zhejiang and Jiangsu province: Fig.1: Location of Buddhist sites described in the 100 gazetteers contained in the ZFSH. Clearly recognizable are the centers of some “macro-regions” often used to discuss later 16 imperial China. This correlation does tell us more about the economics of publishing than the level of Buddhist activity in the region. For the production of gazetteers considerable resources were needed. Moreover, once printed it had to be sold to an audience of interested literati, a market that was not available outside these centers. We therefore see a cluster of sites in Guangdong (Lingnan), one along the coast of Fujian (Southeast), the many sites around Ningbo, Hangzhou, Suzhou and Nanjing (Lower Yangzi), fewer in the area around Jiujiang, Wuhan and Nanchang (Middle Yangzi) and a cluster in the north around Beijing and Mt. Wutai. Interesting too are the absences of gazetteers in certain regions. In the ZFSH, which has no regional bias, there are no 17 gazetteers describing sites in Shandong, none in the vast region comprising Hunan, 15 16 17 Current visualizations of the archive, which plots the referenced temples on a map can be found at our website in KML format. At time of writing, the visualization includes all the main sites described in the ZFSH and ZFC. See a discussion of these macro-regions during the 18th century (when many of our gazetteers were compiled) in Naquin and Rawski (1987). For the low level of Buddhist activity in Shandong, see Brook (1993, 238-240). The only gazetteer from Shandong in our archive is the Lingyan zhi 靈巖志 (ZFC 18). 60 Chung-Hwa Buddhist Journal Volume 25 (2012) Guangxi, Guizhou, and eastern Sichuan (today Chongqing Municipality), and none north of Mt. Wutai. Somewhat remarkable is the absence of temple gazetteers for the old heartland of Chinese Buddhism around Chang’an, in the area of today’s Xi’an in Shaanxi. There in the northwest we find various editions and continuations of Yang Xuanzhi’s 楊 之 Luoyang qielan ji 洛陽伽藍記, and the famous Songshan shaolinsi jizhi 嵩山少林寺輯志 (ZFSH 78) of 1612, but on the whole surprisingly few gazetteers were produced in this region. This reflects the fact that during the Ming and Qing Chinese Buddhism in the Northwest was much weaker than during its heyday in the Tang. Though there still were many temples, some of considerable antiquity, culturally, Chinese Buddhism faced competition in this region from both Islam and Tibetan Buddhism. Moreover, Xi’an was not exactly a hotbed of literary activity. According to Naquin and Rawski, “the elite of the northwest played now [in the 18-19th cent.] only a minor role in national literati culture. There were few academies, and the region took a negligible part 18 in the scholarly projects so typical of the Qing period.” Gazetteer writing became popular during the Song in the lower Yangzi region. It was a product of later Chinese literati culture, the tastes and sensibilities of which were not universally accepted on the northwestern border of the empire, where Chinese, Muslims, Tibetans, Mongols and Manchus co-existed uneasily. Even more sites could be added to the visualization above, if all the temples mentioned in e.g. the Nanchao fosi zhi 南朝佛寺志 (ZFSH 5) or the Jiangnan fancha zhi 江南梵剎志 (ZFSH 57) were included. Moreover, information on one temple can be found in several gazetteers. The site of the Jinshan si 金山寺 in Zhejiang, for instance, is associated with at least four gazetteers (ZFSH 37, ZFSH 38, ZFSH 39, ZFSH 57). Fig.2: Gazetteers in the ZFSH that contain descriptions of the Jinshan si temple. As a result of the many-to-many relationship of gazetteers and temples described in them, the archive contains descriptions of at least 400-500 temples. About 50% of these were or 19 are located in the lower Yangzi region (Jiangsu, Zhejiang and Anhui). 18 19 Naquin and Rawski (1987, 192). A geo-referenced dataset of the sites described in the 234 gazetteers has been built and is available from the author on request. Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 61 Prefaces When a gazetteer for a location had become unavailable, or an update was in order, new 20 editions were produced. Usually, the prefaces or, more rarely, the postscripts of previous editions were included in later ones and from these an outline of the gazetteer’s evolution can be traced. This is true for gazetteers in general as well as for the gazetteers on Buddhist sites. Prefaces and postscripts therefore, play an important role in understanding the genre itself and they are also one of the best places to look for information about the author-compilers, their motivation and the history of the compilation. This provides a useful angle for understanding how literati culture interacted with Buddhism during the Ming and Qing. Moreover, prefaces are often the only place where the voice of the compiler appears at all. As compiled works, gazetteers contain for the most part texts (e.g. biographies, poems, epigraphy) that were collected from earlier sources, the preface, postscript or, in later times, a section on ‘edition policy’ (fanli 凡例) is essential for understanding the selection criteria. 21 With the gazetteer, as with other historiographical forms, a traditional genre was adapted to record the history of Buddhism. Most of the earlier gazetteers of Buddhist sites were not compiled by Buddhist scholar-monks, but by literati scholar-officials or 22 members of the local gentry. Often a gazetteer for a site was commissioned by Buddhist monks or lay-believers to a literati writer, who was perceived as sympathetic or at least indifferent to Buddhism. The commissioned compilers were, however, rarely purely religiously motivated. Sometimes a Buddhist monk would later re-edit or rewrite the gazetteer from a more Buddhist perspective. This was especially common during the 23 Chinese Buddhist revival of the late Qing and the Republican era. In the 1938 edition of 20 21 22 23 The relationship between a gazetteer and its previous editions is complicated at best (see Qiu 2008 for the edition history of some temple gazetteers from Guangdong and Guangxi). Only sometimes would new editions be marked in the title as such. Generally text from the older edition would be reused in varying amounts, while it was mainly up to the (re-)compiler what to add. The genres used by Chinese Buddhist historiographers are without exception drawn from already existing precedents. Although Buddhism strongly influenced Chinese language and literature, it did not develop a distinct way of writing history. (On the use of genre in Buddhist historiography, see Bingenheimer 2009). Of the 87 Ming gazetteers listed in Cao (2011) only 21 were compiled by monks. During the 18th and 19th the Qing emperors had generally favored Tibetan over Chinese Buddhism. Moreover the Taiping rebellion (1850-1864) had destroyed much of the Buddhist infrastructure in the lower Yangzi region, the heartland of Chinese Buddhism. Therefore founding of the Jinling Scriptural Press 金陵刻經處 in Nanjing by Yang Wenhui 楊文會 (1837-1911) in 1866 is widely seen as the beginning of a new chapter for Chinese Buddhism. 62 Chung-Hwa Buddhist Journal Volume 25 (2012) the Jiuhuashan zhi 九華山志 (ZFSH 77), for instance, the eminent monk Yinguang 印 24 (1861-1940) compares the new edition with previous ones: The earlier editions of this gazetteer were written by literati, who would not even dream of the Buddhist teachings. To them, to believe or to doubt the miraculous stories about [the Bodhisattva] Dizang 地藏 was all the same, and they included his biography among those of [ordinary] humans, which were placed after the chapters with literary texts and biographies of Daoist immortals [my emphasis, M.B.]. In our new edition of the gazetteer the first chapter is dedicated to the saintly traces [of Dizang’s deeds]. […] The earlier editions gave pride of place to the temples that were established by imperial decree or had received the inscription above their gate from the court. Those temples that were built by private donations, or for which the funds were collected [by the clergy] were called hermitages, chapels, groves, or halls, and placed after the former. [...] From the Tang to our days more than a thousand years have passed. There have been many upheavals, [dynasties] rose and fell. Only a few monks might [nowadays] live in what was designated a “temple” in the past, and what was called a “hermitage” or a “chapel” now houses many. Society too has changed and no longer follows the will of a king. In this gazetteer we therefore put the large [public] conglin 叢林 monasteries, where monks from all directions gather, first. After that we include the smaller family temples [where the monks from one 25 ordination lineage reside]. Yinguang seems to have relished the freedom gained after the fall of the empire. During the Republican era it was possible for Buddhists to claim superiority for their religious sites in an unprecedented way. Being liberated from the need for rhetorical tributes to the greatness of imperial power, Yinguang wryly comments on the lack of devotion the five marchmounts (wuyue ) now inspired: When talking about Jiuhua mountain people often used to regret that it was not included in the five marchmounts where the imperial court makes offerings. Did they not know that at the marchmounts’ temples no one but the local government officials in charge make two offerings per year, one in spring and one in autumn? At Jiuhua mountain, however, devotees from all over the country offer their sincere respects, and the burning of incense and 24 25 Next to Taixu 虛 , Hongyi 弘 一 and Xuyun 虛 雲 , Yinguang was one of the most influential monks of the Republican era. Before the Jiuhuashan zhi (1938) he had organized the re-edition of the gazetteers of the three other ‘great mountains’ of Buddhism: the Putuoshan zhi (ZFSH 9) (1924), the Qingliangshan zhi 清涼山志 (ZFSH 81) (1933), and the Emeishan zhi 峨眉山志 (ZFSH 49) (1934). ZFSH 77: 32. Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 63 the prayers do not cease from dawn to dusk. How could the five 26 marchmounts ever hope to compare? That temple gazetteers were indeed compiled with the secular attitude criticized by Yinguang can be seen in the three prefaces of an edition of the Putuoshan gazetteer (Chongxiu putuoshan zhi ZFSH 8). These were clearly not written from a Buddhist perspective and the three authors, all of them jinshi 進士 scholars writing in the early 17th century, mainly praise the emperor and the landscape, and emphasize the role secular officials played in reconstructing the site. Clerics and the Bodhisattva Guanyin 觀音, to 27 whom Mount Putuo was dedicated, are mentioned only in passing. The literati rhetoric, which downplays both any possible religious motivation on the part of the authors and the religious context of the site, was not limited to literati authors. Consider Yuanxian’s 元賢 (1578-1657) preface to the gazetteer of the Kaiyuan temple 開元寺 in Quanzhou written in 1643, at a time when Confucian hegemony was still unchallenged. Compared to Yinguang, Yuanxian had to couch his critique of Confucian literati writing on Buddhists sites in more careful language: The first records [about the Kaiyuan temple of Quanzhou] were composed in the Song, when Xu Lie 許列 wrote the “Biographies of Eminent Monks 28 of the Kaiyuan Temple”. The Yuan dynasty master Mengguan 夢觀 accused Xu’s work of being unreliable and based on hearsay, its explanations being unfounded and labored, coarse and unrefined, not worthy of being read. Master Mengguan then wrote the “Biographies of Bodhisattvas”, his work was erudite and knowledgeable. […] Since then more than 300 years have passed and today’s chan 禪 practice cannot compare to that of yesteryear. In these days of decline there hardly seems anything worth reporting. Nevertheless, the ups and downs, the continuities and changes should be recorded somehow. In 1596 Master 29 Chen “Zhizhi” 陳 first produced a gazetteer, but his research was superficial and people felt he did not do a very good job of it. Then in the winter of 1635-1636 some gentlemen of Wenling asked me to teach at the Kaiyuan temple. [...(Yuanxian is asked several times to write a history of the Kaiyuan temple)]. 26 27 28 29 ZFSH 77: 31. This attitude in the Chongxiu putuoshan zhi (ZFSH 8) of 1607 is much different from and in fact a reaction to the first full fledged gazetteer of the site that was produced by the Admiral Hou Jigao 侯繼高 and the poet Tu Long 屠隆 only some years earlier in 1589. Hou and his friends were on the Buddhist side of the Confucian-Buddhist syncretist spectrum and broadly sympathetic to Buddhism. This is probably the monk Dagui 大圭 (14th century). Otherwise unknown. Zhizhi was almost certainly a style name. 64 Chung-Hwa Buddhist Journal Volume 25 (2012) Though I do not have the ability to write a gazetteer – me being just a rustic from Nanzhou, who, not successful in studying Confucianism, gave up and studied Buddhism instead [!] – I have followed the wishes of these gentlemen. […] I have just tried to fill a30gap. Someday a better writer will come and this gazetteer may be replaced. Two things should be noted here. Firstly, the overview of previous gazetteers of the site – a standard constituent of gazetteer prefaces – illustrates the change in genre: while in the Song and Yuan dynasties the history of a temple was written in the (by then well-known) form of collected biographies (zhuan 傳) (i.e. the works of Xu Lie and Mengguan), in the late Ming Yuanxian is asked to write a gazetteer. The gazetteer as a genre continues the historiographical tradition of earlier times. Secondly, Yuanxian, in spite of his humble rhetoric, deftly disparages previous attempts by non-clerical writers to write about the Kaiyuan temple. And yet, that the monk Yuanxian, during the last days of the Ming, wrote passages like “not successful in studying Confucianism, gave up and studied Buddhism instead” 學儒不成棄而學佛 testifies to the hegemony of the Confucian discourse, of which Yinguang three hundred years later was newly freed. Obviously, prefaces are the first place to look for the compilers’ intentions, but their evaluation must take account of context and allow for semantic and rhetorical polyvalence. When looking for prefaces one should bear in mind that they are not always found at the beginning of a gazetteer; sometimes they are prefixed only to certain chapters, while older prefaces might be collected in a special section somewhere within 31 the body of the text. Then again there are different types of texts called “preface” xu 序. The Huangbo gazetteer (ZFSH 86) preserves, attached as “prefaces,” two interesting 32 endorsements of fund-raising appeals. The first, titled Preface to the Fund-raising Efforts for the Reconstruction of Huangbo 重 黃 檗 募 緣 序 , was written by Ye 33 Xianggao 葉向高 (1559-1627) sometime between October 1614 and 1620. Ye, who was a Fujian native, rose through the ranks to become one of the most important grand secretaries during the Ming. He was a gifted writer and starts his preface in literary fashion with a line from the Liang dynasty poet Jiang Yan 江 淹 (444-505), who described the mountain scenery in his Journey to Mount Huangbo 游黃檗山: “The 30 31 32 33 ZFSH 62: 4-8. The prefaces of previous editions of the 1607 gazetteer of Mt. Putou, for instance, are found in Ch.4 (ZFSH 8: 312-389). Endorsements gave the monastery a quasi-legal backing to approach prospective donors and presumably were helpful in raising money from among the gentry. See Brook (1993, 196-213) for a discussion of some other examples of these fund-raising appeals. ZFSH 86: 240-242. For Ye Xianggao 葉向高 see his entry in Goodrich (1976, sub voc.). Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 65 dazzling Luan birds glide by sunlit peaks, in shaded brooks gush dragon springs; (...) On 34 the crimson cliffs the cries of birds, monkeys shout in clear and empty spaces.” The reference to Jiang Yan, who went to the Huangbo mountains before Buddhist activity is recorded for the area, sets the tone for a secular recommendation in support of a religious institution. Ye keeps his text largely devoid of Buddhist imagery. He recounts how the Wanli emperor, on occasion of the death of his mother the Empress Dowager Cisheng 慈聖 (1546-1614), donated a set of the Tripitaka to the monastery and uses this and earlier land donations by the Hongwu 洪武 emperor as precedents to justify his own 35 support for the fund-raiser. He alludes to the fact that official support may not be taken for granted: Some say the Buddhist teachings are sheer nonsense, are to be avoided by Confucians, and do not merit respect. [These people] do not realize that in this universe this way does exist after all, and cannot just be abolished. [Nevertheless, in spite of the example] of his majesty Emperor Gao [the founder of the Ming] himself, there are [still people] saying this. When I stayed in the capital I saw how in its vicinity everywhere there were landholdings that temples had received from emperor Gao. Huangbo [Monastery] is more than a thousand years old, and again our Emperor has given orders [to support it]. 36 How can one not admire this? Ye supports the rebuilding of the temple, which had been destroyed by a fire in the Jiajing 嘉靖 period (1522-1569) and urges the “believers of the four directions” to assist the monks in this task. About two hundred years later the temple was again in dire straits and the monks approached Zhang Jinyun 張縉雲, an official posted in the area. His argument is similar to that of Ye. In his Preface to Donation Records 黃檗寺緣簿序 (c.1823-1826) Zhang writes: When I came to this area in 1823, I visited first Lingshi 靈石 [monastery], then Huangbo. Both temples had fields that had been appropriated by someone. I sent a messenger to make inquiries, and the people returned the 34 35 36 陽岫飛鸞彩,陰 噴龍泉,鳥 丹壁 ,猿嘯清虛間. As quoted (rather freely) by Ye in ZFSH 86: 240. Huangbo, under its abbot Zhongtian Zhengyuan 中 圓 (1537-1610), received one of only six sets that were given to various monasteries on this occasion. The late Empress Dowager had been an important supporter of Buddhism. In 1602 it was due to her influence that abbot Xinkong Mingkai 心 空 明 開 (1568-1641) received a Tripitaka set for his Guangming monastery (Brook 1993, 241; also 206 and 262). Recently, a comprehensive study of these events, especially the promotion of Buddhism by the Empress Dowager and her son the Wanli emperor, has been completed (Zhang 2010). ZFSH 86: 241. 66 Chung-Hwa Buddhist Journal Volume 25 (2012) fields to the temples, without charging for it.[...] [A while ago] the monks from the [neighboring] Lingshi monastery asked me to write an endorsement [for a fund-raiser]. I consented and less than one year later, the monks from Huangbo too asked me to write an endorsement to raise funds. Huangbo’s buildings are even more numerous than those of Lingshi, the repair costs are huge and the monks have no choice but to ask for help. The teachings of the two masters [Buddhism and Daoism] are not greatly admired by [us] Confucians, but I felt that as the local official I would be at fault if I would not see to the37repair of the famous sites of the area that have been continued for centuries. Both Ye and Zhang are hedging here against possible criticism from conservative Confucians. Timothy Brook in his study of the relationship of late-Ming gentry with Buddhism outlines the attempt of Neo-Confucians to integrate Confucianism and Buddhism as well as the conservative backlash against this trend. The conservative reaction against members of the gentry assimilating Buddhist practices had teeth. In 1602 Li Zhi 李 贄 (1527-1602), the radical champion for a synthesis of Confucian and Buddhist ideals, committed suicide in prison after being impeached for heterodoxy. Ye, who would have known Li as a fellow Fujianese personally, certainly remembered the case. Even Zhang two hundred years later probably would have known about the incident, 38 as the indictment was widely circulated in later times. This is one of the reasons why, although both Ye and Zhang were supportive of Buddhism on other occasions as well, it is difficult to gauge the depth of their interest in 39 Buddhism. Belonging as they did to “Neo-Confucianism’s captive audience”, they had to frame their support as part of their administrative duties and put a certain rhetorical distance between them and their Buddhist subjects. Gazetteers and the Canon What is the relationship of the corpus of Buddhist temple gazetteers and the corpus of religious texts preserved in canonical editions? Catalogs and editions of the Buddhist canon existed before the gazetteer emerged as a genre. The Buddhist canon was never closed, however, and new material was included in every new edition. Although by late 40 imperial times some of the proto-gazetteers were already several hundreds of years old, 37 38 39 40 ZFSH 86: 264-265. The indictment (first translated by Franke 1938, 23-24) is fiercely critical of literati families practicing Buddhism. Brook (1993, 90). The Luoyang qielan ji, written in 547 CE, even neared the 1000 th anniversary of its publication. Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 67 neither they nor the newer temple or mountain gazetteers were included in canonical 41 editions during the Ming and Qing. There are no a priori reasons why temple gazetteers should not be included. The canon contains many works from secular genres such as catalogs, biographies or dictionaries. However, many of the early gazetteers were not written by monks and even the Taishō, one of the most liberal and inclusive editions, contains very few texts that were not written by monks. Another issue for the incorporation into the canon of post-Tang Chinese Buddhist literature was timing: the older a text the more likely its inclusion. The annals of Song and Yuan Buddhist historiography, for instance, appear only infrequently in canonical editions of the Ming and Qing, and many of them are first included only in a Japanese edition, the Man[ji] zokuzōkyō 卍續藏經 supplement to the canon proper (ed. 1905-1912). Of the approximately 230 different gazetteers in our archive fewer than ten were produced before 1600. This is in line of what Brook (1993, 64) proposed about the adoption of Buddhism among the gentry during the Ming: that the popularity of Buddhism among the gentry became visible again only in the latter half of the sixteenth century. This perception, that the time between roughly 1350 and 1550 saw less Buddhist activity in China than either before and after, is corroborated by the data Eberhard (1964, 42 280 and 298) has assembled from gazetteers. The decline of Buddhism in terms of personnel and real estate during the early and mid-Ming was apparently due to the restrictive regulations, which the Hongwu and Yongle emperors placed on Buddhist activities. It is therefore no coincidence that, modeled on gazetteers on administrative units, the first sizhi 寺志 and shanzhi 山志 gazetteers on Buddhist sites appear only after this period of relative decline. It was, however, too late for their inclusion in the Ming canonical editions. The three official Ming editions (Hongwu nanzang 洪武南藏, Yongle nanzang 永樂南藏, Yongle beizang 永樂 藏) were all carved before 1440 and only the privately funded Jiaxing 嘉 canon would have been late enough to accommodate the then brand-new gazetteer literature. Understandably, the editors of that edition decided that the gazetteers did not merit inclusion, as many of them were compiled by lay-men and there was no precedent for the inclusion of gazetteers. The major canonical edition of the Qing, the Long zang 龍藏 created 1733-1738, was conservative with regard to inclusion and 41 42 The Ming especially saw the production of several canonical editions both by the court as well as at private hands. Especially the Jiaxing zang 嘉 藏 added many scriptures that were produced during the Song, Yuan and early Ming dynasties and were being included in a canonical edition for the first time. On the decline of Buddhism in the middle period of the Ming see Yü (1998, 918). See also the dissertation of Zhang Dewei (2010), who clearly traces the impact of the imperial support for Buddhism by the Wanli emperor and his mother. 68 Chung-Hwa Buddhist Journal Volume 25 (2012) 43 contained fewer texts than the Jiaxing zang. Only with the Taishō edition, created in the early 20th century by Japanese scholars rather than government officials or lay Buddhists, 44 twelve proto-gazetteers were included. All of them were first published before the Ming. Title Author / Editor Date T 2092 (5 juan) Luoyang qielan ji 洛陽伽藍記 On the temples of Luoyang in ca. 500. Yang Xuanzhi 楊 之 (active around 547) after 534 T 2088 (2 juan) Shijia fangzhi On place names of India, and Central Asia related to Buddhism. Last four 釋迦方志 sections deal with the introduction and establishment of Buddhism in China. Daoxuan 道宣 (596-667) dated 650 T 2091 (1 juan) Dunhuang lu 45 敦煌錄 Dunhuang fragment S.5448 (893 Author unknown characters) describing Buddhist sites in and around Dunhuang after 756 T 2093 (1 juan) Si ta ji 寺塔記 Short descriptions of some temples in Luoyang (esp. the 大 善寺) ca. 843 Duan Chengshi 段成式 (c. 803-863) after 843 T 2094 (1 juan) Liangjing si ji Short notes on nine temples in Nanjing during the first half of the 6th century 梁 寺記 Compiler unknown after 46 1160 T 2095 (5 juan) Lushan ji 廬山記 43 44 45 46 47 Describes the Buddhist sites on Mt. Lu. Chen Shunyu 1072 Recording biographies of eminent monks, 陳舜 (d. 1075) poems and inscriptions. First work resembling the mature gazetteer genre in containing geographic and historical 47 information as well as belles lettres. Including all supplements the Jiaxing zang contains 2090 texts and the Long zang only 1669. I use the term proto-gazetteers for the chorographical works that do not yet have the size, the self-awareness and the attitude of later gazetteers, but already exhibit the combined interest in history, literary and topographical description that is found in the mature form. Proto-gazetteers generally do not yet use zhi 志 in the title, but ji 記 or zhuan 傳. Translated by L. Giles (Giles 1914, Giles 1915, cf. Hu 1915). See Suwa (1977, 91) for the complicated history of the short text, which was compiled from several earlier sources. As student of Ouyang Xiu 歐陽脩, the author Chen Shunyu was well versed in historiography. The Lushan ji has been studied by Reiter (1978 & 1980). Lushan is one of the sites which have a large number of gazetteers, next to T 2095, there is ZFSH 75, ZFC 28, 29, and 118, which remain unstudied. Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 69 T 2096 (1 juan) Tiantaishan ji The earliest account of sacred sites Xu Lingfu after 815 around the Tiantai mountains. 台山記 靈府 Written by a Daoist associated with (c. 760-841) the Shangqing 清 school and mentioning Buddhist influence only in passing. T 2097 (3 juan) Nanyue zongsheng ji 南 總勝集 On the sites around Nanyue. Nanyue, the “Southern Marchmount” in the system of five sacred peaks is commonly called Hengshan 衡山. The text is also included in the Daoist canon. Chen Tianfu 1131 - 1163 preface dated: 陳田 嘉慶六 六 朔日 Preface by Sun Xingyan (1801-07-11 CE) 孫星衍 48 Three short proto-gazetteers on the sacred sites at Mt. Wutai 台. Huixiang 慧祥 Preface by Guangying 廣英 49 T 2098 (2 juan) Gu Qingjing zhuan 古清涼傳 T 2099 (3 juan) Guang Qingjing zhuan 廣清涼傳 Que Jichuan dated: 嘉祐紀號龍 郄濟 集庚子 (1060-02-05 to 1060-03-04 CE) T 2100 (2 juan) Xu Qingjing zhuan 續清涼傳 Zhang Shangying 張商英 T 2101 (1 juan) Butuoluojiash Contains descriptions and scriptural Sheng an zhuan sources for the Guanyin cult at Mt. Ximing 補 洛迦山傳 Putuo. 盛熙明 (1323— 1363) after 680 preface dated: 大定辛丑歲 十 七日 (1181-03-04 CE) dated: 大定四 九 十七日 (116410-04 CE) 50 1349-1359 The editors of the Taishō made the decision to include only topographical descriptions that were written before the Ming. This was innovative as most of these texts had not been part of canonical editions before. Although the Taishō has never been superseded as the authoritative edition, there have been a number of attempts to re-edit or supplement the canon. Some of these editions also noticed the value of topographical literature. 48 49 50 Robson makes ample use of this text and translated parts of the preface (2009, 2-3). Cao (1999, 195). Based on the remark by Hou Jigao in June 1589 to the effect that Sheng’s work predates the gazetteer commissioned by Hou “more than 230 years” (ZFSH 8: 334). 70 Chung-Hwa Buddhist Journal Volume 25 (2012) The uncompleted Puhui Canon 慧大藏經, 1944 – an unsuccessful, and in the end aborted, attempt to create a new Chinese Buddhist canon in the 1930s and 40s – does 51 include the Nanchao si kao 南朝寺考. The Dazangjing bubian 大藏經補編 (Lan 1984), a little known recent supplement to the canon, shows a growing concern with topographical sources and includes for the first time works such as the Jinling fancha zhi (ZFSH 6) and the Wulin fan zhi (ZFSH 7). As the editing principles for canonical editions of Chinese Buddhist texts grow more comprehensive, it is likely that the trend to include topographic descriptions will continue and the (digital) Buddhist canons of the 21st century will eventually include gazetteers. Gazetteers of Buddhist sites are valuable sources for researchers trying to understand the actual practice of Buddhism in a certain place, at a certain time. We hope that the digital archive of Chinese temple gazetteers will make these sources better accessible to all. 51 The Nanchao si kao (ZFSH 56) is not a gazetteer in the narrow sense, but a Qing dynasty attempt to gather or reconstruct information on temples during the Southern dynasties. For an analysis of the information concerning the temples constructed in the Liang dynasty see Suwa (1980). Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 71 Appendix A These are the tables of content for the ZFC and the ZFSH. To my knowledge these lists are not available elsewhere, not even within the collections themselves. They constitute a finding aid by indexing the location of the gazetteer in the collections. The tables also cross-reference the collections: “=” indicates the gazetteer in the other collection is for all practical purposes identical; “~” indicates the other collection contains another, similar edition of this gazetteer. In the latter case the user might want to consult Appendix B for more information. An alphabetical listing of the titles according to the pinyin 52 romanization is available on the web. ZFC: Zhongguo Fosizhi Congkan 中國佛寺志叢刊: ZFC 001 (Vol.001): Huang ming si guan zhi 皇明寺觀志 ZFC 002 (Vol.001): Bei ping miao yu tong jian 廟宇 檢 ZFC 003 (Vol.002): Bei jing miao yu zheng cun lu 廟宇征存錄 ZFC 004 (Vol.002): Shang fang shan zhi 方山志 ( ~ ZFSH 099) ZFC 005 (Vol.003): Fa yuan si zhi gao 法源寺志稿 ZFC 006 (Vol.003): Tan zhe shan xiu yun si zhi 潭柘山岫雲寺志 ( ~ ZFSH 047) ZFC 007 (Vol.004-005): Pan shan zhi 盤山志 ( ~ ZFSH 080) ZFC 008 (Vol.006): Shao lin si zhi 少林寺志 ZFC 009 (Vol.007): Luo yang qie lan ji he jiao ben 洛陽伽藍記合校本 ( = ZFSH 004) ZFC 010 (Vol.007): Luo yang qie lan ji gou chen 洛陽伽藍記鉤沉 ( = ZFSH 003) ZFC 011 (Vol.008): Luo yang long men zhi 洛陽龍門志 ZFC 012 (Vol.008): Long men zhi xu zuan 龍門志續纂 ZFC 013 (Vol.008): Mai ji shan shi ku zhi 麥積山石窟志 ZFC 014 (Vol.008): Da tong wu zhou shi ku si ji 大同武 石窟寺記 ZFC 015 (Vol.009): Qing liang shan zhi 清涼山志 ( ~ ZFSH 081) ZFC 016 (Vol.009): Bi shan xiao zhi 碧山小志 ZFC 017 (Vol.009): Qi yan shan zhi 七岩山志 ZFC 018 (Vol.010): Ling yan zhi 靈岩志 ZFC 019 (Vol.010): Zi peng shan zhi 紫蓬山志 ZFC 020 (Vol.011): Lang ye shan zhi 瑯琊山志 ZFC 021 (Vol.012): Ye fu shan zhi 冶父山志 ZFC 022 (Vol.012): Yun ling zhi 雲嶺志 ZFC 023 (Vol.013): Huang shan cui wei si zhi 黃山翠微寺志 ZFC 024 (Vol.013): Jiu hua shan zhi 九華山志 ( ~ ZFSH 077) ZFC 025 (Vol.014-015): Yu quan si zhi 玉泉寺志 ( ~ ZFSH 096) ZFC 026 (Vol.015): Hong shan bao tong chan si zhi 洪山寶 禪寺志 ( ~ ZFSH 095) 52 http://buddhistinformatics.ddbc.edu.tw/fosizhi/ (August 2011). 72 Chung-Hwa Buddhist Journal Volume 25 (2012) ZFC 027 (Vol.016): Lian feng zhi 蓮 志 ZFC 028 (Vol.016): Lu shan gui zong si zhi 廬山 宗寺志 ZFC 029 (Vol.017): Lu shan xiu feng si zhi 廬山秀 寺志 ZFC 030 (Vol.018-019): Qing yuan zhi lue 青原志略 ( ~ ZFSH 094) ZFC 031 (Vol.020): E hu feng ding zhi 鵝湖 志 ZFC 032 (Vol.020): Hui li si zhi 慧力寺志 ZFC 033 (Vol.021): Yun ju shan zhi 雲居山志 ( ~ ZFSH 074) ZFC 034 (Vol.022-025): Jin ling fan cha zhi 金陵梵剎志 ( = ZFSH 006) ZFC 035 (Vol.026): Zhe yi fan cha zhi 折疑梵剎志 ZFC 036 (Vol.027): Jin ling da bao en si ta zhi 金陵大報恩寺塔志 ( = ZFSH 068) ZFC 037 (Vol.027): Nan chao si kao 南朝寺考 ( ~ ZFSH 056) ZFC 038 (Vol.028): Nan chao fo si zhi 南朝佛寺志 ( = ZFSH 005) ZFC 039 (Vol.028): Xian hua yan zhi 獻花岩志 ( ~ ZFSH 070) ZFC 040 (Vol.029): Ling gu chan lin zhi 靈谷禪林志 ( ~ ZFSH 067) ZFC 041 (Vol.030): Niu shou shan zhi 牛首山志 ( ~ ZFSH 069) ZFC 042 (Vol.030-031): She shan zhi 攝山志 ( ~ ZFSH 034) ZFC 043 (Vol.031): Qi xia xiao zhi 栖霞小志 ZFC 044 (Vol.031): Wei mo si zhi 維摩寺志 ZFC 045 (Vol.032-038): Wu du fa cheng 吳都法乘 ( ~ ZFSH 097) ZFC 046 (Vol.039): Cang hai si zhi 藏海寺志 ZFC 047 (Vol.039): Chang shu xing fu si zhi 常熟 福寺志 ( ~ ZFSH 036) ZFC 048 (Vol.039): San feng qing liang chan si zhi 清涼禪寺志 ZFC 049 (Vol.040-041): San feng qing liang si zhi 清涼寺志 ZFC 050 (Vol.041): Su zhou fu bao en si zhi 府報恩寺志 ZFC 051 (Vol.041): Kai yuan si zhi 開元寺志 ZFC 052 (Vol.042): Han shan si zhi 寒山寺志 ( = ZFSH 043) ZFC 053 (Vol.042): Han shan zi shi ji 寒山子詩集 ZFC 054 (Vol.042): Han shan si han tong fo xiang ti yong 寒山寺漢銅佛像題詠 ZFC 055 (Vol.042): Han shan si xiao zhi 寒山寺小志 ZFC 056 (Vol.043): Yao feng shan zhi 堯 山志 ( ~ ZFSH 066) ZFC 057 (Vol.043): Feng huang shan yong qing si zhi 凰山永慶寺志 ZFC 058 (Vol.043): Zhu tang si zhi 堂寺志 ZFC 059 (Vol.043): Zhu tang si zhi bu 堂寺志補 ZFC 060 (Vol.044-045): Deng wei shan sheng en si zhi 鄧尉山聖恩寺志 ( = ZFSH 042) ZFC 061 (Vol.045): Wu jin tian ning si zhi 武進 寧寺志 ( ~ ZFSH 035) ZFC 062 (Vol.046): Ling yan shan zhi 靈岩山志 ZFC 063 (Vol.046): Ling yan ji lue 靈岩紀略 ( ~ ZFSH 072) ZFC 064 (Vol.047): Ling yan zhi lue 靈岩志略 ( = ZFSH 073) ZFC 065 (Vol.047): Ling yan xiao zhi 靈岩小志 ZFC 066 (Vol.047): Wu xi nan chan si zhi 無錫南禪寺志 ZFC 067 (Vol.047): Ren cao an zhi 忍草庵志 ( = ZFSH 098) ZFC 068 (Vol.047): Guan hua cong lu 貫華叢錄 ZFC 069 (Vol.047): Fu hui shuang xiu an xiao ji 福慧雙修庵小記 Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 73 ZFC 070 (Vol.048): Jin shan zhi 金山志 ( = ZFSH 038) ZFC 071 (Vol.049): Xu jin shan zhi 續金山志 ( ~ ZFSH 039) ZFC 072 (Vol.049-050): Jin shan long you chan si zhi 金山龍游禪寺志 ( = ZFSH 037) ZFC 073 (Vol.050): Jin shan jiang tian si xiao zhi 金山江 寺小志 ZFC 074 (Vol.051): Jing kou jia shan zhu lin si zhi 口夾山 林寺志 ZFC 075 (Vol.051-052): Zhao yin shan zhi 招隱山志 ZFC 076 (Vol.052): He lin si zhi 鶴林寺志 ( ~ ZFSH 046) ZFC 077 (Vol.053-054): Bao hua shan zhi 寶華山志 ( = ZFSH 041) ZFC 078 (Vol.054): Jian long si zhi lue 建隆寺志略 ZFC 079 (Vol.055): Ping shan tang tu zhi 山堂圖志 ( = ZFSH 040) ZFC 080 (Vol.056): Yuan jin chan yuan xiao zhi 圓津禪院小志 ZFC 081 (Vol.056): Hui yin si zhi 慧因寺志 ( = ZFSH 017) ZFC 082 (Vol.057-058): Wu lin fan zhi 武林梵志 ( ~ ZFSH 007) ZFC 083 (Vol.059): Long jing jian wen lu 龍 見聞錄 ( = ZFSH 020) ZFC 084 (Vol.060): Ling yin si zhi 靈隱寺志 ( = ZFSH 021) ZFC 085 (Vol.061): Yun lin si zhi 雲林寺志 ( = ZFSH 022) ZFC 086 (Vol.062): Yun lin si xu zhi 雲林寺續志 ( = ZFSH 023) ZFC 087 (Vol.063-066): Jing ci si zhi 凈慈寺志 ( ~ ZFSH 016) ZFC 088 (Vol.067): Shang tian zhu shan zhi 山志 ( ~ ZFSH 024) ZFC 089 (Vol.068): Fa jing si zhi 法凈寺志 ZFC 090 (Vol.068): Lin ping an yin si zhi 臨 安隱寺志 ZFC 091 (Vol.068): Chong fu si zhi 崇福寺志 ( = ZFSH 030) ZFC 092 (Vol.068): Xu chong fu si zhi 續崇福寺志 ( = ZFSH 031) ZFC 093 (Vol.069): Xi xi fan yin zhi 西 梵隱志 ( = ZFSH 029) ZFC 094 (Vol.069): Xi xi qiu xue an zhi 西 秋雪庵志 ZFC 095 (Vol.070): Lian ju an zhi 蓮居庵志 ZFC 096 (Vol.070): Xiao ci an ji 孝慈庵集 ZFC 097 (Vol.070): Bian li yuan zhi 辯利院志 ( = ZFSH 092) ZFC 098 (Vol.071): Da zhao qing lu si zhi 大昭慶律寺志 ( = ZFSH 015) ZFC 099 (Vol.071): Zhao xian si lue ji 招賢寺略記 ZFC 100 (Vol.072): Hu pao fo zu cang dian zhi 虎跑佛祖藏殿志 ZFC 101 (Vol.072): Sheng guo si zhi 聖果寺志 ( = ZFSH 018) ZFC 102 (Vol.073): Long xing xiang fu jie tan si zhi 龍 祥符戒壇寺志 ( = ZFSH 028) ZFC 103 (Vol.074): Yun ju sheng shui si zhi 雲居聖水寺志 ( = ZFSH 025) ZFC 104 (Vol.074): Sheng yin jie dai si zhi 聖因接待寺志 ( ~ ZFSH 088) ZFC 105 (Vol.075): Yun qi zhi 雲栖志 ZFC 106 (Vol.076): Yun qi ji shi 雲栖紀 ( = ZFSH 027) ZFC 107 (Vol.076): Guang shou hui yun si zhi 廣壽慧雲寺志 ZFC 108 (Vol.077): Li an si zhi 理安寺志 ( = ZFSH 019) ZFC 109 (Vol.078): Jing shan ji 山集 ZFC 110 (Vol.078): Yun he xian da qing si zhi 雲和縣大慶寺志 ZFC 111 (Vol.078): Cheng shan cheng xin si zhi 偁山偁心寺志 ZFC 112 (Vol.079): Jin su si zhi 金粟寺志 74 Chung-Hwa Buddhist Journal Volume 25 (2012) ZFC 113 (Vol.079): Yun men zhi lue 雲門志略 ZFC 114 (Vol.080): Yun men xian sheng si zhi 雲門顯聖寺志 ZFC 115 (Vol.081): Tian tai shan fang wai zhi 台山方外志 ( ~ ZFSH 089) ZFC 116 (Vol.082): Pu tuo luo jia xin zhi 洛迦新志 ( = ZFSH 009) ZFC 117 (Vol.083): Bao guo si zhi 國寺志 ZFC 118 (Vol.083): Lu shan si zhi 山寺志 ZFC 119 (Vol.083): Wu lei si zhi 磊寺志 ZFC 120 (Vol.083): Xian jue si zhi lue 覺寺志略 ZFC 121 (Vol.083): Chan yue si zhi 禪悅寺志 ZFC 122 (Vol.084): San mao pu an si zhi 茅 安寺志 ZFC 123 (Vol.084-085): Tian tong si zhi 童寺志 ( = ZFSH 012) ZFC 124 (Vol.086): Tian tong si xu zhi 童寺續志 ZFC 125 (Vol.087-088): Xue dou si zhi 雪竇寺志 ZFC 126 (Vol.088): Xue dou si zhi lue 雪竇寺志略 ( ~ ZFSH 091) ZFC 127 (Vol.088): Xue dou xiao zhi 雪竇小志 ZFC 128 (Vol.089-090): A yu wang shan si zhi 育王山寺志 ( ~ ZFSH 010 #g011) ZFC 129 (Vol.091): Qi ta si zhi 七塔寺志 ( ~ ZFSH 013) ZFC 130 (Vol.091): Yong shan he bai que si zhi 甬山和 寺志 ZFC 131 (Vol.091): Yue lin si zhi 岳林寺志 ( ~ ZFSH 014) ZFC 132 (Vol.092): Jiang xin zhi 江心志 ZFC 133 (Vol.093): Xian yan si zhi 仙岩寺志 ZFC 134 (Vol.094): Xian yan shan zhi 仙岩山志 ZFC 135 (Vol.095): Xi tian mu zu shan zhi 西 目祖山志 ( ~ ZFSH 033) ZFC 136 (Vol.096): Dong tian mu zhao ming chan si zhi 東 目昭明禪寺志 ZFC 137 (Vol.096): Bei tian mu ling feng si zhi 目靈 寺志 ZFC 138 (Vol.097-098): Gu shan zhi 鼓山志 ( ~ ZFSH 053) ZFC 139 (Vol.099): Xu xiu gu shan zhi gao 續修鼓山志稿 ZFC 140 (Vol.099): He shan ji le si zhi 鶴山極樂寺志 ZFC 141 (Vol.100): Xi chan chang qing si zhi 西禪長慶寺志 ZFC 142 (Vol.100): Xi chan xiao ji 西禪小記 ZFC 143 (Vol.100): Nan shan lue ji 南山略紀 ZFC 144 (Vol.101): An xi qing shui yan zhi 安 清水岩志 ZFC 145 (Vol.102): Huang bo shan si zhi 黃檗山寺志 ( ~ ZFSH 086) ZFC 146 (Vol.103): Xue feng zhi 雪 志 ( = ZFSH 061) ZFC 147 (Vol.103): Jiu feng zhi 九 志 ZFC 148 (Vol.104): Ling shi si zhi 靈石寺志 ZFC 149 (Vol.104): Long hua si zhi 龍華寺志 ZFC 150 (Vol.104): Sha jing long quan si zhi 沙 龍泉寺志 ZFC 151 (Vol.105): Xia men nan pu tuo si zhi 廈門南 寺志 ( = ZFSH 063) ZFC 152 (Vol.105): Zhi ti si zhi 支提寺志 ZFC 153 (Vol.106): Wen ling kai yuan si zhi 陵開元寺志 ( = ZFSH 062) ZFC 154 (Vol.106): Pu tian guang hua si zhi 莆田廣 寺志 ZFC 155 (Vol.106): Ling guang bei chan shi ji he ke 靈 禪 跡合刻 Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 75 ZFC 156 (Vol.107~108): Dan xia shan zhi 丹霞山志 ZFC 157 (Vol.108~109): Yu xia shan zhi 禺峽山志 ZFC 158 (Vol.110): Ding hu shan qing yun si zhi 鼎湖山慶雲寺志 ( = ZFSH 051) ZFC 159 (Vol.111~112): Cao xi tong zhi 志 ( ~ ZFSH 058) ZFC 160 (Vol.113): Guang xiao si zhi 孝寺志 ( ~ ZFSH 085) ZFC 161 (Vol.113): Qi xia si zhi 栖霞寺志 ZFC 162 (Vol.114): Xiang shan zhi 湘山志 ZFC 163 (Vol.115~116): Ji zu shan zhi 雞足山志 ( ~ ZFSH 084) ZFC 164 (Vol.117): Ji zu shan zhi bu 雞足山志補 ZFC 165 (Vol.117): E mei shan zhi 峨嵋山志 ( ~ ZFSH 049) ZFC 166 (Vol.118): Jin yun shan zhi 縉雲山志 ZFC 167 (Vol.118): Hua yan bei zhi 華岩備志 ZFC 168 (Vol.118): Hua yan si xu zhi 華岩寺續志 ZFC 169 (Vol.118): Shi lin ji jing 石林即 ZFC 170 (Vol.119): Hua yin shan zhi 華銀山志 ZFC 171 (Vol.120): Chong xiu zhao jue si zhi 重修昭覺寺志 ( = ZFSH 087) ZFC 172 (Vol.121): Guang ji si xin zhi 廣濟寺新志 ( = ZFSH 048) ZFC 173 (Vol.121): Xian shou shan zhi 賢首山志 ZFC 174 (Vol.121): Pu du si ling qiu zhi 渡寺靈湫志 ZFC 175 (Vol.121): Da xing shan si ji lue 大 善寺紀略 ZFC 176 (Vol.121): Qu jiang ci en si jin xi zhuang kuang ji 曲江慈恩寺今昔狀況記 ZFC 177 (Vol.122): Xin ban e shan tu zhi 新版峨山圖志 ( ~ ZFSH 050) ZFC 178 (Vol.123): Chong xiu ma ji shan zhi 重修馬跡山志 ZFC 179 (Vol.124): Lu quan si zhi 鹿泉寺志 ( = ZFSH 044) ZFC 180 (Vol.124): Huang mei lao si zhong shan zhi 黃梅老寺中山志 ZFC 181 (Vol.124): Zhu ming si chong xiu ji 珠明寺重修記 ZFC 182 (Vol.124): Cui shan si zhi 翠山寺志 ( = ZFSH 093) ZFC 183 (Vol.124): Qing hua guang li chan si zhi 清 廣利禪寺志 ZFC 184 (Vol.125): Bao yan si zhi 寶嚴寺志 ZFC 185 (Vol.125): Liu ting an zhi 柳 庵志 ZFC 186 (Vol.126): Hu pao quan ding hui si zhi 虎跑泉定慧寺志 ( ~ ZFSH 026) ZFC 187 (Vol.126): Ji shi ta yuan zhi 濟師塔院志 ZFC 188 (Vol.127): Hu yin chan yuan ji shi 湖隱禪院記 ZFC 189 (Vol.127): Chang shui ta yuan ji 長水塔院紀 ZFC 190 (Vol.127): Bao qing si zhi 慶寺志 ZFC 191 (Vol.127): Ming en si zhi 明恩寺志 ZFC 192 (Vol.128): Shou feng xian jue si zhi lue 壽 覺寺志略 ZFC 193 (Vol.128): Jin e si zhi 金峨寺志 ZFC 194 (Vol.129): You xi bie zhi 幽 志 ( = ZFSH 090) ZFC 195 (Vol.130): Shang hai ming xin si zhi 海明心寺志 ZFC 196 (Vol.130): Ming xin si zhi 明心寺志 ZFC 197 (Vol.130): Long hua zhi 龍華志 76 Chung-Hwa Buddhist Journal Volume 25 (2012) ZFSH: Zhongguo Fosi Shizhi Huikan 中國佛寺史志彙刊 ZFSH 001 (Part1 Vol.01): Luo yang qie lan ji 洛陽伽藍記 ZFSH 002 (Part1 Vol.01): Luo yang qie lan ji ji zheng 洛陽伽藍記集證 ZFSH 003 (Part1 Vol.01): Luo yang qie lan ji gou chen 洛陽伽藍記鉤沉 ( = ZFC 010) ZF SH 004 (Part1 Vol.01): Luo yang qie lan ji he jiao ben 洛陽伽藍記合校本 ( = ZFC 009) ZFSH 005 (Part1 Vol.02): Nan chao fo si zhi 南朝佛寺志 ( = ZFC 038) ZFSH 006 (Part1 Vol.03-06): Jin ling fan cha zhi 金陵梵剎志 ( = ZFC 034) ZFSH 007 (Part1 Vol.07-08): Wu lin fan zhi 武林梵志 ( ~ ZFC 082) ZFSH 008 (Part1 Vol.09): Chong xiu pu tuo shan zhi 重修 山志 ZFSH 009 (Part1 Vol.10): Pu tuo luo jia xin zhi 洛迦新志 ( = ZFC 116) ZFSH 010 (Part1 Vol.11): Ming zhou a yu wang shan zhi 明 育王山志 ( ~ ZFC 128) ZFSH 011 (Part1 Vol.12): Ming zhou a yu wang shan xu zhi 明 育王山續志 ( ~ ZFC 128) ZFSH 012 (Part1 Vol.13-14): Tian tong si zhi 童寺志 ( = ZFC 123) ZFSH 013 (Part1 Vol.15): Qi ta si zhi 七塔寺志 ( ~ ZFC 129) ZFSH 014 (Part1 Vol.15): Ming zhou yue lin si zhi 明 岳林寺志 ( ~ ZFC 131) ZFSH 015 (Part1 Vol.16): Da zhao qing lu si zhi 大昭慶律寺志 ( = ZFC 098) ZFSH 016 (Part1 Vol.17-19): Jing ci si zhi 淨慈寺志 ( ~ ZFC 087) ZFSH 017 (Part1 Vol.20): Yu cen shan hui yin gao li hua yan jiao si zhi 玉岑山慧因高麗華嚴教 寺志 ( = ZFC 081) Z FSH 018 (Part1 Vol.20): Feng huang shan sheng guo si zhi 凰山聖果寺志 ( = ZFC 101) ZFSH 019 (Part1 Vol.21): Wu lin li an si zhi 武林理安寺志 ( = ZFC 108) ZFSH 020 (Part1 Vol.12): Long jing jian wen lu 龍 見聞錄 ( = ZFC 083) ZFSH 021 (Part1 Vol.23): Wu lin ling yin si zhi 武林靈隱寺志 ( = ZFC 084) ZFSH 022 (Part1 Vol.24): Zeng xiu yun lin si zhi 增修雲林寺志 ( = ZFC 085) ZFSH 023 (Part1 Vol.25): Yun lin si xu zhi 雲林寺續志 ( = ZFC 086) Z FSH 024 (Part1 Vol.26): Hang zhou shang tian zhu jiang si zhi 杭 講寺志~ ZFC 088) ZFSH 025 (Part1 Vol.27): Yun ju sheng shui si zhi 雲居聖水寺志 ( = ZFC 103) ZFSH 026 (Part1 Vol.28): Hu pao ding hui si zhi 虎跑定慧寺志 ( ~ ZFC 186) ZFSH 027 (Part1 Vol.28): Yun qi ji shi 雲棲紀 ( = ZFC 106) Z FSH 028 (Part1 Vol.29): Long xing xiang fu jie tan si zhi 龍 祥符戒壇寺志 ( = ZFC 102) ZFSH 029 (Part1 Vol.30): Xi xi fan yin zhi 西谿梵隱志 ( = ZFC 093) ZFSH 030 (Part1 Vol.30): Chong fu si zhi 崇福寺志 ( = ZFC 091) ZFSH 031 (Part1 Vol.30): Xu chong fu si zhi 續崇福寺志 ( = ZFC 092) ZFSH 032 (Part1 Vol.31-32): Jing shan zhi 山志 ZFSH 033 (Part1 Vol.33): Xi tian mu zu shan zhi 西 目祖山志 ( ~ ZFC 135) ZFSH 034 (Part1 Vol.34): She shan zhi 攝山志 ( ~ ZFC 042) ZFSH 035 (Part1 Vol.35): Wu jin tian ning si zhi 武進 寧寺志 ( ~ ZFC 061) ZFSH 036 (Part1 Vol.35): Po shan xing fu si zhi 破山 福寺志 ( ~ ZFC 047) ZFSH 037 (Part1 Vol.36-37): Jin shan long you chan si zhi lue 金山龍游禪寺志略 ( = ZFC 072) ZFSH 038 (Part1 Vol.38-39): Jin shan zhi 金山志 ( = ZFC 070) ZFSH 039 (Part1 Vol.39): Xu jin shan zhi 續金山志 ( ~ ZFC 071) ZFSH 040 (Part1 Vol.40): Ping shan tang tu zhi 山堂圖志 ( = ZFC 079) ZFSH 041 (Part1 Vol.41): Bao hua shan zhi 寶華山志 ( = ZFC 077) Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 77 ZFSH 042 (Part1 Vol.42): Deng wei shan sheng en si zhi 鄧尉山聖恩寺志 ( = ZFC 060) ZFSH 043 (Part1 Vol.43): Han shan si zhi 寒山寺志 ( = ZFC 052) ZFSH 044 (Part1 Vol.43): Lu quan si zhi 鹿泉寺志 ( = ZFC 179) ZFSH 045 (Part1 Vol.43): He lin si zhi (jing kou san shan quan zhi) 鶴林寺志( 口 山 志) ZFSH 046 (Part1 Vol.43): He lin si zhi (shi ming xian ben ) 鶴林寺志(釋明賢本) ( ~ ZFC 076) ZFSH 047 (Part1 Vol.44): Tan zhe shan xiu yun si zhi 潭柘山岫雲寺志 ( ~ ZFC 006) ZFSH 048 (Part1 Vol.44): Hong ci guang ji si xin zhi 弘慈廣濟寺新志 ( = ZFC 172) ZFSH 049 (Part1 Vol.45): E mei shan zhi 峨眉山志 ( ~ ZFC 165) ZFSH 050 (Part1 Vol.46): Xin ban e shan tu zhi 新版峨山圖志 ( ~ ZFC 177) ZFSH 051 (Part1 Vol.47-48): Ding hu shan zhi 鼎湖山志 ( = ZFC 158) ZFSH 052 (Part1 Vol.48): Hua feng shan zhi 華 山志 ZFSH 053 (Part1 Vol.49-50): Gu shan zhi 鼓山志 ( ~ ZFC 138) ZFSH 054 (Part2 Vol.01): Luo yang qie lan ji jiao zhu 洛陽伽藍記校注 ZFSH 055 (Part2 Vol.02): Chong kan luo yang qie lan ji 重刊洛陽伽藍記 ZFSH 056 (Part2 Vol.02): Nan chao si kao 南朝寺考 ( ~ ZFC 037) ZFSH 057 (Part2 Vol.03): Jiang nan fan cha zhi 江南梵剎志 ZFSH 058 (Part2 Vol.04-05): Chong xiu cao xi tong zhi 重修 志 ( ~ ZFC 159) ZFSH 059 (Part2 Vol.06): Yun men shan zhi 雲門山志 ZFSH 060 (Part2 Vol.06): Da yu shan zhi 大 山志 ZFSH 061 (Part2 Vol.07): Xue feng zhi 雪 志 ( = ZFC 146) ZFSH 062 (Part2 Vol.08): Quan zhou kai yuan si zhi 泉 開元寺志 ( = ZFC 153) ZFSH 063 (Part2 Vol.08): Xia men nan pu tuo si zhi 廈門南 寺志 ( = ZFC 151) ZFSH 064 (Part2 Vol.09): Tian tai sheng ji lu 台勝蹟錄 ZFSH 065 (Part2 Vol.10): Yan shan zhi 山志 ZFSH 066 (Part2 Vol.11): Yao feng shan zhi 堯 山志 ( ~ ZFC 056) ZFSH 067 (Part2 Vol.12): Ling gu chan lin zhi 靈谷禪林志 ( ~ ZFC 040) ZFSH 068 (Part2 Vol.13): Jin ling da bao en si ta zhi 金陵大報恩寺塔志 ( = ZFC 036) ZFSH 069 (Part2 Vol.13): Niu shou shan zhi 牛首山志 ( ~ ZFC 041) ZFSH 070 (Part2 Vol.13): Xian hua yan zhi 獻花巖志 ( ~ ZFC 039) ZFSH 071 (Part2 Vol.14): Qi xia shan zhi 棲霞山志 ZFSH 072 (Part2 Vol.14): Ling yan ji lue 靈巖記略 ( ~ ZFC 063) ZFSH 073 (Part2 Vol.14): Ling yan zhi lue 靈巖志略 ( = ZFC 064) ZFSH 074 (Part2 Vol.15): Yun ju shan zhi 雲居山志 ( ~ ZFC 033) ZFSH 075 (Part2 Vol.16-20): Lu shan zhi 盧山志 ZFSH 076 (Part2 Vol.21): Yang shan sheng 仰山乘 ZFSH 077 (Part2 Vol.22): Jiu hua shan zhi 九華山志 ( ~ ZFC 024) ZFSH 078 (Part2 Vol.23~24): Song shan shao lin si ji zhi 嵩山少林寺輯志 ZFSH 079 (Part2 Vol.25): Ji fu fan cha zhi 畿輔梵剎志 ZFSH 080 (Part2 Vol.26-28): Qin ding pan shan zhi 欽定盤山志 ( ~ ZFC 007) ZFSH 081 (Part2 Vol.29): Qing liang shan zhi 清涼山志 ( ~ ZFC 015) ZFSH 082 (Part2 Vol.29): Yun gang shi ku si zhi 雲岡石窟寺志 ZFSH 083 (Part2 Vol.30): E mei shan zhi bu 峨眉山志補 ZFSH 084 (Part3 Vol.01-02): Ji zu shan zhi 雞足山志 ( ~ ZFC 163) 78 Chung-Hwa Buddhist Journal Volume 25 (2012) ZFSH 085 (Part3 Vol.03): Guang xiao si zhi 孝寺志 ( ~ ZFC 160) ZFSH 086 (Part3 Vol.04): Huang bo shan si zhi 黃檗山寺志 ( ~ ZFC 145) ZFSH 087 (Part3 Vol.05-06): Chong xiu zhao jue si zhi 重修昭覺寺志 ( = ZFC 171) ZFSH 088 (Part3 Vol.07): Sheng yin jie dai si zhi 聖因接待寺志 ( ~ ZFC 104) ZFSH 089 (Part3 Vol.08~10): Tian tai shan fang wai zhi 台山方外志 ( ~ ZFC 115) ZFSH 090 (Part3 Vol.11~12): You xi bie zhi 幽 志 ( = ZFC 194) ZFSH 091 (Part3 Vol.13): Xue dou si zhi lue 雪竇寺志畧 ( ~ ZFC 126) ZFSH 092 (Part3 Vol.13): Bian li yuan zhi 辯利院志 ( = ZFC 097) ZFSH 093 (Part3 Vol.13): Cui shan si zhi 翠山寺志 ( = ZFC 182) ZFSH 094 (Part3 Vol.14~15): Qing yuan zhi lue 青原志略 ( ~ ZFC 030) ZFSH 095 (Part3 Vol.16): Hong shan bao tong si zhi 洪山寶 寺志 ( ~ ZFC 026) ZFSH 096 (Part3 Vol.17~18): Yu quan si zhi 玉泉寺志 ( ~ ZFC 025) ZFSH 097 (Part3 Vol.19~28): Wu du fa sheng 吳都法乘 ( ~ ZFC 045) ZFSH 098 (Part3 Vol.29): Ren cao an zhi 忍草庵志 ( = ZFC 067) ZFSH 099 (Part3 Vol.29): Shang fang shan zhi 方山志 ( ~ ZFC 004) ZFSH 100 (Part3 Vol.30): Qing liang shan xin zhi 清涼山新志 Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 79 Appendix B 53 Both ZFC (197 gazetteers) and ZFSH (100 gazetteers) consist of facsimiles of manuscripts, woodblock or movable-type prints. 78 gazetteers have a counterpart in the other collection. This appendix describes the relationship between the gazetteers in these 78 pairs, in the hope that it will enable researchers to quickly decide which edition to consult first and inform them of differences early on in the course of their study. In the following 39 cases the gazetteers in ZFC and ZFSH are for all practical purposes 54 identical: [ZFC 009 - ZFSH 004], [ZFC 010 - ZFSH 003], [ZFC 034 - ZFSH 006], [ZFC 036 - ZFSH 068], [ZFC 038 - ZFSH 005], [ZFC 052 - ZFSH 043], [ZFC 060 - ZFSH 042], [ZFC 064 - ZFSH 073], [ZFC 067 - ZFSH 098], [ZFC 070 - ZFSH 038], [ZFC 072 - ZFSH 037], [ZFC 077 - ZFSH 041], [ZFC 079 - ZFSH 040], [ZFC 081 - ZFSH 017], [ZFC 083 - ZFSH 020], [ZFC 084 - ZFSH 021], [ZFC 085 - ZFSH 022], [ZFC 086 - ZFSH 023], [ZFC 091 - ZFSH 030], [ZFC 092 - ZFSH 031], [ZFC 093 - ZFSH 029], [ZFC 097 - ZFSH 092], [ZFC 098 - ZFSH 015], [ZFC 101 - ZFSH 018], [ZFC 102 - ZFSH 028], [ZFC 103 - ZFSH 025], [ZFC 106 - ZFSH 027], [ZFC 108 - ZFSH 019], [ZFC 116 - ZFSH 009], [ZFC 123 - ZFSH 012], [ZFC 146 - ZFSH 061], [ZFC 151 - ZFSH 063], [ZFC 153 - ZFSH 062], [ZFC 158 - ZFSH 051], [ZFC 171 - ZFSH 087], [ZFC 172 - ZFSH 048], [ZFC 179 - ZFSH 044], [ZFC 182 - ZFSH 093], [ZFC 194 - ZFSH 090]. Two pairs are two different gazetteers written independently on the same location. 1. [ZFC 004 - ZFSH 099] Both prints are titled Shangfangshan zhi 方山志. The ZFSH 99 was printed by the famous Sanshan tang 善堂 publishers in 1892. Originally the work in five chapters with an introduction was compiled by the monk Ziru 自如 (1706-1796). ZFC 4, on the other hand, is a copy of a work printed in 1933. It was compiled in 1930 in ten chapters by the famous and reclusive artist Pu Xinyu 溥心 (aka Puru 溥儒) (1896-1963), who almost became the last emperor of China. 2. [ZFC 186 - ZFSH 026]. The Hupaoquan dinghuisi zhi 虎跑泉定慧寺志 (ZFC 186) and the Hupao dinghuisi zhi 虎跑定慧寺志 (ZFSH 26) are both reproductions of manuscripts. ZFC 186 consists of an introduction followed 53 54 Much of the detailed comparison between the collections was carried out in spring 2009 by Mrs. Lin Xiuli: her help is acknowledged and deeply appreciated. In a few cases (e.g. ZFC 70/ZFSH 38) one of the facsimiles was taken from a reprint, whereas the other was done from the original. 80 Chung-Hwa Buddhist Journal Volume 25 (2012) by six chapters. The original is preserved in the Shanghai Library and was composed by the monk Changren 常仁 (aka Anren 安忍). ZFSH 26 is a manuscript by the monk Shengguang 聖 dated 1900. It is not a complete gazetteer, but the draft for a later, probably never realized, edition. It is not divided into chapters. In ten gazetteer pairs, one of the two is a manuscript copy, usually a transcription from a print, and the other is a printed edition. The text is often identical, allowing for minor mistakes and omissions (usually in the manuscript). The date given is usually taken from the preface. Where the same date is given for manuscript and print, the date in the manuscript might simply be copying the date of the print: it is not to be confused with the actual date of the transcription. Further research on the relationship between the two editions is needed in almost every case. Here only the general results: Panshan zhi 盤山志 ZFC 007 (Ms dated 1755) ZFSH 080 (Siku quanshu 四庫 edition dated 1755) Qingyuan zhi lue 青原志略 ZFC 030 (1669) ZFSH 094 (Ms) Yunjushan zhi 雲居山志 ZFC 033 (Ms dated 1727 ) ZFSH 074 (printed in Hongkong 1959) Xianhuayan zhi 獻花岩(巖)志 ZFC 039 (Ms) ZFSH 070 (dated 1603) Niushoushan zhi 牛首山志 ZFC 041 (Ms) ZFSH 069 (print dated 1579, handwritten preface added 1639) Poshan xingfusi zhi 破山(常熟) 福寺志 ZFC 047 (movable- type print 1919) ZFSH 036 (Ms dated 1643) Yaofengshan zhi 堯 山志 ZFC 056 (Ms (Chapters 4- ZFSH 066 (print dated 1638) 55 6) dated 1943) Lingyan ji lue 靈岩紀(記)略 ZFC 063 (early Qing) ZFSH 072 (Ms, early Qing) Wulin fan zhi 武林梵志 ZFC 082 (Ms dated 1864) ZFSH 007 (Siku quanshu edition dated 1780) Shengyin jiedaisi zhi 聖因接待寺志 ZFC 104 (Ms) ZFSH 088 (print dated 1748) In the Chinese textual universe, print copies are preferred over manuscripts. There are good reasons for this: usually the print copy is better proofed and provides a more reliable and readable text. When the woodblocks had been lost and no new print copies could be 55 ZFC 056 was done from a copy in which three missing chapters (ch.4-6) were supplied in manuscript in 1943. Chapters 1-3 and the introduction are identical with ZFSH 066. Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 81 ordered, a scholar might transcribe or excerpt a gazetteer, or hire someone to do so. Transcription, however, almost always introduces errors. A typical example is a date in the manuscript copy of the Wulin fan zhi 武林梵志 (ZFC 082, p.15), which is given as 宋紹 十 (1152 CE). The correct print version (ZFSH 007, p.7) has 宋紹 十 (1162 CE). Generally, in the case of the ten gazetteer pairs above the print versions are to be preferred, but there are exceptions. In the pair ZFC 047 and ZFSH 036, the ZFSH manuscript precedes the print by almost 300 years and is more complete (ZFC lacks the text on p.127-132 in ZFSH). Five gazetteers that appear in both collections differ in chapter number or arrangement: ZFC 025 ZFSH 096 Yuquansi zhi 玉泉寺志 While the text in ZFSH has 6 chapters and an introduction, the ZFC edition has a seventh chapter (added later). Moreover, ZFSH lacks three pages of the second chapter (ZFC, p.207-208, 252). ZFC 037 ZFSH 056 Nanchaosi kao The ZFSH text was printed for inclusion in the (never finished) 南朝寺考 Puhui Canon. It contains two additional chapters (the 梁 寺志, and the 寺塔記, in themselves small gazetteers). While the ZFC edition of 1907 is divided in six juan-chapters, the ZFSH is arranged according to temple sections. ZFC 061 ZFSH 035 Wujin Two different editions, each completed at around the same time. The tianningsi zhi ZFSH version contains an addendum (pp. 383-340). The ZFC also 武進 寧寺志 lacks the introduction and the maps that are preserved in ZFSH, pp.1-6. ZFC 129 ZFSH 013 Qitasi zhi 七塔寺志 ZFC 135 ZFSH 033 Xi tianmu The edition preserved in the ZFSH is about a third more voluminous zushan zhi than the ZFC: It has eight chapters, plus an introduction and two 西 目祖山志 addenda. Against this the ZFC edition consists of only six chapters. The editions contain different maps. The ZFSH edition contains an addendum (pp.235-242). 82 Chung-Hwa Buddhist Journal Volume 25 (2012) For the following 22 gazetteer pairs, the editions contained in ZFC and ZFSH show various minor differences, omissions and additions. Tan zhe shan xiu yun si zhi ZFC 006 潭柘山岫雲寺志 ZFSH 047 Tan zhe shan xiu yun si zhi 潭柘山岫雲寺志 The chapter 名勝古蹟 in ZFSH, pp.139-170 was moved into the addendum (續刊) of the ZFC, pp.149180. ZFC lacks ZFSH, pp.181-188 (再集唐句十首). ZFC 015 Qing liang shan zhi 清涼山志 ZFSH 081 Qing liang shan zhi 清涼山志 1.Responsiblility statement in ZFC is given as 釋鎮澄纂, in the ZFSH as 釋印 重修 (經查原著者為釋鎮澄) (s. Preface). 2. ZFC and ZFSH contain a different map. ZFC 024 Jiu hua shan zhi 九華山志 ZFSH 077 Jiu hua shan zhi 九華山志 ZFC lacks ZFSH, pp.3-4 (Dizang Image). Hong shan bao tong chan si zhi 1. ZFC, pp.139, 145 and 151 are illegible. ZFC 026 洪山寶 禪寺志 2. ZFC Ch. 3 lacks ZFSH, pp.177-178. ZFSH 095 Hong shan bao tong si zhi 洪山寶 寺志 3. ZFSH Ch. 3 lacks ZFC, pp.180-181. ZFC 040 Ling gu chan lin zhi 靈谷禪林志 ZFSH 067 Ling gu chan lin zhi 靈谷禪林志 1. Responsibility statement in ZFC is 謝元福纂輯, in ZFSH as 釋德鎧撰 (The author is indeed 釋德鎧, see preface). 2. ZFC edtion printed in 緒十 (1887), ZFSH is a reprint of the 緒十 (1886) edition. 3. ZFC lacks ZFSH, pp.3-4 (Preface by 青芝老人). 4. ZFC Ch.14, p.414 differs slightly from ZFSH Ch.14, p.420. ZFC 042 She shan zhi 攝山志 ZFSH 034 She shan zhi 攝山志 ZFC lacks ZFSH “Principles of Organization” (凡例), pp.23-26. ZFC 045 Wu du fa sheng 吳都法乘 ZFSH 097 Wu du fa sheng 吳都法乘 1. ZFSH Ch.6c lacks ZFC Ch.6c, pp.1020-1021. 2. ZFSH Ch.30 lacks ZFC Ch.30, p. 3769. ZFC 071 Xu jin shan zhi 續金山志 ZFSH 039 Xu jin shan zhi 續金山志 1. ZFSH Ch.1 lacks ZFC Ch.1, pp.151-154. 2. ZFSH Ch. 2 lacks ZFC Ch. 2, pp.259-260. 3. ZFC Ch. 2 lacks ZFSH Ch. 2, pp.267-270. ZFC 076 He lin si zhi 鶴林寺志 ZFSH 046 He lin si zhi 鶴林寺志 1. ZFC is a “reprint” dated 1909 done at Helin 鶴林 temple on orders of the monk Fudeng 福登: the edition in the ZFSH is a Wanli era (1573-1619) print. 2. Lay-out and calligraphy are different, therefore the woodblocks must have been re-cut. 3. ZFSH lacks one of the prefaces in ZFC, pp.11-16. 4. ZFSH lacks ZFC, pp.205-212. 5. ZFC lacks ZFSH, pp.201-204. ZFC 087 Jing ci si zhi 凈慈寺志 ZFSH 016 Jing ci si zhi 淨慈寺志 ZFC lacks ZFSH “Principles of organization” (凡例), pp.17-24. Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 83 1. ZFC edition is dated 順治 (1646). The ZFSH edition, a “re-carving” (重刻本), was done in 緒 十 (1897) at the prolific 嘉惠堂 in Hangzhou. 2. ZFC and ZFSH differ in lay-out. 3. The map of the temple (寺圖 ) in ZFC, pp.25-40 differs from that in ZFSH, pp.17-24. Probably the ZFSH reflects the lay-out of the temple as it was rebuilt after the destruction during the Taiping. 4. ZFSH lacks ZFC, pp.145-152. Shang tian zhu shan zhi 山志 ZFC 088 Hang zhou shang tian zhu jiang si zhi ZFSH 024 杭 講寺志 ZFC 115 Tian tai shan fang wai zhi ZFSH 089 Tian tai shan fang wai zhi 1. ZFC reproduces a string-bound edition dated 1922, printed by Jiyunxuan 集雲軒 in moveable types in Shanghai, which in turn was done from the the first edition dated 緒 十 (1894) printed at the 台山方外志 Zhenjue temple 真覺寺 in Folong 佛隴 near Mt. 台山方外志 Tiantai. The ZFSH edition is a “reprint” dated 緒 十 (1894) done from the woodblocks of the Folong edition. 2. ZFSH lacks ZFC addendum, pp.665-670. ZFC 126 Xue dou si zhi lue 雪竇寺志略 ZFSH 091 Xue dou si zhi lue 雪竇寺志畧 1. ZFSH lacks ZFC, pp.5-14 (山圖). 2. ZFC lacks ZFSH, pp.49-50. 3. ZFC lacks ZFSH, pp.60-62. 4. ZFC lacks ZFSH, pp.85-86. ZFC 128 A yu wang shan si zhi ZFSH 010 ZFSH lacks ZFC, pp.53-58. 育王山寺志 ZFC 131 Yue lin si zhi 岳林寺志 ZFSH 014 Ming zhou yue lin si zhi 明 岳林寺志 ZFSH Ch. 6 incomplete. ZFSH lacks ZFC, p.174. ZFC 138 Gu shan zhi 鼓山志 ZFSH 053 Gu shan zhi 鼓山志 ZFC lacks ZFSH, p.556. ZFC 145 Huang bo shan si zhi 黃檗山寺志 ZFSH 086 Huang bo shan si zhi 黃檗山寺志 1. ZFSH lacks ZFC (preface), pp.1-10. 2. ZFC, p.319 differs from ZFSH, p.311. 3. ZFC lacks ZFSH, p.460. 志 ZFC 159 Cao xi tong zhi ZFSH 058 Chong xiu cao xi tong zhi 重修 ZFC 160 Guang xiao si zhi ZFSH 085 Guang xiao si zhi 孝寺志 孝寺志 志 ZFC lacks ZFSH, pp.405-412. ZFSH lacks ZFC (maps of the temple 寺圖), pp.33-62. ZFC 163 Ji zu shan zhi 雞足山志 ZFSH 084 Ji zu shan zhi 雞足山志 ZFSH lacks ZFC (preface), pp.24-25. ZFC 165 E mei shan zhi 峨嵋山志 ZFSH 049 E mei shan zhi 峨眉山志 ZFSH lacks ZFC (overview map of Emei), pp.19-20. ZFC 177 Xin ban e shan tu zhi 新版峨山圖志 ZFSH 050 Xin ban e shan tu zhi 新版峨山圖志 ZFC lacks ZFSH, pp.453-454. This is so far the only gazetteer published in Chinese together with English translation (by Dryden L. Phelps) 84 Chung-Hwa Buddhist Journal Volume 25 (2012) Abbreviations ZFC Zhongguo Fosizhi Congkan 中國佛寺志叢刊. Hangzhou: Guangling shushe 廣陵 社. 2006 . Compiled by Zhang Zhi 張智 et. al., 130 vols. ZFSH Zhongguo Fosi Shizhi Huikan 中國佛寺史志彙刊. Taipei: Mingwen shuju 明文 局. 1980-1985. Compiled by Du Jiexiang 杜潔祥, 110 vols. References Bingenheimer, Marcus. 2009. Writing History of Buddhist Thought in the 20th century – Yinshun (1906-2005) in the Context of Chinese Buddhist Historiography. Journal of Global Buddhism 10:255-290. Bol, Peter K. 2001. The Rise of Local History: History, Geography, and Culture in Southern Song and Yuan Wuzhou. Harvard Journal of Asiatic Studies 61(1) :37-76. Brook, Timothy. 1993. Praying for Power – Buddhism and the Formation of Gentry Society in Late-Ming China. Cambridge (MA) and London: Council on East Asian Studies Harvard University. Brook, Timothy. 2002. Geographical Sources of Ming-Qing History. Ann Arbor: Univ. of Michigan, Center for Chinese Studies. (Michigan monographs in Chinese Studies 58, First edition:1988). Cao, Ganghua 剛華. 2011. Mingdai Fojiao Difangzhi Yanjiu 明代佛教地方志研究. Beijing: Renmin daxue 中國人民大學出版社. Cao, Shibang 仕邦. 1999. Zhongguo Fojiao Shixue Shi – Dongjin zhi Wu dai 中國佛教史 學史─東晉至 代 (History of Chinese Buddhist Historiography – Eastern Jin to Wudai Period). Taipei: Faguwenhua 法鼓文 . Dow, Francis D.M. 1969. A Study of Chiang-su and Che-chiang Gazetteers of the Ming Dynasty. Canberra: Australia National University. Du, Jiexiang 杜潔祥. 1981. Zhongguo Fosizhi Gaishuo 中國佛寺志概說. Putishu 菩提樹 29 (6) (=No.342) :19-20. Dudbrigde, Glenn. 2000. Lost Books of Mediaval China. London: The British Library. Eberhard, Wolfram. 1964. Temple Building Activities in Medieval and Modern China. Monumenta Serica 23:264-318. Franke, Otto. 1938. Li Tschi 李 贄 – Ein Beitrag zur Geschichte der Chinesischen Geisteskämpfe im 16. Jahrhundert. Berlin: De Gruyter. (Abhandlungen der Preussischen Akademie der Wissenschaften Jahrg.1937. Phil-hist. Klasse Nr. 10). Franke, Wolfgang. 1968. An Introduction to the Sources of Ming History. Kuala Lumpur: University of Malaya Press. Buddhist Temple Gazetteers, Their Prefaces and Their Relationship to the Buddhist Canon 85 Giles, Lionel. 1914 (July). Tun Huang Lu: Notes on the District of Tun-Huang. Journal of the Royal Asiatic Society of Great Britain and Ireland. 703-728. (cf. Hu, Suh 1915) Giles, Lionel. 1915 (Jan.). The Tun Huang Lu Re-Translated. Journal of the Royal Asiatic Society of Great Britain and Ireland. 41-47. Goodrich, L. Carrington. 1976. A Dictionary of Ming Biography. New York and London: Columbia University Press. Gu, Hongyi 顧宏義. 2010. Songchao Fangzhi Kao 宋朝方志考. Shanghai: Shanghai guiji 海古籍出版社. Hahn, Thomas H. 1997. Formalisierter Wilder Raum—Chinesische Berge und ihre Beschreibungen (shanzhi 山 志 ). Unpublished PhD-thesis Heidelberg University. (Accessed online January 2008: http:/archiv.ub.uni-heidelberg.de/volltextserver/ volltexte/2007 [archived 2007-04-16]). Hargett, James M. 1996. Song Dynasty Local Gazetteers and Their Place in the History of Difangzhi Writing. Harvard Journal of Asiatic Studies 56(2):405-442. Hu, Suh (aka Hu, Shi). 1915 (Jan.). Notes on Dr. Lionel Giles' Article on ‘Tun Huang Lu’. Journal of the Royal Asiatic Society of Great Britain and Ireland. 35-39. Jin, Enhui 金恩輝; Hu, Shuzhao 胡述兆. 1996. Zhongguo Difangzhi Zongmu Tiyao 中國地 方志總目提要. Sino-American Publishers 漢美圖 . 3 vols. Lan, Jifu 藍吉富, ed. 1984. Dazangjing Bubian 大藏經補編. Taipei: Huayu 華宇. Naquin, Susan; Rawski, Evelyn S. 1987. Chinese Society in the Eighteenth Century. New Haven: Yale University Press. Qiu, Jiang 仇江. 2008. Qing chu Lingnan Fomen Shiliao Zhengli Yanjiu 清初嶺南佛門史料 整理研究. Unpublished conference paper. Conference- 沉淪 懺悔 救度:中國文 的懺悔 寫 國際學 研討會 2008.12.4-6 Place: Taibei (Academia Sinica 中央 研究院) and Jinshan (Dharma Drum Buddhist College 法鼓佛教學院). 1-31. Qiu, Jiang 仇江; Li, Fubiao 李福标, eds. 2003. Danxia Shanzhi 丹霞山志 . Beijing: Zhonghua shuju 中華 局. Reiter, Florian. 1978. Der “Bericht über den Berg Lu” (Lu-shan chi) von Ch’en Shun-yü; ein Historiographischer Beitrag aus der Sung Zeit zum Kulturraum des Lu Shan. PhD dissertation, Munich, Reiter, Florian. 1980. Bergmonographien als Geographische und Historische Quellen, Dargestellt an Ch’en Shun-yüs “Bericht über den Berg Lu” (Lu-shan chi) aus dem 11. Jahrhundert. Zeitschrift der Deutschen Morgenländischen Gesellschaft 130:397–407. Robson, James. 2009. Power of Place: The Religious Landscape of the Southern Sacred Peak (Nanyue 南 ) in Medieval China. MA: Harvard University Asia Center. (Harvard East Asian Monographs). Suwa, Gijun 諏訪義純. 1977. ‘Ryankyō jiki’ Shiryō kō 梁 寺記 資料考. Indogaku Bukkyōgaku Kenkyū 印度学仏教学研究 51 (26-1): 91-96. 86 Chung-Hwa Buddhist Journal Volume 25 (2012) Suwa, Gijun 諏訪義純. 1980. Nanchō Butsuji kō – Ryandai Kenritsu 南朝仏寺考-梁代建 立. Bukkyō no Rekishi to Bunka 仏教の歴史と文 :仏教史学会30周 記念論集. 157-179. Wu, Jiang. 2004. Leaving for the Rising Sun - The Historical Background of Yinyuan Longqi’s Migration to Japan in 1654. Asia Major (Third Series) 17(2): 89-120. Wu, Jiang. 2006. Building a Dharma Transmission Monastery in Seventeenth-Century China: The Case of Mount Huangbo. East Asian History 31:29-52. Yü, Chün-fang. 1998. Ming Buddhism. The Cambridge History of China. 8(2):893-952. Zhang, Dewei. 2010. A Fragile Revival - Buddhism under the Political Shadow, 1522-1620. PhD Thesis, University of British Columbia, Vancouver. Zhuang, Weifeng 莊威 , et al., eds. 1985. Zhongguo Difangzhi Lianhe Mulu 中國地方志 聯合目錄. Beijing: Zhonghua Shuju.
Chung-Hwa Buddhist Journal (2012, 25:129-148) Taipei: Chung-Hwa Institute of Buddhist Studies 中華佛學學報第二十五期 頁 129-148 (民國一百零一年),臺北:中華佛學研究所 ISSN:1017-7132 Verb Semantics and Argument Realization in Pre-modern Japanese: A Corpus Based Study Kerri L Russell and Stephen Wright Horn University of Oxford Abstract We are developing a corpus in order to investigate argument realization in detail for premodern Japanese, giving a comprehensive account of the basic grammar of each major stage of the language and allowing for both synchronic and diachronic analyses. When completed, the corpus will contain texts from the 8th century until the beginning of the 16th century. The results of the project will impact the description and understanding of pre-modern Japanese and its changes through time, furthering our understanding and interpretation of earlier texts. The project is also expected to have implications for general linguistic theory, both with regard to frameworks for understanding verb semantics and clause structure, and with regard to the application of syntactic theory to 'dead' languages. This paper focuses on the initial stages of corpus building, including methods for encoding orthography, morphology, and syntax. Keywords: Old Japanese, Argument Realization, Lexicon, TEI markup, Corpus-based Linguistics 130 Chung-Hwa Buddhist Journal Volume 25 (2012) 前現代日語的動詞語意與論元體現: 以語料庫為基礎的研究 Kerri L Russell and Stephen Wright Horn 牛津大學 摘要 藉由對語言的每一主要階段的基本文法之詳盡說明,並考慮共時與歷時分析, 們 發展一語料庫以詳細研究前現代日語之論元體現 當此完 後,此語料庫將包含從 八世紀到十六世紀初的的文本 此計畫的 果將影響前現代日語的描寫與了解,及 其隨著時間所造 的改變,增進 們對於早期文本的理解與解釋 此計畫也期望對 一般語言學理論有所影響,包含在了解動詞語意與 法結構的架構上,以及對於不 再通行的語言之語法理論的應用兩層面 此篇著重在語料庫建立的初始階段,包含 編碼拼字的方法,型態與語法 關鍵詞: 日語 論元體現 詞典 TEI標記 語料庫語言學 Verb Semantics and Argument Realization in Pre-modern Japanese 131 Introduction This paper presents the tagging conventions used in the development of a corpus for a pre-modern Japanese syntax project at the University of Oxford. The project is entitled Verb Semantics and Argument Realization in Pre-modern Japanese: A Comprehensive Study of the Basic Syntax of Pre-modern Japanese (abbreviated as ‘VSARPJ’) and is funded by a grant of almost £1 million from the Arts and Humanities Research Council in the UK. An important first phase of the project is the construction of an annotated and encoded corpus of texts. While the corpus is initially constructed specifically for the purpose of serving the VSARPJ project, we believe it will eventually become useful for the investigation of many other aspects of the syntax of pre-modern Japanese. The primary and immediate goal of the VSARPJ project is to investigate argument realization in detail for pre-modern Japanese. Argument realization is a fundamental aspect of the syntax of a language which concerns the ways in which verb meaning determines the number of arguments (e.g., subjects, objects, goals, etc.) in a clause and their morpho-syntactic and semantic properties. In essence, the project will contribute to a comprehensive account of the basic syntax of each of the stages of the pre-modern Japanese language, from the beginning of its recorded history in the 8th century until the end of the 16th century, and of the changes in basic syntax that have taken place over 1 these stages. The VSARPJ project has two parts: Synchronic and Diachronic. In the synchronic part, we investigate for the main stages of pre-modern Japanese the argument realization patterns of individual verbs and of verb classes. For each verb attested in the pre-modern Japanese texts we are using for this investigation, we establish both the syntactic frames in which it can occur and also its basic argument realization pattern. An important part of this will be the determination of what counts as an argument, and to what extent a more finely graded range of categories between argument and adjunct is needed. We will also look at other grammatical phenomena than argument realization which may be explained by verb semantics, for example, aspect, auxiliary selection, ellipsis, and case drop. The diachronic part of the project will build on the results of the synchronic part. In addition to charting changes affecting individual verbs, we will be able to establish an inventory of changes through the history of Japanese in argument realization both for individual verbs and for classes of verbs and thereby be able to investigate general patterns of change, including possible development pathways for verb meanings and argument realization. 1 More detail about the VSARPJ project, including the framework we use for analysis is presented on our website: http://vsarpj.orinst.ox.ac.uk/project.html. 132 Chung-Hwa Buddhist Journal Volume 25 (2012) Apart from the intrinsic value the results of the project will have to the description and understanding of Japanese grammar and its history, the project may also be expected to yield results of more general interest, as this will be the first detailed application of the type of framework employed here to a language such as Japanese, which frequently drops case markers, has extensive argument ellipsis (pro-drop), and has fairly free word order. It will also be the first large-scale investigation of this kind to a ‘dead’ language, which poses particular challenges to research into syntax. The initial stage of the VSARPJ corpus consists of building a digital corpus of texts, encoded with information about various linguistic properties. Once this stage is completed, the next stage will involve using the corpus to conduct various types of linguistic analysis. As we are currently in the initial stage of corpus construction, this paper will focus on the encoding of the corpus, and in particular, on the oldest stage of texts in the corpus, Old Japanese (OJ). In this paper we describe the contents of the VSARPJ corpus (section 2), the initial stage of marking up texts (section 3), and XML mark-up conventions (section 4). The VSARPJ Corpus The corpus will in the initial stage comprise a selection of texts from the three main periods of pre-modern Japanese (Old, Early Middle, and Late Middle Japanese): Old Japanese (‘OJ’, approximately 700-800) Kojiki kayō 古事記歌謡 Nihon shoki kayō 日本書紀歌謡 Fudoki kayō 風土記歌謡 Bussokuseki-uta 仏足石歌 Man’yōshū 万葉集 Shoku nihongi kayō 続日本紀歌謡 Shoku nihongi senmyō 続日本紀宣命 Engishiki norito 延喜式祝詞 712 720 730s after 753 after 759 797 697-791 (compiled) 927 Early Middle Japanese (‘EMJ’, 800-1200) Kokin wakashū preface 古今和歌集仮名序 Ise monogatari 伊勢物語 Tosa nikki 土佐日記 Taketori monogatari 竹取物語 Kagerō nikki 蜻蛉日記 Ochikubo monogatari 落窪物語 Makura no sōshi 枕草子 914 early 10th century 935 mid 10th century second half of 10th century late 10th century c. 1000 Verb Semantics and Argument Realization in Pre-modern Japanese Genji monogatari Sarashina nikki Konjaku monogatari-shū 源氏物語 更級日記 今昔物語集 Late Middle Japanese (‘LMJ’, 1200-1600) Esopo no fabulas Feiqe monogatari 133 1001-1010 1059-1060 1120 1593 1593 The corpus includes all main extant texts from the OJ period. For EMJ, the corpus focuses on texts from the period 900-1100 which are thought to a large extent to reflect the (spoken) language of the time. For large texts from this period, e.g., Genji monogatari, only extensive selections and not the entire texts will be included in the initial phase of the corpus. From the LMJ period, where most of the textual material is written in ‘classical Japanese’ rather than in the contemporary language and is characterized by a high degree of fossilization, we use two texts produced by the Jesuit missionaries at the end of the 16th century, the Esopo no fabulas and the Feiqe monogatari, which both reflect the contemporary language at the very end of the period, and also have the additional advantage of being written in alphabetic writing. For all periods, we follow the readings in the critical edition of Nihon koten bungaku taikei (NBKT), published by 2 Iwanami Shoten. Initial Stage of Markup The first stage of markup was completed in MS Word. This process involved romanization of texts and the use of symbols to indicate prefixes, suffixes, compounds, etc. Romanization of Texts First, each text was romanized to present a phonemic transcription in accordance with the phonology of the time the text is thought to have been written, and reflecting the sound changes which had been completed by that time. For example, the word which is often written by 恋, which in Modern Japanese (NJ) has the shape koi and which may be glossed very roughly as ‘love’. In the historical kana spelling (歴史的仮名遣い) this word is writtenこひ, regardless of the time from which the text dates. In a phonemic 3 transcription, however, this word has the shape /kwopwi/ (こ甲ひ乙) in OJ. As a result of 2 3 At this stage, construction of the OJ corpus is complete. The corpus consists of nearly 5,000 poems of around 90,000 words, 20,000 of which are verbs. We have not yet decided on how much to include from other periods, so we are not yet certain of the size of the corpora we will develop for later stages of pre-modern Japanese. We use the Frellesvig & Whitman (2008) transcription system for OJ. 134 Chung-Hwa Buddhist Journal Volume 25 (2012) sound changes which took place since OJ, the shape of this word has changed as shown in (1) below with approximate dating, and the corpus uses those shapes in accordance with the dates of the texts. Thus, in the Tosa nikki (from 935), this word is transcribed kopi, but in the Genji monogatari (from just after 1000) it will be written kowi. This is a very basic point, but one which is often ignored in the presentation of pre-modern Japanese texts. (1) OJ kwopwi > EMJ 800 kwopi> 950 kopi > 1000 kowi > 1100 koi Further, in the process of romanizing texts, we preserved a three-way distinction found in the texts: phonographic, logographic, and “not in text” for items which are not orthographically represented in the original text. This distinction is shown in (2) from the Man’yōshū (MYS 1:1) with phonographically written material in italicized text, logographically written material in plain text, and items not orthographically represented in the original text (“not in text”) written in underlined text. (2) 篭 毛 與 美篭 母乳 布久思 kwo mo yo mi-kwo moti pukusi basket ETOP EMPH HON-basket hold.INF shovel 美夫君志 持 此 岳 mi-bukusi moti ko no woka HON-shovel hold.INF this GEN hill 採須 毛 與 mo yo ETOP EMPH 尓 菜 ni na DAT greens 兒 家 吉閑名 tuma-su kwo ipye kikana pick-RESP.ADN child home ask.OPT 告紗根 nora-sane tell-RESP.OPT ‘Girl with your basket, with your pretty basket, with your shovel, with your pretty shovel, picking greens on this hillside, I want to ask your home. Please tell me!’ The interpretation of logographic writing relies on reading tradition and is in many respects uncertain. This is sometimes reflected in the existence of significantly different reading traditions of some texts. If a text or crucial parts of it are written logographically, we can not, strictly speaking, be certain of which words, or inflected forms, are reflected in the text. For example, in (2) above, we can not be certain that the verb written by 持 (in bold face) really is mot- ‘to hold’, nor that its inflected form really is the infinitive moti, as it is read according to the reading tradition, and not the adnominal motu. Thus, logographically written text is far less reliable than phonographically written text and can be used as linguistic evidence only with great caution. Verb Semantics and Argument Realization in Pre-modern Japanese 135 The items which are not orthographically represented in the text are also based solely on reading tradition. The word 此 ‘this’ in (2) is interpreted as as “ko no” but the genitive particle no is not represented by a character in the text. This issue will become particularly important when investigating argument structure in contexts where a case particle marking an argument is understood only from the reading tradition and not from the written text itself. As there is no way to prove the existence of the case particle, such examples are less reliable as evidence of case marking than those where a particle is written phonographically or even logographically. In the initial stage of markup we indicated this three way distinction by rendering phonographically written material in lower case, logographically written material in upper case, and items not recorded in the original script written in upper case with a comment 4 saying “not in text”. Symbols Used in Markup While romanizing the texts in this stage of marking up texts in MS Word, we added information about certain types of words with the symbols =, -, +, and ~. The symbol “=” was used to indicate a particle. For example, “ko no” in (2) above was written as KO=NO to indicate that “no” is a genitive particle. Following the discussion above, a comment was also attached to “no” to mark it as not having been represented orthographically in the original text. Next, “-” was used to indicate 1) inflecting forms following verbs and adjectives and 2) compound verbs. The last word in (2), norasane, consists of the stem of the verb nor- ‘to tell’ and the optative inflection of the respect auxiliary -(a)s-. This was marked in our word files as NORA-sane at this stage, thus simultaneously indicating orthography and morphology. The “+” symbol was used to indicate 1) nouns in compounds, including noun+noun and noun+verb combinations and 2) nominal and adjectival prefixes. For example, mikwo in (2) above consists of the honorific prefix mi followed by kwo ‘child’. This was marked as mi+KWO. Last, the symbol ~ was used for verbal prefixes and circumfixes. There are no examples of this in (2), but take, for example, sanuru (MYS 14.3504) which consists of 4 In hindsight, from the point of view of converting Word files to XML format, it would have been to our advantage had we indicated these three types of orthography using distinct styles in MS Word. 136 Chung-Hwa Buddhist Journal Volume 25 (2012) 5 the prefix sa- and the adnominal form of the verb ne- ‘to sleep’. This word was marked up as sa~nuru. XML Markup Conventions The next stage in corpus building involves XML markup according to the guidelines of the Text Encoding Initiative (TEI). The inventory of TEI coding is a small set of tags which are used to enclose portions of text; text enclosed by tags can further be characterized by various attributes, such as type, subtype, function, inflection, etc. The inventory of coding elements and conventions of the TEI are under constant development and improvement; they may be viewed at http://www.tei-c.org/. A major consideration for adopting TEI technology and guidelines for the corpus was that such standards ensure that the corpus we design will be long lasting, non-idiosyncratic, and updateable along with future changes in technology. We attempted to follow the TEI guidelines as closely as possible, however, we had to add some attributes for items we felt important for markup and which were not available in TEI. For example, we felt it important to indicate the inflection for all forms which can inflect (e.g., verbs, adjectives, copulas, auxiliaries) and created the ‘inflection’ attribute to allow us to do this. By indicating the inflection, we can easily compare all forms in any given inflection. The inflected form of the predicate also indicates clause types, so we can investigate main clauses or subordinate clauses based on the inflection of the predicate. Most of the OJ texts were marked up using MS Word, as described above. These 6 were then converted into XML format. Our mark-up policies consist of ways to link the original and romanized version of a text (section 1), to preserve orthographic conventions (section 2), to encode information about words, morphemes, and parts of speech (section 3), to identify lexemes and morphemes (section 4), and to encode syntactic features (section 5). As an example, we also present a fully marked up poem (section 6). 1. Original and Transliterated Text In order to reflect the crucial distinction between logographic and phonographic writing and to represent information about how words and/or morphemes were written in the 7 original script, we have adopted the following policies. First, for OJ texts we preserve the original script together with the phonemically transcribed text. Thus, reference can be made to the original script. This is done by having the original script in an <ab> (“anonymous block”) tag and assigned the @type attribute with the value ‘original’. We 5 6 7 The function of this prefix is unclear. The scripts for converting our word files into XML were written by James Cummings. By ‘original script’ we mean the script employed in the critical edition upon which a text is based. Verb Semantics and Argument Realization in Pre-modern Japanese 137 use “ojp” as the value for @xml:lang for texts written in Old Japanese and “ojp-Latn” for the transliterated version of the OJ texts. The romanized version of the script follows in its own <ab> tag with the @type attribute value ‘transliteration’. Line breaks ( <lb>) are also linked in the original and transliterated version using @xml:id and @corresp attributes in order to make it easy to see how a line of text was rendered in the original or how a line of text should be read. The @xml:id value contains the poem and line number; “MYS.1.1” means that the poem is from the Man’yōshū (MYS), Book 1, poem number 1, and orig_1 defines this as the first line break in the poem. This is illustrated in (3) using an excerpt from the poem presented in (2) above. (3)<ab type="original" xml:lang="ojp"> 篭毛與 <lb xml:id="MYS.1.1-orig_1" corresp="#MYS.1.1-trans_1"/> 美篭母乳 <!-- … --> </ab> <ab type="transliteration" xml:lang="ojp-Latn"> kwo mo yo <lb xml:id="MYS.1.1-trans_1" corresp="#MYS.1.1-orig_1"/> mikwo moti <!-- … --> </ab> 2. Encoding Orthography To preserve the three-way writing distinction discussed above, we use the character tag <c> with the @type attribute. The possible values for @type are “phon” for items written phonographically, “logo” for those written logographically, and “noLogo” for items not orthographically represented in the original text. This is shown in (4) below with (a) presenting the original text, the phonemic transcription, and glosses, and (b) showing the markup. (4) a. wa ga I GEN ‘of my hut’ b. 屋戸 乃 yadwo no hut GEN (MYS 8.1606) <c type="logo"> wa </c> <c type="noLogo"> ga </c> <c type="logo"> yadwo </c> <c type="phon"> no </c> 138 Chung-Hwa Buddhist Journal Volume 25 (2012) 3. Words, Morphemes, and Part of Speech Words are enclosed in ‘word(-like)’ tags, <w>, and information about part of speech is supplied by the @type attribute. The main word classes represented in this way are noun, pronoun, adverb, verb, adjective, copula, adjectival noun, verbal noun and particle. Complex words can consist of more than one word, forming a compound word. And they can consist of one or more words followed or preceded by one or more morphemes. The morpheme tag <m> is used for bound forms, and is then categorized by @type attributes with 8 9 the possible values of auxiliary, prefix, suffix, numeral, counter, and adjectival copula. The grammatical system and terminology reflected in the coding is that of Frellesvig (2010). Several of the parts of speech are further subcategorized, notably particles and auxiliaries, which are given subtypes and functions. For example, ga is a word (<w>) of the @type value “particle”, @subtype value "case" with the @function value “genitive”; and -(i)ki is a morpheme (<m>) of the @type value “auxiliary” and with the @function value “simple past”. A full, current list of the parts of speech, including subcategories, which are distinguished throughout the corpus is available at the corpus website (http://vsarpj.orinst.ox.ac.uk/corpus/). Inflecting parts of speech, such as verbs, auxiliaries, extensions, copulas, and adjectival copulas are supplied with information about their inflectional forms with the @inflection attribute. For inflectional forms which are identical in shape, we do not specify which inflecting form is shown even when the syntax allows us to chose one or the other. For example, both the adnominal and conclusive form of the verb yuk- ‘to go’ is yuku; it is impossible to tell which inflection this is just by the shape of the word. The verb in this case is assigned the @inflection value “adnconc” and not “adnominal” or “conclusive”. Similarly, for conjugation classes which do not have a distinction between conclusive and infinitive, we mark those categories with the @inflection value “infconc”, see (5). The reason for marking only morphologically distinct categories also at the level of individual conjugation classes is that it seems likely that there is a correlation between the inflected form of a clause predicate and the marking of its arguments, and that it therefore is important to distinguish between forms which are positively identifiable by their shape and on the other hand forms which on the basis of their shape may be assigned to either of two syncretic categories. (5) adnconc infconc 8 9 yuku ari Auxiliaries are inflecting suffixes, corresponding largely to the jodōshi (助動詞) of traditional Japanese grammar, e.g., negative -(a)zu or perfective -(i)te- and -(i)n-. The adjectival copula is the inflectional morpheme which usually follows adjective stems, with forms like conclusive -si, adnominal -ki, and infinitive -ku. Verb Semantics and Argument Realization in Pre-modern Japanese 139 In (6) we give an example of markup of part of speech and inflection. (6) a. 君 之 行 氣 長 奴 kimi ga yuki ke naga-ku nari-nu my.lord GEN go day long become-PERF ‘My lord, it has been a long time since you left’(MYS 2.85) b. <w type="noun"> kimi </w> <w type="particle" subtype="case" function="gen"> ga </w> <w type="verb" inflection="infinitive"> yuki </w> <w type="noun"> ke </w> <w> <w type="adjective"> naga </w> <m type="adjcop" inflection="infinitive"> ku </m> </w> <w> <w type="verb" inflection="stem"> nari </w> <m type="auxiliary" inflection="conclusive" function="perf"> nu </m> </w> 4. Lexeme and Morpheme Identification Each distinct item (word or morpheme) in the corpus is assigned a unique ID number. This has a number of advantages, in particular in making it possible to divorce searches in the corpus from actual strings of text. ● Searches for inflecting words or morphemes in the texts will not be limited to the actual inflected forms of an item. Thus, a search for the verb sin- ‘die’ will return all the inflected forms of that verb. However, searches can also be modified to give only a subset of forms, for example defined by specific inflected forms or combination with specific auxiliaries. ● Searches across time for items which have changed shape as a result of sound change will be straightforward. For example, as a result of sound change the verb OJ kwopwi- has a number of different shapes through time, as outlined above (1), and appears in texts from different periods in significantly different shapes (kwopi-, kopi-, kowi-, koi-). With unique ID numbering, it is not necessary to search for all of these shapes, but it is possible to search for all, or a specific set of, occurrences of this verb through the corpus, regardless of the actual shape of the verb at any particular stage. ● Searches are not contaminated by text strings which are identical to the intended target of a search. For example, the verb ‘request, ask’ OJ kop- has a number of forms which are segmentally identical with forms of ‘love’ from somewhere in the first half of the EMJ period (for example infinitive kopi, kowi, koi). With unique ID numbering, forms of one verb will not show up in searches for the other verb. 140 Chung-Hwa Buddhist Journal Volume 25 (2012) In our current practice, unique ID numbers consist of the letter ‘L’ and a six-digit number. They are assigned to a word (<w>) or morpheme (<m>) as an @ana attribute. For example, the form nari-nu (cf. (6) above) is marked as shown in (7). (7) <w> <w ana="#L031317"> nari </w> <m ana="#L000018"> nu </m> </w> The unique IDs are stored in a separate lexicon file, which is linked to the corpus and which contains basic information about each word or morpheme, including variant shapes of a form over time, its part of speech, conjugation class (where relevant), and a simple gloss. The information currently contained within a simple lexicon entry is as shown in (8). (8) Shapes:From the 8th century: kwopwi- > From 800: kwopi- > From before 950: kopi- > From c. 950-1000: kowi- > From c. 1100: koiPart of speech: verb Conjugation class: upper bigrade (上二段) Gloss: love This information in (8) was extracted from an entry presented below in (9). (9) <superEntry xml:id="L030731"> <entry> <form type="stem"> <orth stage="I">kwopwi-</orth> <orth stage="II">kwopi-</orth> <orth stage="III">kopi-</orth> <orth stage="V">kowi-</orth> <orth stage="VII">koi-</orth> <gramGrp> <pos>verb</pos> <iType type="UB"/> </gramGrp> </form> <def>love</def> </entry> <entry> <form type="noun"> <orth stage="I">kwopwi</orth> <orth stage="II">kwopi</orth> <orth stage="III">kopi</orth> <orth stage="V">kowi</orth> <orth stage="VII">koi</orth> Verb Semantics and Argument Realization in Pre-modern Japanese 141 <gramGrp> <pos>noun</pos> </gramGrp> </form> </entry> </superEntry> Here, the <superEntry> element defines the @xml:id for the lexical item. The <entry/> element is used to indicate one or more related lexical entries. The <form> element can be further specified with the @type attribute, which we currently only use for verbs to indicate their “stem” and the derived “noun” form of a verb. Next, <orth> (orthography) presents the shape of the form (e.g., kwopwi-) and also has the @stage attribute corresponding to stages of phonological development in the pre-modern period. Grammatical information is presented in <gramGrp>. This includes part of speech <pos> and conjugation class <iType>. The example in (9) above is defined as @type="UB" which stands for “upper bigrade”. Finally, the meaning is presented in the <def> tag; where more than one meaning is possible, the element <sense> is also used. As the research of the VSARPJ project progresses, additional grammatical information will also be entered into the lexicon. This will include information about the possible argument realization patterns of a verb. In this way, the lexicon will also be an important tool for organizing the results of our research as they appear. Finally, although outside the scope of the VSARPJ project, it should be mentioned that a lexicon linked to a text corpus by means of unique ID numbering has enormous potential for enriching the field of Japanese lexicography. 5. Syntax Syntactic information is encoded by means of a minimal inventory of constituents, namely those of clause, <cl>, and phrase, <phr>. The @type attribute can be used to identify the clause or phrase as being an argument (predicate selected) or adjunct (e.g., free adverbials). Clauses can be embedded within other clauses as subordinate clauses. Adnominal, or relative, clauses are embedded within phrases. Nominalized clauses are first wrapped as clauses to show the clausal structure and then wrapped as phrases to put them on the same level as noun phrases. Predicate-selected clauses (including but not limited to complement clauses) are categorized by the @type attribute as arguments ("arg"). Phrases can be headed by adverbs and nominalized clauses, in addition to nouns. Phrases are categorized by the @type attribute as arguments if they are clearly predicate selected, and as adjuncts ("djunct") if they are clearly free adverbials or sentence adjuncts. At this stage of markup, a large proportion of phrases are marked neither as arguments nor as adjuncts, because their status is not entirely clear. Resolving the status of such 142 Chung-Hwa Buddhist Journal Volume 25 (2012) phrases, and other important issues such as the determination of whether categories may be needed which are intermediary between the poles of argument and adjunct, or whether argumenthood is a scalar property, are parts of the substantive research of the VSARPJ project. The corpus will eventually reflect the results of this research. 10 The structure of both clauses and phrases is generally flat. The words which can form predicates of clauses are verbs, adjectives, or copulas. Within a clause, the word or words which form its predicate are identifiable by not being enclosed in phrase tags. Topics and right dislocated elements are located outside of the clauses they relate to. (10) exemplifies syntactic markup: (10a) shows a complex clause from the poem in (6a); (10b) shows the topic pito pa; (10c) shows the relative clause a ga kwopuru modifying kimi; and (10d) shows the right dislocated topic ware pa. (10) a. Complex clause <cl> <cl> <phr type="arg"> kimi ga </phr> yuki </cl> <cl type="arg"> <phr type="arg"> ke </phr> nagaku </cl> narinu </cl> b. Topic 人 者 待跡 不来家留 pito pa matedo ko-zu-kyeru person TOP wait.CONC come-NEG.INF-MPAST.ADN Even though I wait for you, you do not come’(MYS 4.589) <phr> pito pa </phr> <cl> <cl> matedo </cl> kozukyeru </cl> c. Relative clause 吾 戀 流君 kwopuru kimi a ga I GEN love.ADN lord 10 Within phrases constituency is usually predictable from the sequence of constituents, but if not, constituency can be marked as necessary. Verb Semantics and Argument Realization in Pre-modern Japanese 143 ‘my lord, whom I love’ (MYS 4.485) <phr type="arg"> <cl> <phr type="arg"> a ga </phr> kwopuru </cl> kimi </phr> d. Right dislocated topic 野嶋 左吉 爾 伊保里 須 和礼 波 Nwosima ga saki ni ipori su ware pa [place name] GEN cape DAT hut do.CONCL I TOP ‘me, I make a hut on the cape of Noshima’(MYS 15.3606) <cl> <phr> nwosima ga saki ni </phr> ipori su </cl> <phr> ware pa </phr> 6. An Example of Full Markup Finally in this section, we provide as an example the full markup of the text in (6a) above. (11) <ab type="original" xml:lang="ojp"> 君之行 <lb xml:id="MYS.2.85-orig_1" corresp="#MYS.2.85-trans_1"/> 氣長 奴 <lb xml:id="MYS.2.85-orig_2" corresp="#MYS.2.85-trans_2"/> 山多都祢 <lb xml:id="MYS.2.85-orig_3" corresp="#MYS.2.85-trans_3"/> 迎加将行 <lb xml:id="MYS.2.85-orig_4" corresp="#MYS.2.85-trans_4"/> 待尓可将待 </ab> <ab type="transliteration" xml:lang="ojp-Latn"> <s> <cl> <cl> <phr type="arg"> <w type="noun" ana="#L042066"> <c type="logo">kimi</c> 144 Chung-Hwa Buddhist Journal Volume 25 (2012) </w> <w type="particle" subtype="case" function="gen" ana="#L000503"> <c type="logo">ga</c> </w> </phr> <w type="verb" inflection="infinitive" ana="#L031840"> <c type="logo">yuki</c> </w> </cl> <lb/> <cl type="arg"> <phr type="arg"> <w type="noun" ana="#L050033"> <c type="phon">ke</c> </w> </phr> <w> <w type="adjective" ana="#L007007"> <c type="logo">naga</c> </w> <m type="adjcop" inflection="infinitive" ana="#L000033"> <c type="logo">ku</c> </m> </w> </cl> <w> <w type="verb" inflection="stem" ana="#L031317"> <c type="logo">nari</c> </w> <m type="auxiliary" function="perf" inflection="conclusive" ana="#L000018"> <c type="phon">nu</c> </m> </w> </cl> </s> <lb/> <s> <cl> <phr> <cl> <cl type="djunct"> <phr type="arg"> <w type="noun" Verb Semantics and Argument Realization in Pre-modern Japanese ana="#L050034"> <c type="logo">yama</c> </w> </phr> <w type="verb" inflection="infinitive" ana="#L031047"> <c type="phon">tadune</c> </w> </cl> <lb/> <w type="verb" inflection="infinitive" ana="#L031722"> <c type="logo">mukape</c> </w> </cl> <w type="particle" subtype="foc" ana="#L000506"> <c type="phon">ka</c> </w> </phr> <w> <w type="verb" inflection="stem" ana="#L031840"> <c type="logo">yuka</c> </w> <m type="auxiliary" function="conjectural" inflection="adnconc" ana="#L000002"> <c type="logo">mu</c> </m> </w> </cl> </s> <lb/> <s> <cl> <phr> <cl> <w type="verb" inflection="stem" ana="#L031644"> <c type="logo">mati</c> </w> </cl> <w type="particle" subtype="case" function="dat" ana="#L000519"> <c type="phon">ni</c> </w> <w type="particle" subtype="foc" 145 146 Chung-Hwa Buddhist Journal Volume 25 (2012) ana="#L000506"> <c type="phon">ka</c> </w> </phr> <w> <w type="verb" inflection="stem" ana="#L031644"> <c type="logo">mata</c> </w> <m type="auxiliary" function="conjectural" ana="#L000002" inflection="adnconc"> <c type="logo">mu</c> </m> </w> </cl> </s> </ab> Conclusion This small inventory of syntactic elements and conventions for their use, combined with the material they can contain, will allow unique identification of at least all of these elements or properties in the corpus: topics, right dislocated elements, focused elements, noun phrase heads, particle scope, clause predicates (including analytic predicates), zero marked arguments, topicalized arguments, relative order of case marked and zero marked arguments (including ordering relative to focused elements), and clause types (main, subordinate, adnominal, nominalized). Furthermore, all such elements and properties, as well as combinations of them, and combinations with other items and properties coded in the corpus will be searchable and extractable from the corpus. For example, we will be able to use the corpus to extract all attested syntactic frames for individual verbs, within individual stages of the language as well as across different stages. All of this is highly relevant, not just to the VSARPJ research project, but also more generally and widely to 11 investigation of most features of pre-modern Japanese syntax. 11 Needless to say, these coding conventions easily lend themselves to the creation of equally powerful corpora of modern Japanese. Verb Semantics and Argument Realization in Pre-modern Japanese Abbreviations General TEI VSARPJ Text Encoding Initiative Verb Semantics and Argument Realization in Pre-modern Japanese Grammatical Terms AND Adnominal CONC Concessive CONCL Conclusive DAT Dative EMPH Emphatic ETOP Emphatic topic HON Honorific NEG Negative OPT Optative RESP TOP Respect Topic Languages EMJ LMJ MJ NJ OJ Early Middle Japanese Late Middle Japanese Middle Japanese Modern Japanese Old Japanese Texts MYS Man’yōshū 147 148 Chung-Hwa Buddhist Journal Volume 25 (2012) References Frellesvig, Bjarke and Whitman, John. eds. 2008. Proto-Japanese: Issues and Prospects. Amsterdam: John Benjamins. Frellesvig, Bjarke. 2010. A History of the Japanese Language. Cambridge: Cambridge University Press. Frellesvig, Bjarke; Hom, Stephen Wright; Russell, Kerri L.; Sells, Peter. The Oxford Corpus of Old Japanese. http://vsarpj.orinst.ox.ac.uk/corpus/corpus.html. Levin, Beth and Hovav, Malka Rappaport. 2005. Argument Realization. Cambridge: Cambridge University Press. Text Encoding Initiative. (n.d.) P5: Guidelines for Electronic Text Encoding and Interchange. http://www.tei-c.org/release/doc/tei-p5-doc/en/html/index-toc.html.
Chung-Hwa Buddhist Journal (2012, 25:105-128) Taipei: Chung-Hwa Institute of Buddhist Studies 中華 學學報第二十五期 頁 105-128 (民國一百零一年),臺北 ISSN:1017-7132 中華 學研究所 The XML-Based DDB:The DDB Document Structure and the P5 Dictionary Module; New Developments of DDB Interoperation and Access Charles Muller (University of Tokyo) Kiyonori Nagasaki (International Institute for Digital Humanities, General Incorporated Foundation) Jean Soulat (Independent scholar) Abstract 1 This paper has three parts. The first part, by A. Charles Muller, consists of a comparative analysis of the DDB structure with that of the Dictionary Module in the current TEI P5 recommendations. The second and third parts are short summaries of the recent applications offering enhanced access to and usage of the DDB, created by Kiyonori 2 3 Nagasaki and Jean Soulat. Keywords: Lexicons, XML, TEI, Web API, Interoperability 1 2 3 A. Charles Muller teaches Buddhism, East Asian thought, and a little bit of XML at the University of Tokyo. He is the founder and chief editor of the Digital Dictionary of Buddhism, its companion Chinese-Japanese-Korean-Vietnamese/English Dictionary (CJKV-E). He is also the founder and managing editor of the scholarly network H-Buddhism (<http://www.h-net. org/~buddhism>). His primary fields of research are Korean Buddhism and East Asian Yogācāra/Tathāgatagarbha thought, along with occasional forays into Zen, Confucianism, and Daoism. A listing of his books and articles on these topics can be accessed through his web site, Resources for East Asian Language and Thought (<http://www.acmuller.net/index.html>). Kiyonori Nagasaki (永崎研宣) has an M.A. in Buddhist Studies from Tsukuba University, and is best known for his work as the primary developer handling the <SAT Taish Database> and <INBUDS Database> in Tokyo. He has developed a range support structures to provide interoperation between SAT and the DDB, as well as INBUDS. He also wrote the Perl code for our “Feedback” option. Jean Soulat is a telecom engineer with a personal interest in Buddhism and Chinese Culture. He has worked in the area of computer networks since the early days of the French public data network and then with different large scale networking and IT programs. He has created the application tool named Smarthanzi (<http://www.smarthanzi.net>) for looking up Sinitic words and characters in East Asian texts. Based on Smarthanzi, he has also created a specialized application for the DDB, called DDB Access <http://download.smarthanzi. net/ddbaccess>), which adds extensive functionality to the standard DDB lookup. 106 Chung-Hwa Buddhist Journal Volume 25 (2012) 以可擴展標記語言(XML)為基礎的電子 教字典 (DDB): DDB文件結構與P5字典模組; DDB 相互 操 與取用技術的新發展 Charles Muller (東京大學) 永崎研宣 (一般財団法人人文情報學研究所) Jean Soulat (獨立學者) 摘要 此篇文章為三部分 第一部分由Charles Muller所寫,以現行的TEI P5所建議的字典 模組,對DDB結構之比較分析 第二與第三部分則是由Kiyonori Nagasaki 與 Jean Soulat 所寫,簡短地概述近來提供加強對於DDB的取用技術與使用 關鍵詞 詞典 XML TEI 網路應用程式介面 相互操 The XML-Based DDB 107 The DDB Document Structure and the P5 Dictionary Module Charles Muller No doubt that many of those of us who began their engagement in the development of web-based canonical collections, online databases, and various other research tools related to Buddhist Studies and East Asian studies at the time of the inception the WWWeb (circa 1994-95) look back in sheer amazement at the fact that almost fifteen years have passed since we made our most rudimentary stabs at developing these materials. At that time, Unicode, XML, Yahoo, Google, Internet Explorer, and scores of other now-commonplace Internet tools were yet to be heard of. In a short decade and a half, our way of doing research — and especially textual research — has been radically transformed. Because of this radical change, young scholars coming into our field today need an entirely different set of skills for finding and organizing information. On the other hand, they no longer need, upon their departure from graduate school, to begin to try to figure out how they are going to afford to buy their first printed Taishō canon, and all the dictionaries and other reference tools needed to work with East Asian Buddhist texts. Most of these are now available digitally, and online, in one format or another. And these young scholars will have far more than simply the printed Taishō, Zokuzōkyō and other smaller canonical collections presently available at their disposal, as new, heretofore unavailable materials are being made searchable and downloadable — a main case in point being that of the newly developing Chan Texts Database, which will make available a variety of Chan texts, along with Dunhuang materials which were almost impossible to get one's hands on before this. And of course, for working with these texts, there is the DDB. As has been explained over the years in numerous other presentations and project reports, I began my compilation of what turned out to be the DDB in my early days in graduate school (1986) having become aware of the incredible dearth in adequate lexicographical and other reference works in English language for the textual scholar of East Asian Buddhism in particular, and East Asian philosophy and religion in general. I worked at compiling terms for about ten years, and in 1995, shortly after the birth of the WWWeb, uploaded the collection that I had gathered up to that time up to my first web 4 site, and the rest is history. 4 For various accounts of the development of the DDB up to its present state, please see the bibliography, which provides a fairly complete listing of presentations, both published and unpublished. 108 Chung-Hwa Buddhist Journal Volume 25 (2012) Suffice it to say that the DDB has become the de facto choice among reference tools for young Western scholars doing work involving East Asian Buddhism. It is introduced as a primary reference work in all major North American universities that have programs dealing with East Asian Buddhism; it is supported in terms of content and programming by more than sixty scholars, many of whom are recognized as leading figures in their own sub-areas of Buddhist Studies or Information Technology; and it is presently subscribed 5 to by twenty-eight university libraries. It is also now accessible through online canonical 6 text databases such as that of the SAT Taishō Database, and is included in various Han7 8 character-based lookup tools, including Smarthanzi, the WWWebJDic Server, and 9 Tangorin. In prior papers dealing with the DDB, I have explained various aspects of the project, 10 including history, design, collaboration strategies, XML structure, and so forth. Here, I would like to focus on a specific issue with the present XML structure, paying special attention to its relation with the TEI P5 Dictionary Module. At the conference where this paper was originally presented (which is the basis for the present volume), a significant portion of the presentations dealt with XML in one way or another. What most of them had in common, however, was their presentation of XML as a way of marking up pre-existent materials, whether they be pre-existent canonical collections, lexicons, or whatever. The DDB was unusual among the presented projects at this venue in that it was one of the very few where XML was shown as a framework for the development of a new data set from the ground up, and which, working through XSLT, provides the systematic structure for an online database-reference resource. Indeed, among online academic reference tools of its kind, the DDB as a fully XML structured resource is unusual, since most online reference resources tend to be run from a more traditional database structure. The original choice of XML to structure the DDB data is basically an accident of history, related to the background of the people from whom I received my earliest technical advice. Most important in this regard is Christian Wittern, who discovered my earliest, hard-linked HTML version of the DDB on the Web sometime in 1995 or 1996. He applied a basic SGML structure to the data, where the tags referred to elements of the content and document structure, rather than being the mere style commands of HTML. Christian send me a copy of his SGML-restructured data, along with SoftQuad Panorama, See <http://www.buddhism-dict.net/ddb/subscribing_libraries.html.> See <http://21dzk.l.u-tokyo.ac.jp/SAT/ddb-sat2.php.> Also available in the Windows desktop application DDB Access; both of these are available at <http://www.smarthanzi.net>; to be discussed by its developer below. 8 <http://www.csse.monash.edu.au/~jwb/cgi-bin/wwwjdic.cgi?1C> 9 <http://tangorin.com> 10 See JODI (2002). 5 6 7 The XML-Based DDB 109 a freeware viewer for SGML. I knew nothing of SGML at this time, but could see that it could be quite useful to mark up the data with content-meaningful tags as opposed to simple HTML style markers. Before long, the news of the impending release of the new XML standard to replace SGML had many people excited, and it seemed as if major software companies intended to support it, so I converted the DDB to XML, and stored it that way during the next couple of years, running an array of MS-Word macros to generate new HTML files periodically, uploading these to my web site. But generating files this way each time was a convoluted and time-consuming process. Around 2000, I knew that people were beginning to publish XML materials on the web via XSL, and that more and more major markup and publication projects were turning to XML. But there were virtually no examples of serious real-world implementation other than in brief W3C explanatory materials. And without any kind of precedent available from which to learn, my programming skills were entirely insufficient for trying to implement raw XML on the web on my own. At the same time, there were really no major data sets like my own readily available for testbed purposes, so newly appearing XML software development companies had no way to thoroughly test their new applications on actual large and complex data sets. On my website at the time, I had a description of the content and structure of my data (which was importantly, multilingual, including Chinese, Korean, and Japanese scripts, and a fairly large range of diacritical characters), and I was contacted by a few companies, including Microsoft and Altova (XMLSpy) who asked to use my data for testing of their 11 XML software currently under development. After having been contacted by these companies, it occurred to me that there may be other individual XML developers who could take advantage of the DDB data for their own purposes, and at the same time help me to begin to take proper advantage of the XML structure and begin delivering the data on the web in real time, through XSL and whatever else might be required. I posted a message on the Mulberry XSL list inquiring as to whether anyone was interested in working with my data in this way. Within a week, I received a response from Michael Beddow, who, in connection with work he was doing 12 on the web-based version of the Anglo-Norman Dictionary, expressed a willingness to try to get the XML-DDB up and running on a server. In a very short time, he accomplished this to a level far beyond what I could have ever hoped. Since both Michael 11 I agreed to both of these requests. From Microsoft, I never even received so much as a thankyou note. Altova gave me one free upgrade (to my already-purchased license), but then forgot about me, demanding that I pay the full price for their next Enterprise version. This turned out to be a major motivation toward my efforts to learn Emacs. Also, luckily, not too much later, <oXygen/> made its appearance on the scene, with its much more reasonably priced, fullyfeatured XML editor, and much more humane support staff. 12 <http://www.anglo-norman.net/> 110 Chung-Hwa Buddhist Journal Volume 25 (2012) 13 and I have recounted the main points of this landmark task in some detail in the past, I will not go into great detail retelling this stage of the process, except to say that Michael is still providing the basic technical support for the project, including security as well as the basic delivery of the data, for which I, and thousands of researchers of Buddhism around the world can be eternally grateful. This simple but elegant XML/Perl/XSL delivery system developed by Michael has functioned in the same way, basically unchanged for almost a decade, and technically speaking, there have been no special demands or changes to our system that XML/XSL can't deal with, so although the suggestion of changing over to a traditional database system has been made to me from time to time, I have never felt the need to give it serious consideration. Although a database setup such as MySQL may be a bit faster in retrieving entries, having the data in XML format allows me to fully integrate it with the rest of my work on my desktop. Since I do virtually all of my scholarly research and translation in XML, and maintain various related data sets in XML or plain text format, having the DDB in XML while using the same basic tag structure for the rest of my documents makes it very easy to move things back and forth. Having mentioned the fact that I use the same basic tagging structure in the rest of my work, I would like, from here, to deal with a technical aspect of the project that I have touched on briefly from time to time, but have never really worked through in detail: that is the relationship of the structure of the DDB to the TEI document model. I have been using TEI for my writing and most of the other phases of my work for about eight years now. Also, the two major technical contributors to the DDB project, Christian Wittern and Michael Beddow, are persons well-versed in the development and implementation of TEI. Since TEI has a subset of tags specifically designed for the structuring of lexical materials, it might be reasonable to assume that the DDB would be a fully TEI-based project. It is to a significant degree. Since I have been using TEI in my work for several years, it has been the case that when I have needed a new tagging structure for the DDB, I have always first checked the TEI tag set to search for an appropriate tag. Almost always finding one, I have done my best to implement new elements in the DDB according to TEI hierarchical rules and with the recommended attributes. Thus, the content of the <sense> nodes in the DDB (discussed in further detail below) is fully TEI(P4)-compliant. This covers many sub-structures, including <list>, <biblStruct> and many other basic prose structures necessary for writing short dictionary entries, as well as encyclopedic entries—basically replicating the rules of what would be allowable inside the TEI <p> element. 13 See the JODI (2002) article, ibid. I have also discussed Michael's role in the project in a few other articles. The XML-Based DDB 111 For the nodes above, and outside <sense> however, the structure of DDB entries is somewhat different from the sort of thing that one would build if one were to start from the ground up with the present TEI P5 Dictionary Module. When the XML for the DDB was first set up, there was no special intention to reject the TEI (at that time P4) structure. Christian Wittern and I sat down at a conference one time and tried to write a tag structure that best fit that of the DDB at the time. At this time I knew nothing of TEI, and Christian was just getting seriously involved in this Initiative. Thus, while this initial structure was informed by TEI concepts, it tended to conform more closely to the actual structure of the DDB, rather than trying to force a full TEI framework. The basic structure of a DDB entry is currently like this: <entry> (one dictionary entry) <hdwd> (Chinese logographic headword) <pron_list> (grouping the pronunciations into a separate node) <pron> (pronunciations of the headword in various East Asian languages, in roman script as well as native syllabaries) <pron> <pron> ... </pron_list> <sense_area> (grouping semantic/content information) <trans> (a short, primary translation or meaning of the head word) <sense> (explanatory portion of the headword, for which there is usually more than one) <sense> ... </sense_area> <dictref> (list of references to entries for the term in other major reference works) <dict> <dict> ... </dictref> </entry> Filled out with attributes and data, a relatively short sample entry looks like this: <entry ID="b9403" added_by="cmuller" add_date="1993-09-01" 112 Chung-Hwa Buddhist Journal Volume 25 (2012) update="2009-11-25" rad="金" radval="08" radno="167" strokes="12"> <hdwd>鐃</hdwd> <pron_list> <pron lang="zh" system="py" resp="c.wittern">naó</pron> <pron lang="zh" system="wg" resp="cmuller">jao</pron> <pron lang="ko" system="hg" resp="cmuller">요</pron> <pron lang="ko" system="mc" resp="cmuller">yo</pron> <pron lang="ko" system="mr" resp="cmuller">yo</pron> <pron lang="ja" system="kk" resp="cmuller">ドウ</pron> <pron lang="ja" system="hb" resp="cmuller">ny </pron> <pron lang="vi" system="qn" resp="daouyen">nao</pron> </pron_list> <sense_area> <trans resp="cmuller" rend="hide">a <term lang="en">hand-bell</term></trans> <sense resp="cmuller" ref="Yokoi,Hirakawa">Cymbals.(Skt. <term lang="sa-mw" n="11740">tūrya</term>) <bibl type="canonlink">法華經 <xref canonref="http://21dzk.l.utokyo.ac.jp/SAT/T0262_,09,0009a11:0262_,09,0009b11.html">T 262.9.9a13</xref></bibl> </sense> </sense_area> <dictref> <dict><title>Zengaku daijiten (Komazawa U.)</title><page>989b</page></dict> <dict><title>Japanese-English Zen Buddhist Dictionary (Yokoi)</title><page>512</page></dict> <dict><title>Ding Fubao</title><page/></dict> <dict><title>Buddhist Chinese-Sanskrit Dictionary (Hirakawa)</title><page>1193</page></dict> <dict><title>Bukky daijiten (Mochizuki)</title><page>(v.16)1307b,2595b,4137a</page></dict> <dict><title>Bukky daijiten (Oda)</title><page>1370-1</page></dict> </dictref> </entry> The XML-Based DDB 113 We will get into the treatment of comparative issues in detail below, but just to provide the reader with some context, it is probably useful to have some idea of the basic 14 P5 recommendation for dictionary structures, which, as provided on the TEI web site, is like this: <entry> <form> <orth>disproof</orth> <pron>dis"pru:f</pron> </form> <gramGrp> <pos>n</pos> </gramGrp> <sense n="1"> <def>facts that disprove something.</def> </sense> <sense n="2"> <def>the act of disproving.</def> </sense> </entry> As we can see, the fundamental elements <entry>, <pron>, and <sense> are used in the DDB for the same purposes, and with basically the same kind of hierarchical structure. The most glaring difference is seen in the DDB element <hdwd> (head word), which is idiosyncratic, an odd tag that I created during a short period in which the DDB was stored in a mixed structure of XML and HTML tags, and the attempt to use the tag <head> for head words produced obvious problems in HTML. It would have been better to get rid of this at an earlier stage, but opportunities were missed, so it remains here, embedded at the core of the DDB. In P5, the corresponding tag would probably be <orth>. Beyond this, the other major difference is the presence in the DDB of the node <dictrefs>, within which information is contained regarding references on the same term in other dictionaries. Before we address specific issues of tags and their structure, a word regarding the nature of the data set itself is in order. That is, the East Asian notion that is translated into English as “dictionary” — 辭典 (Ch. cidian; K. sajeon; J. jiten) sometimes refers to 14 <http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-entry.html> 114 Chung-Hwa Buddhist Journal Volume 25 (2012) something that is basically the equivalent of a Western dictionary. But it is also often something quite different, in the sense that it may well end up containing entries that are more like those of an encyclopedia in terms of length and complexity. And there is less linguistic-oriented information (such as grammatical forms and so forth). Another distinctive characteristic for East Asian works of this type is that with Sinitic Buddhism being a pan-East Asian phenomenon, the Chinese logographic head words have distinct pronunciations in Mandarin, Korean, Japanese, and Vietnamese (including variant readings within these languages), with these being represented in both native syllabaries and Western romanization systems. Since the TEI dictionary module is basically constructed upon the Western model, problems will be evident from the start. Acknowledging these points, let us try to see what would be involved in bringing the DDB structure in line with TEI P5. For the moment, we will leave the level of attributes aside, focusing on elements. The first obvious change would be that of replacing <hdwd> to <orth>. This would not be terribly difficult, since a global replacement throughout the data and XSL files should not result in any major problems. Next, removing the <pron_list> wrapper from the level above the <pron> elements would not pose major problems at the XML level, but it would require some degree of rewriting of the style sheet; the same would be true for removing the <sense_area> wrapper from around the senses. The TEI element <gramGrp> is not relevant to the DDB. A major consideration would be the conversion of the <dictrefs> area. This is an idiosyncratic component of the DDB, since it is not customary for dictionaries in general — whether they be Eastern or Western — to provide a list of references in other dictionaries or encyclopedias. Among the child nodes offered in TEI entry/dictionaries, the only thing that appears to come close to this is the element <xr> (x-reference). But if we used this, it would probably be more appropriate to use it in the place of the <dict> reference, rather than the wrapper <dictrefs>. With this kind of list, including essentially bibliographical references, it would be helpful for styling and other programming purposes to have a wrapper for this list of <xr> elements, something playing a similar to <listBibl>. Actually, the inclusion of <listBibl> — or something like it — as a possible child of <entry> would very helpful in this case. Then would could convert the <dictrefs/dict> structure into a basic TEI <listBibl/bibl> tree (<listBibl> does appear under <xr>, so this would be another possible route.). But again, this is a special dimension of the DDB, and not a something that one 15 would see needed for dictionary entries in general. 15 In this regard, the DDB is often more like an encyclopedia than a dictionary, but the TEI does not at the moment have an encyclopedia module. A discussion of encyclopedias on TEI-L (<http://listserv.brown.edu/archives/cgi-bin/wa?A0=TEI-L>) at the end of 2009 concluded in The XML-Based DDB 115 A provisional rewrite of the DDB entry structure, based on the above changes, would now look something like this (shortening some sections for the sake of readability): <entry ID="b9403" added_by="cmuller" add_date="1993-09-01" update="2009-11-25" rad="金" radval="08" radno="167" strokes="12"> <form> <orth>鐃</orth> <pron lang="zh" system="py" resp="c.wittern">naó</pron> <pron lang="ko" system="hg" resp="cmuller">요</pron> <pron lang="ja" system="kk" resp="cmuller">ドウ</pron> </form> <sense type="brief"><def>a <term lang="en">hand-bell</term></def></sense> <sense type="normal" resp="cmuller" ref="Yokoi,Hirakawa">Cymbals.(Skt. <term lang="sa-mw" n="11740">tūrya</term>) <bibl type="canonlink">法華經<xref canonref="http://21dzk.l.utokyo.ac.jp/SAT/T0262_,09,0009a11:0262_,09,0009b11.html">T 262.9.9a13</xref></bibl> </sense> <xr> <bibl><title>Buddhist Chinese-Sanskrit Dictionary (Hirakawa)</title><biblScope type="pages">1193</biblScope></bibl> <bibl><title>Bukky daijiten (Mochizuki)</title><biblScope type="pages">(v.16)1307b,2595b,4137a</biblScope></bibl> <bibl><title>Bukky daijiten (Oda)</title><biblScope type="pages">13701</biblScope></bibl> </xr> </entry> The next level of conversion—that of attributes—gets more complicated, as the DDB utilizes a number of attributes that are not contained either as attributes or elements in the dictionary module or elsewhere in the TEI P5 tag set, as far as I can determine. The character of the attributes currently used in the DDB can serve to draw our attention to the recommendation for the encyclopedia maker to structure his data with a series of nested <div> tags with various attributes making the content distinctions. 116 Chung-Hwa Buddhist Journal Volume 25 (2012) some of the distinctive aspects of the DDB mentioned above. That is, rather than being the markup of some pre-existent lexicon, the DDB is a new work in progress. To properly embed information related to the development of each entry, the attributes attached to our <entry> tag contain several pieces of information that provide important history regarding the entry, as well as categorizing and sorting information. These include, at the entry level, @added_by, @add_date, and @updated. Interestingly, the TEI has always shown concern about this kind of documentation, as these kinds of elements have always been part of TEI document headers. But as far as I can tell, there is no mechanism for recording this kind of information at the level of entries or entry child nodes in a reference work. So if we tried to convert to P5, these would need to be added to a customized schema. Similarly, at the <sense> level, the @resp, @source, and @ref attributes are critical to the DDB for keeping clear records of sources, contributions, responsibility, and related references. Unless I have missed some alternative way of dealing with these in the Guidelines, it seems that the committee that developed the dictionary module had in mind the markup of pre-existent dictionaries, rather than the collaboration-based creation of a new dictionary in mind when they created this attribute structure. Would it be worth the effort to convert to P5? The thought of going through this present comparison of the DDB entry structure with that of P5 has been on my mind for some time. Why would one go through the trouble of making this kind of major conversion in an XML structure that is working fine as it is? There would be a few significant advantages to doing this. The first reason is that, as mentioned above, most of the rest of the academic research and writing that I am doing is being composed in TEI P5. Having a DDB structure that is fully TEI compatible would allow me to freely copy data back and forth without generating non-validity problems at either end. Second, this would allow the usage of the same basic style sheets for all of my projects. Third, full TEI compatibility would allow me to take advantage of other tools produced by members of the TEI community, including its schemas, and CSS/XSL sheets. There are, however, a couple of significant drawbacks. First, it would not only require a major reworking of the data and the style sheets. It would also entail a reworking of scores of MS-Word macros that have been the background for the actual production of the data for more than a decade. So careful consideration is needed before taking the leap. The XML-Based DDB 117 Interoperation I: The DDB and SAT Kiyonori Nagasaki The digitization of the resources for Buddhist studies—as well as those for other fields of academic inquiry—has now been in progress for a few decades. As a result of the diligent efforts of those engaged in various digitization projects, researchers of Buddhism now have access to a wide range of electronic materials, a state of affairs that serves to enhance the efficiency, accuracy, and overall quality of their research. The emergence of the Web environment has been the fundamental catalyst allowing a wide range of new ways of storing, representing, and sharing of resources. Recently, the next evolution of the Web—known as Web 2.0—has brought about a transformation in the delivery and handling of digital scholarly resources for all kinds of research. Most important here is the availability of the AJAX technology and Web API, which have enhanced the ways of sharing and delivering Web resources by leaps and bounds. The dissemination of cloud computing technology will further serve to support these kinds of developments. Even only a decade or so ago it was taken for granted that for complementary digital resources—such as text and lexicon—to work together effectively, they had to be integrated one way or another into a single database format. While this may still well probably happen in a case where both resources are developed by the same individual or within a single project, if the resources were developed by separate entities, the combining of both into a single structure would usually entail the loss of independence or identity for one party or the other. However, in recent years, the situation has changed significantly, since, by adopting AJAX, Web API, and similar technologies, those who have been developing Web-based resources in the Humanities will be able to cooperate/interoperate between projects while each project maintains full independence. The prominent example to be offered here is the recent interoperation developed 16 (starting in 2008) between the SAT Taishō Database and the DDB and INBUDS (Indian 17 and Buddhist Studies Treatise Database) on the Web environment using the AJAX technology. Since April of 2008, the SAT Web service has been providing the function wherein if the user selects a portion of kanbun text from the Taishō canon with the mouse, a list of terms within that text that are available in the DDB will be generated alongside the text, along with English head words and links into the DDB itself. We are continuing to enhance various aspects of this function. Since the time of the presentation of this application at the Chan texts conference in Oslo in October 2009, SAT has been providing further new functions implemented with 16 <http://21dzk.l.u-tokyo.ac.jp/SAT/search.php> 17 <http://21dzk.l.u-tokyo.ac.jp/INBUDS/search.php> 118 Chung-Hwa Buddhist Journal Volume 25 (2012) AJAX and Web API. Previously, users could search related articles from the INBUDS 18 database (maintained by the Japanese Association of Indian and Buddhist Studies ), but were only able to elicit basic bibliographical reference information. Under this new function, users are also able to obtain PDF files of the articles (when PDFs are available) by clicking on the PDF icon displayed on the ending of each line of the search results. 19 Clicking on the icon opens up a page within the CiNii service that includes a link to the PDF file. This PDF file service is provided for the whole academic society, not only to Buddhist Studies or the Humanities. CiNii distributes their bibliographic data as a PDF file through their Web API service. INBUDS has taken optimal advantage of this public service by providing a Web API that allows other Web services to retrieve the INBUDS search results, including their PDF file information. The SAT Web service has implemented this, but it is important to know that every scholarly web service is welcome to enrich itself by taking advantage of CiNii's offering. Furthermore, the SAT Web service has been further contributing to CiNii's efforts by providing some Web APIs. SAT is also planning to provide some more efficient APIs so that the other Buddhist service providers can also distribute better services. Adopting AJAX and Web API, each project/service can enrich other services, while maintaining their independence as individual projects. In this kind of Web environment, we will have the opportunity to work together not only as isolated contributors of data but also as individual and cooperative service providers so that researchers in our field can benefit from more efficient services. By so doing, our study and inner space will be greatly enriched. 18 <http://www.jaibs.jp> 19 (Scholarly and Academic Information Navigator, pronounced like “sigh-knee”) is a database service maintained as a Japanese government project by the National Institute of Informatics, which enables searching of information on academic articles published in academic society journals or university research bulletins, or articles included in the National Diet Library's Japanese Periodicals Index Database. <http://ci.nii.ac.jp/en> The XML-Based DDB 1. DDB Parsing From the SAT Database 1.1. SAT Text View The user opens up desired text by scrolling or computer search: 119 120 Chung-Hwa Buddhist Journal Volume 25 (2012) 1.2. Selecting and Generating a Word List One then selects a portion of text with the mouse, upon which the DDB words contained in the selected text will be arranged in a list on the left: The XML-Based DDB 1.3. Lookup in the DDB Clicking on term in the list will open up the entry in the DDB: 121 122 Chung-Hwa Buddhist Journal Volume 25 (2012) Interoperation II: DDB Parsing and Lookup with SmartHanzi.net and DDB Access Jean Soulat 1. Smarthanzi.net SmartHanzi.net is a website with a parsing and lookup tool developed by Jean Soulat for Chinese and links to etymological lessons by Dr. L. Wieger, S.J., in his CHINESE 20 CHARACTERS – Their origin, etymology, history, classification and signification Finding a given character in this book can be a trying experience, since one is forced to work through a number of indexes. The introduction of simplified characters in continental China has added a further level of complication. This is precisely the sort of situation where information technology can be of the greatest help: with just a mouse click, the website points to the relevant etymological lesson (out of 177) and phonetic series (out of 858). 1.1. Parsing and Lookup Parsing and lookup relies on various Chinese word lists available on the Internet: ● ● For basic Chinese, CEDICT MDBG (English), HanDeDict (German); for Buddhism, the DDB and Soothill & Hodous. The companion site Smartkanji.net uses the JMDict multilingual list for 21 Japanese available on Jim Breen’s Monash Nihongo FTP Archive. Jim Breen also kindly provides Japanese specific tables for adjectives and words. When a text is submitted to the application, the server parses it and displays a first view of all words found (in the main list) just under the text. Users can then lookup anywhere in the text either with a mouse click or by dragging over if more convenient. The website shows all words recognized at the mouse position. It does not try to make a choice. Users have to select one among the available dictionaries. If one needs to lookup from several dictionaries, several tabs in the browser window offer a convenient solution. First published in 1899 (French) and 1915 (English), based on the 2nd century Shuowen Jiezi. This work contains numerous technical errors, but is a valuable historical document in that it reflects the understanding that many Chinese had regarding their writing system. 21 < http://www.bcit-broadcast.com/monash/> 20 The XML-Based DDB 123 1.2. Technology SmartHanzi.net uses the so-called “Ajax” technology: one HTML page is used as an application. Further data are then updated through XmlHttpRequest and JavaScript within the original HTML page. The server is written in PHP and uses flat files (no database) to keep parsing time acceptable. Since some large size tables need to be loaded for each text, the website works best when users submit full paragraphs or short texts. 1.3. Limitations The word lists available on the Internet are convenient for parsing and lookup. But they do not contain enough detail to navigate from one word to another, as many people love to do with paper dictionaries. This is where the DDB XML access provides a great opportunity. 1.4. The DDB Access Application Both Smarthanzi (<http://www.smarthanzi.net>) and DDB Access (<http://download. smarthanzi.net/ddbaccess>) work off of the same DDB data extract, which includes headwords, pronunciations, and basic definitions in a public file published monthly by 22 Charles Muller. It does not include the full set of data accessible through the DDB website (<http://buddhism-dict.net/ddb>). Since the full data displayed on the DDB website are also available in XML format on per word requests, there was a perspective to put together the parsing and lookup facility of SmartHanzi.net and the complete DDB data. 22 This file, <http://acmuller.net/download/buddhdic.txt.gz> is the same as that which is published on Jim Breen's WWWJDic Server (<http://www.csse.monash.edu.au/~jwb/cgibin/wwwjdic.cgi?1C>). It is also used by the SAT database to access terms in the DDB, as well as by the developer of Tangorin. 124 Chung-Hwa Buddhist Journal Volume 25 (2012) 1.5. Views of DDB Access 1.5.1. Step One: Paste in Text The user pastes in some East Asian text containing Chinese characters: The XML-Based DDB 125 1.5.2. Step Two: Parse Text The text is then parsed, separating compound words on the left and single characters on the right: 126 Chung-Hwa Buddhist Journal Volume 25 (2012) 1.5.3. Step Three: Select Word for Lookup The user can then select a character or compound word from the generated list for lookup in the DDB: The XML-Based DDB 127 1.6. Meeting the DDB Password Policy Both for security purposes, and in order to encourage users to contribute to the DDB, Muller has implemented a tiered password policy. This has led to a maximum number of calls per day for users with the “guest” login. In order to meet the DDB password policy, DDB data have to be requested from the user PC. Since making XML calls from the SmartHanzi.net server would have infringed the DDB password policy it was not considered. One option might have been to include XML data into SmartHanzi.net HTML / Ajax pages, of course subject to agreement from the DDB team. However, for security reasons, JavaScript does not allow a web page from one site (SmartHanzi.net) to access XML data from another site (DDB). Cross-domain calls may be allowed in latest generation browsers but not all users have a recent browser. 1.7. The DDB Access Application The chosen solution was to develop a PC application, called “DDB Access,” which is not subject to the cross-domain limitation. The application has the same look and feel as the website and uses the same DDB extract to parse the text. When the user clicks on a word, the application makes a request to the DDB server and gets the full XML data. Each user needs to provide his or her DDB password. The XML data are presented with different views in separate tabs: ● Standard view: similar to the DDB website. ● Text view: no formatting, convenient for paste and copy into a word processor. ● XML: formatted XML view. ● XML (raw): unformatted XML view, for paste and copy into a XML editor. To make sure that parsing and lookup maintain consistency, it is recommended to download monthly updates. 1.8. Technology The application is developed with Microsoft Windows Presentation Foundation (WPF, .NET 3.5). It embeds a SQLite lightweight database which makes it easy, for instance, to add the “Also contained in” function. 1.9. Soothill and Hodous Both SmartHanzi.net and DDB Access also include Soothill and Hodous entries, as digitized and published by A. Charles Muller. (<http://www.acmuller.net/soothill /index. html>). 128 Chung-Hwa Buddhist Journal Volume 25 (2012) References Muller, A. Charles. 2009. The Digital Dictionary of Buddhism [DDB] as a Model for Web Collaboration. Symposium of the Information Processing Society of Japan, University of Tokyo. ----. 2009. The Digital Dictionary of Buddhism [DDB]: Present Status and Future Developments. Scholars of Buddhism in Japan: Buddhist Studies in the 21st Century. Kyoto: International Research Center for Japanese Studies. 87–100. http://acmuller.net /articles/ddb-nichibunken-200803.html. ----. 2005. A Model for Scholarly Collaboration in the Development of On-line Reference Works: The Digital Dictionary of Buddhism. Conference on New Technology in the Handling of East Asian Documents; Chinese National Library, Beijing. http://acmuller. net/articles/ddb-beijing-conference.pdf. ----. 2002. Moving into XML Functionality: The Combined Digital Dictionaries of Buddhism and East Asian Literary Terms. Journal of Digital Information: Special Issue on Chinese Collections in the Digital Library 3(2). http://journals.tdl.org/jodi article/view/83/82. ----. 2001. ディジタル媒体を使用して 仏教データの調査と普及: 仏教学のディジタ ル辞書. (Using the Digital Medium for Research and the Dissemination of Buddhist Studies Data: The Digital Dictionary of Buddhism). Annual Conference of the Japanese Association for Indian and Buddhist Studies, Tokyo University. http://acmuller. net/articles/jaibs2001.html. ----. 1999. Developments of the Web Dictionaries of East Asian Thought: Stepping up to XML. Seminar on Computing in East Asian Studies, Kyoto University Computing Center. ----. 1999. Update on the Development of the Digital Dictionary of East Asian Buddhist Terms. Fifth meeting of the EBTI, Academia Sinica, Taipei. http://acmuller.net/articles report1999ebti.htm. ----. 1998. The Structure and Function of the Interlinked Electronic CJK-English and Buddhist CJK-English Dictionaries. International Conference of Asian Scholars (ICAS), Leiden University. http://www.acmuller.net/articles/dictionaries1.htm. ----. 1998. The Structure and Function of the Interlinked Electronic CJK-English and Buddhist CJK-English Dictionaries. Meeting of the Pacific Neighborhood Consortium (PNC), Taiwan. ----. 1997. The Usage and Development of Digital Reference Tools in Working with CJK Buddhist Texts: Interlinked CJK and Buddhist CJK Dictionaries. Fourth Meeting of the Electronic Buddhist Text Initiative, tani University, Japan. ----. 1996. Introducing the Web Dictionary of East Asian Buddhist Terms. Third Meeting of the Electronic Buddhist Text Initiative, Foguang Shan, Taiwan. Nagasaki, Kiyonori; Muller, A. Charles; Shimoda, Masahiro. 2009. Aspects of the Interoperability in the Digital Humanities. Digital Humanities. 375-377.