Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
This presentation introduces RISE and SHINE, a digital research infrastructure developed by the Max Planck Institute for the History of Science (MPIWG). For more information, please see https://rise.mpiwg-berlin.mpg.de/ and... more
This presentation introduces RISE and SHINE, a digital research infrastructure developed by the Max Planck Institute for the History of Science (MPIWG). For more information, please see https://rise.mpiwg-berlin.mpg.de/ and https://www.mpiwg-berlin.mpg.de/research/projects/rise-and-shine-research-infrastructure-study-eurasia.
This paper introduces a semi-automatic text tagging interface to help historians efficiently garner posting records from the Chinese Local Gazetteers (difangzhi 地方志) with the format of “who, when, which posting”. By turning texts into... more
This paper introduces a semi-automatic text tagging interface to help historians efficiently garner posting records from the Chinese Local Gazetteers (difangzhi 地方志) with the format of “who, when, which posting”. By turning texts into tabular data forms, this interface aims to lay the basis for analyzing Chinese local gazetteers on a large scale. Although local gazetteers from various locations all follow a general pattern when recording posting data, they still differ in details due to their substantial amount. Therefore, it is unfeasible to ask programmers to extract the posting data using a one-size-fits-all computer program. This tagging interface, on the other hand, provides a simple user interface with built-in patterns to extract the subjects’ names, posting titles, dynasties, posting times, basic addresses and entry methods. This allows users to tag most of data in the text quickly, which can then be proofread by them themselves in order to check the tagging result and to correct mistakes. The interface also enables users to adjust the extraction patterns according to each text in order to accurately extract posting data from local gazetteers with distinct patterns.
This paper introduces a digital humanities project at the Max Planck Institute for the History of Science (Max-­‐Planck-­‐Institut für Wissenschaftsgeschichte, MPIWG) that aims to unlock the treasure chest of local knowledge written in... more
This paper introduces a digital humanities project at the Max Planck Institute for the History of Science (Max-­‐Planck-­‐Institut für Wissenschaftsgeschichte, MPIWG) that aims to unlock the treasure chest of local knowledge written in the genre of Chinese local gazetteers for computer assisted analyses. In the past two decades, a great amount of historical documents have been digitized and put on the Web to enable easy access for scholars around the globe. In parallel, the amount of searchable full-­‐text versions of historical texts has increased, which opens the possibility of text mining the contents for large scaled analyses. Many works in this direction have been proposed and got recognition, while they are also criticized for drawing conclusions from seemingly imprecise results due to the restriction that their algorithms have no knowledge about the meanings of the different pieces in a text (Jockers, 2013; Google Ngram Viewer; Chen et al., 2007). An alternative approach is thus to first " teach " computers what each pieces of a text means before asking computers to run automatic analyses. Such " teaching " is done by tagging, or called markup. Many digital humanities projects have been using TEI, a standard for text encoding based on XML, to tag their research materials (TEI; Flanders). In the Local Gazetteers Project, since the genre organizes knowledge in a very structural way, we are also using tagging to teach computers the meanings of texts in order to turn them into data tables to enable computer assisted analyses including GIS mapping. However, since the amount of texts is huge, we also propose a research data repository for scholars to collaborate in this project and to aggregate their results. What are the Chinese Local Gazetteers? The Chinese local gazetteers is a genre of texts that has been produced in China consistently from the 10 th century on to even today. Most of them are compiled by local officials as a major means to collect and aggregate historical, social, and geographical knowledge of an administrative region for governing purposes. There are at least 8,000 titles of pre-­‐1949 local gazetteers still extant today. They cover almost every well-­‐populated region in historical China. Despite being compiled by different officials for different regions, the local gazetteers have developed a pretty consistent structure of " describing " local knowledge. Most gazetteers contain the following chapters: history,
This article gives an overview of the Local Gazetteers Research Tools (LoGaRT), including its development, technical features, methodology, and examples of research applications by members of the Tu 圖 working group. The use of LoGaRT is... more
This article gives an overview of the Local Gazetteers Research Tools (LoGaRT), including its development, technical features, methodology, and examples of research applications by members of the Tu 圖 working group. The use of LoGaRT is illustrated with four brief introductions to projects that draw on visual materials from the local gazetteers, including ritual-related illustrations, city layout maps, and maps with western cartographic features. See the websites for more detailed information on LoGaRT and other research projects using it.1
Designing a protocol for the interoperability of digital textual resources—or, more simply, a “IIIF for texts”—remains a challenge, as such a protocol must cater to their vastly heterogenous formats, structures, languages, text encodings... more
Designing a protocol for the interoperability of digital textual resources—or, more simply, a “IIIF for texts”—remains a challenge, as such a protocol must cater to their vastly heterogenous formats, structures, languages, text encodings and metadata. There have been many attempts to propose a standard for textual resource interoperability, from the ubiquitous Text Encoding Initiative (TEI) format to more recent proposals like the Distributed Text Services (DTS) protocol. In this paper, we critically survey these attempts and introduce our proposal called SHINE, which aims to escape from TEI’s legacy and prioritize instead the ease for software developers to representation and exchange textual resources and their associated metadata. We do so by combining a hierarchical model of textual structure with a flexible metadata scheme in SHINE, and we continue to define and develop it based on user-centered and iterative design principles. Therefore, we argue that SHINE is a protocol for t...
Digital humanities (DH) is a burgeoning field of research in Sinology and Asian studies more broadly, and its diversity and maturity necessitate a cyberinfrastructure fit for DH-focused Sinologists’ specific needs. “Asia Network” is our... more
Digital humanities (DH) is a burgeoning field of research in Sinology and Asian studies more broadly, and its diversity and maturity necessitate a cyberinfrastructure fit for DH-focused Sinologists’ specific needs. “Asia Network” is our solution. It is a pioneering approach for resource dissemination and emerging data analytics (such as text mining and other fair-use, consumptive research techniques) in the humanities. It is a language-agnostic software that facilitates the secure linkage between third-party research tools to different third-party textual collections (both licensed and open-access ones) via application programming interfaces (APIs). It revolutionizes how scholars can work with textual sources by promoting a flexible, networked approach to e-infrastructure development. Crucially, Asia Network is a loosely-coupled software with flexible topologies; it can enable both federated or centralized linkages, and it can even “disappear” as long as its API standards remain in ...
The Qing Imperial Court documents are a major source of primary research material for studying the Qing era China since they provide the most direct and first-hand details of how national affairs were handled. However, the way Qing... more
The Qing Imperial Court documents are a major source of primary research material for studying the Qing era China since they provide the most direct and first-hand details of how national affairs were handled. However, the way Qing archived these documents has made it cumbersome to collect documents covering the same event and rebuild their original contexts. In this paper, we describe some information technology that we have developed to discover two important and useful relations among these documents. The first is the citation relation among the Imperial Edicts and the Memorials. We discovered 6,801 pairs from the 37,831 Taiwan-related Imperial Court documents in the Taiwan History Digital Library (THDL) and produced 1,101 graphs of successive citations, which we call IE-M diagrams. The second relation is a template relation, which indicates groups of documents that were created following a specific format. Numerical data can also be tabulated from these documents and be used for...
As digital humanities research relies on the digitization of sources, many of its applications are based on access to data on a huge scale that makes quantitative analyses and distant reading (or a birds-eye view) possible. Based on this... more
As digital humanities research relies on the digitization of sources, many of its applications are based on access to data on a huge scale that makes quantitative analyses and distant reading (or a birds-eye view) possible. Based on this assumption, we show how the genre of Chinese local gazetteers, with its volume, consistent structures, and broad geographic and temporal range, provides an ideal case to benefit from the digital approach. This paper introduces the Local Gazetteers Research Tools (LoGaRT), a suite of research tools designed for studying Chinese local gazetteers based on the philosophy that any comprehensive genre, such as Chinese local gazetteers, when accompanied with tools that aim to bring a collective lens to the genre, can greatly enrich the ways that scholars approach it and can transform the genre into a research infrastructure that enables new types of research. We report on how LoGaRT opens up new perspectives for researching Chinese history by showing case ...
This panel discusses how digitization and digital tools help to bring new insights to a well studied genre, in this case the Chinese Local Gazetteers, by supporting research inquiries that treat the whole genre as a conceptual... more
This panel discusses how digitization and digital tools help to bring new insights to a well studied genre, in this case the Chinese Local Gazetteers, by supporting research inquiries that treat the whole genre as a conceptual "database" to answer especially large scale questions that take into account gazetteers from multiple geographic regions and within long time spans. The Chinese Local Gazetteers is a long established genre of writing in China since the twelve century for recording local knowledge about a region. Local gentry and officials compiled information about a region, ranging from landscape, flora and fauna, officials and celebrities to temples and schools, local culture and customs, and taxes and census, and kept them in this genre. Despite that Local Gazetteers have been major sources for scholars to find specific information about a place, it turns out to be very difficult if a scholar wishes to study the gazetteers on larger scales due to the vast amount o...
This paper focuses on the historical politics of disaster records in Chinese local gazetteers (difangzhi 地方志). Using records of mulberry crop failures as examples, the authors ask how gazetteer editors collated Yuan disaster... more
This paper focuses on the historical politics of disaster records in Chinese local gazetteers (difangzhi 地方志). Using records of mulberry crop failures as examples, the authors ask how gazetteer editors collated Yuan disaster records—initially collected to help prevent disasters and authorize the legitimacy of dynastic rule—in gazetteers and, in so doing, made them into ‘local’ knowledge. Digital humanities methods allow for both qualitative and quantitative analyses, and the authors deploy them to demonstrate how, in structured texts like the Chinese local gazetteers, they could help combine close reading of specific sections and larger-scale analysis of regional patterns. In the first part, the authors show how disasters were recorded in a Yuan Zhenjiang gazetteer to facilitate taxation and disaster prevention locally—a strategy rarely traceable in subsequent gazetteers until the Qing. In the second part, the authors shifted their perspective to the historical accumulation of data ...
It is increasingly common for text-based projects in digital humanities to incorporate GIS and other geovisualization techniques for the purpose of data exploration and search-result displays. On the other hand, image-based projects,... more
It is increasingly common for text-based projects in digital humanities to incorporate GIS and other geovisualization techniques for the purpose of data exploration and search-result displays. On the other hand, image-based projects, drawing from fields such as digital art history, often require text-based finding aids (such as metadata and keywords) to facilitate data discovery. Working at the intersection between spatial humanities and geohumanities, we believe that techniques found in historical GIS could well integrate these two approaches for specific exploratory purposes. In this paper, we introduce a web GIS platform created expressly for exploring and researching a set of 63,497 historical figures and illustrations, based on content and source locations. These images are extracted from a larger set of 4 million scanned pages from 4,000 titles of Chinese local gazetteers (difangzhi) , which is a genre of Chinese local history produced between the 8th and the 19th centuries. I...
This paper introduces a semi-automatic text tagging interface to help historians efficiently garner posting records from the Chinese Local Gazetteers (difangzhi 地方志) with the format of “who, when, which posting”. By turning texts into... more
This paper introduces a semi-automatic text tagging interface to help historians efficiently garner posting records from the Chinese Local Gazetteers (difangzhi 地方志) with the format of “who, when, which posting”. By turning texts into tabular data forms, this interface aims to lay the basis for analyzing Chinese local gazetteers on a large scale. Although local gazetteers from various locations all follow a general pattern when recording posting data, they still differ in details due to their substantial amount. Therefore, it is unfeasible to ask programmers to extract the posting data using a one-size-fits-all computer program. This tagging interface, on the other hand, provides a simple user interface with built-in patterns to extract the subjects’ names, posting titles, dynasties, posting times, basic addresses and entry methods. This allows users to tag most of data in the text quickly, which can then be proofread by them themselves in order to check the tagging result and to correct mistakes. The interface also enables users to adjust the extraction patterns according to each text in order to accurately extract posting data from local gazetteers with distinct patterns.
Research Interests:
The Qing Imperial Court documents are a major source of primary research material for studying the Qing era China since they provide the most direct and first-hand details of how national affairs were handled. However, the way Qing... more
The Qing Imperial Court documents are a major source of primary research material for studying the Qing era China since they provide the most direct and first-hand details of how national affairs were handled. However, the way Qing archived these documents has made it cumbersome to collect documents covering the same event and rebuild their original contexts. In this paper, we describe some information technology that we have developed to discover two important and useful relations among these documents. The first is the citation relation among the Imperial Edicts and the Memorials. We discovered 6,801 pairs from the 37,831 Taiwan-related Imperial Court documents in the Taiwan History Digital Library (THDL) and produced 1,101 graphs of successive citations, which we call IE-M diagrams. The second relation is a template relation, which indicates groups of documents that were created following a specific format. Numerical data can also be tabulated from these documents and be used for further analysis. Our studies show how information technology can be used to discover useful contexts from seemingly unrelated historical documents.
This thesis proposes two IT methods to help historians utilize digitized historical documents. The availability of large quantity of historical documents that can be searched and retrieved has become a challenge for historians since the... more
This thesis proposes two IT methods to help historians utilize digitized historical documents. The availability of large quantity of historical documents that can be searched and retrieved has become a challenge for historians since the traditional way of carefully going through a small number of documents is no longer sufficient.
In this thesis we first give an overview of THDL, the Taiwan History Digital Library, a full-text digital library of primary historical documents about Taiwan. The documents in THDL, currently numbered 73,287 documents and over 54,000,000 words, are the major experiment materials in this thesis. We then introduce the feature analysis method, which puts a collection of historical documents in an observation environment to be studied collectively as opposed to treating them as individual documents. Feature analysis takes a sub-collection, meaning a set of documents related to a research topic that the user is currently interested in, as its input and analyzes the features shared by these documents. By calculating the amount of support for each feature (the amount of documents which are evidences of the occurrence of a feature), this method discovers features that are highly related to a sub-collection. We have developed a mathematical model for this method. We have also applied it to two of the corpuses in THDL and found unexpected and interesting observations.
We then present several relation discovery methods that try to find relationships among historical documents in a large collection of documents. We gave three examples of relation discovery carried out on the Imperial Court documents and Taiwanese land deeds. They are citation relations, land transaction relations, and the template relation. Through our methods, we have discovered 6,802 citation relations among the 37,836 Imperial Court documents selected from 280 sources, 3,910
transaction relations among the 35,451 land deeds from 117 sources, and 105 templates that were created following a specific format. We argued that the relationship discovery not only can help historians to consider more angles while reading the documents, but also can lead to new findings. The citation relations found have been transformed into 1,101 successive citation graphs, each of which reveals how a historical event evolved through the correspondence between a Qing emperor and his officials. The transaction relations are also transformed into 2,219 land transitivity graphs, some of which indicates land development activities that have never been studied before.
Land deeds were the only proof of ownership in pre-1900 Taiwan. They are indispensable for the studies of Taiwan’s social, anthropological, and economic evolution. We have built a full-text digital library that contains more than 30,000... more
Land deeds were the only proof of ownership in pre-1900 Taiwan.  They are indispensable for the studies of Taiwan’s social, anthropological, and economic evolution.  We have built a full-text digital library that contains more than 30,000 land deeds.  The deeds in our collection range over 250 years and are collected from over 100 sources.  The unprecedented volume and diversity of the sources provide an exciting source of primary documents for historians.  But they also pose an interesting challenge: how to tell if two land deeds are related.
In this paper we describe an approach to discover one of the most important relations: successive transactions involving the same property.  Our method enabled us to construct over 3,300 such transaction pairs.  We also introduce a notion of land transitivity graph to capture the transitivity embedded in these transactions. We discovered 2,219 such graphs, the largest of which includes 103 deeds.  Some of these graphs involve land behavior that had never been studied before.
In this paper we present a full-text digital library of Taiwanese land deeds. Land deeds were the only proof of land activities such as transaction of ownership and leasing in Taiwan before 1900. They form a major part of the primary... more
In this paper we present a full-text digital library of Taiwanese land deeds. Land deeds were the only proof of land activities such as transaction of ownership and leasing in Taiwan before 1900. They form a major part of the primary documents at the grassroot level in pre-1900 Taiwan, and are extremely valuable for studying the evolution of the Taiwanese society.
Land deeds, on the other hand, are difficult to study because they are hand-written and hard to read.  Furthermore, they are scattered in many different locations and, in some cases, in the hands of families and private collectors. 
In order to make the land deeds more accessible to researchers and educators, the Council for Cultural Affairs of Taiwan embarked on a major effort to organize available land deeds and typed them as machine readable full-text.  Based on this collection and collections from other sources, the National Taiwan University built a full-text digital library of Taiwanese land deeds.  The current size of the collection is over 23,000 which, according to one estimation, cover about 50% of all existing land deeds.  The collection will be expanded to around 30,000 by the end of the year.
Our digital library is built with the goal of providing an electronic research environment for historians to conduct research using land deeds.  Thus in addition to providing full-text search and retrieval, we developed a concept of regarding the query return as a sub-collection and built tools to help the user find meaning and relationships at the collection level.  Post-processing presentation, term frequency analysis and co-occurrence, and relation graphs are some of the tools described in this paper. We believe that our digital library will bring Taiwanese historical research using land deeds to a different horizon.
The National Taiwan University Library has built a digital library of historical documents about Taiwan. The content is unique in that it covers about 80% of all primary Chinese historical materials about Taiwan before 1895, and that they... more
The National Taiwan University Library has built a digital library of historical documents about Taiwan. The content is unique in that it covers about 80% of all primary Chinese historical materials about Taiwan before 1895, and that they are all available in searchable full text, in addition to metadata. To make these materials more accessible to the research community, we have developed, in addition to full-text search and retrieval, a concept of regarding the set of documents retrieved by a query as a sub-collection, and have designed post-query classification methods to help users find the inter-relationships among documents and the collective meaning of a sub-collection. We have also developed techniques for term extraction for old Chinese and a data format for representing governmental structures. We hope that our system will help advance research in Taiwanese history, and will set a model for other similar endeavor.
Research Interests:
This paper introduces a digital humanities project at the Max Planck Institute for the History of Science (Max-­‐Planck-­‐Institut für Wissenschaftsgeschichte, MPIWG) that aims to unlock the treasure chest of local knowledge written in... more
This paper introduces a digital humanities project at the Max Planck Institute for the History of Science (Max-­‐Planck-­‐Institut für Wissenschaftsgeschichte, MPIWG) that aims to unlock the treasure chest of local knowledge written in the genre of Chinese local gazetteers for computer assisted analyses. In the past two decades, a great amount of historical documents have been digitized and put on the Web to enable easy access for scholars around the globe. In parallel, the amount of searchable full-­‐text versions of historical texts has increased, which opens the possibility of text mining the contents for large scaled analyses. Many works in this direction have been proposed and got recognition, while they are also criticized for drawing conclusions from seemingly imprecise results due to the restriction that their algorithms have no knowledge about the meanings of the different pieces in a text (Jockers, 2013; Google Ngram Viewer; Chen et al., 2007). An alternative approach is thus to first " teach " computers what each pieces of a text means before asking computers to run automatic analyses. Such " teaching " is done by tagging, or called markup. Many digital humanities projects have been using TEI, a standard for text encoding based on XML, to tag their research materials (TEI; Flanders). In the Local Gazetteers Project, since the genre organizes knowledge in a very structural way, we are also using tagging to teach computers the meanings of texts in order to turn them into data tables to enable computer assisted analyses including GIS mapping. However, since the amount of texts is huge, we also propose a research data repository for scholars to collaborate in this project and to aggregate their results. What are the Chinese Local Gazetteers? The Chinese local gazetteers is a genre of texts that has been produced in China consistently from the 10 th century on to even today. Most of them are compiled by local officials as a major means to collect and aggregate historical, social, and geographical knowledge of an administrative region for governing purposes. There are at least 8,000 titles of pre-­‐1949 local gazetteers still extant today. They cover almost every well-­‐populated region in historical China. Despite being compiled by different officials for different regions, the local gazetteers have developed a pretty consistent structure of " describing " local knowledge. Most gazetteers contain the following chapters: history,
Research Interests:
These are the slides (in Chinese) that I used for the Digital Humanities workshop organized by Peking University on May 23, 2020. In this online seminar, I talked about information technologies and historical documents analyses in... more
These are the slides (in Chinese) that I used for the Digital Humanities workshop organized by Peking University on May 23, 2020. In this online seminar, I talked about information technologies and historical documents analyses in general, and introduced LoGaRT (Local Gazetteers Research Tools) and showed how these methods were applied to the Chinese local gazetteers (difangzhi 地方志).