System for Markup and Retrieval of Texts (SMART) :

A tool for electronic markup of premodern Chinese texts

[Note:This project was conceived in 1997. It was first presented publicly at the Kyoto EBTI meeting in fall 1997, but has been developing rather slowly due to other commitments. Since July 1999 it is funded for a two years period by the German Science Foundation (DFG) and is expected to make some progress in this period:-)]

Objectives of the project

The usefulness and need of electronic markup, preferably using the conventions laid down in the "Guidelines for Electronic Text Encoding and Interchange" [TEIP3] is now widely acknowledged in the academic world. However, using these guidelines is still a matter that requires in depth understanding of SGML/XML markup and computing in general, as well as, when it comes to East Asian languages, knowledge of how these languages are handled in a computer. Since acquiring these skills leaves almost no time for the pursuit of the research it was originally begun for, this situation is not satifying. To provide a tool that helps curing this calamity is the aim of this project.

The development and research into such a system will be supplemented by a simultaneous case study, that aims at clarifying developments, interrelationships and borrowings in the compilation of Chan Buddhist chronicles in the Chan School in Song China.

Technical requirements

The "System for Markup and Retrieval of Text" (SMART), the development of which is under way now, will need to have at least the following features:

1. Direct support for the markup as recommended in the TEI guidelines.

2. Direct display of all Chinese characters and possibly of other writing systems as well. Transparent adaption to the various enocding schemes used today in East Asia.

3. The program shall also be suitable for the publication of electronic texts on CD-ROM and possibly on the Internet.

4. Support for the building of text-corpora that allows for index-based text-retrieval, with a provision for making use of the existing markup.

5. Support for the search for corresponding parts to the current division in the corpus.

6. Support for the interactive encoding of quotes and pointers to other locations of a text (hyperlinks) and the traversing of these links.

7. Support for the maintenance of databases of proper nouns, terminology etc.

8. Support for the alignment of multiple versions of a text (possibly in different languages)

9. The program should allow the user to incrementally edit and revise the text. These changes will be tagged with an identification of the user and logged. They should be reversable individually.

This is from the original plan. Taking into account recent changes, two strings of development are envisioned:

A standalone version, maybe built around MS Word.
A networked version using modern 5^th Generation browsers.

Ideally, this would be developped as a set of independent, general tools, that can invoked over the network. More pragmatic considerations and time constraints might however lead to a situation, where a prototype is built as one complex tool working on a narrowly defined text and only in a later stage of the project is this further developed into the desired direction.

Last updated: 1999-10-30