CoPAR Bulletin 8

Dedicated to helping anthropologists, librarians, archivists,
information specialists and others preserve and provide access to the
records of human diversity and the history of the discipline.

The Special Nature of Linguistic Records


The records accumulated by anthropological linguists are often of a special nature, and their preservation and archiving must be addressed in special ways. These records in many cases constitute an important part of the primary documentation of extinct or severely endangered languages, and are thus of value to several constituencies beyond those ordinarily concerned with anthropological records, in particular those with a stake in the retention or revival of the languages concerned. The creators of linguistic records should be sensitive to the uses to which their materials may eventually be put and, where possible, should minimize the difficulties that will confront future users.


In comparison to most anthropological records (see Bulletin No.2, Taking Stock of Your Records), linguistic records are typically extensive and quite complexly organized. They tend to accumulate incrementally and are often layered and cross-referenced. This information is not normally documented in a systematic way because the user is fully aware of the material’s complexity and is frequently the only user of the data. An annotated inventory of materials will be essential to future users, and should contain the following information at a minimum:

  • An identification of the body of materials associated with each language or dialect you have worked on.
  • A chronology of your work on each language, giving as much detail as possible about: (1) the academic circumstances of your work (including institutional settings, funding sources, and the names of mentors and assistants); (2) the sources of your linguistic information (consultants, interpreters, other linguists, publications); (3) any resulting manuscripts or publications that may not form a part of the primary collection.
  • An inventory of the major categories of data contained in each body of material, such as original field notes, file slips, tape recordings, secondary manuscripts, and correspondence.

Special Concerns with Linguistic Records

In addition to an overall synopsis of the organization of your records, certain information is often crucial to the understanding of specific types of records. This includes:


Linguists typically create documents in which the order of the materials is a vital part of the record (e.g., the sequence of slips in a lexical file). Wherever possible, you should make such ordering explicit by page-numbering or other devices. If this is not feasible (as in a large slip file) you should attach instructions to the record making clear what order needs to be preserved.


The orthography used in written records of a language must be identified. Since it is normal for linguists to use different orthographic systems and conventions at different times and in different contexts, you should take care to be precise and exhaustive. This concern is particularly acute for field notes transcribed phonetically. You should identify the phonetic orthography you used, and any modifications you introduced to a standard orthography should be fully explained.

If you have arranged a group of materials (such as a lexical file) in an alphabetical order based on a special orthography, that order should be explicitly described.

Use of other materials

Anthropological linguists often use word lists, questionnaires, and other materials to elicit data, and in some cases the resulting records cannot be fully understood without this framework. Any records that you have created in this fashion should be identified and the elicitation guides referenced (ideally, copies should be included with the collection). You should also pay careful attention to identifying all abbreviations and other symbols in your data that are keyed to an elicitation guide (e.g., the numbers of color chips).

In some records you may have re-elicited data from published and unpublished work of other linguists, and this may have involved the incorporation of transcriptions other than your own into your data or to cross-referencing of various sorts. In addition to identifying such external documents, it would be desirable to have a description of your re-elicitation procedures (e.g., was the material read aloud to a consultant?).

Linguists also quite frequently use their own earlier data in later work. For example, you might scour your primary notes for examples to cite in an analytic document, or you might play a tape recording that you had made earlier in order to elicit new data from another speaker. You should make sure that these cross-references are clearly noted in the materials involved.

Audio and video recordings

In addition to general concerns about the preservation and archiving of electronic records (see Bulletin No. 15, Managing Electronic Records), special considerations apply to audio and video recordings of language data. During the past half century electronic records have become a central component in the documentation of unwritten languages, and in the case of languages now extinct these recordings are important cultural documents in their own right.

Unlike written documents, electronic records—particularly audio tapes—are often difficult to identify and to associate with the collection of which they form a part. Labels should always be affixed to the reel or cassette itself as well as to the container, giving the date, language, speaker, collector, the circumstances of collection, and the position of a tape in a sequence. In addition, technical information should be provided about the equipment used to make the recording, the recording speed, sound specifications (such as stereo, mono, Dolby, etc.), and format: beta videotape, four-track stereo cassette, etc.). Where possible, tape recordings should also have introductory remarks directly on the tape.

Logging the contents of audio and video recordings in a simple outline is highly recommended. In addition to the information noted above, a log should also summarize the contents of each recording in the order in which it is recorded on the tape. A consistent numbering scheme should be developed to facilitate cross-referencing.

It is of great importance to know how an audio or video recording is connected to other records in a collection, in particular to know whether full or partial written transcriptions exist. Since practical considerations often lead to the archival separation of audiovisual materials from written documents, you should make the linkages between these as clear as possible.

Digital records

The widespread use of computers for word-processing and data basing has serious implications for the preservation of all anthropological records, but digital linguistic records raise several specific concerns.


Your digital linguistic records may use non-standard character sets that you have specially designed or that you share with a small group of collaborating researchers. Copies of these fonts must always be included with the documents themselves.


You may have created complex secondary records (lexicons, text collections, comparative dictionaries, etc.) in database formats that allow interlinkage of files. If your database software was specially created or tailored for your project, you should provide a copy of that specific database together with documentation.


As a matter of general principle, e-mail correspondence you may have transmitted or received that is relevant to your records should be preserved as part of your collection, preferably in paper format. If you have developed any conventions for citing linguistic forms or analysis in e-mail correspondence these should be made explicit (e.g., the use of @ for schwa).

Victor Golla
Department of Native American Studies
Humboldt State University
Arcata, California 95521

