Korpus C4 - Sub-projects

Digitales Wörterbuch der deutschen Sprache des 20. Jahrhunderts (DWDS)

DWDS aims at developing a dictionary system based on large and balanced electronic corpora. The project is planned to be realised in three consecutive phases: Building of text corpora, data processing with computational linguistic techniques and lexicographical work. The publicly available data is linguistically pre-processed and can be accessed via Internet (www.dwds.de). Since 2007, DWDS has been an Academy Project of the BBAW.

Austrian Academy Corpus (AAC)

The Austrian Academy Corpus (AAC) is compiled in Austria in a national corpus programme especially paying attention to the aspects of "preservation of cultural heritage" and "text technology". At its completion the AAC will contain about 1 billion running words.

Korpus Südtirol

The 'Korpus Südtirol' initiative aims at the collection, filing and corpus linguistic processing of South Tyrolean texts in order to make them available to the public and to document the use of written German in South Tyrol. It was started in September 2005 by researchers of the Free University of Bolzano, the European Academy Bozen/Bolzano and the University of Innsbruck.

By now, the text corpus thus developed allows for the evaluation of the available authentic language data according to different criteria, so as to be used for studies in historical linguistics, socio-linguistics, contact linguistics and language varieties. It will also serve as a basis for language teaching and consulting. The texts collected in South Tyrol, as well as further corpora in other languages, can be investigated through a user-friendly corpus search interface, which offers the opportunity to strengthen people's language awareness within the multilingual environment in South Tyrol.

Swiss Text Corpus (CHTK)

The Swiss sub-project Swiss Text Corpus (CHTK) contains texts in German written by Swiss authors in the 20th century. This digital collection is structured in analogy to the partner projects in Germany, Austria and Italy (using the same formal, time and content criteria). It is a balanced representation of standard German vocabulary in Switzerland and can serve as a base resource for specifically Swiss lexicographical needs.

The Swiss Text Corpus was built by a research group of the Deutsches Seminar of the University of Basel and was funded mainly by the Swiss National Science Foundation. Since 2014 it has been hosted by the Schweizerisches Idiotikon with financial support of the Swiss Academy of Humanities and Social Sciences.

Information

Digitales Wörterbuch der deutschen Sprache des 20. Jahrhunderts (DWDS)

Austrian Academy Corpus (AAC)

Korpus Südtirol

Swiss Text Corpus (CHTK)