Korpus C4 - Project

Project structure

Members of the Corpus-C4 initiative are the Digitales Wörterbuch der deutschen Sprache des 20. Jahrhunderts (DWDS) from Berlin, the Austrian Academy Corpus (AAC) from Vienna, the Korpus Südtirol from Bolzano/Bozen and the Swiss Text Corpus (CHTK) from Basel/Zurich.

The project aims at providing access to a balanced corpus of 20th century Standard German, especially taking into account regional variation. At the moment, the corpus contains 20 million running words from DWDS, 4.1 million words from AAC, 1.7 million words from Korpus Südtirol and 20 million words from CHTK.

Corpus query system

A special technical feature of Corpus C4 is its distributed query system. All sub-project host their data on their own servers and only at query time results are collected from the different servers and put together on a single page. Hence, the Corpus C4 as a whole is a virtual corpus. From a technological view this is mainly achieved using the functions of the linguistic search engine DDC developed by the Berlin sub-project DWDS.

Information

Project structure

Corpus query system