Time periods
Middle High German:
- 1050-1350
- approx. 35 texts selected from the Referenzkorpus Mittelhochdeutsch (ReM), adapted under a CC BY-SA license.
- Approx. 170,000 words
Early New High German:
- 1350-1650
- 64 texts selected mostly from the Referenzkorpus Frühneuhochdeutsch (ReF), adapted under a CC BY-SA license. (Four late-16th- and early-17th-century texts from traditionally Low German areas will be selected from the Deutsches Textarchiv.)
- Approx. 640,000 words
New High German:
- 1650-1950
- 66 texts selected from the Deutsches Textarchiv (DTA), adapted under a CC BY-SA license.
- Approx. 660,000 words
Dialects (cities and regions)
We have divided the historically High-German-speaking area into 10 dialect regions. After 1550, we include texts from traditionally Low German areas, which we have divided into northwest and northeast.
- Cologne / Ripuarian (or 'Mittelfränkisch' in the Middle High German era)
- Hessian / Rhein Franconian / Mosel Franconian
- Thuringian (incl. 'Thüringisch-Hessisch' in MHG)
- (Upper) Saxon
- Alsatian/Baden (or 'Allemannic' in MHG)
- Swabian (or 'Bairisch-Allemannisch' in MHG)
- Swiss (or 'Allemannic' in MHG)
- East Franconian / North Bavarian (Nuremberg, Bamberg)
- Bavarian in the narrow sense (often simply 'Bairisch' in MHG)
- Viennese, or Austrian more generally (often 'Bairisch' in MHG)
- Northwest (Bremen, Hamburg, Wolfenbüttel)
- Northeast (Berlin, Magdeburg, Rostock)
In the Referenzkorpus Mittelhochdeutsch, many texts are characterized by less specific regions, such as 'Bairisch-Allemannisch' or even just 'Oberdeutsch.' In such cases, we have either assigned texts to a region based on the location of the manuscript (however problematic) or simply selected an appropriate number of texts to represent these broader regions (e.g. choosing two MHG 'Bairisch' texts in lieu of one from Bavaria and one from Austria).
With the emergence of Modern Standard German in the 19th century, texts from a given region do not necessarily reflect the local dialect anymore in orthography, morphology, etc. Nevertheless, we have attempted to select texts that were not only published in that region, but also authored by a native of that region, hoping if possible to capture any regional variation in the syntax of Modern Standard German.
Genres
We aim for a balance of genres in an attempt to capture some sociolinguistic variation. However, genres are somewhat unevenly distributed over time, e.g. religious texts are over-represented in MHG. Moreover, the split between academic and popular writing, which characterizes the NHG period, is less clear in earlier times.
Religious genres are prefixed with "Rel-" and literary genres are prefixed with "Lit-", in case the researcher wishes to focus on or exclude such texts.
The genres included in the corpus so far are:
- Chronicle
- Geography
- Law
- Literaryanalysis
- Lit-prose
- Medicine
- Nature
- Occult
- Politics
- Records (incl. Urkunden)
- Report
- Rel-allegory, -cosmography, -devotional, -gospel, -monastic, -narrative, -pamphlet, -sermon, -treatise
- Travel