We largely follow the general annotation principles of the Penn Parsed Corpora of Historical English, adapted to the specifics of historical German. Our word-level tags are most similar to the tagset used for the Old Saxon HeliPaD corpus. Our higher-level labels correspond most closely to those of the CHLG (although CHLG uses an entirely different tagset for parts of speech.)
- Splitting and joining words
- Lemmatization
- Part-of-speech tags (heads)
- Morphological extensions
- Phrasal tags and extensions
- Empty categories
- Treatment of individual words and phrases
- Issues to be resolved