We largely follow the general annotation principles of the Penn Parsed Corpora of Historical English. However, our word-level tags are adapted to the specifics of historical German and are most similar to the tagset used for the Old Saxon HeliPaD corpus. (Our higher-level labels correspond to those of the CHLG, but CHLG uses an entire different tagset for parts of speech.)
- Splitting and joining words
- Lemmatization
- Part-of-speech tags (heads)
- Morphological extensions
- Phrasal tags and extensions
- Empty categories
- Treatment of individual words and phrases
- Issues to be resolved