The following annotations attempt to capture both the orthography of the original text and the syntactic representation.
Clitics
Clitics and other words that are orthographically written together but do not form a constituent are split with "@", e.g. ins schiff:(PP (P in@) (NP (D @s)
(N schiff)))
Compounds
A compound, if written together, is given a complex label that represents its make-up: (NP (N+N sturmwind))
But if a compound is orthographically split (often tagged "TRUNC" in the source corpus), label each part with its part of speech (unless there is evidence that the first part is a genitive NP), and wrap in the compound's part of speech: (NP (N (N sturm) (N wind)))
If the first part of what would be a compound in Modern German has unambiguous genitive marking, treat it as a genitive NP:(NP (NP-POS (N sturmes)) (N wind))
If the first part is adjectival and can be reasonably interpreted as a free-standing adjective, treat it as such:(NP (D die) (ADJ mittel) (N Deutschen))
If a non-compound is orthographically split, use the Penn notations 21 ('part 1 of 2') and 22 ('part 2 of 2'), as in this made-up example: (VBN (VBN21 des) (VBN22 to))
Compounds with elision
These are particularly tricky, as one does not want to imply constituency where there is none.
If elision is in the second conjunct (menschen wort und menschen lehre), we treat the first part as a modifier of both heads, if that's possible:(NP-OB1 (NP-POS (N^G^PL menschen))
(N (N wort) (CONJ und) (N lehre)) )
If it's not possible to treat the first part as a separate word for orthographic (menschenwort und menschenlehre) or other reasons, we have to ignore the fact that the second conjunct has an elided modfier:(NP-OB1 (N+N menschenwort) (CONJ und) (N lehre) )
If the first conjunct has an elided head (Einsiedelleben und einwohniges Leben), it's not possible to show this in the annotation:(NP-SBJ (NP-SBJ (N Einsiedel))
(CONJP (CONJ und)
(NP-SBJ (ADJ einwohniges) (N Leben))))
If coordination involves the elision of a bound morpheme (Volkommenheit undHeiligkeit), the truncated conjunct is labeled TRUNC:(NP (D der) (ADJ Apostolischen) (N (TRUNC Vollkommen)(CONJ vnnd) (N Heyligkeit)))
Particles
Verbal particles ('separable prefixes') orthographically joined to the verb should be tagged as RP+VB*: (RP+VBN angeritten))
While those written separately are sisters:(RP an) (TO zu) (VB zeygen)
When an orthographically unjoined prefix is formally ambiguous between a particle/separable prefix and an inseparable prefix, we do not attempt to distinguish the two:(NP-SBJ (NPR Luther)) (RP vor) (VBPI^3^SG kehret) ...
(probably MSG 'Luther verkehrt' and V2, i.e. an inseparable prefix, but we tag as RP because it's orthographically split.)
The clitic ne should be treated similarly to verbal particles: NEG+VB* if orthographically joined, NEG if orthographically separate.