The following annotations attempt to capture both the orthography of the original text and the syntactic representation.
Clitics
Clitics and other words that are orthographically written together but do not form a constituent are split with "@", e.g. ins schiff:(PP (P in@) (NP (D @s)
(N schiff)))
Compounds
A compound, if written together, is given a complex label that represents its make-up: (NP (N+N sturmwind))
But if a compound is orthographically split (often tagged "TRUNC" in the source corpus), label each part with its part of speech (unless there is evidence that the first part is a genitive NP), and wrap in the compound's part of speech: (NP (N (N sturm) (N wind)))
If the first part of what would be a compound in Modern German has unambiguous genitive marking, treat it as a genitive NP:(NP (NP-POS (N sturmes)) (N wind))
If the first part is adjectival and can be reasonably interpreted as a free-standing adjective, treat it as such:(NP (D die) (ADJ mittel) (N Deutschen))
If a non-compound is orthographically split, use the Penn notations 21 ('part 1 of 2') and 22 ('part 2 of 2'); note that any inflection goes only on the wrapper: (VBPI^3^SG (VBPI21 vn) (VBPI22 tphit))
'empfängt'
Compounds with elision
These are particularly tricky, as one does not want to imply constituency where there is none.
If elision is in the second conjunct (menschen wort und menschen lehre), we treat the first part as a modifier of both heads, if that's possible:(NP-OB1 (NP-POS (N^G^PL menschen))
(N (N wort) (CONJ und) (N lehre)) )
If it's not possible to treat the first part as a separate word for orthographic (menschenwort und menschenlehre) or other reasons, we have to ignore the fact that the second conjunct has an elided modfier:(NP-OB1 (N+N menschenwort) (CONJ und) (N lehre) )
If the first conjunct has an elided head (Einsiedelleben und einwohniges Leben), it's not possible to show this in the annotation. Label the first part of the compound with its part of speech:(NP-SBJ (NP-SBJ (N Einsiedel))
(CONJP (CONJ und)
(NP-SBJ (ADJ einwohniges) (N Leben))))
If coordination involves the elision of a bound morpheme (Volkommenheit undHeiligkeit), the truncated conjunct is labeled TRUNC:(NP (D der) (ADJ Apostolischen) (N (TRUNC Vollkommen)(CONJ vnnd) (N Heyligkeit)))
Particles
Verbal particles ('separable prefixes') orthographically joined to the verb should be tagged as RP+VB*: (RP+VBN angeritten))
While those written separately are sisters:(RP an) (TO zu) (VB zeygen)
The particle and the verb are typically daughters of IP with no need to indicate that they form a constituent, even when coordinated. The exception is if they are attributive to a N.
When an orthographically unjoined prefix is formally ambiguous between a particle/separable prefix and an inseparable prefix, we do not attempt to distinguish the two:(NP-SBJ (NPR Luther)) (RP vor) (VBPI^3^SG kehret) ...
(probably MSG 'Luther verkehrt' and V2, i.e. an inseparable prefix, but we tag as RP because it's orthographically split.)
The clitic ne should be treated similarly to verbal particles: NEG+VB* if orthographically joined, NEG if orthographically separate.
When an etymological particle precedes a N, tag as RP only if that N is an infinitive; otherwise use 21/22:
(NP-OB1 (D^A^SG das) (RP aus) (VB^A^SG laufen))
(NP-OB1 (D^A^SG den) (N^A^SG (N21 aus) (N22 lauf)))