The IPCHG is structured to represent syntactic variation across time and space and will ultimately consist of 165 texts.
Texts are annotated only up to approximately 10,000 words (although at least 30 of the texts are shorter). The corpus will contain approx. 1.4 million words.
Each parsed text is stored as a UTF-8 text file with the .txt extension. Every word is tagged for part of speech and morphological features (and will eventually be lemmatized). Sentences are syntactically parsed according to the Penn annotation system.
You can download individual texts or download all currently available texts as a .zip file: