The IPCHG is structured to represent syntactic variation across time and space and will ultimately consist of 165 texts.
Texts are annotated only up to approximately 10,000 words (although at least 30 of the texts are shorter). The corpus will contain approx. 1.4 million words.
Each parsed text is stored as a UTF-8 text file with the .txt extension. Every word is tagged for part of speech and morphological features (and will eventually be lemmatized). Sentences are syntactically parsed according to the Penn annotation system.
Texts come from three source corpora and are available under a CC license. An explanation of the version numbers is shown here.
You can download individual texts or download all currently available texts as a .zip file:
Please report any errors in the corpus files by emailing Elliott Evans, evansell at iu dot edu.