A New Generation of Textual Corpora: Mining Corpora from Very Large Collections