|
The corpus, which has been constructed from a selection of existing transcripts of interactions in professional settings, contains two main sub-corpora of a million words each. One sub-corpus consists mainly of academic discussions such as faculty council meetings and committee meetings related to testing. The second sub-corpus contains transcripts of White House press conferences, which are almost exclusively question-and-answer sessions.
The transcripts making up the spoken American corpus have been selected on the basis of being relatively unedited. However, since they have not been produced by linguists, the transcripts do not have all the features one might wish for, such as back channel, pause length, overlap etc.
You can get a good idea of the content of the corpus by carrying out a search at www.monoconc.com.
|