My Science Tutor (MyST) Children’s Speech Corpus
The MyST (My Science Tutor) Children’s Speech Corpus consists of 393 hours of children’s speech. The speech was collected from 1,371 third, fourth and fifth grade students. The students engaged in spoken dialogs with a virtual science tutor in 8 areas of science. A total of 10,496 student sessions of 15 to 20 minutes produced a total of 228,874 utterances. 45% of the utterances have been transcribed at the word level. The MyST Children’s speech corpus contains approximately 10 times more English children’s speech data than all other English children’s speech corpora combined.
Click here for a more detailed overview of the Corpus. Or, you can download a copy of the README distributed with the corpus release, containing much more details.
Click here for details of the 12-year My Science Tutor project and the MyST Children’s Speech Corpus.
Availability of the MyST Corpus
Boulder Learning is providing The MyST Corpus free of charge to the research community. Please review the terms and conditions of the Research License below. Companies can purchase the MyST Corpus. The cost for a commercial license is $15,000 for companies with annual revenue less then $2M, and $25,000 for those with larger revenue.
Research License
Please complete the data use agreement that requests information about your organization and describes the terms of use of the research license agreement. We will contact you promptly with instructions for downloading the corpus.
Commercial License
Please review and sign the commercial license agreement to initiate purchase and download of the corpus.