Contact: nemlar@hum.ku.dk |
BLARK Definition and BLARK Content
|
Resources | BNSC | Desktop/Microphone & High quality microphone data or phone data | Telephony | Audio data with prosodic markers and other emotional features | Annotated written corpus | Unannotated written corpus | Vowelised corpus | Non-vowelised corpus | Phonetic lexicon general vocabular | Onomastica (proper names) | Visual data (faces, lips, etc.) |
---|---|---|---|---|---|---|---|---|---|---|---|
Speech modules | |||||||||||
Acoustic models | +++ | +++ | +++ | +++ | |||||||
Language models | ++ | +++ | ++ | +++ | |||||||
Pronunciation lexicon | + | ++ | +++ | +++ | |||||||
Lexicon adaptation | + | + | ++ | + | +++ | +++ | |||||
Phoneme alignment | ++ | ++ | ++ | ++ | ++ | + | +++ | +++ | |||
Prosody recognition | ++ | ++ | ++ | ++ | ++ | ||||||
Speech Units Selection | + | + | +++ | ++ | + | + | |||||
Prosody prediction | ++ | ++ | ++ | ++ | ++ | ||||||
Segmentation Speech / Silence | ++ | ++ | ++ | ++ | |||||||
Sentence boundary detection | ++ | ++ | ++ | ++ | + | + | |||||
Dialect / language identification | ++ | ++ | ++ | + | + | + | |||||
Word boundary identification | + | + | + | + | + | + | |||||
Speech /Non-speech (music) detection | ++ | + | + | ++ | |||||||
Speaker recognition / identification | + | + | + | + | |||||||
Emotion identification | + | + | + | + | + | + | |||||
Speaker adaptation | ++ | ++ | ++ | + | |||||||
Lips movement reading | +++ |
Speech applications and corresponding speech modules, marked with importance:
Applications | Dictation | Telephony speech applications |
Embedded speech recognition | Transcription of broadcast News | Transcription of conversational speech | Speaker recognition | Dialect / language identification | Emotion identification | Speaker adaptation | Lips movement reading | Topic detection, segmentation, topic boundaries | Speaker 2 speaker mapping | Emotion / Prosody output | Text to Speech (incl. formatted data e.g. databases) | Customization to different voices | Generation Lips Movement |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Speech modules | ||||||||||||||||
Acoustic models | +++ | +++ | +++ | +++ | +++ | ++ | +++ | +++ | +++ | +++ | +++ | ++ | +++ | +++ | +++ | +++ |
Language models | +++ | ++ | ++ | +++ | +++ | ++ | ++ | +++ | ||||||||
Pronunciation lexicon | +++ | +++ | +++ | +++ | +++ | ++ | +++ | |||||||||
Lexicon adaptation | + | + | + | + | + | ++ | +++ | |||||||||
Phoneme alignment | + | + | + | + | + | + | ++ | ++ | +++ | |||||||
Prosody recognition | + | + | + | + | + | + | + | +++ | + | ++ | ||||||
Speech Units Selection | +++ | +++ | ||||||||||||||
Prosody prediction | +++ | +++ | ||||||||||||||
Segmentation Speech / Silence | ++ | + | ++ | ++ | ++ | + | ++ | ++ | + | + | + | + | ||||
Sentence boundary detection | + | + | + | + | + | + | + | ++ | + | + | + | ++ | +++ | |||
Dialect / language identification | + | + | + | + | + | + | + | + | + | + | + | + | ||||
Word boundary identification | + | + | + | + | + | + | + | + | + | + | + | ++ | ||||
Speech /Non-speech (music) detection | + | + | + | + | + | + | + | ++ | + | + | + | |||||
Speaker recognition / identification | ||||||||||||||||
Emotion identification | + | + | + | + | + | + | + | + | + | + | ++ | ++ | ||||
Speaker adaptation | ++ | + | ++ | + | ++ | + | + | + | + | + | + | ++ | + | |||
Lips movement reading | +++ |