Spoken Resources
Back to BLARK for spoken language
Availability:
3 existent but only company-internal,
2 existent and freely usable for PreR&D,
1 existent and freely usable for both PreR&D and R&D.
Cost:
4 > 10,000 €,
3 1,000 - 10,000 €,
2 100 - 1,000 €,
1 < 100 € or free
Adaptability:
3 black box,
2 glass box (you can see but not change it)
1 freely manipulable
R means for research,
C means for commercial use.
For availability = 3 (company internal) other features are irrelevant.
Acustic Data |
Name of Resource |
Provider |
Size |
Other information |
Availability, cost, manip. |
SpeechDat like database |
UOB/ENS |
|
More than 100 speakers French/Arabic,
For speech recognition,
Lebanese/Syrian/French |
1,1,1 |
Arabic digits |
UOB |
|
For speech recognition, Lebanese accent |
1,1,1 |
Speech database in 4 languages |
LibanCell |
10,000 announcement with 10 words/announcements |
|
3 |
Labelled database for TTS |
Millenium |
|
|
3 |
Arabic broadcast news speech corpus (BNSC) |
ELRA/LDC |
More than 20 hours of transcribed Arabic news in Modern Standard Arabic. |
Domain: news |
1,2,1 |
Arabic acoustic corpus mono-speaker |
Benabbou, Morocco |
|
|
3 |
Arabic Phonetic database |
King Abdulaziz City for Science and Technology |
|
Lang: En-Ar |
3 |
Holy Quran multi-speaker |
RDI |
60 hours |
|
1,4,1 |
Single male speaker concatenative Arabic TTS database |
RDI |
1 hour, 1,300 sentences |
|
1,3,1 |
Single female speaker concatenative Arabic TTS database |
RDI |
4 hours, 3,000 sentences |
|
1,3,1 |
Arabic concatenative TTS male recording |
Sakhr |
MSA 3 hours |
|
3 |
Arabic ASR recording db |
Sakhr |
56 hours of MSA and Colloquial Arabic |
|
3 |
Human Names Language Model |
Sakhr |
500,000 Names |
Egyptian and Saudi human names corpus |
3 |
Arabic Acoustic Model |
Sakhr |
|
|
3 |
CALLHOME Egyptian Arabic Speech |
LDC |
120 Egyptian Colloquial Arabic telephone conversations |
Calls lasting up to 30 minutes |
1,2,1 |
CALLFRIEND Egyptian Arabic |
LDC |
60 telephone conversations between native speaker of Egyptian dialect of Arabic |
Calls lasted between 5 and 30 minutes. Includes documentation.
All calls are domestic. |
1,2,1 |
CALLHOME Egyptian Arabic Speech Supplement |
LDC |
20 telephone conversations.
Transcripts for 120 Egyptian Colloquial Arabic telephone conversations.
273,681,144 bytes (261 Mbytes) or 8 hours of audio data. |
20 data files in sphere format, 8 KHz shorten-compressed 2-channel mulaw. |
1,1,1 |
GlobalPhone Arabic |
ELRA |
About 100 adult native speakers were asked to read 100 sentences. |
The GlobalPhone corpus provides transcribed speech data for the development and evaluation of large vocabulary continuous speech recognition systems. |
1,3 |
OrienTel United Arab Emirates MSA |
ELRA |
500 speakers (254 males, 246 females) |
Recorded over the local fixed and mobile telephone network. |
1,4 |
OrienTel Arabic as spoken in Israel |
ELRA |
750 Arabic speakers (375 males, 375 females) |
Recorded over the Israeli fixed and mobile telephone network. |
1,4 |
OrienTel Jordan MCA |
ELRA |
757 Jordanian speakers (393 males, 364 females) |
Recorded over the Jordanian fixed and mobile telephone network. |
1,4 |
OrienTel Jordan MSA |
ELRA |
556 Jordanian speakers (288 males, 268 females) |
Recorded over the Jordanian fixed and mobile telephone network. |
1,4 |
OrienTel Egypt MCA |
ELRA |
750 Egyptian speakers (398 males, 352 females) |
Recorded over the Egyptian fixed and mobile telephone network. |
1,4 |
OrienTel Egypt MSA |
ELRA |
500 Egyptian speakers (254 males, 246 females) |
Recorded over the Egyptian fixed and mobile telephone network. |
1,4 |
OrienTel Morocco MCA |
ELRA |
772 Moroccan speakers (383 males, 389 females) |
Recorded over the Moroccan fixed and mobile telephone network. |
1,4 |
OrienTel Morocco MSA |
ELRA |
530 Moroccan speakers (264 males, 266 females) |
Recorded over the Moroccan fixed and mobile telephone network. |
1,4 |
OrienTel Tunisia MCA |
ELRA |
792 Tunisian speakers (426 males, 366 females) |
Recorded over the Tunisian fixed and mobile telephone network. |
1,4 |
OrienTel Tunisia MSA |
ELRA |
598 Tunisian speakers (359 males, 239 females) |
Recorded over the Tunisian fixed and mobile telephone network. |
1,4 |
OrienTel United Arab Emirates MCA |
ELRA |
880 speakers (432 males, 448 females) v
| Recorded over the local fixed and mobile telephone network. |
1,4 |
Arabic Broadcast news |
LDC |
|
Recordings from several Arabic radio channels |
$700 |
The Corpus of Spoken Palestinian Arabic (CoSPA | )
University of Haifa, Israel |
Between 1996 and 1998, 200 hours of recorded speech have been collected. |
The aim is to collect data that would cover the whole linguistic area of Palestinian Arabic. |
- |
KACST Arabic Phonetics Database |
KACST, Saudi Arabia |
The database contains more than 46,000 files. |
The KAPD is a detailed and comprehensive database that shows the articulatory mechanism of Arabic sounds. |
KAPD is available on 3 CDs for researchers and students of Speech. |
Saudi Accented Arabic Voice Bank |
KACST, Saudi Arabia |
1033 native speakers |
Saudi accented Arabic telephone speech database |
Can be licensed to be used in research or to develop products when a contract with KACST is signed. |
Written Corpus for Speech Technologys |
Name of Resource |
Provider |
Size |
Other information |
Availability, cost, manip. |
Corpus for di-syllables |
Abdelhak Mouradi, Noureddine Chenfour |
|
Domain: text-to-speech |
1,2,1 |
CALLHOME Egyptian Arabic Transcripts |
LDC |
Contiguous 5 or 10 minute segments taken from 120 unscripted telephone conversations |
The transcripts are timestamped by speaker turn for alignment with the speech signal and are provided in standard orthography. |
1,2,1 |
Phonetic Lexicon |
Name of Resource |
Provider |
Size |
Other information |
Availability, cost, manip. |
Special pronunciations dictionary |
Sakhr |
20,000 entries |
Dict. for handling pronunciation anormalities |
3 |
Name master dictionary |
Sakhr |
100,000 Names |
|
3 |
LC-STAR Standard Arabic Phonetic lexicon |
ELRA |
110,271 entries |
52,981 common word entries, 50,135 proper names, 7,155 special application words. |
1,4 |
|