Nemlar logo


Monolingual lexica

Back to BLARK for written language

3 existent but only company-internal, 2 existent and freely usable for PreR&D, 1 existent and freely usable for both PreR&D and R&D.
4 more than 10,000 €, 3 1,000 - 10,000 €, 2 100 - 1,000 €, 1 less than 100 € or free
3 black box, 2 glass box (you can see but not change it) 1 freely manipulable

R means for research, C means for commercial use.
For availability = 3 (company internal) other features are irrelevant.

Name of Lexicon Provider Size Other information Availability, price, manip.
Diinar 1 Lyon2 138,766 entries – 129,000 lemmas   1,3,1 R,1,4,1 C
Arabic Lexicon RDI 4,500 roots, 4 million stems See Arabic Natural Language Processing from RDI 3
Dictionnaire de formes fléchies simples et agglutinées arabes CNRS 66 million entries   1 (subject to negociation)
Arabic lexicon Sakhr 120K MSA & Classic stem   3,4,1
Arabic Idiom lexicon Sakhr 76K basic idioms With both lexical and semantic information 3
Selectional restrictions Sakhr 50K frame Semantic restrictions associated with senses of verbs, nouns and adjectives and imposed on the environment in which they occur 3
Arabic simple forms lexicon CEA 3,164,000 entries - 114,000 lemmas With grammatical information (POS, Gender and Number) 3
Arabic proclitics lexicon CEA 77 entries With grammatical information 3
Arabic enclitics lexicon CEA 66 entries With grammatical information 3

List of conjunctions and other sentence starters/stoppers
(No resources have been surveyed for 'sentence boundary detection')

Name of Lexicon Provider Size Other information Availability, price, manip.
Arabic word segment model Sakhr   MSA & Classic Arabic Language model for Arabic word segment 3

MEDAR is supported by the European Commission's ICT programme and is running from
February 1st 2008 until July 31st 2010

European Flag