Datasets
Swadesh Lists linked to the Open Multilingual Wordnet (OMW)
This resource includes the compilation and mapping of 1212 Swadesh lists to the Open Multilingual Wordnet, through Princeton Wordnet 3.0 synsets.
The canonical citation and description of this resource is:
@inproceedings{MorgadoDaCosta:Bond:Kratochvil:2016,
title = {Linking and Disambiguating Swadesh Lists: Expanding the
{Open Multilingual Wordnet} Using Open Language Resources}},
author = {Morgado da Costa, Luis and
Bond, Francis and
Kratochv{\'\i}l, Franti{\v{s}}ek},
booktitle = {Proceedings of GLOBALEX 2016 Lexicographic Resources for
Human Language Technology, 10th edition of the International Conference
on Language Resources and Evaluation (LREC 2016)},
pages = {29--36},
year = {2016}
}
Both this work and all 1212 original lists used in this work are shared under a CC-BY-3.0 license. The original data and metadata shared by the The Rosetta Project can be found as an appendix to this data.
Click here to download this dataset.
Chinese Classifiers linked to the Chinese Open Wordnet (COW)
This resource includes the compilation of a mapping between Mandarin Chinese lemmas and their candidate classifiers to the Open Multilingual Wordnet, through Princeton Wordnet 3.0 synsets. Two files are made available:
- (Raw, text-based) Chinese lemma mapping, with frequencies, to a list of 204 sortal classifiers
- Princeton Wordnet synset mapping, with frequencies, to a list of 204 sortal classifiers The frequencies presented here are raw (i.e. no filtering was applied), refering to the dataset described as Tau=1, in the appended publication.
@inproceedings{Morgado:Bond:Gao:2016,
title = {Mapping and Generating Classifiers using an Open Chinese Ontology},
author = {Morgado da Costa, Luís and
Bond, Francis and
Gao, Helena},
booktitle = {Proceedings of the 8th Global WordNet Conference (GWC 2016)},
address = {Bucharest, Romania},
year = {2016}
}
Both files in this dataset (i.e. lemma and wordnet mappings) are released under a CC-BY-4.0 license.
Click here to download this dataset.