Frequency Dictionaries

Frekwensiewoordeboek van Afrikaans

von

The Frequency Dictionaries series aims at producing dictionaries with com parable frequency data for a large number of different languages. For many of the languages featured in this collection, this series is the first comprehensive compilation to use a large-scale empirical base.
The dictionaries are available in both print and electronic versions. Each dictionary provides the most frequent 1,000 word forms in order of frequency and the 10,000 most frequent word forms in alphabetical order. They provide an introductory description of the data and the methodological approach used. In addition, language-specific statistical information is provided with regard to letters, word structure and structural changes.
The enclosed CD-ROM contains a more comprehensive version of the dictionary as an e-book. It includes data on the relative frequency of up to 1,000,000 word forms. For less-resourced languages the lists are shorter due to the reduced size of the corpora. The Georgian word list of this volume contains 1,000,000 word forms. This list of words (with frequency classes) is also available as a plain text file on the CD-ROM, ordered both alphabetically and by frequency. Using this file, word lists for various applications can be generated easily. The word forms in the printed part of the dictionary have been checked carefully by hand to identify incorrect forms. By contrast, the more comprehensive list on the CD-ROM has been inspected by means of automatic plausibility criteria alone.
For the compilation, comprehensive electronically available sources of the Leipzig Corpora Collection were used consistently. The corpora on which the individual frequency dictionaries are based include newspaper texts, Wikipedia articles and other randomly collected texts available on the Internet. They can be accessed online at http://corpora.uni-leipzig.de/. This series of dictionaries provides the opportunity to explore comparative linguistic topics and such monolingual issues as studies on word formation and frequency-based examinations of lexical areas for use in dictionaries or language teaching. The statistical results presented here can offer initial suggestions for several areas of research. The title of each frequency dictionary always includes the name of the language in English, in the original language and its three-letter abbreviation according to ISO 639-3.

Hierdie reeks frekwensiewoordeboeke het ten doel om woordeboeke saam te stel met vergelykbare frekwensiedata vir ʼn verskeidenheid tale. Vir baie van die tale wat in hierdie versameling aan bod kom, is hierdie reeks die eerste omvattende poging waar ʼn grootskaalse empiriese databasis gebruik word. Die woordeboeke is in sowel gedrukte as elektroniese formaat beskik- baar. Elke woordeboek verskaf die gebruiklikste 1 000 woordvorme in frek- wensievolgorde en die 10,000 gebruiklikste woordvorme in alfabetiese orde. Daar is ook ʼn inleidende beskrywing van die data en die metodologiese be- nadering wat gevolg is. Daarbenewens word taalspesifieke statistiese data verskaf met betrekking tot letters, woordstruktuur en struktuurverander- ings. Die ingeslote CD-ROM bevat ʼn omvattender weergawe van die woordeboek as ʼn e-boek. Dit bevat data oor die relatiewe gebruiklikheid van tot 500,000 woordvorme. Vir tale met minder bronne is die lyste korter as gevolg van beperkter korpora. Die Afrikaanse woordelys van hierdie deel bevat 500,000 woordvorme. Hierdie lys woorde (met frekwensieklasse) is ook beskikbaar as ʼn gewone tekslêer op CD-ROM en is sowel alfabeties as volgens gebruiklikheid georden. Deur hierdie lêer te gebruik, kan woordelyste vir verskillende toepassings maklik geskep word. Die woordvorme in die gedrukte deel van die woordeboek is noukeurig met die hand nagegaan om foutiewe vorme te identifiseer. Daarenteen is die omvattende lys op CD-ROM slegs met behulp van outomatiese geskiktheidskriteria nagegaan. Vir die samestelling van hierdie woordeboek is die omvattende beskikbare bonne van die Leipzig Corpora Collection deurgaans gebruik. Die korpora waarop die individuele frekwensiewoordeboeke gebaseer is, sluit koeranttekste in asook Wikipedia-artikels en ander lukraak gekose tekste wat op die internet beskikbaar is. Toegang daartoe is aanlyn moontlik by http://corpora.informatik.uni-leipzig.de/. Hierdie reeks woordeboeke bied die geleentheid om vergelykende taalkunde-onderwerpe te ondersoek asook eentalige kwessies oor woordvorming en frekwensie-gebaseerde ondersoeke na leksikale gebiede wat ter sake is vir woordeboeke en taalonderrig. Die statistiese data wat hier verstrek word, kan ook aanduidings gee van verskeie verdere navorsingsgebiede. Die titel van die frekwensiewoordeboeke bevat altyd die naam van die taal in Engels en in die oorspronklike taal en dit gebruik die drieletterafkorting van die ISO-639-3.