April 3, 2017 Uncategorized 0

Here are the slides to the MCU conference that introduced the DV-8k. DV-8k wordlist.MCU.2017.03

Abstract

This paper has 4 aims: to argue for the need of a ranked wordlist of core and mid-level vocabulary for English language learners (ELLs); present the compilation methods of making a list of 8000 word families; compare the list with other existing wordlists, such as Nation’s BNC word lists, the 1900-word General Service List, the 2800-word New General Service List, and Taiwan’s 6480-word CEEC list; and provide preliminary validation evidence.  

This 8000-word list differs from other wordlists insofar as it is ranked; other wordlists have lumped words into 1000 bands (e.g., Nation’s BNC/COCA 25000 word list used in the range program) or special functional grouping, like the Coxhead’s AWL. Given that many ELLs tend to only know around 2000 words, wordlists based on 1000 bands are a blunt instrument if used in diagnostic tests like the Vocabulary Levels Test (VLT) to measure vocabulary mastery at these levels.

The first 2000 words of the 8000-word list were redacted from COCA’s SOAP wordlist  (corpus of 100 million words from TV scripts), while the remaining 6000 came from COCA’s wordlist from a 400-million-word corpus composed of a wide and balanced range of genres including news, academic and fiction. Also in contrast to other wordlists, this 8000-word list consists of only lemma forms with the highest frequency and dispersion scores, while a manual elimination process removed lemmas sharing the same primary meaning of higher frequency forms. The elimination criteria were applied by another rater on 1000 words, with an 86% interrater agreement.

Analyses were made with the contents of other wordlists and initial validation evidence comes from a pilot diagnostic vocabulary test of 180 words sampling 3 words for every group of 100 words across 6000 words.