Quantifying English Learning

Site content

This site has been made to make resources available for English language learners and researchers.

DV-8k word-list

The DV-8k is an 8000-word list based on corpus the highest frequency and dispersion scores from the Corpus of Contemporary American English (COCA). Its purpose is to be used in a diagnostic test to determine the level of mastery of vocabulary and the level of preparedness for reading a wide range of authentic English texts. Click here to see the DV-8k.

About the DV-8k

This diagnostic vocabulary 8000-word list (or DV-8k) is composed of core (1-2000 words) and mid-frequency (2001-8000 words) lists. The core list is based on COCA’s 100-million-word SOAP corpus of scripts from TV dramas because its language will be more informal and deal with more daily life topics; the mid-frequency list is based on COCA’s 400+ million corpus of a balance of genres (news, fiction, spoken and academic). The DV-8k can be understood as an objective ranking representing the likelihood of single-words being encountered in authentic English texts across a wide range of genres. Unlike other word-lists, the DV-8k consists of word-family headwords (lemmas of different primary meanings with the highest frequency and dispersion scores), and a manual elimination process removed the lower frequency lemmas sharing the same primary meaning.

Latest News

More resources for English language users, teachers and researchers will be uploaded in the future.

SEE ALL BLOG POSTS

Conference talk on the DV-8k

Here are the slides to the MCU conference that introduced the DV-8k. DV-8k wordlist.MCU.2017.03 Abstract This paper has 4 aims: to argue for the need of a ranked wordlist of core[…]

Read more

The purpose of eq-ls.org

This website is the product of ongoing research into English language learning and testing. The plan is to first publish the DV-8k word-list and later a diagnostic vocabulary test based[…]

Read more