标题:A BERT-based method to develop discipline-specific academic vocabulary lists in large corpora
作者:宫同喜(Institute of Language Sciences, Shanghai International Studies University, China)、刘磊(School of International Studies, Zhengzhou University, China)
来源:《English for Specific Purposes》, 82 (2026), 1–15.
摘要:Accurately identifying discipline-specific vocabulary—particularly common words with specialized meanings—remains a critical challenge in vocabulary list development. This study introduces a novel approach to integrating BERT-based semantic annotation with statistically rigorous thresholds to address three limitations of prior methods: (1) the disconnect between overall word frequency and specialized meanings, (2) reliance on arbitrary statistical cutoffs, and (3) the need for manual disambiguation. We demonstrate this approach by constructing the Medical Sense List (MSL), a sense-level inventory of 961 medical terms validated against corpora and dictionaries. The MSL shows a 78.5 % overlap with established medical dictionaries, achieves a higher mean coverage per sense (12.25 %) than existing medical vocabulary lists, and utilizes BERT with 94 % disambiguation accuracy. Crucially, our method establishes objective thresholds through combinatorially symmetric cross-validation (CSCV), significantly reducing reliance on human judgment. This transparently outlined approach can be readily adapted to other disciplines or languages.
关键词:Academic word list;Semantic annotation;BERT;Automated identification;English for specific purposes
引用格式(GB/T 7714—2015):Gong, T., & Liu, L. (2026). A BERT-based method to develop discipline-specific academic vocabulary lists in large corpora. English for Specific Purposes, 82, 1-15.https://doi.org/10.1016/j.esp.2025.11.002






