宫同喜,刘磊:A BERT-based method to develop discipline-specific academic vocabulary lists in large corpora

发布者:中国外语战略研究中心发布时间:2025-12-25浏览次数:10

标题:A BERT-based method to develop discipline-specific academic vocabulary lists in large corpora

作者:宫同喜(Institute of Language Sciences, Shanghai International Studies University, China)、刘磊(School of International Studies, Zhengzhou University, China)

来源:《English for Specific Purposes》, 82 (2026), 1–15.

摘要:Accurately identifying discipline-specific vocabulary—particularly common words with specialized meanings—remains a critical challenge in vocabulary list development. This study introduces a novel approach to integrating BERT-based semantic annotation with statistically rigorous thresholds to address three limitations of prior methods: (1) the disconnect between overall word frequency and specialized meanings, (2) reliance on arbitrary statistical cutoffs, and (3) the need for manual disambiguation. We demonstrate this approach by constructing the Medical Sense List (MSL), a sense-level inventory of 961 medical terms validated against corpora and dictionaries. The MSL shows a 78.5 % overlap with established medical dictionaries, achieves a higher mean coverage per sense (12.25 %) than existing medical vocabulary lists, and utilizes BERT with 94 % disambiguation accuracy. Crucially, our method establishes objective thresholds through combinatorially symmetric cross-validation (CSCV), significantly reducing reliance on human judgment. This transparently outlined approach can be readily adapted to other disciplines or languages.

关键词:Academic word list;Semantic annotation;BERT;Automated identification;English for specific purposes

引用格式(GB/T 7714—2015):Gong, T., & Liu, L. (2026). A BERT-based method to develop discipline-specific academic vocabulary lists in large corpora. English for Specific Purposes, 82, 1-15.https://doi.org/10.1016/j.esp.2025.11.002

 

 

关闭
Baidu
map