Few-shot Learning for Biomedical NER
Few-shot learning (FSL) is a class of machine learning methods that require small numbers of labeled instances for training. With many medical topics having limited annotated text-based data in practical settings, FSL-based natural language processing (NLP) holds substantial promise. However, there is no current study that compares the performances of FSL models with traditional models (e.g., conditional random fields) for medical text at different training set sizes, or provides a comprehensive review of existing few-shot learning methods for biomedical texts, with special reference to named entity recognition (NER).
Using five health-related annotated NER datasets, we benchmarked three traditional NER models based on BERT-BERT-Linear Classifier (BLC), BERT-CRF (BC) and SANER; and three FSL NER models-StructShot & NNShot, Few-Shot Slot Tagging (FS-ST) and ProtoNER. We also conducted a comprehensive review of few-shot learning for medical text, comprising 51 articles. We present the results of benchmarking of medical named entity recognition using several few-shot methods, along with best practice recommendations for evaluation of few-shot methods for medical text. Our work also involves developing a systematic resource of research aims, datasets, evaluation metrics, and methodology.
This project is funded by the National Institute on Drug Abuse (NIDA) of the National Institutes of Health (NIH) under award number R01DA057599. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.