LLM BENCHMARKS
• Gemma3: GEMMA-3-27B-IT• Llama4: LLAMA4-109B• DeepSeekV3: DEEPSEEK-V3-0324-UD-Q2_K_XL• Llama3: LLAMA3-70B• DeepSeekR1: DEEPSEEK-R1-DISTILL-LLAMA-70B
Task Information
All tasks listed are binary classification tasks. Please refer to our paper (https://arxiv.org/pdf/2503.15169) or their original publication for more detailed information.
Breast cancer
Al-Garadi MA, Yang YC, Lakamana S, et al. Automatic Breast Cancer Cohort Detection from Social Media for Studying Factors Affecting Patient-Centered Outcomes. Vol 12299 LNAI.; 2020. DOI Link
Changes in medication regimen
Magge A, Klein A, Miranda-Escalada A, et al. Overview of the Sixth Social Media Mining for Health Applications (#SMM4H) Shared Tasks at NAACL 2021. Proceedings of the Sixth SMM4H Workshop. Association for Computational Linguistics; 2021:21-32. DOI Link
Adverse pregnancy outcomes
Klein AZ, Gonzalez-Hernandez G. An annotated data set for identifying women reporting adverse pregnancy outcomes on Twitter. Data Brief. 2020;32:106249. DOI Link
Potential cases of COVID-19
Klein Ari Z, Magge A, OK, FAJI, WD, GHG. Toward Using Twitter for Tracking COVID-19: A Natural Language Process. DOI Link
Stigma labeling
Walker A, Thorne A, Das S, et al. CARE-SD: classifier-based analysis for recognizing provider stigmatizing and doubt marker labels in electronic health records: model development and validation. JAMIA. 2024;32(2):365–374. DOI Link
Medication change discussion
Mahajan D, Liang JJ, Tsou CH. Toward Understanding Clinical Context of Medication Change Events in Clinical Narratives. AMIA Annu Symp Proc. 2022;2021:833–842. DOI Link
Natal sex
Not Published Yet
Contact
For any questions or requests related to benchmarking new models or new tasks, please contact Yuting Guo (yguo262@emory.edu).