LLM Benchmarks

Task Information

All tasks listed are binary classification tasks. Please refer to our paper (https://arxiv.org/pdf/2503.15169) or their original publication for more detailed information.

Breast cancer

Al-Garadi MA, Yang YC, Lakamana S, et al. Automatic Breast Cancer Cohort Detection from Social Media for Studying Factors Affecting Patient-Centered Outcomes. Vol 12299 LNAI.; 2020. DOI Link

Changes in medication regimen

Magge A, Klein A, Miranda-Escalada A, et al. Overview of the Sixth Social Media Mining for Health Applications (#SMM4H) Shared Tasks at NAACL 2021. Proceedings of the Sixth SMM4H Workshop. Association for Computational Linguistics; 2021:21-32. DOI Link

Adverse pregnancy outcomes

Klein AZ, Gonzalez-Hernandez G. An annotated data set for identifying women reporting adverse pregnancy outcomes on Twitter. Data Brief. 2020;32:106249. DOI Link

Potential cases of COVID-19

Klein Ari Z, Magge A, OK, FAJI, WD, GHG. Toward Using Twitter for Tracking COVID-19: A Natural Language Process. DOI Link

Stigma labeling

Walker A, Thorne A, Das S, et al. CARE-SD: classifier-based analysis for recognizing provider stigmatizing and doubt marker labels in electronic health records: model development and validation. JAMIA. 2024;32(2):365–374. DOI Link

Medication change discussion

Mahajan D, Liang JJ, Tsou CH. Toward Understanding Clinical Context of Medication Change Events in Clinical Narratives. AMIA Annu Symp Proc. 2022;2021:833–842. DOI Link

Natal sex

Not Published Yet

Contact

For any questions or requests related to benchmarking new models or new tasks, please contact Yuting Guo (yguo262@emory.edu).