MEDX-FE: A Robust LLM-Based Framework for Medical Feature Extraction with Matching Gate Hallucination Filtering
Main Article Content
Abstract
Licensed medical examination assessment is a resource-intensive task that requires significant expert involvement and time. Automating this process is both critical and challenging due to the need to ensure reliable evaluation while minimizing manual effort. In this work, we introduce MEDX-FE (Medical Exam Feature Extractor), a framework that leverages the semantic understanding capabilities of large language models (LLMs) to extract clinical features from patient notes written by medical students. Central to our framework is the Matching Gate, a novel three-stage validation module designed to mitigate LLM hallucinations and ensure that only text-grounded features are retained. MEDX-FE combines instruction fine-tuning with few-shot in-context learning to produce accurate extractions even under limited supervision. Experiments on the United States Medical Licensing Examination (USMLE) Step 2 Clinical Skills (CS) dataset show that MEDX-FE achieves a micro F1 score of 0.96 using the full training set and maintains strong performance (0.946) even when trained on as few dataset (10 examples per case). These results highlight the potential of integrating LLMs with lightweight validation modules to enable scalable and trustworthy assessment in medical education.