Research
Building Safe & Ethical AI For The Healthcare Industry
Sully.ai’s Doctor-Language Model vs SOTA (State-of-the-art LLMs)
Task (Conversation-to-charting) Input: Transcript -> Output: Clinical Note Comparison: GPT4, GPT3.5 vs. Sully.ai’s Doctor-Language ModelBenchmark Dataset: de-identified dataset We prepared this dataset using thousands of real-life physician-signed notes with transcripts as ground truth.
Approach
The goal of this experiment was to compare the performance of different models and prompting techniques for the task of generating a SOAP note from a patient visit transcript. The ground truth SOAP notes were collected by pulling all available physician-signed notes and de-identifying them as per HIPAA Safe Harbor guidelines. For both GPT4 and GPT3.5, we use single-shot prompting where only one exemplar for the JSON output format of the SOAP note was provided along with a preamble (e.g. “You are an expert physician…”). We also fine-tuned a model with transcripts and corresponding SOAP notes (JSON formatted) from hundreds of real medical notes.The following table shows how each method fares (measured as DICE and a custom similarity score)Approach (length truncated) Figure 1.0:
A 10,000-foot view of our infrastructure design and the flow of information.
