Skip to main content

"We've detected that you're visiting from {0}. Would you like to switch languages for tailored content?"

Our team recently participated in a generative artificial intelligence (AI) experiment where we set out to explore the capabilities of using generative AI models in mapping ICD-10-CM and SNOMED-CT codes to high level categories. Our goal was to determine whether these advanced AI models could accurately and reliably perform this task, which is otherwise manual for our team.

The experiment

We utilized three generative AI models, ChatGPT 4, Perplexity and Gemini Advanced, to see if they could map ICD-10-CM and SNOMED-CT codes to broader categories. We prompted each model to try finding the concepts within the terminologies that related to our broad categories. One example of a prompt used was, “Return all SNOMED-CT codes including their descriptions related to the following term: Chest trauma.” We also used several follow-up prompts to see what results were generated or if further codes were revealed.

Here’s what we discovered

Our findings were a mix of promising results and notable challenges:

  • Relevant codes identified: All three models identified some relevant codes, demonstrating their potential in understanding and processing standard terminology and medical terms.
  • Unreliable performance: Despite some successes, the overall performance of the models was unreliable. We encountered several issues, including:
    • Hallucinated ICD-10-CM codes: The models generated some ICD-10-CM codes that do not exist, highlighting a significant limitation in their current capabilities.
    • Broad code suggestions: Some of the codes suggested by the models were too broad.
    • Incomplete sets: Perplexity and Gemini Advanced often provided subsets of the appropriate codes, which were not complete. This inconsistency lent to the opinion that considering the time required to verify and check for completeness, we may as well do the curation of mapping ourselves.
    • Incorrect code types: Perplexity once returned ICD-10-CM codes when asked for SNOMED-CT codes, indicating a misunderstanding of the request.
    • Lack of descendants: Gemini Advanced sometimes did not return codes or give descendants; instead, it directed us on how to find them.

Future considerations

Our experiment with ChatGPT 4, Gemini Advanced and Perplexity in mapping ICD-10-CM and SNOMED-CT codes revealed both the potential and the current limitations of generative AI in the standard terminology space. While the models showed some promise, their unreliability underscores the need for further training and refinement. One of the key takeaways from our experiment is the importance of training and fine-tuning AI models for specific tasks. In our case, we did not train the models, which likely contributed to the unreliable results. We believe that incorporating a retrieval-augmented generation (RAG) approach could enhance the models' performance by providing them with more relevant context and data. 

As we continue to explore these technologies, we remain optimistic about their future applications in assisting with standard terminology mapping.
 

Jacee Robison, RN, MS is a nurse informaticist at Solventum.