Resident Physician University of Illinois at Chicago Chicago, IL, US
Introduction: Online patient-targeted reading materials on Chiari malformations are often written at a level that is difficult for the average American to understand. This can especially burden patients with low levels of health literacy. We sought to assess and compare the utility of Large Language Models (LLM) in improving the readability of existing patient-targeted education materials on Chiari malformations.
Methods: The three LLM’s assessed in this study included ChatGPT-3.5, ChatGPT-4, and Google Bard. To assess their effect on readability, LLM’s were asked to rewrite 10 reputable resources from a Google search of keyword “Chiari malformation” to the AMA-recommended “6th grade reading level.” This was accomplished by inputting the following prompt: “Given patient education materials are recommended to be written at a 6th grade reading level, using the SMOG readability formula, can you rewrite the following text to a 6th grade reading level: [insert text]”.
Results: The mean SMOG Readability Score of the original texts (8.7 ± 1.7) was found to higher (i.e. less readable) on average than the AMA-recommended 6th grade level. Only ChatGPT-4 was capable of rewriting educational resources to be significantly more readable than originally published form (p < 0.001). ChatGPT-4 also generated significantly lower (i.e. more readable) SMOG scores (5.6 ± 0.7) in comparison to both ChatGPT-3.5 (7.5 ± 1.3) and Bard (8.0 ± 1.7)
Conclusion : ChatGPT-4 can serve as a valuable supplement in improving the baseline readability of patient educational materials, as was demonstrated in our study on Chiari malformations. In doing so, it may serve as a useful tool help address gaps on complex health topics for patients with low health literacy.