• Case-Based Roundtable
  • General Dermatology
  • Eczema
  • Chronic Hand Eczema
  • Chronic Spontaneous Urticaria
  • Alopecia
  • Aesthetics
  • Vitiligo
  • COVID-19
  • Actinic Keratosis
  • Precision Medicine and Biologics
  • Rare Disease
  • Wound Care
  • Rosacea
  • Psoriasis
  • Psoriatic Arthritis
  • Atopic Dermatitis
  • Melasma
  • NP and PA
  • Skin Cancer
  • Hidradenitis Suppurativa
  • Drug Watch
  • Pigmentary Disorders
  • Acne
  • Pediatric Dermatology
  • Practice Management
  • Prurigo Nodularis
  • Buy-and-Bill

News

Article

AI Shows Promise in Early Medical Education

Key Takeaways

  • ChatGPT-4.0 performed comparably to third-year dermatology residents but struggled with complex questions, highlighting its limitations in advanced clinical contexts.
  • Language limitations and the absence of visual aids may have impacted ChatGPT's performance in Turkish-language dermatology exams.
SHOW MORE

Explore how AI, particularly ChatGPT, impacts dermatology education, revealing strengths and limitations in clinical decision-making and training.

AI generic image | Image Credit: © lucegrafiar - stock.adobe.com

Image Credit: © lucegrafiar - stock.adobe.com

The integration of artificial intelligence (AI), particularly large language models (LLMs), into the medical field has expanded rapidly in recent years. LLMs such as ChatGPT, developed by OpenAI, have demonstrated potential for transforming medical education and clinical decision support.1 ChatGPT’s ability to respond to medical inquiries, summarize vast amounts of clinical information, and assist in diagnostic reasoning highlights its potential as an educational resource. However, evaluating its effectiveness in specific medical domains remains essential to understanding both its capabilities and limitations.2 With this in mind, a recent study provided a critical analysis of ChatGPT's performance in the field of dermatology, particularly in Turkish-language clinical settings.3

Methods

The prospective study, conducted in January 2025, assessed the performance of ChatGPT versions 3.5 and 4.0 on dermatology exam questions written in Turkish. The questions were categorized according to the seniority levels of dermatology resident physicians and developed to test a range of competencies, including basic knowledge, clinical application, and complex reasoning. The participants included 25 dermatology residents at varying stages of training, and their performance was compared to that of ChatGPT under controlled conditions. Importantly, the exam did not include visual aids, which are typically integral to dermatology assessments, emphasizing the models' reliance solely on text-based inputs.

Results

The findings demonstrated that ChatGPT-3.5 consistently underperformed compared to residents in their second, third, and fourth years. Meanwhile, ChatGPT-4.0 showed improved accuracy, performing on par with third-year residents and surpassing first-year residents, although it still lagged behind more experienced physicians. Both models’ scores declined as the questions increased in complexity, aligning with the residents' progression through training years. While ChatGPT-4.0 passed exams up to the third-year level, it failed to meet the threshold for the fourth year. These results suggest that while AI can support learning at earlier stages, its utility diminishes in advanced clinical contexts requiring nuanced judgment and deep contextual understanding.

One factor potentially limiting ChatGPT’s performance is its handling of the Turkish language. Prior studies have identified that ChatGPT performs best in English, with notable reductions in accuracy and coherence in other languages. This language gap may have hindered the models' ability to fully interpret and respond to the nuanced, clinically rich Turkish exam questions. Additionally, since dermatology often involves interpretation of visual cues, the absence of image-based questions may have skewed the results, limiting a complete evaluation of the models’ potential in dermatologic diagnosis and training.

Comparisons with prior studies in other languages support the finding that ChatGPT-4.0 generally performs better than its predecessor and can sometimes approach the competency level of medical professionals. For example, studies involving English and Polish-language dermatology exams have found ChatGPT-4.0 to consistently score above passing thresholds. Nonetheless, the current study’s results reinforce that AI models are not yet substitutes for human expertise, particularly in complex medical fields like dermatology.

Ethical and practical concerns remainregarding the deployment of AI in clinical education and practice. While AI tools such as ChatGPT can supplement learning and provide rapid access to information, reliance on them without appropriate oversight may pose risks, particularly when dealing with rare or complex conditions. AI is best positioned as an adjunct to, rather than a replacement for, physician expertise.

Conclusion

Overall, this study contributes valuable insights into the current capabilities and limitations of ChatGPT in medical education, particularly in non-English and visually dependent specialties such as dermatology. While ChatGPT-4.0 demonstrates promising performance in early-stage medical training, it does not yet match the clinical reasoning or diagnostic acumen of senior residents. Ongoing enhancements in language model training, especially in multilingual and clinically nuanced contexts, will be necessary for broader integration into medical education. Researchers suggested future studies should expand upon these results by including visual diagnostic elements, exploring additional languages, and comparing a wider range of AI tools to more accurately assess their potential and safety in clinical settings.

References

  1. Ahmed Y. Utilization of ChatGPT in medical education: Applications and implications for curriculum enhancement. Acta Inform Med. 2023;31(4):300-305. doi:10.5455/aim.2023.31.300-305
  2. Günay AE, Özer A, Yazıcı A, Sayer G. Comparison of ChatGPT versions in informing patients with rotator cuff injuries. JSES Int. 2024;8(5):1016-1018. Published 2024 May 6. doi:10.1016/j.jseint.2024.04.016
  3. Göçer Gürok N, Öztürk S. The performance of AI in dermatology exams: The exam success and limits of ChatGPT. J Cosmet Dermatol. 2025;24(5):e70244. doi:10.1111/jocd.70244

Newsletter

Like what you’re reading? Subscribe to Dermatology Times for weekly updates on therapies, innovations, and real-world practice tips.

Related Videos
© 2025 MJH Life Sciences

All rights reserved.