News
Article
Author(s):
Explore how synthetic data revolutionizes CSU research, enabling smaller sample sizes and enhancing trial efficiency and inclusivity.
Randomized controlled trials (RCTs) remain the cornerstone of evidence-based medicine, yet their execution can be hindered by cost, resource demands, and participant recruitment challenges.1 In chronic spontaneous urticaria (CSU)—a condition characterized by recurrent hives, angioedema, or both for more than 6 weeks—these challenges are pronounced. Despite affecting approximately 0.5–1% of the population, RCT enrollment is often slowed by strict inclusion criteria, comorbidities, and high dropout rates. Subgroups such as elderly patients, individuals with specific skin types, or non-responders to standard therapy remain underrepresented.2
Synthetic data, generated from validated real-world data (RWD), has emerged as a potential solution to these limitations. Using statistical and machine learning models, synthetic datasets can simulate patient cohorts, replicate clinically relevant variables, and preserve privacy while enabling more comprehensive subgroup analyses. Unlike raw RWD, synthetic data can be freely shared without breaching confidentiality, as it does not contain identifiable patient information.3
Study Methods and Materials
A recent study assessed whether synthetic data could replicate the demographic and clinical characteristics of patients enrolled in the Chronic Urticaria Registry (CURE).4 This international, multi-center registry includes over 4,000 physician-confirmed CSU patients from 54 centers in 30 countries. Nineteen variables—including demographic factors, laboratory data, and patient-reported outcome measures such as the Urticaria Activity Score over 7 days (UAS7) and the Urticaria Control Test (UCT)—were analyzed.
Researchers employed a generative decision tree (GenDT) approach using the Classification and Regression Trees (CART) algorithm to create synthetic datasets from CURE data. The similarity between synthetic and real datasets was evaluated using established statistical metrics, including pMSE values, correlation coefficients, and regression analyses.
Results
Researchers found that synthetic data closely mirrored real patient data for key characteristics such as gender distribution (72.4% women in RWD vs. 71.7% in synthetic), mean age (both 44 years), and frequency of symptoms. No statistically significant differences were observed across most variables, including comorbidity rates, medication use, and PROMs. Subgroup analyses—such as elderly patients, different BMI categories, and gender-based comparisons—also demonstrated high concordance between synthetic and original datasets. Importantly, researchers stated disease-relevant correlations, such as the inverse relationship between UCT and UAS7 scores, were preserved.
A sensitivity analysis determined that high-quality synthetic datasets could be generated from as little as 25% of the original patient sample. In practical terms, enrolling just 38 patients in a clinical trial could yield a synthetic dataset of 150 participants—4 times the original size—without significant loss of data fidelity. This suggests a potential role for synthetic data in boosting statistical power for underrepresented subgroups or early-phase pilot studies.
Limitations
Despite these encouraging results, limitations remain. Synthetic data inherits the biases and limitations of its source RWD, and small category sizes in the original data may lead to reduced accuracy for those variables. Additionally, generative models may oversimplify heterogeneity in patient populations. Standards for synthetic data generation and validation are not yet established, and further testing against independent datasets is necessary.
Conclusion
Nonetheless, the study highlights promising applications for synthetic data in dermatologic research. By reducing the reliance on large control groups, synthetic datasets could lower trial costs, accelerate timelines, and expand inclusion of rare or underrepresented patient subgroups. Compared with current FDA- and EMA-approved digital twin technologies, which typically achieve around a one-third reduction in control arm size, the approach described here suggests the possibility of up to 75% reduction while maintaining quality.
Researchers suggested that future research should focus on prospective validation, broader population inclusion, and the development of regulatory standards for synthetic data in clinical research. If validated, this approach could help overcome long-standing barriers in CSU and other dermatologic trials, enabling more efficient and representative evidence generation.
References
Like what you’re reading? Subscribe to Dermatology Times for weekly updates on therapies, innovations, and real-world practice tips.