Case Study: WA Department of Health’s Synthetic Data Innovation Project

Learn about the state's bold new approach to overcoming the traditional challenges of health data usage.

Author avatar
Heather Dailey 18 March 2025
Case Study: WA Department of Health’s Synthetic Data Innovation Project

With Nasir David, Director Data and Information Systems, WA Department of Health

The WA Department of Health’s Synthetic Data Innovation Project represents a bold new approach to overcoming the traditional challenges of health data usage. By embracing synthetic data, the department has paved the way for faster innovation, stronger privacy protections, and closer collaboration with external industries. As the project expands, it promises to unlock even greater potential for health research, AI integration, and data-driven policy development in Western Australia. 


Quick Facts about the State and Data Linkage Project 

  • Project Name: Western Australia (WA) Department of Health’s Synthetic Data Innovation Project
  • Established: The original Data Linkage Service began in 1995, while the synthetic data initiative launched more recently.
  • Main Objective: To generate synthetic data that maintains key characteristics of real health data while protecting patient privacy, facilitating research and innovation.
  • Key Benefits: Enables innovation, drives faster policy development, enhances cybersecurity, and reduces the ethical burden of handling real patient data.
  • Project Milestones: The WA Department of Health released a data linkage strategy, participated in hackathons, and successfully tested synthetic data in practical settings.
  • Future Vision: Expand the synthetic data offering with more representative and non-representative datasets, create common data models, and leverage AI for further innovation.

Historic Obstacles to Health Innovation Programs 

The WA Department of Health’s journey toward synthetic data innovation has its roots in the Data Linkage Service, which was established in 1995 to support research by linking disparate datasets. This initiative, however, faced several long-standing challenges that hampered the department’s ability to maximise the use of health data, especially for innovation purposes. Key obstacles included: 

  1. Privacy Concerns: Health data is sensitive by nature, and any use for research or innovation is bound by strict privacy regulations. The administrative health data collected when a patient visits a hospital or clinic is primarily for treatment purposes, not research. Thus, the department cannot freely share this data without navigating stringent privacy protocols.
  2. Cybersecurity Threats: One of the most critical obstacles has been ensuring that health data remains secure while in use. High-profile incidents like the Optus data breach underscored the risks of data exposure when systems are improperly secured. The risk of sensitive health data being leaked during testing or development processes posed significant concerns.
  3. Fragmented Infrastructure: WA’s health system, like many others, relies on a mix of legacy systems that are difficult to modernise. Data had to be reloaded from scratch due to outdated systems, which delayed projects and hampered the effective use of data.
  4. Ethical and Legal Limitations: Accessing and using patient data requires navigating a complex web of legal and ethical approvals. Legislation such as the Health Services Act, combined with strict ethics approval processes, slowed the department’s ability to innovate. Every request for data had to pass through a custodian and go through rigorous review, making it nearly impossible to share data for exploratory or what-if purposes.

These barriers led to a lack of participation in health-related hackathons and other innovation programs. WA’s Department of Health found it difficult to collaborate with technology companies, pharmaceutical firms, and other external partners who were eager to leverage health data for research, policy development, and innovation. 


How the Project is Improving Innovation through Data 

In response to these historic obstacles, the WA Department of Health embarked on an ambitious synthetic data project to facilitate innovation while protecting privacy and ensuring compliance with regulatory frameworks. 

Synthetic Data Generation: The cornerstone of the project is synthetic data—artificially generated data that mimics the structure and characteristics of real health data without containing identifiable patient information. This allows the department to share data for testing, research, and innovation without the ethical concerns or privacy risks associated with real data. This comes in two types: 

  • Representative Synthetic Data: This type of data maintains the same statistical properties as real data, such as patient demographics and medical test results, but cannot be traced back to individuals. It is ideal for use in research or policy development where a close approximation to real-world data is necessary.
  • Non-representative Synthetic Data: This is randomly generated data within certain ranges (e.g., age 1-100), which is particularly useful for testing large-scale systems and models without needing to replicate actual patient data.

Facilitating Faster Innovation: By using synthetic data, the department can now engage in hackathons, where external innovators can experiment with health data without the need for extensive ethics approvals. For instance, at a hackathon held last year, participants were able to develop innovative solutions using synthetic emergency department (ED) data. This data mirrored real ED presentation patterns, allowing the participants to analyse demand and create solutions for issues like hospital ramping. This has enabled: 

  • Successful Hackathon Participation: For the first time, the WA Department of Health provided synthetic data to hackathon participants, enabling them to develop practical solutions within a short timeframe. This event highlighted how synthetic data could drive rapid innovation and generate ideas that the department may not have considered in its own internal processes.

Enhanced Cybersecurity and Privacy: The use of synthetic data also addresses the cybersecurity concerns that have long plagued the department. By testing systems and running simulations on non-sensitive synthetic data, the department can avoid scenarios like the Optus data breach, where real data was accidentally exposed in a development environment. 

Driving Collaboration with Industry: One of the biggest advantages of synthetic data is its potential to bridge the gap between the public sector and industries like pharmaceuticals and technology. Previously, pharmaceutical companies faced difficulties accessing health data for drug development due to privacy concerns. Now, with synthetic data, initial exploratory research can be conducted without sharing sensitive patient information. This fast-tracks negotiations and collaborations, allowing the department to work more efficiently with external partners. 

Impact on Policy Development: The project also facilitates the creation of policies based on accurate, representative data without needing to compromise patient privacy. By using synthetic datasets, government agencies can develop policies informed by health trends, test the potential impact of new initiatives, and validate these models without handling real patient data. 


Future Plans for Synthetic Data 

Looking ahead, the WA Department of Health has ambitious plans to further expand the role of synthetic data in driving innovation and supporting health research. Key elements of the future strategy include: 

Expansion of Synthetic Data Offerings: The department plans to add more layers to its synthetic data portfolio, focusing on both representative and non-representative datasets. These will serve different purposes: 

  • Representative Synthetic Data: Continually refined and expanded to cover more medical conditions, age groups, and service delivery points, providing comprehensive datasets for research and policy development.
  • Non-representative Synthetic Data: Increased use for testing systems and models, as well as for education and training purposes.

Development of Common Data Models: Another major focus will be on creating and sharing common data models, particularly for use by researchers and technology companies. By providing data in standardised formats, the department hopes to reduce the time spent transforming and preparing data, enabling faster and more efficient use of health data in innovation. 

Collaboration with Global Standards: The department is considering adopting international standards like Microsoft’s Common Data Model for health, and the OMOP (Observational Medical Outcomes Partnership) data model used by researchers worldwide. This would allow for easier collaboration and data exchange across jurisdictions and institutions. 

Scaling Innovation: The department’s synthetic data project has already shown how it can support innovation at the hackathon level. Now, the goal is to scale these efforts by integrating synthetic data into more formal innovation programs, such as large-scale clinical trials, AI model development, and drug research. 

AI and Machine Learning Integration: AI will play a critical role in the department’s future data strategy. The department plans to integrate more AI-driven models to generate synthetic data and develop machine learning algorithms capable of extracting new insights from health data. Synthetic data will be instrumental in training these AI models without compromising privacy. 

Government-wide Data Strategy: The department envisions that synthetic data could serve as a model for other government agencies. The ultimate goal is to create a government-wide common data model that can be used across multiple sectors, not just health, for innovation and policy development. 

 

 

Communities
Regions
Australia Australia

Published by

Heather Dailey Content Strategist, Marketing