Data Provenance

Data Provenance

In the UK, many types of routinely-collected data from the NHS and other government agencies are available for research. To protect privacy, data governance law requires that only project-specific portions of the data be extracted, filtered and anonymised before release for research.

Currently little information is provided to researchers on the methods used to produce their data. This lack of transparency results in an increased risk of undetected error propagation and leaves the resulting research difficult or impossible to evaluate and reproduce.

We will co-design, pilot, and evaluate methods for recording and reporting provenance for research using high-security data. The result will be a method to report data provenance that maintains privacy and makes the research more findable, accessible, interoperable, and reproducible.

Our approach recognises that meeting the needs of both data guardians and researchers requires active cooperation. It is a collaboration between data guardians, computing scientists specialising in provenance and trust, an expert in service evaluation methods, and a specialist in open science practice.