Tools for research with sensitive data

On March 21st, SURF and Utrecht University organized the event “Your secrets are safe with us: Tooling for research with sensitive data” at the Utrecht University Library. The aim of this event was to provide an overview of current projects, tools, and techniques to safely work with sensitive data. Several researchers and support staff presented initiatives that all had a common goal: to make it easier for researchers to work with sensitive research data.  

Freek Dijkstra, Dorien Huijser and Annette Langedijk standing in the Boothzaal at the Utrecht University Library, smiling into the camera.
The event organizers: Freek Dijkstra (SURF), Dorien Huijser (UU) and Annette Langedijk (SURF)

Here you can find a summary of the presentations given at this event:

Data Privacy Project

Dorien Huijser, Utrecht University

The Data Privacy Project is an NWO-funded support initiative from Research Data Management Support at Utrecht University (UU) which is running until May 2023. The project is creating products that aim to generate and disseminate practical knowledge about privacy in research in the form of a Data Privacy Handbook, a survey among UU research staff, a self-paced e-learning “Privacy basics for researchers” and tools and tool overviews. The latter were the motivation for co-organizing this event! 

Tools-to-Data

Joris van Zundert, Huygens Institute

How does writing style influence readers, and what writing styles are appealing? In the Impact & Fiction project, Joris van Zundert and colleagues from the Huygens Institute wanted to analyze fiction literature on a large scale to answer these questions. The problem: book reviews are publicly available, but the (e-)books that they are written about are not, because of intellectual property limitations. In the Tools-to-Data project, a proof of concept was developed that allowed the project partners to analyze 20.000 full-text e-books from the National Library (Koninklijke Bibliotheek), without having access to the e-books themselves, using the Data Exchange for trusted data sharing offered by SURF. This proof of concept is now being further developed in the Secure Analysis Environment (SANE) project. 

Federated learning

Esther Bron, Erasmus Medical Center, Health-RI

Federated learning is a distributed technique, where the algorithm is brought to the local data, and the data user only receives the results of the analysis. In this talk, Esther Bron explained how the Netherlands Consortium of Dementia Cohorts applied this technique to Magnetic Resonance Images (MRI) to predict someone’s “Brain Age”. To do so, the project partners had to prepare not only the technical infrastructure (using vantage6), but also legal measures (Data Protection Impact Assessment, agreements), a common data structure (“OMOP”), and a working script that could be run on the data. The lesson Esther shared: federated learning takes effort to set up, but can prove very valuable when succeeding. 

Synthetic data

Erik-Jan van Kesteren, Utrecht University, ODISSEI SODA Team

Synthetic data is often presented as the solution for sharing data that is too sensitive to share in raw form. It is data that is generated from a model, which in turn is often based on real data. The more detailed the model is that created the synthetic data, the more the synthetic data will resemble the real data and the more useful it will be, but also the more information about the people behind the data is leaked. This makes synthetic data not always a suitable replacement for sharing sensitive data. Erik-Jan presented some nice alternative ways that synthetic data can be used, including a tool (MetaSynth) that can help you. 

Digital data donation

Laura Boeschoten, Utrecht University

Your personal public transport card, social media posts, Google Maps: you leave a digital trace everywhere*. These data are often very interesting for research, because they describe human behavior in an objective way, instead of having to rely on self-reported questionnaires. Digital data donation provides a platform where people can request their personal data with the relevant provider(s), analyze their data locally, and only consent to sharing the results of that analysis that they want with the researcher. This is done using an open source software package called PORT, but always has to be accompanied by legal, ethical and practical considerations: which data do you want, how will you analyze the data, and what will you tell participants? 

Overview of privacy-preserving techniques

Freek Dijkstra, SURF

There are many strategies and tools available to deal with sensitive data, such as organizational measures (agreements, access control) and technical measures (pseudonymization, code-to-data approaches, cryptography). All of them have their specific situation where they can be useful, and in many of these SURF can be of assistance, for example in managing access (SRAM), safe storage and archiving (Secure data archive, Research cloud), or secure data processing (Spider, ODISSEI Secure Supercomputer). Freek Dijkstra briefly talked about the possibilities and considerations around using these solutions. 

You can find all the slides that were presented here: https://doi.org/10.5281/zenodo.7778158

*Visitors could experience just how much digital trace data they left online in Alejandra Gómez Ortega’s “Data slip” machine. Read more about the machine and Alejandra’s research here

Laura Boeschoten holding her "data slip", a long piece of receipt-looking paper that contains all the data that she has left in different places. On the right is Alejandra Gómez Ortega smiling and holding the end of the receipt. Behind them is the pink "data slip machine" that spit out Laura's data slip.
Laura Boeschoten and Alejandra Gómez Ortega collecting a list of personal digital traces from the dataslip machine.

Auteur

Reacties

Dit artikel heeft 0 reacties

Gerelateerde artikelen