Here you can find a summary of the presentations given at this event:
Data Privacy Project
Dorien Huijser, Utrecht University
The Data Privacy Project is an NWO-funded support initiative from Research Data Management Support at Utrecht University (UU) which is running until May 2023. The project is creating products that aim to generate and disseminate practical knowledge about privacy in research in the form of a Data Privacy Handbook, a survey among UU research staff, a self-paced e-learning “Privacy basics for researchers” and tools and tool overviews. The latter were the motivation for co-organizing this event!
Joris van Zundert, Huygens Institute
How does writing style influence readers, and what writing styles are appealing? In the Impact & Fiction project, Joris van Zundert and colleagues from the Huygens Institute wanted to analyze fiction literature on a large scale to answer these questions. The problem: book reviews are publicly available, but the (e-)books that they are written about are not, because of intellectual property limitations. In the Tools-to-Data project, a proof of concept was developed that allowed the project partners to analyze 20.000 full-text e-books from the National Library (Koninklijke Bibliotheek), without having access to the e-books themselves, using the Data Exchange for trusted data sharing offered by SURF. This proof of concept is now being further developed in the Secure Analysis Environment (SANE) project.
Esther Bron, Erasmus Medical Center, Health-RI
Federated learning is a distributed technique, where the algorithm is brought to the local data, and the data user only receives the results of the analysis. In this talk, Esther Bron explained how the Netherlands Consortium of Dementia Cohorts applied this technique to Magnetic Resonance Images (MRI) to predict someone’s “Brain Age”. To do so, the project partners had to prepare not only the technical infrastructure (using vantage6), but also legal measures (Data Protection Impact Assessment, agreements), a common data structure (“OMOP”), and a working script that could be run on the data. The lesson Esther shared: federated learning takes effort to set up, but can prove very valuable when succeeding.
Erik-Jan van Kesteren, Utrecht University, ODISSEI SODA Team
Synthetic data is often presented as the solution for sharing data that is too sensitive to share in raw form. It is data that is generated from a model, which in turn is often based on real data. The more detailed the model is that created the synthetic data, the more the synthetic data will resemble the real data and the more useful it will be, but also the more information about the people behind the data is leaked. This makes synthetic data not always a suitable replacement for sharing sensitive data. Erik-Jan presented some nice alternative ways that synthetic data can be used, including a tool (MetaSynth) that can help you.
Digital data donation
Laura Boeschoten, Utrecht University
Your personal public transport card, social media posts, Google Maps: you leave a digital trace everywhere*. These data are often very interesting for research, because they describe human behavior in an objective way, instead of having to rely on self-reported questionnaires. Digital data donation provides a platform where people can request their personal data with the relevant provider(s), analyze their data locally, and only consent to sharing the results of that analysis that they want with the researcher. This is done using an open source software package called PORT, but always has to be accompanied by legal, ethical and practical considerations: which data do you want, how will you analyze the data, and what will you tell participants?
Overview of privacy-preserving techniques
Freek Dijkstra, SURF
There are many strategies and tools available to deal with sensitive data, such as organizational measures (agreements, access control) and technical measures (pseudonymization, code-to-data approaches, cryptography). All of them have their specific situation where they can be useful, and in many of these SURF can be of assistance, for example in managing access (SRAM), safe storage and archiving (Secure data archive, Research cloud), or secure data processing (Spider, ODISSEI Secure Supercomputer). Freek Dijkstra briefly talked about the possibilities and considerations around using these solutions.
You can find all the slides that were presented here: https://doi.org/10.5281/zenodo.7778158.
*Visitors could experience just how much digital trace data they left online in Alejandra Gómez Ortega’s “Data slip” machine. Read more about the machine and Alejandra’s research here.