The Barcelona Declaration: Rise of the data engineer

Author: Nick Veenstra (copied from LinkedIn), Research Information Specialist at Universtity of Groningen 


    My career as a developer started as junior programmer on a Library Information System, and over the years I landed on a combination of developer and research information specialist. Now that I am working on a BI project involving research information, I evolved to Data Engineer (possibly even granting myself the title of Cloud Data Engineer since we're working in Azure mainly). As I find myself trying to make sense of research related metadata on a daily basis, there's two main things I've learned:

    • Research information integration on campus is hard
    • We rely too much on software vendors (i.e. publishers) to do the integration for us

    And that's where I am hoping the Barcelona declaration will come in handy. Even though my LinkedIn timeline is full of posts stating "we signed the Barcelona declaration and now we're going to think about implementing it!", I am hoping that more people will start to realise that if you want to improve research information openness, we will have to improve our own integration skills. Allow me to illustrate:

    Current and ideal situation of research information on campus
    Current (left) and ideal (right) situation of research information on campus

    On the left side of the illustration is a typical example of how research information gathering works currently. The CRIS system is in the center of the entire process, with a messy integration of all kinds of campus systems. This makes us vulnerable on two sides: the CRIS vendor decides what's coming in from external sources, and the lacking local integration is insufficient to provide meaningful insights (e.g. an improper integration with AFAS causes issues with local affiliations).

    The right side shows my ideal situation of handling research information on campus. The center of the system is comprised of a middleware solution that makes sense of all incoming data (be it internal or external), and a Datawarehouse (DWH) is used to gather all the data we need. Since we're already collecting it ourselves, we can do all the cleanup and connect to any external source we want, since we are not reliant on the CRIS system. I can get metadata from OpenAlex and ROR with amazing open identifiers, and thanks to (some) major funders opening their databases using an API, we can collect grants directly at the source. The same goes for datasets. No need to buy an expensive CRIS module to do that for us.

    We can then push all this neatly cleaned data we have in our DWH into the CRIS system, leveraging the workload for researchers. The role of the CRIS system will then change from the primary administration system to a system needed to collect the remaining information we can't get from external sources (e.g. activities, equipment, project information or local affiliations below university level). Who knows, a simple storage and input solution might even suffice as a CRIS system in the future thanks to the DWH.

    This ideal scenario requires understanding of two things:

    • Open is not a synonym for free. Providers of open information also need funding to run their hardware.
    • We require a lot more data engineers.

    Conclusion: we need more data engineers within the IM department on Campus, working on improving integrations and open software. We also need to reserve funding to support the vendors of open information such as ROR and OpenAlex. Together with some of my data engineering peers at other universities we hope to start a few initiatives in the near future to kickstart this evolution.

    I am hoping that the Barcelona Declaration will be an extra push in this direction.




    Dit artikel heeft 0 reacties