FAIR data: 'too little, too late', unless …

When people say "make data FAIR" I wonder whether they mean doing extra paperwork after the fact, or changing how data are created so they’re FAIR from the start — #bornFAIR. Ideally, adding metadata should feel like a tiny, natural part of researchers’ normal workflows and should give them immediate benefits. We should be thinking about FAIR across the whole organization, not just at the end, making it much easier to work with and implement. I have a number of potential ideas that can make the FAIR process better for researchers and research-support and I would like to share them here and invite anyone to comment or reach out to see if we can collaborate.

I say researchers (plural) and workflows (plural), because science is a multi-disciplinary team undertaking, even if you look only at one research group and its support. Strangely enough it sometimes seems as if the complex information system that is a human/researcher, is limited to a brain at a screen overloading colleagues with messages. This makes it hard to go through all phases of work up to deep work, similar to how a few 'well-timed' short interruptions of your sleep can actually make you very tired.

Wasn't a lot of the promise of FAIR data to 'automate away' much of the 'hassle' frustrating our VALUABLE EXPERTISE FOCUS?

From the point of view of someone who describes his role as working on the abstraction-rich human cooperation spectrum ranging from value, meaning, standardization and notation at one side and interaction design at the other, with information expertise (often involving technology) in between to connect it all … I have some thoughts and suggestions to seriously try (and can use a hand at).

tower of Babel
source: Wikimedia Commons

Factors in misunderstanding

We will never all speak exactly the same language and jargon while meaning the same, but there's plenty of room for improving the overlap, also in our data and digital tooling. And I claim that to improve research also the non-research-specific support departments of the wider organisations should change their habits towards FAIR.

Legacy organisations

Current hierarchies and what is being measured within them is still surprisingly much based on what the US brought in after World War 2, when information was mostly a logistics of paper and there were few highly educated 'knowledge workers', let alone those with the internet in their back pocket. How much of that still makes sense to disturb workers for, while on their way to deep work? Any organisational layers, meetings, or form-like administration that costs more time and frustration than it helps? Do Total-Cost-of-Ownership calculations include the 'end'-users' time and take into account knowledge about the high cost of task switching?  At the same time lowering complexity (also by law makers) and raising transparency of data models can help consistency.
More than a few employees are amazed how much time they spend not doing their primary work, because of a list of software nobody seems to like, let alone understand how they relate. Widely known for exposing the best disguised bullshit in technology and education, professor Felienne Hermans wrote on this bureaucracy. Others recognize some of CIA's manual of small sabotage in management policy :-)
But seriously, some suggestions for change:

  1. Less paper(simulation)-centric. Are you really using your Office suite fundamentally different than physical paper? Are you aware hypertext and related web technologies allow for a separation of concerns over time and over colleagues? Is your organisation USING Microsoft Office, or IS it a Microsoft Office? Though NextCloud can help sovereignty, for a healthy digital ecosystem of diversity, and to stimulate different thinking modes and semantics, more changes and additions are needed. Starters could be: For working on shared documents there are also computational notebooks, for most of what you do in spreadsheets I highly suggest trying Orange Data Mining and for presentations there's also flipping through browser tabs as 'websheets' to save on copying and gain live interaction.
  2. Less supplier/contract-centricity, to make it more difficult to miss free, web- or browser-based or home-grown software (dependencies) for your so-called 'application landscape', a term that should be less central: It helps to not see data as the thing that makes migrations difficult, but as the meaning-standardized gold one allows people and tools associated with them, temporary access to. Also for being able to provide an alternative quicker when colleagues from security have good reason to suddenly start blocking something others depend on, or when software in use simply becomes too expensive. [beware: In big organizations quite a few people bring personal devices, and some even 'hack' their corporate devices, to work around limitations preventing them from doing their daily work].
  3. More explicit meaning, as opposed to 'paper' and applications often using more implicit meaning, that the next person or algorithm can't easily guess, let alone know for sure. The Data-Centric Manifesto, as a step towards FAIR, helps make meaning more explicit and standardized and allows organisations to make changes that also makes their people less often say "I can't find anything in our organisation, unless it's on the public internet". And do know that for certain (sub)domains your own colleagues are already using vocabularies, or creating them with experts elsewhere, often under a SURF umbrella; ; e.g. lexicon for education and Open Research API.
  4. Data quality through awareness, skills and responsibility:
    - Just as it's hard to imagine roles for which it's OK to have a typing speed of only 10 keystrokes a minute, there should also be some minimum for data quality skills, e.g. the base rules from this lesson on how (not) to use a spreadsheet (beta).
    - In the administration of (available) software in your organisation, also keep what formats it supports and how. What I call FileFAIRy: Include that information under "Open with …" and "Convert To …" in the context-menu of any file menu, in which to subtly nudge towards open and relatively future-proof file formats
    - Most importantly, meaning is given by an ever changing context, by relations of a diversity of people working together (so not the neo-liberal idea of standard cogs in a machine that turbo-boosted us into the 'efficient' polycrisis) and by relations between data points (not implied in a document or app, but explicit in the data itself, using vocabulaires already available or to be discussed and defined). They may call it Semantic Web, Linked Data, RDF(star), Enterprise Knowledge Graph, or graphRAG (if you think you need LLMs), or just part of the implementation of FAIR data.
    - Some of your employees are now using so-called Personal Knowledge Management (PKM) tools like Logseq, Notion, Obsidian, Roam, TiddlyWiki, at work. Which sounds to me like many are repeating the work that should have been done already organisation-wide. Though personally extending it, without copying, is fine (within proper, but not overreaching security measures of course). And as some rights management can be undoubled by moving it to the data layer, maybe some time can be freed to also offer 'schema-only', synthetic and Frictionless Data.
    - And as some only listen when mentioning direct money risks: several aspects of information are demanded to be open by law: Dutch "Open, tenzij (=unless)" has been mostly unknown or ignored for years, but government agencies are slowly picking up on enforcement, also on the questionable  tendering practices in Dutch higher education.
Usability shown as being central
source: ed-informatics.org

Interaction design:

  1. What I call CCpaste: For the R in FAIR and the uptake of using open materials, if only for making them less 'hassle' than the non-open: Can we implement a drag&drop that in the most common situations not only gets an openly licensed image into what we're creating, but fully takes care of the annotation in a way that is actionable for humans and algorithms?
  2. My SURF community article on DynamicLand

Fully #bornFAIR:

  1. What I call storyLOD: Say you write a little report in which you mention an ambiguous name and the editor webapp (that has some context-specific configuration) offers you a 'Did you mean?'-list of options that have machine-readable properties available in domain-specific databases (while NOT breaking your flow). This could of course help findability of your content, but also some interoperability for machine reasoning: An example could be that one is looking for a report of something 3D-printed with a material that doesn't melt under a certain temperature, though the person that wrote about it never mentioned the temperature. RDF(a) allows to make the connection anyway. The principle is not limited by text; If you write "3D printer" it could show photos of all the ones locally available. The general principle is not limited to certain domains and applications works better if there's some cross-domain Linked Data as well (similar to how OpenAlex is not only about bibliographics, but also about Wikidata subjects connecting to many domains).
  2. What I call swipaLOD: Many use a whiteboard as a thinking tool, team-wise or personally, often resulting in many words and arrows after which everybody is sent their way to go research some parts. However, saving the image usually means making a photograph, processing that just before the next meeting 3 weeks later and even some parts you didn't forget by then, in the actual meeting turn out to have meant rather different things to different people. Now what if …
    - One could draw an arrow by swiping over a screen, while dictating words simultaneously.
    - Based on a selection of domain databases and vocabularies, "Did you mean?"-suggestions are listed for text at any of the nodes and edges, without blocking interaction flow.
    - Choosing such a suggestion means the current canvas is connected to the graphs in these databases which allows for graph operations like showing a context of near relations, filtering, wayfinding, etc.
    - As what is depicted on your canvas is obviously Linked Data, selections can be used to enrich connected databases.
    - Many even small graphs are hard to comprehend without an explanation, which is why the narrated process of creating the 'drawing' can be video captured.

I hope you found some inspiration.
Build focus together?

Auteur

Reacties

Dit artikel heeft 0 reacties