Research Data Management
An overview of the ct.qmat Research Data Management online services is available here
Exploiting ct.qmat’s Wealth of Digital Data to Advance Global Research
Scientists from the ct.qmat Cluster of Excellence generate vast amounts of research data, including numerical data files, plots, pictures, protocols, and software. This research data is a precious asset. If managed efficiently in an open system, this data can be made permanently available to the scientific community. We are therefore working with the University of Würzburg’s Information Technology Centre (RZUW) to create a research data management system with data meeting the fundamental principles of findability, accessibility, interoperability, and reusability (FAIR).
Remote collaboration
In a nutshell, we intend to bring the idea of “collaborative data management” to life. This means setting up an integrated platform that enables ct.qmat researchers to share research data with each other, to store, cite, and analyze it, and also to use it to make new discoveries. Another aim is the user-friendly incorporation of different approaches so that hardware- and vendor-independent accessibility can be ensured for decades.
Harnessing resources
To do so, we are drawing on existing resources such as open-source solutions – software that follows a transparent development model that’s accessible to anyone, meaning it can also be adapted and further developed. In addition, we are combining established web services and the latest storage technology within a system operated at RZUW. All the web services used run in the background so that researchers can be offered a convenient cloud-like platform for data-driven processes (a modern “data mesh”) from a single source.
Modern infrastructure
To build the infrastructure, we’re following in the footsteps of Amazon’s, Google’s and Microsoft’s data centers by combining Kubernetes with an object store.
Kubernetes open-source software and the Kubernetes cluster are used to connect, manage and control services running on servers in “containers.” The integrity of resources is also monitored. As well as being vendor- and hardware-independent, Kubernetes is also flexibly extensible.
The Ceph object store has a capacity of 1.5 petabytes, which is easily large enough. The entire processing concept is supported by an advanced object storage system based on the standard access protocol Amazon Simple Storage Service. The benefits include secure redundant storage, fast access using HTTP and HTTPS (the Hypertext Transfer Protocol – the basis of data communication on the World Wide Web), and high scalability.
Established web services
The web services we use build on this infrastructure and include the following.
The cloud service JupyterHub and the associated project Binderhub (both also open-source) are used to reproducibly store entire research environments, including the software used and interactive elements. The storage location can be shared with other researchers and employees via a link. The computer environment provided in the cloud can then be implemented in a browser. Collaboration is simplified by the fact that no software needs to be installed.
GitLab, a special tool that provides ct.qmat’s numerical teams with a platform for collaborative software development, has already long been in use.
The NOMAD (Novel Material Discovery) Laboratory is an open-source data repository for the structured archiving and publication of data from the materials sciences. Unique, permanent addresses known as DOIs (digital object identifiers) are assigned to published data to aid citation and sharing.
Whenever data is added, it can be automatically catalogued and structured by dynamically extending NOMAD with parsers. A parser is a program that converts our data into a format that can be processed in NOMAD. ct.qmat is working closely with the developers of NOMAD – the National Research Data Infrastructure’s FAIRmat consortium (FAIR Data Infrastructure for Condensed-Matter Physics and the Chemical Physics of Solids) – so that new parsers can be developed for our programs and systems.
eLabFTW is an open-source electronic lab notebook that can also be used to store and timestamp measurements.
To ensure the modern, practicable management of data as well as to enable it to be used by the international scientific community, various software tools have been merged into a single unit at RZUW. Following the initial pilot installation in Würzburg, implementation – with certain adaptations – is also planned at ct.qmat’s branch in Dresden.
The hardware used is part of the University of Würzburg’s Julia high-performance computing cluster and is available to all members of ct.qmat.
Current status and next steps
The above-described infrastructure is currently up and running on systems in Würzburg. All services should be available worldwide by 2023. The result will be an easily accessible, smart research data management platform which is unique in Germany. Equipped with efficient search tools, it will allow scientists to collaborate remotely on ct.qmat’s research data.
Any questions about ct.qmat’s research data management should be addressed to Jonas Schwab and Florian Goth.