Skip to main content

Contributing Data

Become a Data Provider

The Helmholtz Knowledge Graph (HKG) aggregates metadata from diverse Helmholtz data repositories, libraries, publication systems, and research information systems into a coherent, linked, and queryable representation.

To further increase coverage across Helmholtz, we continuously seek to integrate metadata from additional data providers. Our focus is on publicly available metadata from Helmholtz hosted data and information structures describing:

  • data and datasets
  • scientific instruments and facilities
  • software and source code
  • scientific publications and documents
  • other entities supporting research and infrastructure operations

Integrating your metadata will increase the visibility of your infrastrcuture as well as the interoperabiltiy of your metadata with that of others. This imporoves the coherene and interoperabiliy of the Helmholtz digital ecosystem and the digital assets within it.

How to Provide Data

The Helmholtz KG supports multiple ingestion methods based on widely used standards and interfaces. Depending on your infrastructure, metadata can be integrated via:

We aim to rely on established and reusable patterns rather than custom integrations wherever possible. See the detailed documentation on our ingestion methods and the semantics used within the Helmholtz KG infrastructure.

Check Existing Data Providers

Before initiating a new integration, we recommend reviewing the list of current data providers too the HelmholtzKG. This gives you and overview of currently harvested and represented sources as well as their integration patterns.

👉 [List of current Data Sources](docs/DataProv/Data Sources.mdx)

Getting in Touch

If you are interested in integrating your data, please contact the Helmholtz KG team. The following options are available:

To help us assess and plan the integration, please provide:

  • a short description of your data source
  • the types of entities covered (e.g. datasets, software, instruments)
  • information about how the metadata can be accessed (API, OAI-PMH endpoint, website, etc.)
  • the metadata schema or structure used

If your metadata already follows Schema.org, it can typically be integrated with minimal effort. If not, you may propose a mapping from your schema to the Helmholtz KG data model. The HKG team will support the formalization of this mapping using SSSOM (Simple Standard for Sharing Ontological Mappings) within our infrastructure upon which a designated harvesting pipline can be established.

Collaborative Integration Process

Data providers can remain closely involved throughout the integration process, if desired. We offer regular exchange points, feedback loops, and review opportunities to ensure that the harvesting, mapping, and representation of your data align with expectations. The process is collaborative by design and allows for adjustments as the integration evolves.

The onboarding of a new data provider typically consists of five phases:

  1. Initial Information Exchange
    After you contact us, we collaboratively collect the information required for integration, usually within an issue in our harvesting repository. This includes technical details such as endpoints, web locations, preferred harvesting methods, rate limits, metadata formats, and semantic structure, as well as operational information such as persistence evaluation and contact persons.
  2. Harvesting Setup
    Based on the provided information, we configure and test harvesting routines within the Helmholtz KG infrastructure. Harvested records are first ingested into the Storage API for validation and inspection.
  3. Semantic Alignment
    The harvested metadata is mapped to the HKG data model. We aim for semantic equivalence mappings so that comparable entities are represented consistently across the graph. These mappings are formalized as SSSOM files and can either be provided by the data provider or developed collaboratively with the HKG team. The resulting mappings are then used for schema validation against the internal data model.
  4. Isolated Data Review
    A dedicated subgraph containing only the harvested data from your infrastructure is generated for review. This stage allows mappings, metadata quality, and semantic representation to be evaluated before integration into the broader graph.
  5. Integrated Data Review
    Finally, the data is integrated into the development graph environment and reviewed in relation to existing graph content. This step helps identify inconsistencies, duplication, or semantic conflicts across sources and may result in additional refinements to mappings or source metadata.

After successful validation of the complete pipeline, the infrastructure is added as an official data source within the Helmholtz KG.