Storage API

The Storage API provides access to raw metadata records harvested trough the Helmholtz KG pipelines. It forms a storage layer from which harvested data is re-usable within our internal infrastructure - for example for further processing and injection into the graph, or for provision of provenance information. This mechanism ensures every external resource to be harvested only once for each record.

External users may access and retriefe records from the storage layer with read access, e.g. to retrieve RAW metadata records. Write operations (creating or updating data) are restricted to internal processes.

What is stored?

The API stores raw metadata, referred to as records. A record is a single metadata item returned by such a harvester (e.g. all metadata about one specific event from indico).

Since metadata returned from different harvesters is endpoint specific - i.e. it may vary in format and seantics, the records stored within the Storage API is not standardized. Therefore records are stored as content within the data structure described below:

Data Format

All responses to the Data Storage API return JSON.

Example response

[
  {
    "harvester_type": "indico",
    "id": "abc123",
    "PID": 1
    "source": "https://example.com/event/1",
    "harvested_date": "2025-01-01T12:00:00",
    "content": {
      "...": "raw metadata from the source"
    }
  }
]

Field	Description
harvester_type	Name of the harvester that collected the data (e.g. `indico`).
id	Internal identifier of the entity.
PID	Incremental number used for pagination.
source	The original URL from which the data was harvested.
harvested_date	Timestamp of when the data was collected (Python `datetime` format).
content	Raw metadata returned by the harvester. ‼️ This field is not standardized and depends on the source. ‼️

Endpoint usage

Get a list of records

To retrieve a list of stored records, use:

GET https://data.unhide.helmholtz-metadaten.de/api/v0/raw/entities

Query parameters

Parameter	Description
`offset`	Pagination offset (default: `0`)
`limit`	Number of results (default: `10`, max: `100000`)
`harvester_type`	Filter by harvester (e.g. `indico`)
`source`	Filter by exact source URL

Notes:

Filters are applied independently (no complex query logic).
If fewer results are available than requested, only available results are returned.

Get a single entity by ID

To retrieve one specific entity, use:

GET https://data.unhide.helmholtz-metadaten.de/api/v0/raw/{entity_id}

Returns

single JSON object.
404 Not Found – if the entity does not exist

Get multiple records by IDs

To retrieve multiple records in a single request, use:

POST https://data.unhide.helmholtz-metadaten.de/api/v0/raw/by_ids

with Request body

["id1", "id2"]

Returns a list of records in the same format as the list endpoint.

Further Information

Error Handling

The API uses standard HTTP status codes:

404 Not Found – entity does not exist
422 Unprocessable Entity – invalid request parameters

Authentication & Access

Read access (GET endpoints): publicly available
Write access (create/update): restricted and requires authentication

Please see Developer Documentation for further details about authenticated access.

Environment

Current base URL:

https://data.unhide.helmholtz-metadaten.de

Additional Resources

Swagger / OpenAPI documentation: https://data.unhide.helmholtz-metadaten.de/docs

Summary

The Storage API is intended as a central access point for raw harvested metadata. It allows users to:

Explore available data collected from external sources
Retrieve specific metadata entries
Reuse existing harvested data instead of collecting it again

Because the data is stored in its original form, consumers should be prepared to handle varying data structures depending on the source.

What is stored?​

Data Format​

Example response​

Endpoint usage​

Get a list of records​

Query parameters​

Get a single entity by ID​

Get multiple records by IDs​

Further Information​

Error Handling​

Authentication & Access​

Environment​

Additional Resources​

Summary​