Skip to main content

Storage API

The Storage API provides access to raw metadata records harvested trough the Helmholtz KG pipelines. It forms a storage layer from which harvested data is re-usable within our internal infrastructure - for example for further processing and injection into the graph, or for provision of provenance information. This mechanism ensures every external resource to be harvested only once for each record.

External users may access and retriefe records from the storage layer with read access, e.g. to retrieve RAW metadata records. Write operations (creating or updating data) are restricted to internal processes.

What is stored?

The API stores raw metadata, referred to as records. A record is a single metadata item returned by such a harvester (e.g. all metadata about one specific event from indico).

Since metadata returned from different harvesters is endpoint specific - i.e. it may vary in format and seantics, the records stored within the Storage API is not standardized. Therefore records are stored as content within the data structure described below:

Data Format

All responses to the Data Storage API return JSON.

Example response

[
{
"harvester_type": "indico",
"id": "abc123",
"PID": 1
"source": "https://example.com/event/1",
"harvested_date": "2025-01-01T12:00:00",
"content": {
"...": "raw metadata from the source"
}
}
]
FieldDescription
harvester_typeName of the harvester that collected the data (e.g. indico).
idInternal identifier of the entity.
PIDIncremental number used for pagination.
sourceThe original URL from which the data was harvested.
harvested_dateTimestamp of when the data was collected (Python datetime format).
contentRaw metadata returned by the harvester.
‼️ This field is not standardized and depends on the source. ‼️

Endpoint usage

Get a list of records

To retrieve a list of stored records, use:

GET https://data.unhide.helmholtz-metadaten.de/api/v0/raw/entities

Query parameters

ParameterDescription
offsetPagination offset (default: 0)
limitNumber of results (default: 10, max: 100000)
harvester_typeFilter by harvester (e.g. indico)
sourceFilter by exact source URL

Notes:

  • Filters are applied independently (no complex query logic).
  • If fewer results are available than requested, only available results are returned.

Get a single entity by ID

To retrieve one specific entity, use:

GET https://data.unhide.helmholtz-metadaten.de/api/v0/raw/{entity_id}

Returns

  • single JSON object.
  • 404 Not Found – if the entity does not exist

Get multiple records by IDs

To retrieve multiple records in a single request, use:

POST https://data.unhide.helmholtz-metadaten.de/api/v0/raw/by_ids

with Request body

["id1", "id2"]

Returns a list of records in the same format as the list endpoint.

Further Information

Error Handling

The API uses standard HTTP status codes:

  • 404 Not Found – entity does not exist
  • 422 Unprocessable Entity – invalid request parameters

Authentication & Access

  • Read access (GET endpoints): publicly available
  • Write access (create/update): restricted and requires authentication

Please see Developer Documentation for further details about authenticated access.

Environment

Current base URL:

https://data.unhide.helmholtz-metadaten.de

Additional Resources

Summary

The Storage API is intended as a central access point for raw harvested metadata. It allows users to:

  • Explore available data collected from external sources
  • Retrieve specific metadata entries
  • Reuse existing harvested data instead of collecting it again

Because the data is stored in its original form, consumers should be prepared to handle varying data structures depending on the source.