Storage API
The Storage API provides access to raw metadata records harvested trough the Helmholtz KG pipelines. It forms a storage layer from which harvested data is re-usable within our internal infrastructure - for example for further processing and injection into the graph, or for provision of provenance information. This mechanism ensures every external resource to be harvested only once for each record.
External users may access and retriefe records from the storage layer with read access, e.g. to retrieve RAW metadata records. Write operations (creating or updating data) are restricted to internal processes.
What is stored?
The API stores raw metadata, referred to as records. A record is a single metadata item returned by such a harvester (e.g. all metadata about one specific event from indico).
Since metadata returned from different harvesters is endpoint specific - i.e. it may vary in format and seantics, the records stored within the Storage API is not standardized. Therefore records are stored as content within the data structure described below:
Data Format
All responses to the Data Storage API return JSON.
Example response
[
{
"harvester_type": "indico",
"id": "abc123",
"PID": 1
"source": "https://example.com/event/1",
"harvested_date": "2025-01-01T12:00:00",
"content": {
"...": "raw metadata from the source"
}
}
]
| Field | Description |
|---|---|
| harvester_type | Name of the harvester that collected the data (e.g. indico). |
| id | Internal identifier of the entity. |
| PID | Incremental number used for pagination. |
| source | The original URL from which the data was harvested. |
| harvested_date | Timestamp of when the data was collected (Python datetime format). |
| content | Raw metadata returned by the harvester. ‼️ This field is not standardized and depends on the source. ‼️ |
Endpoint usage
Get a list of records
To retrieve a list of stored records, use:
GET https://data.unhide.helmholtz-metadaten.de/api/v0/raw/entities
Query parameters
| Parameter | Description |
|---|---|
offset | Pagination offset (default: 0) |
limit | Number of results (default: 10, max: 100000) |
harvester_type | Filter by harvester (e.g. indico) |
source | Filter by exact source URL |
Notes:
- Filters are applied independently (no complex query logic).
- If fewer results are available than requested, only available results are returned.
Get a single entity by ID
To retrieve one specific entity, use:
GET https://data.unhide.helmholtz-metadaten.de/api/v0/raw/{entity_id}
Returns
- single JSON object.
404 Not Found– if the entity does not exist
Get multiple records by IDs
To retrieve multiple records in a single request, use:
POST https://data.unhide.helmholtz-metadaten.de/api/v0/raw/by_ids
with Request body
["id1", "id2"]
Returns a list of records in the same format as the list endpoint.
Further Information
Error Handling
The API uses standard HTTP status codes:
404 Not Found– entity does not exist422 Unprocessable Entity– invalid request parameters
Authentication & Access
- Read access (GET endpoints): publicly available
- Write access (create/update): restricted and requires authentication
Please see Developer Documentation for further details about authenticated access.
Environment
Current base URL:
https://data.unhide.helmholtz-metadaten.de
Additional Resources
- Swagger / OpenAPI documentation: https://data.unhide.helmholtz-metadaten.de/docs
Summary
The Storage API is intended as a central access point for raw harvested metadata. It allows users to:
- Explore available data collected from external sources
- Retrieve specific metadata entries
- Reuse existing harvested data instead of collecting it again
Because the data is stored in its original form, consumers should be prepared to handle varying data structures depending on the source.
