Technical Overview of Cloudify

doi:10.35095/WDCC/Overview_Cloudify

Wachsmann, Fabian et al.

Grey LiteratureDOI
Summary
We present Cloudify, a modular application designed to serve Earth System Model (ESM) simulation output as cloud-optimized-like datasets from a range of storage and format backends. Based on an adopted version of Xpublish and extended via its plugin architecture, Cloudify exposes simulation data through RESTful Zarr endpoints using FastAPI, effectively emulating the behavior of truly cloud-native datasets. By abstracting heterogeneous file formats and storage systems into a unified interface, Cloudify enables efficient, scalable and convenient data access. This includes fast random access to chunks of complete datasets without relying on additional software like file-based catalogs, making the data well-suited for AI and machine learning workflows independent of source formats.

By introducing a light-weight Kerchunk Plugin, designed to stream raw data as it is, Cloudify enhances Xpublish´s data provision and simplifies access with reduced dependencies on server-side resource and client-side software. At the same time, Cloudify enables asynchronous access to Dask-backed Xarray datasets, laying the foundation for Data-as-a-Service workflows. It facilitates server-side computation, making hosted data fit-for-use, even for clients with limited compute or storage capabilities. With its dynamic plugins enabled, Cloudify enables runtime changes to datasets, supporting online data streaming and diagnostics registration.

A plugin for STAC (SpatioTemporal Asset Catalog) catalog endpoints enhances the FAIRness (Findability, Accessibility, Interoperability, and Reusability) of ESM output hosted through Xpublish, enabling seamless discovery and integration across infrastructures. Cloudify thus acts as a bridge between traditional High-Performance Computing (HPC) environments and modern, cloud-native access paradigms, offering a powerful approach to modernizing and optimizing ESM data services.
Project
Literature (Literature)
Use constraints
Creative Commons Attribution 4.0 International (https://creativecommons.org/licenses/by/4.0/)
Data Catalog
World Data Center for Climate
Access constraints
registered users
Size
684.65 KiB (701083 Byte)
Format
pdf
Status
completely archived
Creation Date
Future Review Date
2035-08-17
Download Permission
Yes
Cite as
Wachsmann, Fabian; Heil, Angelika; Wickramage, Chathurika; Polkova, Iuliia; Thiemann, Hannes; Modali, Kameswarrao; Lammert, Andrea; Peters-von Gehlen, Karsten; Kindermann, Stephan (2025). Technical Overview of Cloudify - An Improved Emulator of Cloud-Optimized Earth System Model Output. World Data Center for Climate (WDCC) at DKRZ. https://doi.org/10.35095/WDCC/Overview_Cloudify

BibTeX RIS
Description
Summary:
Findable: 6 of 7 level;
Accessible: 6 of 7 level;
Interoperable: 4 of 6 level;
Reusable: 6 of 6 level
Method
F-UJI WDCC service v3.5.0 metrics_v0.8
Method Description
Checks performed by WDCC. Metrics documentation: https://doi.org/10.5281/zenodo.15045911 Metric Version: metrics_v0.8
Method Url
Result Date
2025-08-21
Contact typePersonORCIDOrganization

Parent

Technical Reports
Details

Parent project(s)

Literature

[Entry acronym: Overview_Cloudify] [Entry id: 5311377]