This dataset provides a collection of atmospheric and oceanic variables extracted from ERA5, with a daily temporal resolution and a spatial resolution of 1.0°. The domain covers the North Atlantic region, including the entire tropical belt and part of the mid-latitudes, spanning from 51°N to 0°N and from 99°W to 2°E. The spatial domain was also designed to meet the architectural requirements of the autoencoder used in downstream applications. Specifically, the horizontal dimension (longitude) is divisible by 6 and the vertical dimension (latitude) by 4, to ensure compatibility with convolutional and pooling operations. As a result, the grid of each field has dimensions 52×102 (latitude × longitude).
The dataset spans the period from January 1, 1975, to December 31, 2022, and includes both original ERA5 variables and derived variables: absolute vorticity at 850 hPa, mean sea level pressure, relative humidity at 700 hPa, sea surface temperature, total precipitation, meridional and zonal wind components at 10 meters, vertical wind shear between 850 and 250 hPa and geopotential height at 500 hPa. This dataset has been used to develop a deep learning-based analogue method. In particular, it serves as training data for an autoencoder, which compresses the selected input variables into latent vectors. These latent vectors are subsequently used to perform the analogue search.