当前位置: X-MOL 学术Earth Syst. Sci. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
IPB-MSA&SO4: a daily 0.25° resolution dataset of in situ-produced biogenic methanesulfonic acid and sulfate over the North Atlantic during 1998–2022 based on machine learning
Earth System Science Data ( IF 11.2 ) Pub Date : 2024-06-12 , DOI: 10.5194/essd-16-2717-2024
Karam Mansour , Stefano Decesari , Darius Ceburnis , Jurgita Ovadnevaite , Lynn M. Russell , Marco Paglione , Laurent Poulain , Shan Huang , Colin O'Dowd , Matteo Rinaldi

Abstract. Accurate long-term marine-derived biogenic sulfur aerosol concentrations at high spatial and temporal resolutions are critical for a wide range of studies, including climatology, trend analysis, and model evaluation; this information is also imperative for the accurate investigation of the contribution of marine-derived biogenic sulfur aerosol concentrations to the aerosol burden, for the elucidation of their radiative impacts, and to provide boundary conditions for regional models. By applying machine learning algorithms, we constructed the first publicly available daily gridded dataset of in situ-produced biogenic methanesulfonic acid (MSA) and non-sea-salt sulfate (nss-SO4=) concentrations covering the North Atlantic. The dataset is of high spatial resolution (0.25° × 0.25°) and spans 25 years (1998–2022), far exceeding what observations alone could achieve both spatially and temporally. The machine learning models were generated by combining in situ observations of sulfur aerosol data from Mace Head Atmospheric Research Station, located on the west coast of Ireland, and from the North Atlantic Aerosols and Marine Ecosystems Study (NAAMES) cruises in the northwestern Atlantic with the constructed sea-to-air dimethylsulfide flux (FDMS) and ECMWF ERA5 reanalysis datasets. To determine the optimal method for regression, we employed five machine learning model types: support vector machines, decision tree, regression ensemble, Gaussian process regression, and artificial neural networks. A comparison of the mean absolute error (MAE), root-mean-square error (RMSE), and coefficient of determination (R2) revealed that Gaussian process regression (GPR) was the most effective algorithm, outperforming the other models with respect to simulating the biogenic MSA and nss-SO4= concentrations. For predicting daily MSA (nss-SO4=), GPR displayed the highest R2 value of 0.86 (0.72) and the lowest MAE of 0.014 (0.10) µg m−3. GPR partial dependence analysis suggests that the relationships between predictors and MSA and nss-SO4= concentrations are complex rather than linear. Using the GPR algorithm, we produced a high-resolution daily dataset of in situ-produced biogenic MSA and nss-SO4= sea-level concentrations over the North Atlantic, which we named “In-situ Produced Biogenic Methanesulfonic Acid and Sulfate over the North Atlantic” (IPB-MSA&SO4). The obtained IPB-MSA&SO4 data allowed us to analyze the spatiotemporal patterns of MSA and nss-SO4= as well as the ratio between them (MSA:nss-SO4=). A comparison with the existing Copernicus Atmosphere Monitoring Service ECMWF Atmospheric Composition Reanalysis 4 (CAMS-EAC4) reanalysis suggested that our high-resolution dataset reproduces the spatial and temporal patterns of the biogenic sulfur aerosol concentration with high accuracy and has high consistency with independent measurements in the Atlantic Ocean. IPB-MSA&SO4 is publicly available at https://doi.org/10.17632/j8bzd5dvpx.1 (Mansour et al., 2023b).

中文翻译:


IPB-MSA&SO4:基于机器学习的 1998 年至 2022 年北大西洋原位产生的生物甲磺酸和硫酸盐的每日 0.25° 分辨率数据集



摘要。在高空间和时间分辨率下准确的长期海洋生物硫气溶胶浓度对于气候学、趋势分析和模型评估等广泛的研究至关重要;这一信息对于准确调查海洋生物硫气溶胶浓度对气溶胶负荷的影响、阐明其辐射影响以及为区域模型提供边界条件也至关重要。通过应用机器学习算法,我们构建了第一个公开的每日网格数据集,涵盖覆盖北大西洋的原位产生的生物甲磺酸 (MSA) 和非海盐硫酸盐 (nss-SO4=) 浓度。该数据集具有高空间分辨率(0.25° × 0.25°),跨度25年(1998-2022),远远超出了单独观测在空间和时间上所能达到的水平。机器学习模型是通过结合来自位于爱尔兰西海岸的梅斯海德大气研究站、西北大西洋的北大西洋气溶胶和海洋生态系统研究 (NAAMES) 巡航的硫气溶胶数据的现场观测而生成的。构建了海空二甲硫通量 (FDMS) 和 ECMWF ERA5 再分析数据集。为了确定回归的最佳方法,我们采用了五种机器学习模型类型:支持向量机、决策树、回归集成、高斯过程回归和人工神经网络。 平均绝对误差 (MAE)、均方根误差 (RMSE) 和确定系数 (R2) 的比较表明,高斯过程回归 (GPR) 是最有效的算法,在模拟方面优于其他模型生物源 MSA 和 nss-SO4= 浓度。对于预测每日 MSA (nss-SO4=),GPR 显示最高 R2 值 0.86 (0.72) 和最低 MAE 0.014 (0.10) μg m−3。 GPR 部分依赖性分析表明预测因子与 MSA 和 nss-SO4= 浓度之间的关系是复杂的而不是线性的。使用 GPR 算法,我们生成了北大西洋原位产生的生物 MSA 和 nss-SO4= 海平面浓度的高分辨率每日数据集,我们将其命名为“北大西洋原位产生的生物甲磺酸和硫酸盐”大西洋”(IPB-MSA&SO4)。获得的 IPB-MSA&SO4 数据使我们能够分析 MSA 和 nss-SO4= 的时空模式以及它们之间的比率 (MSA:nss-SO4=)。与现有哥白尼大气监测服务ECMWF大气成分再分析4(CAMS-EAC4)再分析的比较表明,我们的高分辨率数据集高精度地再现了生物硫气溶胶浓度的空间和时间模式,并且与独立测量具有高度一致性。大西洋。 IPB-MSA&SO4 可在 https://doi.org/10.17632/j8bzd5dvpx.1 上公开获取(Mansour 等人,2023b)。
更新日期:2024-06-12
down
wechat
bug