OCEAN ICE’s ERDDAP querying: griddap#

This notebook will illustrate how to build queries and make requests to https://er1.s4oceanice.eu/erddap/index.html using Python.

For an interactive version of this page please visit the Google Colab:
Open in Google Colab
(To open link in new tab press Ctrl + click)

Alternatively this notebook can be opened with Binder by following the link: OCEAN ICE’S ERDDAP querying: griddap

Setup#

To begin we need to import the necessary libraries.

# !pip install requests pandas
# these packages should be installed with the command above if running the code locally

import requests
import pandas as pd
import io

Get a list of available datasets#

To check what griddap datasets are available in the ERDDAP and get their URLs the first step is to make a request to https://er1.s4oceanice.eu/erddap/tabledap/allDatasets.html using the URL that will allow us to get the datasets’ ids and their URLs based on the data structure. After receiving the data it will be loaded into a pandas DataFrame.

datasets_url = 'https://er1.s4oceanice.eu/erddap/tabledap/allDatasets.csv?datasetID%2Cgriddap'

# request and load into DataFrame
datasets_resp = requests.get(datasets_url)
datasets_df = pd.read_csv(io.StringIO(datasets_resp.text), sep=',')

# drop rows where tabledap is NaN
datasets_df = datasets_df.dropna(subset=['griddap'])

# add url column
datasets_df['url'] = datasets_df['griddap']
cleaned_df = datasets_df.drop(columns=['griddap'])

pd.set_option('display.max_colwidth', None)
cleaned_df = cleaned_df.reset_index(drop=True)
cleaned_df
datasetID url
0 INSITU_GLO_PHY_TS_OA_MY_013_052 https://er1.s4oceanice.eu/erddap/griddap/INSITU_GLO_PHY_TS_OA_MY_013_052
1 seanoe_slev_anomaly_geostrophic_currents https://er1.s4oceanice.eu/erddap/griddap/seanoe_slev_anomaly_geostrophic_currents
2 RSMC_seaice https://er1.s4oceanice.eu/erddap/griddap/RSMC_seaice
3 GLORYS12V1_sea_floor_potential_temp https://er1.s4oceanice.eu/erddap/griddap/GLORYS12V1_sea_floor_potential_temp
4 GLODAPv2_2016b_MappedClimatologies https://er1.s4oceanice.eu/erddap/griddap/GLODAPv2_2016b_MappedClimatologies
5 NOAA_OISST_v2 https://er1.s4oceanice.eu/erddap/griddap/NOAA_OISST_v2
6 SOCATv2024_tracks_gridded_monthly https://er1.s4oceanice.eu/erddap/griddap/SOCATv2024_tracks_gridded_monthly
7 EU_circumpolar_seaice_prod_fluxes_1992_2023 https://er1.s4oceanice.eu/erddap/griddap/EU_circumpolar_seaice_prod_fluxes_1992_2023
8 SSP585_FWF_1990_2300_ZwallyBasins https://er1.s4oceanice.eu/erddap/griddap/SSP585_FWF_1990_2300_ZwallyBasins
9 SSP126_FWF_1990_2300_ZwallyBasins https://er1.s4oceanice.eu/erddap/griddap/SSP126_FWF_1990_2300_ZwallyBasins
10 SSP585_FWF_1990_2300_OceanSectors https://er1.s4oceanice.eu/erddap/griddap/SSP585_FWF_1990_2300_OceanSectors
11 SSP126_FWF_1990_2300_OceanSectors https://er1.s4oceanice.eu/erddap/griddap/SSP126_FWF_1990_2300_OceanSectors
12 SSP585_FWF_1990_2300_AIS https://er1.s4oceanice.eu/erddap/griddap/SSP585_FWF_1990_2300_AIS
13 SSP126_FWF_1990_2300_AIS https://er1.s4oceanice.eu/erddap/griddap/SSP126_FWF_1990_2300_AIS
14 PSMSL_Absolute_sea_level_trend https://er1.s4oceanice.eu/erddap/griddap/PSMSL_Absolute_sea_level_trend
15 SCAR_RAATD https://er1.s4oceanice.eu/erddap/griddap/SCAR_RAATD

Using these URLs we will than be able to get their data.
In this example we will use the INSITU_GLO_PHY_TS_OA_MY_013_052 dataset, with the URL: https://er1.s4oceanice.eu/erddap/griddap/INSITU_GLO_PHY_TS_OA_MY_013_052

Get a list of variables for the dataset#

Now we can make a request to the dataset’s metadata, which will give us a list of all the available variables and their relative data type. These variables can be than used in the following requests.

BASE_URL = 'https://er1.s4oceanice.eu/erddap/griddap/seanoe_slev_anomaly_geostrophic_currents'

# building the full url for the metadata and making the request
metadata_url = BASE_URL.replace('tabledap', 'info').replace('griddap', 'info') + '/index.csv'

metadata_resp = requests.get(metadata_url)
metadata_df = pd.read_csv(io.StringIO(metadata_resp.text), sep=',')

# Extract time_coverage_start and time_coverage_end
time_coverage_start = metadata_df.loc[metadata_df['Attribute Name'] == 'time_coverage_start', 'Value'].iloc[0]
time_coverage_end = metadata_df.loc[metadata_df['Attribute Name'] == 'time_coverage_end', 'Value'].iloc[0]
geospatial_lat_max = metadata_df.loc[metadata_df['Attribute Name'] == 'geospatial_lat_max', 'Value'].iloc[0]
geospatial_lat_min = metadata_df.loc[metadata_df['Attribute Name'] == 'geospatial_lat_min', 'Value'].iloc[0]
geospatial_lon_max = metadata_df.loc[metadata_df['Attribute Name'] == 'geospatial_lon_max', 'Value'].iloc[0]
geospatial_lon_min = metadata_df.loc[metadata_df['Attribute Name'] == 'geospatial_lon_min', 'Value'].iloc[0]

variables_df = metadata_df.loc[metadata_df['Row Type'].isin(['variable', 'dimension'])]
variables_df.reset_index(drop=True, inplace=True)
variables_df.drop(columns=['Row Type', 'Attribute Name', 'Value'], inplace=True)

print(f"Time Coverage Start: {time_coverage_start}")
print(f"Time Coverage End: {time_coverage_end}")
print(f"Geospatial max Lat: {geospatial_lat_max}")
print(f"Geospatial min Lat: {geospatial_lat_min}")
print(f"Geospatial max Lon: {geospatial_lon_max}")
print(f"Geospatial min Lon: {geospatial_lon_min}")

variables_df
Time Coverage Start: 2013-04-01T00:00:00Z
Time Coverage End: 2019-07-31T00:00:00Z
Geospatial max Lat: 349.0
Geospatial min Lat: 0.0
Geospatial max Lon: 349.0
Geospatial min Lon: 0.0
/tmp/ipykernel_2531/810593471.py:19: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  variables_df.drop(columns=['Row Type', 'Attribute Name', 'Value'], inplace=True)
Variable Name Data Type
0 time double
1 longitude short
2 latitude short
3 sla float
4 formal_error float
5 U float
6 V float

Get a list of platform codes#

We will then perform another request to retrieve all the sla values in the time range and the bounding coordinates we want (in this case we will use the time_coverage_end value and the maximum range between geospatial_lat_min/geospatial_lat_max and geospatial_lon_min/geospatial_lon_max available values, see the output above).

N.B. The wider the range the more the loading time will be. Loading could fail if the range is too wide.

In other datasets it is possible that there are no time or coordinate ranges. Anyway when there is a variable with a range of values, the query follows the same structure: .csv? + variable_we_want_to_see (in this case sla) + %5B + eventually_another_variable + %5B + (min value):1:(max value) + %5D%5B + (min value 2):1:(max value 2) and so on if another value range is available.

platforms_query = f'.csv?sla%5B({time_coverage_end}):1:({time_coverage_end})%5D%5B({geospatial_lat_min}):1:({geospatial_lat_max})%5D%5B({geospatial_lon_min}):1:({geospatial_lon_max})%5D'

# The data format specified is 'csv' (in which the first row contains the column names and the second the units of measurment, which will be removed from the dataframe in these examples).
# Other possibilities are  'csv0' which will return only the data rows and 'csvp', which will return a csv with the column names (and their unit of measurment) as first row and data starting from the second.
# the additional parameter &distinct() will ensure we will get only unique rows

platform_resp = requests.get(BASE_URL + platforms_query)
# Skip the first two rows (header and units)
platforms_df = pd.read_csv(io.StringIO(platform_resp.text), sep=',')
platforms_df
time longitude latitude sla
0 UTC degrees_east degrees_north m
1 2019-07-31T00:00:00Z 0 0 9.96921E36
2 2019-07-31T00:00:00Z 0 1 9.96921E36
3 2019-07-31T00:00:00Z 0 2 9.96921E36
4 2019-07-31T00:00:00Z 0 3 9.96921E36
... ... ... ... ...
122496 2019-07-31T00:00:00Z 349 345 9.96921E36
122497 2019-07-31T00:00:00Z 349 346 9.96921E36
122498 2019-07-31T00:00:00Z 349 347 9.96921E36
122499 2019-07-31T00:00:00Z 349 348 9.96921E36
122500 2019-07-31T00:00:00Z 349 349 9.96921E36

122501 rows × 4 columns

Additional resources#

For additional information about ERDDAP please visit:

https://er1.s4oceanice.eu/erddap/information.html

The webpages for the Python’s libraries that have been used in this notebook are: