OCEAN ICE’s ERDDAP querying: griddap#
This notebook will illustrate how to build queries and make requests to https://er1.s4oceanice.eu/erddap/index.html using Python.
For an interactive version of this page please visit the Google Colab:
Open in Google Colab
(To open link in new tab press Ctrl + click)
Alternatively this notebook can be opened with Binder by following the link: OCEAN ICE’S ERDDAP querying: griddap
Setup#
To begin we need to import the necessary libraries.
# !pip install requests pandas
# these packages should be installed with the command above if running the code locally
import requests
import pandas as pd
import io
Get a list of available datasets#
To check what griddap datasets are available in the ERDDAP and get their URLs the first step is to make a request to https://er1.s4oceanice.eu/erddap/tabledap/allDatasets.html using the URL that will allow us to get the datasets’ ids and their URLs based on the data structure. After receiving the data it will be loaded into a pandas DataFrame.
datasets_url = 'https://er1.s4oceanice.eu/erddap/tabledap/allDatasets.csv?datasetID%2Cgriddap'
# request and load into DataFrame
datasets_resp = requests.get(datasets_url)
datasets_df = pd.read_csv(io.StringIO(datasets_resp.text), sep=',')
# drop rows where tabledap is NaN
datasets_df = datasets_df.dropna(subset=['griddap'])
# add url column
datasets_df['url'] = datasets_df['griddap']
cleaned_df = datasets_df.drop(columns=['griddap'])
pd.set_option('display.max_colwidth', None)
cleaned_df = cleaned_df.reset_index(drop=True)
cleaned_df
datasetID | url | |
---|---|---|
0 | INSITU_GLO_PHY_TS_OA_MY_013_052 | https://er1.s4oceanice.eu/erddap/griddap/INSITU_GLO_PHY_TS_OA_MY_013_052 |
1 | seanoe_slev_anomaly_geostrophic_currents | https://er1.s4oceanice.eu/erddap/griddap/seanoe_slev_anomaly_geostrophic_currents |
2 | RSMC_seaice | https://er1.s4oceanice.eu/erddap/griddap/RSMC_seaice |
3 | GLORYS12V1_sea_floor_potential_temp | https://er1.s4oceanice.eu/erddap/griddap/GLORYS12V1_sea_floor_potential_temp |
4 | GLODAPv2_2016b_MappedClimatologies | https://er1.s4oceanice.eu/erddap/griddap/GLODAPv2_2016b_MappedClimatologies |
5 | NOAA_OISST_v2 | https://er1.s4oceanice.eu/erddap/griddap/NOAA_OISST_v2 |
6 | SOCATv2024_tracks_gridded_monthly | https://er1.s4oceanice.eu/erddap/griddap/SOCATv2024_tracks_gridded_monthly |
7 | EU_circumpolar_seaice_prod_fluxes_1992_2023 | https://er1.s4oceanice.eu/erddap/griddap/EU_circumpolar_seaice_prod_fluxes_1992_2023 |
8 | SSP585_FWF_1990_2300_ZwallyBasins | https://er1.s4oceanice.eu/erddap/griddap/SSP585_FWF_1990_2300_ZwallyBasins |
9 | SSP126_FWF_1990_2300_ZwallyBasins | https://er1.s4oceanice.eu/erddap/griddap/SSP126_FWF_1990_2300_ZwallyBasins |
10 | SSP585_FWF_1990_2300_OceanSectors | https://er1.s4oceanice.eu/erddap/griddap/SSP585_FWF_1990_2300_OceanSectors |
11 | SSP126_FWF_1990_2300_OceanSectors | https://er1.s4oceanice.eu/erddap/griddap/SSP126_FWF_1990_2300_OceanSectors |
12 | SSP585_FWF_1990_2300_AIS | https://er1.s4oceanice.eu/erddap/griddap/SSP585_FWF_1990_2300_AIS |
13 | SSP126_FWF_1990_2300_AIS | https://er1.s4oceanice.eu/erddap/griddap/SSP126_FWF_1990_2300_AIS |
14 | PSMSL_Absolute_sea_level_trend | https://er1.s4oceanice.eu/erddap/griddap/PSMSL_Absolute_sea_level_trend |
15 | SCAR_RAATD | https://er1.s4oceanice.eu/erddap/griddap/SCAR_RAATD |
Using these URLs we will than be able to get their data.
In this example we will use the INSITU_GLO_PHY_TS_OA_MY_013_052 dataset, with the URL:
https://er1.s4oceanice.eu/erddap/griddap/INSITU_GLO_PHY_TS_OA_MY_013_052
Get a list of variables for the dataset#
Now we can make a request to the dataset’s metadata, which will give us a list of all the available variables and their relative data type. These variables can be than used in the following requests.
BASE_URL = 'https://er1.s4oceanice.eu/erddap/griddap/seanoe_slev_anomaly_geostrophic_currents'
# building the full url for the metadata and making the request
metadata_url = BASE_URL.replace('tabledap', 'info').replace('griddap', 'info') + '/index.csv'
metadata_resp = requests.get(metadata_url)
metadata_df = pd.read_csv(io.StringIO(metadata_resp.text), sep=',')
# Extract time_coverage_start and time_coverage_end
time_coverage_start = metadata_df.loc[metadata_df['Attribute Name'] == 'time_coverage_start', 'Value'].iloc[0]
time_coverage_end = metadata_df.loc[metadata_df['Attribute Name'] == 'time_coverage_end', 'Value'].iloc[0]
geospatial_lat_max = metadata_df.loc[metadata_df['Attribute Name'] == 'geospatial_lat_max', 'Value'].iloc[0]
geospatial_lat_min = metadata_df.loc[metadata_df['Attribute Name'] == 'geospatial_lat_min', 'Value'].iloc[0]
geospatial_lon_max = metadata_df.loc[metadata_df['Attribute Name'] == 'geospatial_lon_max', 'Value'].iloc[0]
geospatial_lon_min = metadata_df.loc[metadata_df['Attribute Name'] == 'geospatial_lon_min', 'Value'].iloc[0]
variables_df = metadata_df.loc[metadata_df['Row Type'].isin(['variable', 'dimension'])]
variables_df.reset_index(drop=True, inplace=True)
variables_df.drop(columns=['Row Type', 'Attribute Name', 'Value'], inplace=True)
print(f"Time Coverage Start: {time_coverage_start}")
print(f"Time Coverage End: {time_coverage_end}")
print(f"Geospatial max Lat: {geospatial_lat_max}")
print(f"Geospatial min Lat: {geospatial_lat_min}")
print(f"Geospatial max Lon: {geospatial_lon_max}")
print(f"Geospatial min Lon: {geospatial_lon_min}")
variables_df
Time Coverage Start: 2013-04-01T00:00:00Z
Time Coverage End: 2019-07-31T00:00:00Z
Geospatial max Lat: 349.0
Geospatial min Lat: 0.0
Geospatial max Lon: 349.0
Geospatial min Lon: 0.0
/tmp/ipykernel_2531/810593471.py:19: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
variables_df.drop(columns=['Row Type', 'Attribute Name', 'Value'], inplace=True)
Variable Name | Data Type | |
---|---|---|
0 | time | double |
1 | longitude | short |
2 | latitude | short |
3 | sla | float |
4 | formal_error | float |
5 | U | float |
6 | V | float |
Get a list of platform codes#
We will then perform another request to retrieve all the sla values in the time range and the bounding coordinates we want (in this case we will use the time_coverage_end value and the maximum range between geospatial_lat_min/geospatial_lat_max and geospatial_lon_min/geospatial_lon_max available values, see the output above).
N.B. The wider the range the more the loading time will be. Loading could fail if the range is too wide.
In other datasets it is possible that there are no time or coordinate ranges. Anyway when there is a variable with a range of values, the query follows the same structure: .csv?
+ variable_we_want_to_see
(in this case sla) + %5B
+ eventually_another_variable
+ %5B
+ (min value):1:(max value)
+ %5D%5B
+ (min value 2):1:(max value 2)
and so on if another value range is available.
platforms_query = f'.csv?sla%5B({time_coverage_end}):1:({time_coverage_end})%5D%5B({geospatial_lat_min}):1:({geospatial_lat_max})%5D%5B({geospatial_lon_min}):1:({geospatial_lon_max})%5D'
# The data format specified is 'csv' (in which the first row contains the column names and the second the units of measurment, which will be removed from the dataframe in these examples).
# Other possibilities are 'csv0' which will return only the data rows and 'csvp', which will return a csv with the column names (and their unit of measurment) as first row and data starting from the second.
# the additional parameter &distinct() will ensure we will get only unique rows
platform_resp = requests.get(BASE_URL + platforms_query)
# Skip the first two rows (header and units)
platforms_df = pd.read_csv(io.StringIO(platform_resp.text), sep=',')
platforms_df
time | longitude | latitude | sla | |
---|---|---|---|---|
0 | UTC | degrees_east | degrees_north | m |
1 | 2019-07-31T00:00:00Z | 0 | 0 | 9.96921E36 |
2 | 2019-07-31T00:00:00Z | 0 | 1 | 9.96921E36 |
3 | 2019-07-31T00:00:00Z | 0 | 2 | 9.96921E36 |
4 | 2019-07-31T00:00:00Z | 0 | 3 | 9.96921E36 |
... | ... | ... | ... | ... |
122496 | 2019-07-31T00:00:00Z | 349 | 345 | 9.96921E36 |
122497 | 2019-07-31T00:00:00Z | 349 | 346 | 9.96921E36 |
122498 | 2019-07-31T00:00:00Z | 349 | 347 | 9.96921E36 |
122499 | 2019-07-31T00:00:00Z | 349 | 348 | 9.96921E36 |
122500 | 2019-07-31T00:00:00Z | 349 | 349 | 9.96921E36 |
122501 rows × 4 columns
Additional resources#
For additional information about ERDDAP please visit:
https://er1.s4oceanice.eu/erddap/information.html
The webpages for the Python’s libraries that have been used in this notebook are: