OCEAN ICE’s ERDDAP querying: tabledap#

This notebook will illustrate how to build queries and make requests to https://er1.s4oceanice.eu/erddap/index.html using Python.

For an interactive version of this page please visit the Google Colab:
Open in Google Colab
(To open link in new tab press Ctrl + click)

Alternatively this notebook can be opened with Binder by following the link: OCEAN ICE’S ERDDAP querying: tabledap

Setup#

To begin we need to import the necessary libraries.

# !pip install requests pandas
# these packages should be installed with the command above if running the code locally

import requests
import pandas as pd
import io

Get a list of available tabledap datasets#

To check what datasets are available in the ERDDAP and get their URLs the first step is to make a request to https://er1.s4oceanice.eu/erddap/tabledap/allDatasets.html using the URL that will allow us to get the datasets’ ids and their URLs based on the data structure. After receiving the data it will be loaded into a pandas DataFrame.

datasets_url = 'https://er1.s4oceanice.eu/erddap/tabledap/allDatasets.csv?datasetID%2Ctabledap'

# request and load into DataFrame
datasets_resp = requests.get(datasets_url)
datasets_df = pd.read_csv(io.StringIO(datasets_resp.text), sep=',')

# drop rows where tabledap is NaN
datasets_df = datasets_df.dropna(subset=['tabledap'])

# add url column
datasets_df['url'] = datasets_df['tabledap']
cleaned_df = datasets_df.drop(columns=['tabledap'])

pd.set_option('display.max_colwidth', None)
cleaned_df = cleaned_df.reset_index(drop=True)
cleaned_df
datasetID url
0 allDatasets https://er1.s4oceanice.eu/erddap/tabledap/allDatasets
1 AAD_ASPeCt-Bio_historical https://er1.s4oceanice.eu/erddap/tabledap/AAD_ASPeCt-Bio_historical
2 AADC_sea_ice_measurements_database_1985_2007 https://er1.s4oceanice.eu/erddap/tabledap/AADC_sea_ice_measurements_database_1985_2007
3 AMUNDSEN_CRUISES https://er1.s4oceanice.eu/erddap/tabledap/AMUNDSEN_CRUISES
4 ANT_TG_OCEAN_HEIGHT https://er1.s4oceanice.eu/erddap/tabledap/ANT_TG_OCEAN_HEIGHT
5 AWS_3_hourly https://er1.s4oceanice.eu/erddap/tabledap/AWS_3_hourly
6 AWS_Daily_25 https://er1.s4oceanice.eu/erddap/tabledap/AWS_Daily_25
7 AWS_Daily_75 https://er1.s4oceanice.eu/erddap/tabledap/AWS_Daily_75
8 AWS_Monthly_25 https://er1.s4oceanice.eu/erddap/tabledap/AWS_Monthly_25
9 AWS_Monthly_75 https://er1.s4oceanice.eu/erddap/tabledap/AWS_Monthly_75
10 ARCTICNET_CRUISES https://er1.s4oceanice.eu/erddap/tabledap/ARCTICNET_CRUISES
11 Australian_Antarctic_Program https://er1.s4oceanice.eu/erddap/tabledap/Australian_Antarctic_Program
12 British_Antartica_Survey_webcams https://er1.s4oceanice.eu/erddap/tabledap/British_Antartica_Survey_webcams
13 CCHDO_Bottle https://er1.s4oceanice.eu/erddap/tabledap/CCHDO_Bottle
14 CCHDO_CTD https://er1.s4oceanice.eu/erddap/tabledap/CCHDO_CTD
15 SURVOSTRAL https://er1.s4oceanice.eu/erddap/tabledap/SURVOSTRAL
16 IADC_ctd_cassandra_bottle https://er1.s4oceanice.eu/erddap/tabledap/IADC_ctd_cassandra_bottle
17 IADC_ctd_cassandra_downcast https://er1.s4oceanice.eu/erddap/tabledap/IADC_ctd_cassandra_downcast
18 IADC_s1_ctd https://er1.s4oceanice.eu/erddap/tabledap/IADC_s1_ctd
19 PLATFORMS_METADATA https://er1.s4oceanice.eu/erddap/tabledap/PLATFORMS_METADATA
20 DATA_TRAWLER_SST https://er1.s4oceanice.eu/erddap/tabledap/DATA_TRAWLER_SST
21 MEOP_Animal-borne_profiles https://er1.s4oceanice.eu/erddap/tabledap/MEOP_Animal-borne_profiles
22 NECKLACE https://er1.s4oceanice.eu/erddap/tabledap/NECKLACE
23 ARGO_FLOATS_OCEANICE_DE https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE_DE
24 ARGO_FLOATS_OCEANICE https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE
25 ARGO_FLOATS_OCEANICE_UK https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE_UK
26 seanoe_ctemp_asal_profile_90S45S https://er1.s4oceanice.eu/erddap/tabledap/seanoe_ctemp_asal_profile_90S45S
27 seanoe_moored_time_series_south60S https://er1.s4oceanice.eu/erddap/tabledap/seanoe_moored_time_series_south60S
28 Weddell_Sea_water_mass_age_meltwater_fractions https://er1.s4oceanice.eu/erddap/tabledap/Weddell_Sea_water_mass_age_meltwater_fractions
29 ACE_bottle_CTD20200406CURRSGCMR https://er1.s4oceanice.eu/erddap/tabledap/ACE_bottle_CTD20200406CURRSGCMR
30 ACE_ctd_CTD20200406CURRSGCMR https://er1.s4oceanice.eu/erddap/tabledap/ACE_ctd_CTD20200406CURRSGCMR
31 POLARSTERN_CRUISES https://er1.s4oceanice.eu/erddap/tabledap/POLARSTERN_CRUISES
32 NPI_Iceberg_database https://er1.s4oceanice.eu/erddap/tabledap/NPI_Iceberg_database
33 SOCHIC_Cruise_2022_Agulhas_II_met https://er1.s4oceanice.eu/erddap/tabledap/SOCHIC_Cruise_2022_Agulhas_II_met

Using these URLs we will than be able to get their data.
In this example we will use the ARGO_FLOATS_OCEANICE dataset, with the URL: https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE

Get a list of variables for the dataset#

Now we can make a request to the dataset’s metadata, which will give us a list of all the available variables and their relative data type. These variables can be than used in the following requests.

BASE_URL = 'https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE'

# building the full url for the metadata and making the request
metadata_url = BASE_URL.replace('tabledap', 'info').replace('griddap', 'info') + '/index.csv'

metadata_resp = requests.get(metadata_url)
metadata_df = pd.read_csv(io.StringIO(metadata_resp.text), sep=',')
variables_df = metadata_df.loc[metadata_df['Row Type'].isin(['variable', 'dimension'])]
variables_df.reset_index(drop=True, inplace=True)
variables_df.drop(columns=['Row Type', 'Attribute Name', 'Value'], inplace=True)
variables_df
/tmp/ipykernel_2269/1530411721.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  variables_df.drop(columns=['Row Type', 'Attribute Name', 'Value'], inplace=True)
Variable Name Data Type
0 PLATFORMCODE String
1 data_type String
2 format_version String
3 handbook_version String
4 reference_date_time double
5 date_creation double
6 date_update double
7 WMO String
8 project_name String
9 pi_name String
10 cycle_number int
11 direction String
12 data_center String
13 dc_reference String
14 data_state_indicator String
15 data_mode String
16 platform_type String
17 float_serial_no String
18 firmware_version String
19 wmo_inst_type String
20 time double
21 time_qc String
22 time_location double
23 latitude double
24 longitude double
25 position_qc String
26 positioning_system String
27 profile_pres_qc String
28 profile_temp_qc String
29 profile_psal_qc String
30 vertical_sampling_scheme String
31 config_mission_number int
32 PRESS float
33 pres_qc String
34 pres_adjusted float
35 pres_adjusted_qc String
36 pres_adjusted_error float
37 TEMP float
38 TEMP_QC String
39 temp_adjusted float
40 TEMP_adjusted_QC String
41 TEMP_adjusted_error float
42 PSAL float
43 PSAL_QC String
44 PSAL_ADJUSTED float
45 PSAL_ADJUSTED_QC String
46 PSAL_ADJUSTED_error float
47 DOXY float
48 DOXY_QC String
49 TEMP_DOXY float
50 TEMP_DOXY_QC String
51 molar_DOXY float
52 molar_DOXY_QC String
53 TURBIDITY float
54 TURBIDITY_QC String
55 CHLA float
56 CHLA_QC String
57 NITRATE float
58 NITRATE_QC String

Get a list of platform codes#

We will then perform another request to retrieve a list of platform codes for the selected dataset, which will be useful in the following queries to the ERDDAP.

platforms_query = '.csv?PLATFORMCODE&distinct()'

# The data format specified is 'csv' (in which the first row contains the column names and the second the units of measurment, which will be removed from the dataframe in these examples).
# Other possibilities are  'csv0' which will return only the data rows and 'csvp', which will return a csv with the column names (and their unit of measurment) as first row and data starting from the second.
# the additional parameter &distinct() will ensure we will get only unique rows

platform_resp = requests.get(BASE_URL + platforms_query)
platforms_df = pd.read_csv(io.StringIO(platform_resp.text), sep=',')
platforms_df
PLATFORMCODE
0 1902687
1 3902582
2 4903780
3 4903786
4 5907087
5 5907093
6 6990621
7 6990622

Data gathering#

Following are three examples of data queries:

With PLATFORMCODE and time range#

When building the URL to get the data a platform code can be inserted in the query to get the data relative to the platform. In the following example the platform code ‘1902687’ has been chosen and the variables are:

  • PLATFORMCODE

  • time

  • latitude

  • longitude

  • TEMP

The query will look like:

?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP&PLATFORMCODE=%221902687%22&time%3E=2023-04-29T09%3A45%3A00Z&time%3C=2024-04-29T09%3A45%3A00Z

It can be divided into two main parts:

  1. ?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP

    Where ? indicates the start of query parametes and the rest is a list of variables we want as columns in the csv, separated by %2C, an encoded comma(,).

  2. &PLATFORMCODE=%221902687%22&time%3E=2022-04-29T09%3A45%3A00Z&time%3C=2024-04-29T09%3A45%3A00Z

    After the list of variables we can add filters, separated by &.

    The platform code chosen is 1902687 and it has to be inserted between encoded double quotes(“), represented by %22.

    The syntax for the timerange is:

    time%3E=2022-04-29T09%3A45%3A00Z&time%3C=2024-04-29T09%3A45%3A00Z

    Here the other encoded characters are %3E (>), %3C (<) and %3A (:).

    The time has to be passed as an ISO string, with the format YYYY-MM-DDThh:mm:ssZ.

platform_code = '1902687'

variables = '.csv?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP'
filters = f'&PLATFORMCODE=%22{platform_code}%22&time%3E=2023-04-29T00%3A00%3A00Z&time%3C=2024-04-29T00%3A00%3A00Z'

data_resp = requests.get(BASE_URL + variables + filters)
data_df = pd.read_csv(io.StringIO(data_resp.text), sep=',')

data_df=data_df.sort_values(by=["time"])
data_df.reset_index(drop=True, inplace=True)
data_df = data_df.dropna(subset=['PLATFORMCODE'])
data_df
PLATFORMCODE time latitude longitude TEMP
0 1902687.0 2024-01-12T00:33:00Z -74.85373 -102.42796666666666 1.107
1 1902687.0 2024-01-12T00:33:00Z -74.85373 -102.42796666666666 1.098
2 1902687.0 2024-01-12T00:33:00Z -74.85373 -102.42796666666666 1.091
3 1902687.0 2024-01-12T00:33:00Z -74.85373 -102.42796666666666 1.087
4 1902687.0 2024-01-12T00:33:00Z -74.85373 -102.42796666666666 1.084
... ... ... ... ... ...
787 1902687.0 2024-02-12T05:31:20Z -74.89717 -102.34899666666666 -0.677
788 1902687.0 2024-02-12T05:31:20Z -74.89717 -102.34899666666666 -0.913
789 1902687.0 2024-02-12T05:31:20Z -74.89717 -102.34899666666666 -1.098
790 1902687.0 2024-02-12T05:31:20Z -74.89717 -102.34899666666666 -1.194
791 1902687.0 2024-02-12T05:31:20Z -74.89717 -102.34899666666666 0.031

792 rows × 5 columns

With multiple platform codes#

It is possible to select multiple platform codes when querying the data. This can be done by using a regex.

In this example the three platform codes used will be ‘4903780’, ‘4903786’ and ‘3902582’.

To build these part of the query the regex will have this syntax:

PLATFORMCODE=~%22(platform_code_1%7Cplatform_code_2%7Cplatform_code_3)

Where %7C represents the symbol | (meaning OR).

regex_platform_code = '(3902582%7C4903780%7C4903786)'

variables = '.csv?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP'
regex_filters = f'&PLATFORMCODE=~%22{regex_platform_code}%22&time%3E=2024-02-20T00%3A00%3A00Z&time%3C=2024-04-29T00%3A00%3A00Z'

regex_data_resp = requests.get(BASE_URL + variables + regex_filters)
regex_data_df = pd.read_csv(io.StringIO(regex_data_resp.text), sep=',')

regex_data_df = regex_data_df.dropna(subset=['PLATFORMCODE'])

unique_platform_codes = regex_data_df['PLATFORMCODE'].unique()
print('\nThis DataFrame contains the platform codes:', unique_platform_codes, '\n')
regex_data_df
This DataFrame contains the platform codes: [4903780. 4903786. 3902582.] 
PLATFORMCODE time latitude longitude TEMP
1 4903780.0 2024-02-20T05:12:20Z -67.26258518663079 80.31254458206934 -1.823
2 4903780.0 2024-02-20T05:12:20Z -67.26258518663079 80.31254458206934 -1.821
3 4903780.0 2024-02-20T05:12:20Z -67.26258518663079 80.31254458206934 -1.82
4 4903780.0 2024-02-20T05:12:20Z -67.26258518663079 80.31254458206934 -1.82
5 4903780.0 2024-02-20T05:12:20Z -67.26258518663079 80.31254458206934 -1.828
... ... ... ... ... ...
1826 3902582.0 2024-02-25T05:42:20Z -78.13948 -174.97683 -1.902
1827 3902582.0 2024-02-25T05:42:20Z -78.13948 -174.97683 -1.901
1828 3902582.0 2024-02-25T05:42:20Z -78.13948 -174.97683 -1.896
1829 3902582.0 2024-02-25T05:42:20Z -78.13948 -174.97683 -1.894
1830 3902582.0 2024-02-25T05:42:20Z -78.13948 -174.97683 -1.894

1830 rows × 5 columns

With coordinates range#

Another possibility when querying the data is to specify a range of coordinates. This can be done by inserting in the query filters the following:

latitude%3E=-75&latitude%3C=-30&longitude%3E=-50&longitude%3C=50

Effectively selecting platforms inside a square delimited by:

  • latitude equal or greater than -75 and equal or less than -30

and

  • longitude equal or greater than -50 and equal or less than 50.

coords_variables = '.csv?PLATFORMCODE%2Clatitude%2Clongitude'
coords_filter = '&latitude%3E=-75&latitude%3C=-30&longitude%3E=-50&longitude%3C=50&distinct()'

coords_data_resp = requests.get(BASE_URL + coords_variables + coords_filter)
coords_data_df = pd.read_csv(io.StringIO(coords_data_resp.text), sep=',')

coords_data_df = coords_data_df.dropna(subset=['PLATFORMCODE'])

coords_data_df
PLATFORMCODE latitude longitude
1 6990622.0 -68.96947443536386 28.087817418426756
2 6990622.0 -68.94769391657476 -21.311241844741982

Additional resources#

For additional information about ERDDAP please visit:

https://er1.s4oceanice.eu/erddap/information.html

The webpages for the Python’s libraries that have been used in this notebook are: