OCEAN:ICE’s ERDDAP querying#

For an interactive version of this page please visit the Google Colab at the link:
Open in Google Colab
(To open link in new tab press Ctrl + click)

This notebook will illustrate how to build queries and make requests to https://er1.s4oceanice.eu/erddap/index.html using Python.

Get a list of available datasets#

To check what datasets are available in the ERDDAP and get their URLs the first step is to make a request to https://er1.s4oceanice.eu/erddap/tabledap/allDatasets.html
performing a query that will allow us to get the tabledap datasets’ ids and their URLs based on the data structure. For this example the griddap datasets have been omitted. After receiving the data it will be loaded into a pandas DataFrame.

Hide code cell source
datasets_url = 'https://er1.s4oceanice.eu/erddap/tabledap/allDatasets.csv?datasetID%2Ctabledap'

# building the full url and making the request
datasets_resp = requests.get(datasets_url)
# loadingd the data into a pandas DataFrame
datasets_df = pd.read_csv(io.StringIO(datasets_resp.text), sep=',')
datasets_df['url'] = datasets_df['tabledap']

# dropping rows where all values are NaN
df_cleaned = datasets_df.dropna(how='all')
df_cleaned = df_cleaned.dropna(subset='url')

# removing now obsolete columns and showing the content
datasets_df = df_cleaned.drop(columns=['tabledap'])
pd.set_option('display.max_colwidth', None)
datasets_df
datasetID url
1 allDatasets https://er1.s4oceanice.eu/erddap/tabledap/allDatasets
2 AAD_ASPeCt-Bio_historical https://er1.s4oceanice.eu/erddap/tabledap/AAD_ASPeCt-Bio_historical
3 AMUNDSEN_CRUISES https://er1.s4oceanice.eu/erddap/tabledap/AMUNDSEN_CRUISES
4 ANT_TG_OCEAN_HEIGHT https://er1.s4oceanice.eu/erddap/tabledap/ANT_TG_OCEAN_HEIGHT
5 ARCTICNET_CRUISES https://er1.s4oceanice.eu/erddap/tabledap/ARCTICNET_CRUISES
6 Australian_Antarctic_Program https://er1.s4oceanice.eu/erddap/tabledap/Australian_Antarctic_Program
7 British_Antartica_Survey_webcams https://er1.s4oceanice.eu/erddap/tabledap/British_Antartica_Survey_webcams
8 CCHDO_Bottle https://er1.s4oceanice.eu/erddap/tabledap/CCHDO_Bottle
9 CCHDO_CTD https://er1.s4oceanice.eu/erddap/tabledap/CCHDO_CTD
11 SURVOSTRAL https://er1.s4oceanice.eu/erddap/tabledap/SURVOSTRAL
12 commandant_charcot_a5qvgc https://er1.s4oceanice.eu/erddap/tabledap/commandant_charcot_a5qvgc
13 IADC_ctd_cassandra_bottle https://er1.s4oceanice.eu/erddap/tabledap/IADC_ctd_cassandra_bottle
14 IADC_ctd_cassandra_downcast https://er1.s4oceanice.eu/erddap/tabledap/IADC_ctd_cassandra_downcast
15 IADC_s1_ctd https://er1.s4oceanice.eu/erddap/tabledap/IADC_s1_ctd
16 DomeC_SP02 https://er1.s4oceanice.eu/erddap/tabledap/DomeC_SP02
18 itase_chemistry_synthesis_group_9ivzat https://er1.s4oceanice.eu/erddap/tabledap/itase_chemistry_synthesis_group_9ivzat
19 IADC_Mooring_Aquadopp_35m https://er1.s4oceanice.eu/erddap/tabledap/IADC_Mooring_Aquadopp_35m
20 IADC_Mooring_CTD_30m https://er1.s4oceanice.eu/erddap/tabledap/IADC_Mooring_CTD_30m
21 IADC_Mooring_CTD_90m https://er1.s4oceanice.eu/erddap/tabledap/IADC_Mooring_CTD_90m
22 MEOP_Animal-borne_profiles https://er1.s4oceanice.eu/erddap/tabledap/MEOP_Animal-borne_profiles
23 NECKLACE https://er1.s4oceanice.eu/erddap/tabledap/NECKLACE
27 ARGO_FLOATS_OCEANICE https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE
31 seanoe_moored_time_series_south60S https://er1.s4oceanice.eu/erddap/tabledap/seanoe_moored_time_series_south60S
32 Weddell_Sea_water_mass_age_meltwater_fractions https://er1.s4oceanice.eu/erddap/tabledap/Weddell_Sea_water_mass_age_meltwater_fractions
33 POLARSTERN_CRUISES https://er1.s4oceanice.eu/erddap/tabledap/POLARSTERN_CRUISES
35 NPI_Iceberg_database https://er1.s4oceanice.eu/erddap/tabledap/NPI_Iceberg_database
36 SOCHIC_Cruise_2022_Agulhas_II_met https://er1.s4oceanice.eu/erddap/tabledap/SOCHIC_Cruise_2022_Agulhas_II_met
37 SOCHIC_Cruise_2022_Agulhas_II_CTD https://er1.s4oceanice.eu/erddap/tabledap/SOCHIC_Cruise_2022_Agulhas_II_CTD

Using these URLs we will than be able to get their relative dataset’s data.
In this example we will use the ARGO_FLOATS_OCEANICE dataset, with the URL:
https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE

Get a list of variables for the dataset#

Now we can make a request to the dataset’s metadata, which will give us a list of all the available variables and their relative data type. These variables can be than used in the following requests.

Hide code cell source
BASE_URL = 'https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE'

# building the full url for the metadata and making the request
metadata_url = BASE_URL.replace('tabledap', 'info') + '/index.csv'

metadata_resp = requests.get(metadata_url)
metadata_df = pd.read_csv(io.StringIO(metadata_resp.text), sep=',')
variables_df = metadata_df.loc[metadata_df['Row Type'].isin(['variable', 'dimension'])]
variables_df.reset_index(drop=True, inplace=True)
variables_df.drop(columns=['Row Type', 'Attribute Name', 'Value'], inplace=True)
variables_df
Variable Name Data Type
0 PLATFORMCODE String
1 data_type String
2 format_version String
3 handbook_version String
4 reference_date_time double
5 date_creation double
6 date_update double
7 WMO String
8 project_name String
9 pi_name String
10 cycle_number int
11 direction String
12 data_center String
13 dc_reference String
14 data_state_indicator String
15 data_mode String
16 platform_type String
17 float_serial_no String
18 firmware_version String
19 wmo_inst_type String
20 time double
21 time_qc String
22 time_location double
23 latitude double
24 longitude double
25 position_qc String
26 positioning_system String
27 profile_pres_qc String
28 profile_temp_qc String
29 profile_psal_qc String
30 vertical_sampling_scheme String
31 config_mission_number int
32 PRESS float
33 pres_qc String
34 pres_adjusted float
35 pres_adjusted_qc String
36 pres_adjusted_error float
37 TEMP float
38 TEMP_QC String
39 temp_adjusted float
40 TEMP_adjusted_QC String
41 TEMP_adjusted_error float
42 PSAL float
43 PSAL_QC String
44 PSAL_ADJUSTED float
45 PSAL_ADJUSTED_QC String
46 PSAL_ADJUSTED_error float
47 DOXY float
48 DOXY_QC String
49 TEMP_DOXY float
50 TEMP_DOXY_QC String
51 molar_DOXY float
52 molar_DOXY_QC String
53 TURBIDITY float
54 TURBIDITY_QC String
55 CHLA float
56 CHLA_QC String
57 NITRATE float
58 NITRATE_QC String

Get a list of platform codes#

We will then perform another request to retrieve a list of platform codes for the selected dataset, which will be useful in the following queries to the ERDDAP.

Hide code cell source
platforms_query = '.csv?PLATFORMCODE&distinct()'

# The data format specified is 'csv' (in which the first row contains the column names and the second the units of measurment, which will be removed from the dataframe in these examples).
# Other possibilities are  'csv0' which will return only the data rows and 'csvp', which will return a csv with the column names (and their unit of measurment) as first row and data starting from the second.
# the additional parameter &distinct() will ensure we will get only unique rows

platform_resp = requests.get(BASE_URL + platforms_query)
platforms_df = pd.read_csv(io.StringIO(platform_resp.text), sep=',')
platforms_df
PLATFORMCODE
0 1902687
1 3902582
2 4903780
3 4903786
4 5907087
5 5907093
6 6990621
7 6990622

Data gathering#

Following are three examples of data queries:

With PLATFORMCODE and time range#

When building the URL to get the data a platform code can be inserted in the query to get the data relative to the platform. In the following example the platform code ‘1902687’ has been chosen and the variables are:

  • PLATFORMCODE

  • time

  • latitude

  • longitude

  • TEMP

The query will look like:

?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP&PLATFORMCODE=%221902687%22&time%3E=2024-03-29T09%3A45%3A00Z&time%3C=2024-04-29T09%3A45%3A00Z

It can be divided into two main parts:

  1. ?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP

    Where ? indicates the start of query parametes and the rest is a list of variables we want as columns in the csv, separated by %2C, an encoded comma(,).

  2. &PLATFORMCODE=%221902687%22&time%3E=2024-03-29T09%3A45%3A00Z&time%3C=2024-04-29T09%3A45%3A00Z

    After the list of variables we can add filters, separated by &.

    The platform code chosen is 1902687 and it has to be inserted between encoded double quotes(“), represented by %22.

    The syntax for the timerange is:

    time%3E=2024-03-29T09%3A45%3A00Z&time%3C=2024-04-29T09%3A45%3A00Z

    Here the other encoded characters are %3E (>), %3C (<) and %3A (:).

    The time has to be passed as an ISO string, with the format YYYY-MM-DDThh:mm:ssZ.

Hide code cell source
platform_code = '1902687'

variables = '.csv?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP'
filters = f'&PLATFORMCODE=%22{platform_code}%22&time%3E=2023-04-29T00%3A00%3A00Z&time%3C=2024-04-29T00%3A00%3A00Z'

data_resp = requests.get(BASE_URL + variables + filters)
data_df = pd.read_csv(io.StringIO(data_resp.text), sep=',')

data_df=data_df.sort_values(by=["time"])
data_df.reset_index(drop=True, inplace=True)
data_df = data_df.dropna(subset=['PLATFORMCODE'])
data_df
PLATFORMCODE time latitude longitude TEMP
0 1902687.0 2024-01-12T00:33:00Z -74.85373 -102.42796666666666 1.107
1 1902687.0 2024-01-12T00:33:00Z -74.85373 -102.42796666666666 1.098
2 1902687.0 2024-01-12T00:33:00Z -74.85373 -102.42796666666666 1.091
3 1902687.0 2024-01-12T00:33:00Z -74.85373 -102.42796666666666 1.087
4 1902687.0 2024-01-12T00:33:00Z -74.85373 -102.42796666666666 1.084
... ... ... ... ... ...
787 1902687.0 2024-02-12T05:31:20Z -74.89717 -102.34899666666666 -0.677
788 1902687.0 2024-02-12T05:31:20Z -74.89717 -102.34899666666666 -0.913
789 1902687.0 2024-02-12T05:31:20Z -74.89717 -102.34899666666666 -1.098
790 1902687.0 2024-02-12T05:31:20Z -74.89717 -102.34899666666666 -1.194
791 1902687.0 2024-02-12T05:31:20Z -74.89717 -102.34899666666666 0.031

792 rows × 5 columns

With multiple platform codes#

It is possible to select multiple platform codes when querying the data. This can be done by using a regex.

In this example the three platform codes used will be ‘4903780’, ‘4903786’ and ‘3902582’.

To build these part of the query the regex will have this syntax:

PLATFORMCODE=~%22(platform_code_1%7Cplatform_code_2%7Cplatform_code_3)

Where %7C represents the symbol | (meaning OR).

Hide code cell source
regex_platform_code = '(3902582%7C4903780%7C4903786)'

variables = '.csv?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP'
regex_filters = f'&PLATFORMCODE=~%22{regex_platform_code}%22&time%3E=2024-02-20T00%3A00%3A00Z&time%3C=2024-04-29T00%3A00%3A00Z'

regex_data_resp = requests.get(BASE_URL + variables + regex_filters)
regex_data_df = pd.read_csv(io.StringIO(regex_data_resp.text), sep=',')

regex_data_df = regex_data_df.dropna(subset=['PLATFORMCODE'])

unique_platform_codes = regex_data_df['PLATFORMCODE'].unique()
print('\nThis DataFrame contains the platform codes:', unique_platform_codes, '\n')
regex_data_df
This DataFrame contains the platform codes: [4903780. 4903786. 3902582.] 
PLATFORMCODE time latitude longitude TEMP
1 4903780.0 2024-02-20T05:12:20Z -67.26258518663079 80.31254458206934 -1.823
2 4903780.0 2024-02-20T05:12:20Z -67.26258518663079 80.31254458206934 -1.821
3 4903780.0 2024-02-20T05:12:20Z -67.26258518663079 80.31254458206934 -1.82
4 4903780.0 2024-02-20T05:12:20Z -67.26258518663079 80.31254458206934 -1.82
5 4903780.0 2024-02-20T05:12:20Z -67.26258518663079 80.31254458206934 -1.828
... ... ... ... ... ...
1826 3902582.0 2024-02-25T05:42:20Z -78.13948 -174.97683 -1.902
1827 3902582.0 2024-02-25T05:42:20Z -78.13948 -174.97683 -1.901
1828 3902582.0 2024-02-25T05:42:20Z -78.13948 -174.97683 -1.896
1829 3902582.0 2024-02-25T05:42:20Z -78.13948 -174.97683 -1.894
1830 3902582.0 2024-02-25T05:42:20Z -78.13948 -174.97683 -1.894

1830 rows × 5 columns

With coordinates range#

Another possibility when querying the data is to specify a range of coordinates. This can be done by inserting in the query filters the following:

latitude%3E=-75&latitude%3C=-30&longitude%3E=-50&longitude%3C=50

Effectively selecting platforms inside a square delimited by:

  • latitude equal or greater than -75 and equal or less than -30

and

  • longitude equal or greater than -50 and equal or less than 50.

Hide code cell source
coords_variables = '.csv?PLATFORMCODE%2Clatitude%2Clongitude'
coords_filter = '&latitude%3E=-75&latitude%3C=-30&longitude%3E=-50&longitude%3C=50&distinct()'

coords_data_resp = requests.get(BASE_URL + coords_variables + coords_filter)
coords_data_df = pd.read_csv(io.StringIO(coords_data_resp.text), sep=',')

coords_data_df = coords_data_df.dropna(subset=['PLATFORMCODE'])

coords_data_df
PLATFORMCODE latitude longitude
1 6990622.0 -68.96947443536386 28.087817418426756
2 6990622.0 -68.94769391657476 -21.311241844741982

Additional resources#

For additional information about ERDDAP please visit:

https://er1.s4oceanice.eu/erddap/information.html

The Python libraries that have been used in this notebook are: