OCEAN:ICE’s ERDDAP querying#

For an interactive version of this page please visit the Google Colab at the link:
Open in Google Colab
(To open link in new tab press Ctrl + click)

This notebook will illustrate how to build queries and make requests to https://er1.s4oceanice.eu/erddap/index.html using Python.

Get a list of available datasets#

To check what datasets are available in the ERDDAP and get their URLs the first step is to make a request to https://er1.s4oceanice.eu/erddap/tabledap/allDatasets.html
performing a query that will allow us to get the tabledap datasets’ ids and their URLs based on the data structure. For this example the griddap datasets have been omitted. After receiving the data it will be loaded into a pandas DataFrame.

Hide code cell source
datasets_url = 'https://er1.s4oceanice.eu/erddap/tabledap/allDatasets.csv?datasetID%2Ctabledap'

# building the full url and making the request
datasets_resp = requests.get(datasets_url)
# loadingd the data into a pandas DataFrame
datasets_df = pd.read_csv(io.StringIO(datasets_resp.text), sep=',')
datasets_df['url'] = datasets_df['tabledap']

# dropping rows where all values are NaN
df_cleaned = datasets_df.dropna(how='all')
df_cleaned = df_cleaned.dropna(subset='url')

# removing now obsolete columns and showing the content
datasets_df = df_cleaned.drop(columns=['tabledap'])
pd.set_option('display.max_colwidth', None)
datasets_df
datasetID url
1 allDatasets https://er1.s4oceanice.eu/erddap/tabledap/allDatasets
2 AAD_ASPeCt-Bio_historical https://er1.s4oceanice.eu/erddap/tabledap/AAD_ASPeCt-Bio_historical
3 AADC_sea_ice_measurements_database_1985_2007 https://er1.s4oceanice.eu/erddap/tabledap/AADC_sea_ice_measurements_database_1985_2007
4 AMUNDSEN_CRUISES https://er1.s4oceanice.eu/erddap/tabledap/AMUNDSEN_CRUISES
5 ANT_TG_OCEAN_HEIGHT https://er1.s4oceanice.eu/erddap/tabledap/ANT_TG_OCEAN_HEIGHT
6 AWS_3_hourly https://er1.s4oceanice.eu/erddap/tabledap/AWS_3_hourly
7 AWS_Daily_25 https://er1.s4oceanice.eu/erddap/tabledap/AWS_Daily_25
8 AWS_Daily_75 https://er1.s4oceanice.eu/erddap/tabledap/AWS_Daily_75
9 AWS_Monthly_25 https://er1.s4oceanice.eu/erddap/tabledap/AWS_Monthly_25
10 AWS_Monthly_75 https://er1.s4oceanice.eu/erddap/tabledap/AWS_Monthly_75
11 ARCTICNET_CRUISES https://er1.s4oceanice.eu/erddap/tabledap/ARCTICNET_CRUISES
12 Australian_Antarctic_Program https://er1.s4oceanice.eu/erddap/tabledap/Australian_Antarctic_Program
13 British_Antartica_Survey_webcams https://er1.s4oceanice.eu/erddap/tabledap/British_Antartica_Survey_webcams
14 CCHDO_Bottle https://er1.s4oceanice.eu/erddap/tabledap/CCHDO_Bottle
15 CCHDO_CTD https://er1.s4oceanice.eu/erddap/tabledap/CCHDO_CTD
17 SURVOSTRAL https://er1.s4oceanice.eu/erddap/tabledap/SURVOSTRAL
18 IADC_ctd_cassandra_bottle https://er1.s4oceanice.eu/erddap/tabledap/IADC_ctd_cassandra_bottle
19 IADC_ctd_cassandra_downcast https://er1.s4oceanice.eu/erddap/tabledap/IADC_ctd_cassandra_downcast
20 IADC_s1_ctd https://er1.s4oceanice.eu/erddap/tabledap/IADC_s1_ctd
22 DATA_TRAWLER_SST https://er1.s4oceanice.eu/erddap/tabledap/DATA_TRAWLER_SST
23 MEOP_Animal-borne_profiles https://er1.s4oceanice.eu/erddap/tabledap/MEOP_Animal-borne_profiles
25 NECKLACE https://er1.s4oceanice.eu/erddap/tabledap/NECKLACE
29 ARGO_FLOATS_OCEANICE_DE https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE_DE
30 ARGO_FLOATS_OCEANICE https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE
31 ARGO_FLOATS_OCEANICE_UK https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE_UK
39 seanoe_ctemp_asal_profile_90S45S https://er1.s4oceanice.eu/erddap/tabledap/seanoe_ctemp_asal_profile_90S45S
40 seanoe_moored_time_series_south60S https://er1.s4oceanice.eu/erddap/tabledap/seanoe_moored_time_series_south60S
41 Weddell_Sea_water_mass_age_meltwater_fractions https://er1.s4oceanice.eu/erddap/tabledap/Weddell_Sea_water_mass_age_meltwater_fractions
42 ACE_bottle_CTD20200406CURRSGCMR https://er1.s4oceanice.eu/erddap/tabledap/ACE_bottle_CTD20200406CURRSGCMR
43 ACE_ctd_CTD20200406CURRSGCMR https://er1.s4oceanice.eu/erddap/tabledap/ACE_ctd_CTD20200406CURRSGCMR
44 POLARSTERN_CRUISES https://er1.s4oceanice.eu/erddap/tabledap/POLARSTERN_CRUISES
46 NPI_Iceberg_database https://er1.s4oceanice.eu/erddap/tabledap/NPI_Iceberg_database
47 SOCHIC_Cruise_2022_Agulhas_II_met https://er1.s4oceanice.eu/erddap/tabledap/SOCHIC_Cruise_2022_Agulhas_II_met

Using these URLs we will than be able to get their relative dataset’s data.
In this example we will use the ARGO_FLOATS_OCEANICE dataset, with the URL:
https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE

Get a list of variables for the dataset#

Now we can make a request to the dataset’s metadata, which will give us a list of all the available variables and their relative data type. These variables can be than used in the following requests.

Hide code cell source
BASE_URL = 'https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE'

# building the full url for the metadata and making the request
metadata_url = BASE_URL.replace('tabledap', 'info') + '/index.csv'

metadata_resp = requests.get(metadata_url)
metadata_df = pd.read_csv(io.StringIO(metadata_resp.text), sep=',')
variables_df = metadata_df.loc[metadata_df['Row Type'].isin(['variable', 'dimension'])]
variables_df.reset_index(drop=True, inplace=True)
variables_df.drop(columns=['Row Type', 'Attribute Name', 'Value'], inplace=True)
variables_df
Variable Name Data Type
0 PLATFORMCODE String
1 data_type String
2 format_version String
3 handbook_version String
4 reference_date_time double
5 date_creation double
6 date_update double
7 WMO String
8 project_name String
9 pi_name String
10 cycle_number int
11 direction String
12 data_center String
13 dc_reference String
14 data_state_indicator String
15 data_mode String
16 platform_type String
17 float_serial_no String
18 firmware_version String
19 wmo_inst_type String
20 time double
21 time_qc String
22 time_location double
23 latitude double
24 longitude double
25 position_qc String
26 positioning_system String
27 profile_pres_qc String
28 profile_temp_qc String
29 profile_psal_qc String
30 vertical_sampling_scheme String
31 config_mission_number int
32 PRESS float
33 pres_qc String
34 pres_adjusted float
35 pres_adjusted_qc String
36 pres_adjusted_error float
37 TEMP float
38 TEMP_QC String
39 temp_adjusted float
40 TEMP_adjusted_QC String
41 TEMP_adjusted_error float
42 PSAL float
43 PSAL_QC String
44 PSAL_ADJUSTED float
45 PSAL_ADJUSTED_QC String
46 PSAL_ADJUSTED_error float
47 DOXY float
48 DOXY_QC String
49 TEMP_DOXY float
50 TEMP_DOXY_QC String
51 molar_DOXY float
52 molar_DOXY_QC String
53 TURBIDITY float
54 TURBIDITY_QC String
55 CHLA float
56 CHLA_QC String
57 NITRATE float
58 NITRATE_QC String

Get a list of platform codes#

We will then perform another request to retrieve a list of platform codes for the selected dataset, which will be useful in the following queries to the ERDDAP.

Hide code cell source
platforms_query = '.csv?PLATFORMCODE&distinct()'

# The data format specified is 'csv' (in which the first row contains the column names and the second the units of measurment, which will be removed from the dataframe in these examples).
# Other possibilities are  'csv0' which will return only the data rows and 'csvp', which will return a csv with the column names (and their unit of measurment) as first row and data starting from the second.
# the additional parameter &distinct() will ensure we will get only unique rows

platform_resp = requests.get(BASE_URL + platforms_query)
platforms_df = pd.read_csv(io.StringIO(platform_resp.text), sep=',')
platforms_df
PLATFORMCODE
0 1902687
1 3902582
2 4903780
3 4903786
4 5907087
5 5907093
6 6990621
7 6990622

Data gathering#

Following are three examples of data queries:

With PLATFORMCODE and time range#

When building the URL to get the data a platform code can be inserted in the query to get the data relative to the platform. In the following example the platform code ‘1902687’ has been chosen and the variables are:

  • PLATFORMCODE

  • time

  • latitude

  • longitude

  • TEMP

The query will look like:

?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP&PLATFORMCODE=%221902687%22&time%3E=2024-03-29T09%3A45%3A00Z&time%3C=2024-04-29T09%3A45%3A00Z

It can be divided into two main parts:

  1. ?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP

    Where ? indicates the start of query parametes and the rest is a list of variables we want as columns in the csv, separated by %2C, an encoded comma(,).

  2. &PLATFORMCODE=%221902687%22&time%3E=2024-03-29T09%3A45%3A00Z&time%3C=2024-04-29T09%3A45%3A00Z

    After the list of variables we can add filters, separated by &.

    The platform code chosen is 1902687 and it has to be inserted between encoded double quotes(“), represented by %22.

    The syntax for the timerange is:

    time%3E=2024-03-29T09%3A45%3A00Z&time%3C=2024-04-29T09%3A45%3A00Z

    Here the other encoded characters are %3E (>), %3C (<) and %3A (:).

    The time has to be passed as an ISO string, with the format YYYY-MM-DDThh:mm:ssZ.

Hide code cell source
platform_code = '1902687'

variables = '.csv?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP'
filters = f'&PLATFORMCODE=%22{platform_code}%22&time%3E=2023-04-29T00%3A00%3A00Z&time%3C=2024-04-29T00%3A00%3A00Z'

data_resp = requests.get(BASE_URL + variables + filters)
data_df = pd.read_csv(io.StringIO(data_resp.text), sep=',')

data_df=data_df.sort_values(by=["time"])
data_df.reset_index(drop=True, inplace=True)
data_df = data_df.dropna(subset=['PLATFORMCODE'])
data_df
PLATFORMCODE time latitude longitude TEMP
0 1902687.0 2024-01-12T00:33:00Z -74.85373 -102.42796666666666 1.107
1 1902687.0 2024-01-12T00:33:00Z -74.85373 -102.42796666666666 1.098
2 1902687.0 2024-01-12T00:33:00Z -74.85373 -102.42796666666666 1.091
3 1902687.0 2024-01-12T00:33:00Z -74.85373 -102.42796666666666 1.087
4 1902687.0 2024-01-12T00:33:00Z -74.85373 -102.42796666666666 1.084
... ... ... ... ... ...
787 1902687.0 2024-02-12T05:31:20Z -74.89717 -102.34899666666666 -0.677
788 1902687.0 2024-02-12T05:31:20Z -74.89717 -102.34899666666666 -0.913
789 1902687.0 2024-02-12T05:31:20Z -74.89717 -102.34899666666666 -1.098
790 1902687.0 2024-02-12T05:31:20Z -74.89717 -102.34899666666666 -1.194
791 1902687.0 2024-02-12T05:31:20Z -74.89717 -102.34899666666666 0.031

792 rows × 5 columns

With multiple platform codes#

It is possible to select multiple platform codes when querying the data. This can be done by using a regex.

In this example the three platform codes used will be ‘4903780’, ‘4903786’ and ‘3902582’.

To build these part of the query the regex will have this syntax:

PLATFORMCODE=~%22(platform_code_1%7Cplatform_code_2%7Cplatform_code_3)

Where %7C represents the symbol | (meaning OR).

Hide code cell source
regex_platform_code = '(3902582%7C4903780%7C4903786)'

variables = '.csv?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP'
regex_filters = f'&PLATFORMCODE=~%22{regex_platform_code}%22&time%3E=2024-02-20T00%3A00%3A00Z&time%3C=2024-04-29T00%3A00%3A00Z'

regex_data_resp = requests.get(BASE_URL + variables + regex_filters)
regex_data_df = pd.read_csv(io.StringIO(regex_data_resp.text), sep=',')

regex_data_df = regex_data_df.dropna(subset=['PLATFORMCODE'])

unique_platform_codes = regex_data_df['PLATFORMCODE'].unique()
print('\nThis DataFrame contains the platform codes:', unique_platform_codes, '\n')
regex_data_df
This DataFrame contains the platform codes: [4903780. 4903786. 3902582.] 
PLATFORMCODE time latitude longitude TEMP
1 4903780.0 2024-02-20T05:12:20Z -67.26258518663079 80.31254458206934 -1.823
2 4903780.0 2024-02-20T05:12:20Z -67.26258518663079 80.31254458206934 -1.821
3 4903780.0 2024-02-20T05:12:20Z -67.26258518663079 80.31254458206934 -1.82
4 4903780.0 2024-02-20T05:12:20Z -67.26258518663079 80.31254458206934 -1.82
5 4903780.0 2024-02-20T05:12:20Z -67.26258518663079 80.31254458206934 -1.828
... ... ... ... ... ...
1826 3902582.0 2024-02-25T05:42:20Z -78.13948 -174.97683 -1.902
1827 3902582.0 2024-02-25T05:42:20Z -78.13948 -174.97683 -1.901
1828 3902582.0 2024-02-25T05:42:20Z -78.13948 -174.97683 -1.896
1829 3902582.0 2024-02-25T05:42:20Z -78.13948 -174.97683 -1.894
1830 3902582.0 2024-02-25T05:42:20Z -78.13948 -174.97683 -1.894

1830 rows × 5 columns

With coordinates range#

Another possibility when querying the data is to specify a range of coordinates. This can be done by inserting in the query filters the following:

latitude%3E=-75&latitude%3C=-30&longitude%3E=-50&longitude%3C=50

Effectively selecting platforms inside a square delimited by:

  • latitude equal or greater than -75 and equal or less than -30

and

  • longitude equal or greater than -50 and equal or less than 50.

Hide code cell source
coords_variables = '.csv?PLATFORMCODE%2Clatitude%2Clongitude'
coords_filter = '&latitude%3E=-75&latitude%3C=-30&longitude%3E=-50&longitude%3C=50&distinct()'

coords_data_resp = requests.get(BASE_URL + coords_variables + coords_filter)
coords_data_df = pd.read_csv(io.StringIO(coords_data_resp.text), sep=',')

coords_data_df = coords_data_df.dropna(subset=['PLATFORMCODE'])

coords_data_df
PLATFORMCODE latitude longitude
1 6990622.0 -68.96947443536386 28.087817418426756
2 6990622.0 -68.94769391657476 -21.311241844741982

Additional resources#

For additional information about ERDDAP please visit:

https://er1.s4oceanice.eu/erddap/information.html

The Python libraries that have been used in this notebook are: