OCEAN:ICE’s ERDDAP querying#
For an interactive version of this page please visit the Google Colab at the link:
Open in Google Colab
(To open link in new tab press Ctrl + click)
This notebook will illustrate how to build queries and make requests to https://er1.s4oceanice.eu/erddap/index.html using Python.
Get a list of available datasets#
To check what datasets are available in the ERDDAP and get their URLs the first step is to make a request to https://er1.s4oceanice.eu/erddap/tabledap/allDatasets.html
performing a query that will allow us to get the tabledap datasets’ ids and their URLs based on the data structure. For this example the griddap datasets have been omitted. After receiving the data it will be loaded into a pandas DataFrame.
Show code cell source
datasets_url = 'https://er1.s4oceanice.eu/erddap/tabledap/allDatasets.csv?datasetID%2Ctabledap'
# building the full url and making the request
datasets_resp = requests.get(datasets_url)
# loadingd the data into a pandas DataFrame
datasets_df = pd.read_csv(io.StringIO(datasets_resp.text), sep=',')
datasets_df['url'] = datasets_df['tabledap']
# dropping rows where all values are NaN
df_cleaned = datasets_df.dropna(how='all')
df_cleaned = df_cleaned.dropna(subset='url')
# removing now obsolete columns and showing the content
datasets_df = df_cleaned.drop(columns=['tabledap'])
pd.set_option('display.max_colwidth', None)
datasets_df
datasetID | url | |
---|---|---|
1 | allDatasets | https://er1.s4oceanice.eu/erddap/tabledap/allDatasets |
2 | AAD_ASPeCt-Bio_historical | https://er1.s4oceanice.eu/erddap/tabledap/AAD_ASPeCt-Bio_historical |
3 | AMUNDSEN_CRUISES | https://er1.s4oceanice.eu/erddap/tabledap/AMUNDSEN_CRUISES |
4 | ANT_TG_OCEAN_HEIGHT | https://er1.s4oceanice.eu/erddap/tabledap/ANT_TG_OCEAN_HEIGHT |
5 | ARCTICNET_CRUISES | https://er1.s4oceanice.eu/erddap/tabledap/ARCTICNET_CRUISES |
6 | Australian_Antarctic_Program | https://er1.s4oceanice.eu/erddap/tabledap/Australian_Antarctic_Program |
7 | British_Antartica_Survey_webcams | https://er1.s4oceanice.eu/erddap/tabledap/British_Antartica_Survey_webcams |
8 | CCHDO_Bottle | https://er1.s4oceanice.eu/erddap/tabledap/CCHDO_Bottle |
9 | CCHDO_CTD | https://er1.s4oceanice.eu/erddap/tabledap/CCHDO_CTD |
11 | SURVOSTRAL | https://er1.s4oceanice.eu/erddap/tabledap/SURVOSTRAL |
12 | commandant_charcot_a5qvgc | https://er1.s4oceanice.eu/erddap/tabledap/commandant_charcot_a5qvgc |
13 | IADC_ctd_cassandra_bottle | https://er1.s4oceanice.eu/erddap/tabledap/IADC_ctd_cassandra_bottle |
14 | IADC_ctd_cassandra_downcast | https://er1.s4oceanice.eu/erddap/tabledap/IADC_ctd_cassandra_downcast |
15 | IADC_s1_ctd | https://er1.s4oceanice.eu/erddap/tabledap/IADC_s1_ctd |
16 | DomeC_SP02 | https://er1.s4oceanice.eu/erddap/tabledap/DomeC_SP02 |
18 | itase_chemistry_synthesis_group_9ivzat | https://er1.s4oceanice.eu/erddap/tabledap/itase_chemistry_synthesis_group_9ivzat |
19 | IADC_Mooring_Aquadopp_35m | https://er1.s4oceanice.eu/erddap/tabledap/IADC_Mooring_Aquadopp_35m |
20 | IADC_Mooring_CTD_30m | https://er1.s4oceanice.eu/erddap/tabledap/IADC_Mooring_CTD_30m |
21 | IADC_Mooring_CTD_90m | https://er1.s4oceanice.eu/erddap/tabledap/IADC_Mooring_CTD_90m |
22 | MEOP_Animal-borne_profiles | https://er1.s4oceanice.eu/erddap/tabledap/MEOP_Animal-borne_profiles |
23 | NECKLACE | https://er1.s4oceanice.eu/erddap/tabledap/NECKLACE |
27 | ARGO_FLOATS_OCEANICE | https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE |
31 | seanoe_moored_time_series_south60S | https://er1.s4oceanice.eu/erddap/tabledap/seanoe_moored_time_series_south60S |
32 | Weddell_Sea_water_mass_age_meltwater_fractions | https://er1.s4oceanice.eu/erddap/tabledap/Weddell_Sea_water_mass_age_meltwater_fractions |
33 | POLARSTERN_CRUISES | https://er1.s4oceanice.eu/erddap/tabledap/POLARSTERN_CRUISES |
35 | NPI_Iceberg_database | https://er1.s4oceanice.eu/erddap/tabledap/NPI_Iceberg_database |
36 | SOCHIC_Cruise_2022_Agulhas_II_met | https://er1.s4oceanice.eu/erddap/tabledap/SOCHIC_Cruise_2022_Agulhas_II_met |
37 | SOCHIC_Cruise_2022_Agulhas_II_CTD | https://er1.s4oceanice.eu/erddap/tabledap/SOCHIC_Cruise_2022_Agulhas_II_CTD |
Using these URLs we will than be able to get their relative dataset’s data.
In this example we will use the ARGO_FLOATS_OCEANICE dataset, with the URL:
https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE
Get a list of variables for the dataset#
Now we can make a request to the dataset’s metadata, which will give us a list of all the available variables and their relative data type. These variables can be than used in the following requests.
Show code cell source
BASE_URL = 'https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE'
# building the full url for the metadata and making the request
metadata_url = BASE_URL.replace('tabledap', 'info') + '/index.csv'
metadata_resp = requests.get(metadata_url)
metadata_df = pd.read_csv(io.StringIO(metadata_resp.text), sep=',')
variables_df = metadata_df.loc[metadata_df['Row Type'].isin(['variable', 'dimension'])]
variables_df.reset_index(drop=True, inplace=True)
variables_df.drop(columns=['Row Type', 'Attribute Name', 'Value'], inplace=True)
variables_df
Variable Name | Data Type | |
---|---|---|
0 | PLATFORMCODE | String |
1 | data_type | String |
2 | format_version | String |
3 | handbook_version | String |
4 | reference_date_time | double |
5 | date_creation | double |
6 | date_update | double |
7 | WMO | String |
8 | project_name | String |
9 | pi_name | String |
10 | cycle_number | int |
11 | direction | String |
12 | data_center | String |
13 | dc_reference | String |
14 | data_state_indicator | String |
15 | data_mode | String |
16 | platform_type | String |
17 | float_serial_no | String |
18 | firmware_version | String |
19 | wmo_inst_type | String |
20 | time | double |
21 | time_qc | String |
22 | time_location | double |
23 | latitude | double |
24 | longitude | double |
25 | position_qc | String |
26 | positioning_system | String |
27 | profile_pres_qc | String |
28 | profile_temp_qc | String |
29 | profile_psal_qc | String |
30 | vertical_sampling_scheme | String |
31 | config_mission_number | int |
32 | PRESS | float |
33 | pres_qc | String |
34 | pres_adjusted | float |
35 | pres_adjusted_qc | String |
36 | pres_adjusted_error | float |
37 | TEMP | float |
38 | TEMP_QC | String |
39 | temp_adjusted | float |
40 | TEMP_adjusted_QC | String |
41 | TEMP_adjusted_error | float |
42 | PSAL | float |
43 | PSAL_QC | String |
44 | PSAL_ADJUSTED | float |
45 | PSAL_ADJUSTED_QC | String |
46 | PSAL_ADJUSTED_error | float |
47 | DOXY | float |
48 | DOXY_QC | String |
49 | TEMP_DOXY | float |
50 | TEMP_DOXY_QC | String |
51 | molar_DOXY | float |
52 | molar_DOXY_QC | String |
53 | TURBIDITY | float |
54 | TURBIDITY_QC | String |
55 | CHLA | float |
56 | CHLA_QC | String |
57 | NITRATE | float |
58 | NITRATE_QC | String |
Get a list of platform codes#
We will then perform another request to retrieve a list of platform codes for the selected dataset, which will be useful in the following queries to the ERDDAP.
Show code cell source
platforms_query = '.csv?PLATFORMCODE&distinct()'
# The data format specified is 'csv' (in which the first row contains the column names and the second the units of measurment, which will be removed from the dataframe in these examples).
# Other possibilities are 'csv0' which will return only the data rows and 'csvp', which will return a csv with the column names (and their unit of measurment) as first row and data starting from the second.
# the additional parameter &distinct() will ensure we will get only unique rows
platform_resp = requests.get(BASE_URL + platforms_query)
platforms_df = pd.read_csv(io.StringIO(platform_resp.text), sep=',')
platforms_df
PLATFORMCODE | |
---|---|
0 | 1902687 |
1 | 3902582 |
2 | 4903780 |
3 | 4903786 |
4 | 5907087 |
5 | 5907093 |
6 | 6990621 |
7 | 6990622 |
Data gathering#
Following are three examples of data queries:
With PLATFORMCODE and time range#
When building the URL to get the data a platform code can be inserted in the query to get the data relative to the platform. In the following example the platform code ‘1902687’ has been chosen and the variables are:
PLATFORMCODE
time
latitude
longitude
TEMP
The query will look like:
?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP&PLATFORMCODE=%221902687%22&time%3E=2024-03-29T09%3A45%3A00Z&time%3C=2024-04-29T09%3A45%3A00Z
It can be divided into two main parts:
?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP
Where
?
indicates the start of query parametes and the rest is a list of variables we want as columns in the csv, separated by%2C
, an encoded comma(,).&PLATFORMCODE=%221902687%22&time%3E=2024-03-29T09%3A45%3A00Z&time%3C=2024-04-29T09%3A45%3A00Z
After the list of variables we can add filters, separated by
&
.The platform code chosen is 1902687 and it has to be inserted between encoded double quotes(“), represented by
%22
.The syntax for the timerange is:
time%3E=2024-03-29T09%3A45%3A00Z&time%3C=2024-04-29T09%3A45%3A00Z
Here the other encoded characters are
%3E
(>),%3C
(<) and%3A
(:).The time has to be passed as an ISO string, with the format YYYY-MM-DDThh:mm:ssZ.
Show code cell source
platform_code = '1902687'
variables = '.csv?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP'
filters = f'&PLATFORMCODE=%22{platform_code}%22&time%3E=2023-04-29T00%3A00%3A00Z&time%3C=2024-04-29T00%3A00%3A00Z'
data_resp = requests.get(BASE_URL + variables + filters)
data_df = pd.read_csv(io.StringIO(data_resp.text), sep=',')
data_df=data_df.sort_values(by=["time"])
data_df.reset_index(drop=True, inplace=True)
data_df = data_df.dropna(subset=['PLATFORMCODE'])
data_df
PLATFORMCODE | time | latitude | longitude | TEMP | |
---|---|---|---|---|---|
0 | 1902687.0 | 2024-01-12T00:33:00Z | -74.85373 | -102.42796666666666 | 1.107 |
1 | 1902687.0 | 2024-01-12T00:33:00Z | -74.85373 | -102.42796666666666 | 1.098 |
2 | 1902687.0 | 2024-01-12T00:33:00Z | -74.85373 | -102.42796666666666 | 1.091 |
3 | 1902687.0 | 2024-01-12T00:33:00Z | -74.85373 | -102.42796666666666 | 1.087 |
4 | 1902687.0 | 2024-01-12T00:33:00Z | -74.85373 | -102.42796666666666 | 1.084 |
... | ... | ... | ... | ... | ... |
787 | 1902687.0 | 2024-02-12T05:31:20Z | -74.89717 | -102.34899666666666 | -0.677 |
788 | 1902687.0 | 2024-02-12T05:31:20Z | -74.89717 | -102.34899666666666 | -0.913 |
789 | 1902687.0 | 2024-02-12T05:31:20Z | -74.89717 | -102.34899666666666 | -1.098 |
790 | 1902687.0 | 2024-02-12T05:31:20Z | -74.89717 | -102.34899666666666 | -1.194 |
791 | 1902687.0 | 2024-02-12T05:31:20Z | -74.89717 | -102.34899666666666 | 0.031 |
792 rows × 5 columns
With multiple platform codes#
It is possible to select multiple platform codes when querying the data. This can be done by using a regex.
In this example the three platform codes used will be ‘4903780’, ‘4903786’ and ‘3902582’.
To build these part of the query the regex will have this syntax:
PLATFORMCODE=~%22(platform_code_1%7Cplatform_code_2%7Cplatform_code_3)
Where %7C
represents the symbol |
(meaning OR).
Show code cell source
regex_platform_code = '(3902582%7C4903780%7C4903786)'
variables = '.csv?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP'
regex_filters = f'&PLATFORMCODE=~%22{regex_platform_code}%22&time%3E=2024-02-20T00%3A00%3A00Z&time%3C=2024-04-29T00%3A00%3A00Z'
regex_data_resp = requests.get(BASE_URL + variables + regex_filters)
regex_data_df = pd.read_csv(io.StringIO(regex_data_resp.text), sep=',')
regex_data_df = regex_data_df.dropna(subset=['PLATFORMCODE'])
unique_platform_codes = regex_data_df['PLATFORMCODE'].unique()
print('\nThis DataFrame contains the platform codes:', unique_platform_codes, '\n')
regex_data_df
This DataFrame contains the platform codes: [4903780. 4903786. 3902582.]
PLATFORMCODE | time | latitude | longitude | TEMP | |
---|---|---|---|---|---|
1 | 4903780.0 | 2024-02-20T05:12:20Z | -67.26258518663079 | 80.31254458206934 | -1.823 |
2 | 4903780.0 | 2024-02-20T05:12:20Z | -67.26258518663079 | 80.31254458206934 | -1.821 |
3 | 4903780.0 | 2024-02-20T05:12:20Z | -67.26258518663079 | 80.31254458206934 | -1.82 |
4 | 4903780.0 | 2024-02-20T05:12:20Z | -67.26258518663079 | 80.31254458206934 | -1.82 |
5 | 4903780.0 | 2024-02-20T05:12:20Z | -67.26258518663079 | 80.31254458206934 | -1.828 |
... | ... | ... | ... | ... | ... |
1826 | 3902582.0 | 2024-02-25T05:42:20Z | -78.13948 | -174.97683 | -1.902 |
1827 | 3902582.0 | 2024-02-25T05:42:20Z | -78.13948 | -174.97683 | -1.901 |
1828 | 3902582.0 | 2024-02-25T05:42:20Z | -78.13948 | -174.97683 | -1.896 |
1829 | 3902582.0 | 2024-02-25T05:42:20Z | -78.13948 | -174.97683 | -1.894 |
1830 | 3902582.0 | 2024-02-25T05:42:20Z | -78.13948 | -174.97683 | -1.894 |
1830 rows × 5 columns
With coordinates range#
Another possibility when querying the data is to specify a range of coordinates. This can be done by inserting in the query filters the following:
latitude%3E=-75&latitude%3C=-30&longitude%3E=-50&longitude%3C=50
Effectively selecting platforms inside a square delimited by:
latitude equal or greater than -75 and equal or less than -30
and
longitude equal or greater than -50 and equal or less than 50.
Show code cell source
coords_variables = '.csv?PLATFORMCODE%2Clatitude%2Clongitude'
coords_filter = '&latitude%3E=-75&latitude%3C=-30&longitude%3E=-50&longitude%3C=50&distinct()'
coords_data_resp = requests.get(BASE_URL + coords_variables + coords_filter)
coords_data_df = pd.read_csv(io.StringIO(coords_data_resp.text), sep=',')
coords_data_df = coords_data_df.dropna(subset=['PLATFORMCODE'])
coords_data_df
PLATFORMCODE | latitude | longitude | |
---|---|---|---|
1 | 6990622.0 | -68.96947443536386 | 28.087817418426756 |
2 | 6990622.0 | -68.94769391657476 | -21.311241844741982 |
Additional resources#
For additional information about ERDDAP please visit:
https://er1.s4oceanice.eu/erddap/information.html
The Python libraries that have been used in this notebook are: