OCEAN ICE’s ERDDAP querying: tabledap

OCEAN ICE’s ERDDAP querying: tabledap#

This notebook will illustrate how to build queries and make requests to https://er1.s4oceanice.eu/erddap/index.html using Python.

For an interactive version of this page please visit the Google Colab:
Open in Google Colab
_{(To open link in new tab press Ctrl + click)}

Alternatively this notebook can be opened with Binder by following the link: OCEAN ICE’S ERDDAP querying: tabledap

Setup#

To begin we need to import the necessary libraries.

# !pip install requests pandas
# these packages should be installed with the command above if running the code locally

import requests
import pandas as pd
import io

Get a list of available tabledap datasets#

To check what datasets are available in the ERDDAP and get their URLs the first step is to make a request to https://er1.s4oceanice.eu/erddap/tabledap/allDatasets.html using the URL that will allow us to get the datasets’ ids and their URLs based on the data structure. After receiving the data it will be loaded into a pandas DataFrame.

datasets_url = 'https://er1.s4oceanice.eu/erddap/tabledap/allDatasets.csv?datasetID%2Ctabledap'

# request and load into DataFrame
datasets_resp = requests.get(datasets_url)
datasets_df = pd.read_csv(io.StringIO(datasets_resp.text), sep=',')

# drop rows where tabledap is NaN
datasets_df = datasets_df.dropna(subset=['tabledap'])

# add url column
datasets_df['url'] = datasets_df['tabledap']
cleaned_df = datasets_df.drop(columns=['tabledap'])

pd.set_option('display.max_colwidth', None)
cleaned_df = cleaned_df.reset_index(drop=True)
cleaned_df

	datasetID	url
0	allDatasets	https://er1.s4oceanice.eu/erddap/tabledap/allDatasets
1	AAD_ASPeCt-Bio_historical	https://er1.s4oceanice.eu/erddap/tabledap/AAD_ASPeCt-Bio_historical
2	AADC_sea_ice_measurements_database_1985_2007	https://er1.s4oceanice.eu/erddap/tabledap/AADC_sea_ice_measurements_database_1985_2007
3	AMUNDSEN_CRUISES	https://er1.s4oceanice.eu/erddap/tabledap/AMUNDSEN_CRUISES
4	ANT_TG_OCEAN_HEIGHT	https://er1.s4oceanice.eu/erddap/tabledap/ANT_TG_OCEAN_HEIGHT
5	AWS_3_hourly	https://er1.s4oceanice.eu/erddap/tabledap/AWS_3_hourly
6	AWS_Daily_25	https://er1.s4oceanice.eu/erddap/tabledap/AWS_Daily_25
7	AWS_Daily_75	https://er1.s4oceanice.eu/erddap/tabledap/AWS_Daily_75
8	AWS_Monthly_25	https://er1.s4oceanice.eu/erddap/tabledap/AWS_Monthly_25
9	AWS_Monthly_75	https://er1.s4oceanice.eu/erddap/tabledap/AWS_Monthly_75
10	ARCTICNET_CRUISES	https://er1.s4oceanice.eu/erddap/tabledap/ARCTICNET_CRUISES
11	Australian_Antarctic_Program	https://er1.s4oceanice.eu/erddap/tabledap/Australian_Antarctic_Program
12	British_Antartica_Survey_webcams	https://er1.s4oceanice.eu/erddap/tabledap/British_Antartica_Survey_webcams
13	CCHDO_Bottle	https://er1.s4oceanice.eu/erddap/tabledap/CCHDO_Bottle
14	CCHDO_CTD	https://er1.s4oceanice.eu/erddap/tabledap/CCHDO_CTD
15	SURVOSTRAL	https://er1.s4oceanice.eu/erddap/tabledap/SURVOSTRAL
16	IADC_ctd_cassandra_bottle	https://er1.s4oceanice.eu/erddap/tabledap/IADC_ctd_cassandra_bottle
17	IADC_ctd_cassandra_downcast	https://er1.s4oceanice.eu/erddap/tabledap/IADC_ctd_cassandra_downcast
18	IADC_s1_ctd	https://er1.s4oceanice.eu/erddap/tabledap/IADC_s1_ctd
19	PLATFORMS_METADATA	https://er1.s4oceanice.eu/erddap/tabledap/PLATFORMS_METADATA
20	DATA_TRAWLER_SST	https://er1.s4oceanice.eu/erddap/tabledap/DATA_TRAWLER_SST
21	MEOP_Animal-borne_profiles	https://er1.s4oceanice.eu/erddap/tabledap/MEOP_Animal-borne_profiles
22	NECKLACE	https://er1.s4oceanice.eu/erddap/tabledap/NECKLACE
23	ARGO_FLOATS_OCEANICE_DE	https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE_DE
24	ARGO_FLOATS_OCEANICE	https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE
25	ARGO_FLOATS_OCEANICE_UK	https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE_UK
26	seanoe_ctemp_asal_profile_90S45S	https://er1.s4oceanice.eu/erddap/tabledap/seanoe_ctemp_asal_profile_90S45S
27	seanoe_moored_time_series_south60S	https://er1.s4oceanice.eu/erddap/tabledap/seanoe_moored_time_series_south60S
28	Weddell_Sea_water_mass_age_meltwater_fractions	https://er1.s4oceanice.eu/erddap/tabledap/Weddell_Sea_water_mass_age_meltwater_fractions
29	ACE_bottle_CTD20200406CURRSGCMR	https://er1.s4oceanice.eu/erddap/tabledap/ACE_bottle_CTD20200406CURRSGCMR
30	ACE_ctd_CTD20200406CURRSGCMR	https://er1.s4oceanice.eu/erddap/tabledap/ACE_ctd_CTD20200406CURRSGCMR
31	POLARSTERN_CRUISES	https://er1.s4oceanice.eu/erddap/tabledap/POLARSTERN_CRUISES
32	NPI_Iceberg_database	https://er1.s4oceanice.eu/erddap/tabledap/NPI_Iceberg_database
33	SOCHIC_Cruise_2022_Agulhas_II_met	https://er1.s4oceanice.eu/erddap/tabledap/SOCHIC_Cruise_2022_Agulhas_II_met

Using these URLs we will than be able to get their data.
In this example we will use the ARGO_FLOATS_OCEANICE dataset, with the URL: https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE

Get a list of variables for the dataset#

Now we can make a request to the dataset’s metadata, which will give us a list of all the available variables and their relative data type. These variables can be than used in the following requests.

BASE_URL = 'https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE'

# building the full url for the metadata and making the request
metadata_url = BASE_URL.replace('tabledap', 'info').replace('griddap', 'info') + '/index.csv'

metadata_resp = requests.get(metadata_url)
metadata_df = pd.read_csv(io.StringIO(metadata_resp.text), sep=',')
variables_df = metadata_df.loc[metadata_df['Row Type'].isin(['variable', 'dimension'])]
variables_df.reset_index(drop=True, inplace=True)
variables_df.drop(columns=['Row Type', 'Attribute Name', 'Value'], inplace=True)
variables_df

/tmp/ipykernel_2269/1530411721.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  variables_df.drop(columns=['Row Type', 'Attribute Name', 'Value'], inplace=True)

	Variable Name	Data Type
0	PLATFORMCODE	String
1	data_type	String
2	format_version	String
3	handbook_version	String
4	reference_date_time	double
5	date_creation	double
6	date_update	double
7	WMO	String
8	project_name	String
9	pi_name	String
10	cycle_number	int
11	direction	String
12	data_center	String
13	dc_reference	String
14	data_state_indicator	String
15	data_mode	String
16	platform_type	String
17	float_serial_no	String
18	firmware_version	String
19	wmo_inst_type	String
20	time	double
21	time_qc	String
22	time_location	double
23	latitude	double
24	longitude	double
25	position_qc	String
26	positioning_system	String
27	profile_pres_qc	String
28	profile_temp_qc	String
29	profile_psal_qc	String
30	vertical_sampling_scheme	String
31	config_mission_number	int
32	PRESS	float
33	pres_qc	String
34	pres_adjusted	float
35	pres_adjusted_qc	String
36	pres_adjusted_error	float
37	TEMP	float
38	TEMP_QC	String
39	temp_adjusted	float
40	TEMP_adjusted_QC	String
41	TEMP_adjusted_error	float
42	PSAL	float
43	PSAL_QC	String
44	PSAL_ADJUSTED	float
45	PSAL_ADJUSTED_QC	String
46	PSAL_ADJUSTED_error	float
47	DOXY	float
48	DOXY_QC	String
49	TEMP_DOXY	float
50	TEMP_DOXY_QC	String
51	molar_DOXY	float
52	molar_DOXY_QC	String
53	TURBIDITY	float
54	TURBIDITY_QC	String
55	CHLA	float
56	CHLA_QC	String
57	NITRATE	float
58	NITRATE_QC	String

Get a list of platform codes#

We will then perform another request to retrieve a list of platform codes for the selected dataset, which will be useful in the following queries to the ERDDAP.

platforms_query = '.csv?PLATFORMCODE&distinct()'

# The data format specified is 'csv' (in which the first row contains the column names and the second the units of measurment, which will be removed from the dataframe in these examples).
# Other possibilities are  'csv0' which will return only the data rows and 'csvp', which will return a csv with the column names (and their unit of measurment) as first row and data starting from the second.
# the additional parameter &distinct() will ensure we will get only unique rows

platform_resp = requests.get(BASE_URL + platforms_query)
platforms_df = pd.read_csv(io.StringIO(platform_resp.text), sep=',')
platforms_df

	PLATFORMCODE
0	1902687
1	3902582
2	4903780
3	4903786
4	5907087
5	5907093
6	6990621
7	6990622

Data gathering#

Following are three examples of data queries:

With PLATFORMCODE and time range#

When building the URL to get the data a platform code can be inserted in the query to get the data relative to the platform. In the following example the platform code ‘1902687’ has been chosen and the variables are:

PLATFORMCODE
time
latitude
longitude
TEMP

The query will look like:

?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP&PLATFORMCODE=%221902687%22&time%3E=2023-04-29T09%3A45%3A00Z&time%3C=2024-04-29T09%3A45%3A00Z

It can be divided into two main parts:

?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP

Where ? indicates the start of query parametes and the rest is a list of variables we want as columns in the csv, separated by %2C, an encoded comma(,).
&PLATFORMCODE=%221902687%22&time%3E=2022-04-29T09%3A45%3A00Z&time%3C=2024-04-29T09%3A45%3A00Z

After the list of variables we can add filters, separated by &.

The platform code chosen is 1902687 and it has to be inserted between encoded double quotes(“), represented by %22.

The syntax for the timerange is:

time%3E=2022-04-29T09%3A45%3A00Z&time%3C=2024-04-29T09%3A45%3A00Z

Here the other encoded characters are %3E (>), %3C (<) and %3A (:).

The time has to be passed as an ISO string, with the format YYYY-MM-DDThh:mm:ssZ.

platform_code = '1902687'

variables = '.csv?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP'
filters = f'&PLATFORMCODE=%22{platform_code}%22&time%3E=2023-04-29T00%3A00%3A00Z&time%3C=2024-04-29T00%3A00%3A00Z'

data_resp = requests.get(BASE_URL + variables + filters)
data_df = pd.read_csv(io.StringIO(data_resp.text), sep=',')

data_df=data_df.sort_values(by=["time"])
data_df.reset_index(drop=True, inplace=True)
data_df = data_df.dropna(subset=['PLATFORMCODE'])
data_df

	PLATFORMCODE	time	latitude	longitude	TEMP
0	1902687.0	2024-01-12T00:33:00Z	-74.85373	-102.42796666666666	1.107
1	1902687.0	2024-01-12T00:33:00Z	-74.85373	-102.42796666666666	1.098
2	1902687.0	2024-01-12T00:33:00Z	-74.85373	-102.42796666666666	1.091
3	1902687.0	2024-01-12T00:33:00Z	-74.85373	-102.42796666666666	1.087
4	1902687.0	2024-01-12T00:33:00Z	-74.85373	-102.42796666666666	1.084
...	...	...	...	...	...
787	1902687.0	2024-02-12T05:31:20Z	-74.89717	-102.34899666666666	-0.677
788	1902687.0	2024-02-12T05:31:20Z	-74.89717	-102.34899666666666	-0.913
789	1902687.0	2024-02-12T05:31:20Z	-74.89717	-102.34899666666666	-1.098
790	1902687.0	2024-02-12T05:31:20Z	-74.89717	-102.34899666666666	-1.194
791	1902687.0	2024-02-12T05:31:20Z	-74.89717	-102.34899666666666	0.031

792 rows × 5 columns

With multiple platform codes#

It is possible to select multiple platform codes when querying the data. This can be done by using a regex.

In this example the three platform codes used will be ‘4903780’, ‘4903786’ and ‘3902582’.

To build these part of the query the regex will have this syntax:

PLATFORMCODE=~%22(platform_code_1%7Cplatform_code_2%7Cplatform_code_3)

Where %7C represents the symbol | (meaning OR).

regex_platform_code = '(3902582%7C4903780%7C4903786)'

variables = '.csv?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP'
regex_filters = f'&PLATFORMCODE=~%22{regex_platform_code}%22&time%3E=2024-02-20T00%3A00%3A00Z&time%3C=2024-04-29T00%3A00%3A00Z'

regex_data_resp = requests.get(BASE_URL + variables + regex_filters)
regex_data_df = pd.read_csv(io.StringIO(regex_data_resp.text), sep=',')

regex_data_df = regex_data_df.dropna(subset=['PLATFORMCODE'])

unique_platform_codes = regex_data_df['PLATFORMCODE'].unique()
print('\nThis DataFrame contains the platform codes:', unique_platform_codes, '\n')
regex_data_df

This DataFrame contains the platform codes: [4903780. 4903786. 3902582.]

	PLATFORMCODE	time	latitude	longitude	TEMP
1	4903780.0	2024-02-20T05:12:20Z	-67.26258518663079	80.31254458206934	-1.823
2	4903780.0	2024-02-20T05:12:20Z	-67.26258518663079	80.31254458206934	-1.821
3	4903780.0	2024-02-20T05:12:20Z	-67.26258518663079	80.31254458206934	-1.82
4	4903780.0	2024-02-20T05:12:20Z	-67.26258518663079	80.31254458206934	-1.82
5	4903780.0	2024-02-20T05:12:20Z	-67.26258518663079	80.31254458206934	-1.828
...	...	...	...	...	...
1826	3902582.0	2024-02-25T05:42:20Z	-78.13948	-174.97683	-1.902
1827	3902582.0	2024-02-25T05:42:20Z	-78.13948	-174.97683	-1.901
1828	3902582.0	2024-02-25T05:42:20Z	-78.13948	-174.97683	-1.896
1829	3902582.0	2024-02-25T05:42:20Z	-78.13948	-174.97683	-1.894
1830	3902582.0	2024-02-25T05:42:20Z	-78.13948	-174.97683	-1.894

1830 rows × 5 columns

With coordinates range#

Another possibility when querying the data is to specify a range of coordinates. This can be done by inserting in the query filters the following:

latitude%3E=-75&latitude%3C=-30&longitude%3E=-50&longitude%3C=50

Effectively selecting platforms inside a square delimited by:

latitude equal or greater than -75 and equal or less than -30

and

longitude equal or greater than -50 and equal or less than 50.

coords_variables = '.csv?PLATFORMCODE%2Clatitude%2Clongitude'
coords_filter = '&latitude%3E=-75&latitude%3C=-30&longitude%3E=-50&longitude%3C=50&distinct()'

coords_data_resp = requests.get(BASE_URL + coords_variables + coords_filter)
coords_data_df = pd.read_csv(io.StringIO(coords_data_resp.text), sep=',')

coords_data_df = coords_data_df.dropna(subset=['PLATFORMCODE'])

coords_data_df

	PLATFORMCODE	latitude	longitude
1	6990622.0	-68.96947443536386	28.087817418426756
2	6990622.0	-68.94769391657476	-21.311241844741982

Additional resources#

For additional information about ERDDAP please visit:

https://er1.s4oceanice.eu/erddap/information.html

The webpages for the Python’s libraries that have been used in this notebook are:

REQUESTS: https://requests.readthedocs.io/en/latest/
PANDAS: https://pandas.pydata.org/
IO: https://docs.python.org/3/library/io.html