{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "7ydeifHy_KZ0" }, "source": [ "# OCEAN:ICE's ERDDAP querying" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For an interactive version of this page please visit the Google Colab at the link: \n", "[ Open in Google Colab ](https://colab.research.google.com/drive/1-PUqnk8Oa6uq-7_4uw6kLMMdCNSEb7QA)
\n", "(To open link in new tab press Ctrl + click)" ] }, { "cell_type": "markdown", "metadata": { "id": "W2Lk2xzd_KZ2" }, "source": [ "This notebook will illustrate how to build queries and make requests to [https://er1.s4oceanice.eu/erddap/index.html](https://er1.s4oceanice.eu/erddap/index.html) using Python." ] }, { "cell_type": "markdown", "metadata": { "id": "HI6J1aGWwOFq" }, "source": [ "## **Get a list of available datasets**" ] }, { "cell_type": "markdown", "metadata": { "id": "x0XY0JDFwTRv" }, "source": [ "To check what datasets are available in the ERDDAP and get their URLs the first step is to make a request to [https://er1.s4oceanice.eu/erddap/tabledap/allDatasets.html](https://er1.s4oceanice.eu/erddap/tabledap/allDatasets.html) \n", "performing a query that will allow us to get the tabledap datasets' ids and their URLs based on the data structure. For this example the griddap datasets have been omitted. After receiving the data it will be loaded into a pandas DataFrame." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "%%capture\n", "# !pip install requests pandas\n", "# these packages should be installed with the command above if running the code outside the Colab\n", "\n", "import requests\n", "import pandas as pd\n", "import io\n", "import warnings\n", "\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 677 }, "id": "Wdmc1ee3xNGp", "outputId": "d4e7cfca-ddda-413c-c370-5142e0807469", "tags": [ "hide-input" ] }, "outputs": [ { "data": { "application/vnd.google.colaboratory.intrinsic+json": { "summary": "{\n \"name\": \"datasets_df\",\n \"rows\": 20,\n \"fields\": [\n {\n \"column\": \"datasetID\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 20,\n \"samples\": [\n \"allDatasets\",\n \"NPI_Iceberg_database\",\n \"NECKLACE\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"url\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 20,\n \"samples\": [\n \"https://er1.s4oceanice.eu/erddap/tabledap/allDatasets\",\n \"https://er1.s4oceanice.eu/erddap/tabledap/NPI_Iceberg_database\",\n \"https://er1.s4oceanice.eu/erddap/tabledap/NECKLACE\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}", "type": "dataframe", "variable_name": "datasets_df" }, "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datasetIDurl
1allDatasetshttps://er1.s4oceanice.eu/erddap/tabledap/allDatasets
2AAD_ASPeCt-Bio_historicalhttps://er1.s4oceanice.eu/erddap/tabledap/AAD_ASPeCt-Bio_historical
3AMUNDSEN_CRUISEShttps://er1.s4oceanice.eu/erddap/tabledap/AMUNDSEN_CRUISES
4ANT_TG_OCEAN_HEIGHThttps://er1.s4oceanice.eu/erddap/tabledap/ANT_TG_OCEAN_HEIGHT
5ARCTICNET_CRUISEShttps://er1.s4oceanice.eu/erddap/tabledap/ARCTICNET_CRUISES
6Australian_Antarctic_Programhttps://er1.s4oceanice.eu/erddap/tabledap/Australian_Antarctic_Program
7British_Antartica_Survey_webcamshttps://er1.s4oceanice.eu/erddap/tabledap/British_Antartica_Survey_webcams
8CCHDO_Bottlehttps://er1.s4oceanice.eu/erddap/tabledap/CCHDO_Bottle
9CCHDO_CTDhttps://er1.s4oceanice.eu/erddap/tabledap/CCHDO_CTD
11ARGO_FLOATS_OCEANICEhttps://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE
12SURVOSTRALhttps://er1.s4oceanice.eu/erddap/tabledap/SURVOSTRAL
13commandant_charcot_a5qvgchttps://er1.s4oceanice.eu/erddap/tabledap/commandant_charcot_a5qvgc
14DomeC_SP02https://er1.s4oceanice.eu/erddap/tabledap/DomeC_SP02
16itase_chemistry_synthesis_group_9ivzathttps://er1.s4oceanice.eu/erddap/tabledap/itase_chemistry_synthesis_group_9ivzat
17MEOP_Animal-borne_profileshttps://er1.s4oceanice.eu/erddap/tabledap/MEOP_Animal-borne_profiles
18NECKLACEhttps://er1.s4oceanice.eu/erddap/tabledap/NECKLACE
24seanoe_moored_time_series_south60Shttps://er1.s4oceanice.eu/erddap/tabledap/seanoe_moored_time_series_south60S
26NPI_Iceberg_databasehttps://er1.s4oceanice.eu/erddap/tabledap/NPI_Iceberg_database
27SOCHIC_Cruise_2022_Agulhas_II_methttps://er1.s4oceanice.eu/erddap/tabledap/SOCHIC_Cruise_2022_Agulhas_II_met
28SOCHIC_Cruise_2022_Agulhas_II_CTDhttps://er1.s4oceanice.eu/erddap/tabledap/SOCHIC_Cruise_2022_Agulhas_II_CTD
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "\n", "
\n", " \n", " \n", " \n", "
\n", "\n", "
\n", "
\n" ], "text/plain": [ " datasetID \\\n", "1 allDatasets \n", "2 AAD_ASPeCt-Bio_historical \n", "3 AMUNDSEN_CRUISES \n", "4 ANT_TG_OCEAN_HEIGHT \n", "5 ARCTICNET_CRUISES \n", "6 Australian_Antarctic_Program \n", "7 British_Antartica_Survey_webcams \n", "8 CCHDO_Bottle \n", "9 CCHDO_CTD \n", "11 ARGO_FLOATS_OCEANICE \n", "12 SURVOSTRAL \n", "13 commandant_charcot_a5qvgc \n", "14 DomeC_SP02 \n", "16 itase_chemistry_synthesis_group_9ivzat \n", "17 MEOP_Animal-borne_profiles \n", "18 NECKLACE \n", "24 seanoe_moored_time_series_south60S \n", "26 NPI_Iceberg_database \n", "27 SOCHIC_Cruise_2022_Agulhas_II_met \n", "28 SOCHIC_Cruise_2022_Agulhas_II_CTD \n", "\n", " url \n", "1 https://er1.s4oceanice.eu/erddap/tabledap/allDatasets \n", "2 https://er1.s4oceanice.eu/erddap/tabledap/AAD_ASPeCt-Bio_historical \n", "3 https://er1.s4oceanice.eu/erddap/tabledap/AMUNDSEN_CRUISES \n", "4 https://er1.s4oceanice.eu/erddap/tabledap/ANT_TG_OCEAN_HEIGHT \n", "5 https://er1.s4oceanice.eu/erddap/tabledap/ARCTICNET_CRUISES \n", "6 https://er1.s4oceanice.eu/erddap/tabledap/Australian_Antarctic_Program \n", "7 https://er1.s4oceanice.eu/erddap/tabledap/British_Antartica_Survey_webcams \n", "8 https://er1.s4oceanice.eu/erddap/tabledap/CCHDO_Bottle \n", "9 https://er1.s4oceanice.eu/erddap/tabledap/CCHDO_CTD \n", "11 https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE \n", "12 https://er1.s4oceanice.eu/erddap/tabledap/SURVOSTRAL \n", "13 https://er1.s4oceanice.eu/erddap/tabledap/commandant_charcot_a5qvgc \n", "14 https://er1.s4oceanice.eu/erddap/tabledap/DomeC_SP02 \n", "16 https://er1.s4oceanice.eu/erddap/tabledap/itase_chemistry_synthesis_group_9ivzat \n", "17 https://er1.s4oceanice.eu/erddap/tabledap/MEOP_Animal-borne_profiles \n", "18 https://er1.s4oceanice.eu/erddap/tabledap/NECKLACE \n", "24 https://er1.s4oceanice.eu/erddap/tabledap/seanoe_moored_time_series_south60S \n", "26 https://er1.s4oceanice.eu/erddap/tabledap/NPI_Iceberg_database \n", "27 https://er1.s4oceanice.eu/erddap/tabledap/SOCHIC_Cruise_2022_Agulhas_II_met \n", "28 https://er1.s4oceanice.eu/erddap/tabledap/SOCHIC_Cruise_2022_Agulhas_II_CTD " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "datasets_url = 'https://er1.s4oceanice.eu/erddap/tabledap/allDatasets.csv?datasetID%2Ctabledap'\n", "\n", "# building the full url and making the request\n", "datasets_resp = requests.get(datasets_url)\n", "# loadingd the data into a pandas DataFrame\n", "datasets_df = pd.read_csv(io.StringIO(datasets_resp.text), sep=',')\n", "datasets_df['url'] = datasets_df['tabledap']\n", "\n", "# dropping rows where all values are NaN\n", "df_cleaned = datasets_df.dropna(how='all')\n", "df_cleaned = df_cleaned.dropna(subset='url')\n", "\n", "# removing now obsolete columns and showing the content\n", "datasets_df = df_cleaned.drop(columns=['tabledap'])\n", "pd.set_option('display.max_colwidth', None)\n", "datasets_df" ] }, { "cell_type": "markdown", "metadata": { "id": "NGuJYRDI5X-Q" }, "source": [ "Using these URLs we will than be able to get their relative dataset's data. \n", "In this example we will use the ARGO_FLOATS_OCEANICE dataset, with the URL: \n", "[https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE](https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE)" ] }, { "cell_type": "markdown", "metadata": { "id": "2GI1pHT8_KZ3" }, "source": [ "## **Get a list of variables for the dataset**" ] }, { "cell_type": "markdown", "metadata": { "id": "iHEskNpp_KZ3" }, "source": [ "Now we can make a request to the dataset's metadata, which will give us a list of all the available variables and their relative data type.\n", "These variables can be than used in the following requests." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "Y0gjus3u_KZ3", "outputId": "b291f8b4-32e7-4967-d09c-d5f0b3ee5dde", "tags": [ "hide-input" ] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ ":10: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " variables_df.drop(columns=['Row Type', 'Attribute Name', 'Value'], inplace=True)\n" ] }, { "data": { "application/vnd.google.colaboratory.intrinsic+json": { "summary": "{\n \"name\": \"variables_df\",\n \"rows\": 59,\n \"fields\": [\n {\n \"column\": \"Variable Name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 59,\n \"samples\": [\n \"PLATFORMCODE\",\n \"date_creation\",\n \"pres_adjusted\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Data Type\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"double\",\n \"float\",\n \"String\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}", "type": "dataframe", "variable_name": "variables_df" }, "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Variable NameData Type
0PLATFORMCODEString
1data_typeString
2format_versionString
3handbook_versionString
4reference_date_timedouble
5date_creationdouble
6date_updatedouble
7WMOString
8project_nameString
9pi_nameString
10cycle_numberint
11directionString
12data_centerString
13dc_referenceString
14data_state_indicatorString
15data_modeString
16platform_typeString
17float_serial_noString
18firmware_versionString
19wmo_inst_typeString
20timedouble
21time_qcString
22time_locationdouble
23latitudedouble
24longitudedouble
25position_qcString
26positioning_systemString
27profile_pres_qcString
28profile_temp_qcString
29profile_psal_qcString
30vertical_sampling_schemeString
31config_mission_numberint
32PRESSfloat
33pres_qcString
34pres_adjustedfloat
35pres_adjusted_qcString
36pres_adjusted_errorfloat
37TEMPfloat
38TEMP_QCString
39temp_adjustedfloat
40TEMP_adjusted_QCString
41TEMP_adjusted_errorfloat
42PSALfloat
43PSAL_QCString
44PSAL_ADJUSTEDfloat
45PSAL_ADJUSTED_QCString
46PSAL_ADJUSTED_errorfloat
47DOXYfloat
48DOXY_QCString
49TEMP_DOXYfloat
50TEMP_DOXY_QCString
51molar_DOXYfloat
52molar_DOXY_QCString
53TURBIDITYfloat
54TURBIDITY_QCString
55CHLAfloat
56CHLA_QCString
57NITRATEfloat
58NITRATE_QCString
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "\n", "
\n", " \n", " \n", " \n", "
\n", "\n", "
\n", "
\n" ], "text/plain": [ " Variable Name Data Type\n", "0 PLATFORMCODE String\n", "1 data_type String\n", "2 format_version String\n", "3 handbook_version String\n", "4 reference_date_time double\n", "5 date_creation double\n", "6 date_update double\n", "7 WMO String\n", "8 project_name String\n", "9 pi_name String\n", "10 cycle_number int\n", "11 direction String\n", "12 data_center String\n", "13 dc_reference String\n", "14 data_state_indicator String\n", "15 data_mode String\n", "16 platform_type String\n", "17 float_serial_no String\n", "18 firmware_version String\n", "19 wmo_inst_type String\n", "20 time double\n", "21 time_qc String\n", "22 time_location double\n", "23 latitude double\n", "24 longitude double\n", "25 position_qc String\n", "26 positioning_system String\n", "27 profile_pres_qc String\n", "28 profile_temp_qc String\n", "29 profile_psal_qc String\n", "30 vertical_sampling_scheme String\n", "31 config_mission_number int\n", "32 PRESS float\n", "33 pres_qc String\n", "34 pres_adjusted float\n", "35 pres_adjusted_qc String\n", "36 pres_adjusted_error float\n", "37 TEMP float\n", "38 TEMP_QC String\n", "39 temp_adjusted float\n", "40 TEMP_adjusted_QC String\n", "41 TEMP_adjusted_error float\n", "42 PSAL float\n", "43 PSAL_QC String\n", "44 PSAL_ADJUSTED float\n", "45 PSAL_ADJUSTED_QC String\n", "46 PSAL_ADJUSTED_error float\n", "47 DOXY float\n", "48 DOXY_QC String\n", "49 TEMP_DOXY float\n", "50 TEMP_DOXY_QC String\n", "51 molar_DOXY float\n", "52 molar_DOXY_QC String\n", "53 TURBIDITY float\n", "54 TURBIDITY_QC String\n", "55 CHLA float\n", "56 CHLA_QC String\n", "57 NITRATE float\n", "58 NITRATE_QC String" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "BASE_URL = 'https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE'\n", "\n", "# building the full url for the metadata and making the request\n", "metadata_url = BASE_URL.replace('tabledap', 'info') + '/index.csv'\n", "\n", "metadata_resp = requests.get(metadata_url)\n", "metadata_df = pd.read_csv(io.StringIO(metadata_resp.text), sep=',')\n", "variables_df = metadata_df.loc[metadata_df['Row Type'].isin(['variable', 'dimension'])]\n", "variables_df.reset_index(drop=True, inplace=True)\n", "variables_df.drop(columns=['Row Type', 'Attribute Name', 'Value'], inplace=True)\n", "variables_df" ] }, { "cell_type": "markdown", "metadata": { "id": "WsM4KzezErso" }, "source": [ "## **Get a list of platform codes**" ] }, { "cell_type": "markdown", "metadata": { "id": "G0gdlsevEgQB" }, "source": [ "We will then perform another request to retrieve a list of platform codes for the selected dataset, which will be useful in the following queries to the ERDDAP." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 300 }, "id": "_xPd_j6bET-p", "outputId": "b86251d9-cff1-40a7-a6fc-bb2f4789fb3e", "tags": [ "hide-input" ] }, "outputs": [ { "data": { "application/vnd.google.colaboratory.intrinsic+json": { "summary": "{\n \"name\": \"platforms_df\",\n \"rows\": 8,\n \"fields\": [\n {\n \"column\": \"PLATFORMCODE\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1696174,\n \"min\": 1902687,\n \"max\": 6990622,\n \"num_unique_values\": 8,\n \"samples\": [\n 3902582,\n 5907093,\n 1902687\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}", "type": "dataframe", "variable_name": "platforms_df" }, "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PLATFORMCODE
01902687
13902582
24903780
34903786
45907087
55907093
66990621
76990622
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "\n", "
\n", " \n", " \n", " \n", "
\n", "\n", "
\n", "
\n" ], "text/plain": [ " PLATFORMCODE\n", "0 1902687\n", "1 3902582\n", "2 4903780\n", "3 4903786\n", "4 5907087\n", "5 5907093\n", "6 6990621\n", "7 6990622" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "platforms_query = '.csv?PLATFORMCODE&distinct()'\n", "\n", "# The data format specified is 'csv' (in which the first row contains the column names and the second the units of measurment, which will be removed from the dataframe in these examples).\n", "# Other possibilities are 'csv0' which will return only the data rows and 'csvp', which will return a csv with the column names (and their unit of measurment) as first row and data starting from the second.\n", "# the additional parameter &distinct() will ensure we will get only unique rows\n", "\n", "platform_resp = requests.get(BASE_URL + platforms_query)\n", "platforms_df = pd.read_csv(io.StringIO(platform_resp.text), sep=',')\n", "platforms_df" ] }, { "cell_type": "markdown", "metadata": { "id": "g4Qqknls_KZ3" }, "source": [ "## **Data gathering**" ] }, { "cell_type": "markdown", "metadata": { "id": "IzKOfHrT_KZ3" }, "source": [ "Following are three examples of data queries:\n", "\n", "### With PLATFORMCODE and time range\n", "\n", " When building the URL to get the data a platform code can be inserted in the query to get the data relative to the platform.\n", " In the following example the platform code '1902687' has been chosen and the variables are:\n", " - PLATFORMCODE\n", " - time\n", " - latitude\n", " - longitude\n", " - TEMP\n", "\n", " The query will look like:\n", "\n", " ```?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP&PLATFORMCODE=%221902687%22&time%3E=2024-03-29T09%3A45%3A00Z&time%3C=2024-04-29T09%3A45%3A00Z```\n", "\n", " It can be divided into two main parts:\n", "\n", "1. ```?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP```\n", "\n", " Where ```?``` indicates the start of query parametes and the rest is a list of variables we want as columns in the csv, separated by ```%2C```, an encoded comma(,).\n", "\n", "2. ```&PLATFORMCODE=%221902687%22&time%3E=2024-03-29T09%3A45%3A00Z&time%3C=2024-04-29T09%3A45%3A00Z```\n", "\n", " After the list of variables we can add filters, separated by ```&```.\n", "\n", " The platform code chosen is 1902687 and it has to be inserted between encoded double quotes(\"), represented by ```%22```.\n", "\n", " The syntax for the timerange is:\n", "\n", " ```time%3E=2024-03-29T09%3A45%3A00Z&time%3C=2024-04-29T09%3A45%3A00Z```\n", "\n", " Here the other encoded characters are ```%3E``` (>), ```%3C``` (<) and ```%3A``` (:).\n", " \n", " The time has to be passed as an ISO string, with the format YYYY-MM-DDThh:mm:ssZ." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 424 }, "id": "Y_k4utuW_KZ4", "outputId": "8c79c64b-f0fc-4a1c-a430-aeacd5023680", "tags": [ "hide-input" ] }, "outputs": [ { "data": { "application/vnd.google.colaboratory.intrinsic+json": { "summary": "{\n \"name\": \"data_df\",\n \"rows\": 792,\n \"fields\": [\n {\n \"column\": \"PLATFORMCODE\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.0,\n \"min\": 1902687.0,\n \"max\": 1902687.0,\n \"num_unique_values\": 1,\n \"samples\": [\n 1902687.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"time\",\n \"properties\": {\n \"dtype\": \"object\",\n \"num_unique_values\": 8,\n \"samples\": [\n \"2024-01-13T05:40:20Z\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"latitude\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 8,\n \"samples\": [\n \"-74.83251166666666\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"longitude\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 8,\n \"samples\": [\n \"-102.37396\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"TEMP\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 536,\n \"samples\": [\n \"1.127\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}", "type": "dataframe", "variable_name": "data_df" }, "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PLATFORMCODEtimelatitudelongitudeTEMP
01902687.02024-01-12T00:33:00Z-74.85373-102.427966666666661.107
11902687.02024-01-12T00:33:00Z-74.85373-102.427966666666661.098
21902687.02024-01-12T00:33:00Z-74.85373-102.427966666666661.091
31902687.02024-01-12T00:33:00Z-74.85373-102.427966666666661.087
41902687.02024-01-12T00:33:00Z-74.85373-102.427966666666661.084
..................
7871902687.02024-02-12T05:31:20Z-74.89717-102.34899666666666-0.677
7881902687.02024-02-12T05:31:20Z-74.89717-102.34899666666666-0.913
7891902687.02024-02-12T05:31:20Z-74.89717-102.34899666666666-1.098
7901902687.02024-02-12T05:31:20Z-74.89717-102.34899666666666-1.194
7911902687.02024-02-12T05:31:20Z-74.89717-102.348996666666660.031
\n", "

792 rows × 5 columns

\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "\n", "
\n", " \n", " \n", " \n", "
\n", "\n", "
\n", "
\n" ], "text/plain": [ " PLATFORMCODE time latitude longitude \\\n", "0 1902687.0 2024-01-12T00:33:00Z -74.85373 -102.42796666666666 \n", "1 1902687.0 2024-01-12T00:33:00Z -74.85373 -102.42796666666666 \n", "2 1902687.0 2024-01-12T00:33:00Z -74.85373 -102.42796666666666 \n", "3 1902687.0 2024-01-12T00:33:00Z -74.85373 -102.42796666666666 \n", "4 1902687.0 2024-01-12T00:33:00Z -74.85373 -102.42796666666666 \n", ".. ... ... ... ... \n", "787 1902687.0 2024-02-12T05:31:20Z -74.89717 -102.34899666666666 \n", "788 1902687.0 2024-02-12T05:31:20Z -74.89717 -102.34899666666666 \n", "789 1902687.0 2024-02-12T05:31:20Z -74.89717 -102.34899666666666 \n", "790 1902687.0 2024-02-12T05:31:20Z -74.89717 -102.34899666666666 \n", "791 1902687.0 2024-02-12T05:31:20Z -74.89717 -102.34899666666666 \n", "\n", " TEMP \n", "0 1.107 \n", "1 1.098 \n", "2 1.091 \n", "3 1.087 \n", "4 1.084 \n", ".. ... \n", "787 -0.677 \n", "788 -0.913 \n", "789 -1.098 \n", "790 -1.194 \n", "791 0.031 \n", "\n", "[792 rows x 5 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "platform_code = '1902687'\n", "\n", "variables = '.csv?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP'\n", "filters = f'&PLATFORMCODE=%22{platform_code}%22&time%3E=2023-04-29T00%3A00%3A00Z&time%3C=2024-04-29T00%3A00%3A00Z'\n", "\n", "data_resp = requests.get(BASE_URL + variables + filters)\n", "data_df = pd.read_csv(io.StringIO(data_resp.text), sep=',')\n", "\n", "data_df=data_df.sort_values(by=[\"time\"])\n", "data_df.reset_index(drop=True, inplace=True)\n", "data_df = data_df.dropna(subset=['PLATFORMCODE'])\n", "data_df" ] }, { "cell_type": "markdown", "metadata": { "id": "MnlSGZrmRzyz" }, "source": [ "### With multiple platform codes" ] }, { "cell_type": "markdown", "metadata": { "id": "yufHeR6NR8-z" }, "source": [ "It is possible to select multiple platform codes when querying the data. This can be done by using a regex.\n", "\n", "In this example the three platform codes used will be '4903780', '4903786' and '3902582'.\n", "\n", "To build these part of the query the regex will have this syntax:\n", "\n", "```PLATFORMCODE=~%22(platform_code_1%7Cplatform_code_2%7Cplatform_code_3)```\n", "\n", "Where ```%7C``` represents the symbol ```|``` (meaning OR).\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 476 }, "id": "oeew627AVIqu", "outputId": "287671fe-b20c-4ecd-e7db-ad5cfed77412", "tags": [ "hide-input" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "This DataFrame contains the platform codes: [4903780. 4903786. 3902582.] \n", "\n" ] }, { "data": { "application/vnd.google.colaboratory.intrinsic+json": { "repr_error": "0", "type": "dataframe", "variable_name": "regex_data_df" }, "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PLATFORMCODEtimelatitudelongitudeTEMP
14903780.02024-02-20T05:12:20Z-67.2625851866307980.31254458206934-1.823
24903780.02024-02-20T05:12:20Z-67.2625851866307980.31254458206934-1.821
34903780.02024-02-20T05:12:20Z-67.2625851866307980.31254458206934-1.82
44903780.02024-02-20T05:12:20Z-67.2625851866307980.31254458206934-1.82
54903780.02024-02-20T05:12:20Z-67.2625851866307980.31254458206934-1.828
..................
18263902582.02024-02-25T05:42:20Z-78.13948-174.97683-1.902
18273902582.02024-02-25T05:42:20Z-78.13948-174.97683-1.901
18283902582.02024-02-25T05:42:20Z-78.13948-174.97683-1.896
18293902582.02024-02-25T05:42:20Z-78.13948-174.97683-1.894
18303902582.02024-02-25T05:42:20Z-78.13948-174.97683-1.894
\n", "

1830 rows × 5 columns

\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "\n", "
\n", " \n", " \n", " \n", "
\n", "\n", "
\n", "
\n" ], "text/plain": [ " PLATFORMCODE time latitude \\\n", "1 4903780.0 2024-02-20T05:12:20Z -67.26258518663079 \n", "2 4903780.0 2024-02-20T05:12:20Z -67.26258518663079 \n", "3 4903780.0 2024-02-20T05:12:20Z -67.26258518663079 \n", "4 4903780.0 2024-02-20T05:12:20Z -67.26258518663079 \n", "5 4903780.0 2024-02-20T05:12:20Z -67.26258518663079 \n", "... ... ... ... \n", "1826 3902582.0 2024-02-25T05:42:20Z -78.13948 \n", "1827 3902582.0 2024-02-25T05:42:20Z -78.13948 \n", "1828 3902582.0 2024-02-25T05:42:20Z -78.13948 \n", "1829 3902582.0 2024-02-25T05:42:20Z -78.13948 \n", "1830 3902582.0 2024-02-25T05:42:20Z -78.13948 \n", "\n", " longitude TEMP \n", "1 80.31254458206934 -1.823 \n", "2 80.31254458206934 -1.821 \n", "3 80.31254458206934 -1.82 \n", "4 80.31254458206934 -1.82 \n", "5 80.31254458206934 -1.828 \n", "... ... ... \n", "1826 -174.97683 -1.902 \n", "1827 -174.97683 -1.901 \n", "1828 -174.97683 -1.896 \n", "1829 -174.97683 -1.894 \n", "1830 -174.97683 -1.894 \n", "\n", "[1830 rows x 5 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "regex_platform_code = '(3902582%7C4903780%7C4903786)'\n", "\n", "variables = '.csv?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP'\n", "regex_filters = f'&PLATFORMCODE=~%22{regex_platform_code}%22&time%3E=2024-02-20T00%3A00%3A00Z&time%3C=2024-04-29T00%3A00%3A00Z'\n", "\n", "regex_data_resp = requests.get(BASE_URL + variables + regex_filters)\n", "regex_data_df = pd.read_csv(io.StringIO(regex_data_resp.text), sep=',')\n", "\n", "regex_data_df = regex_data_df.dropna(subset=['PLATFORMCODE'])\n", "\n", "unique_platform_codes = regex_data_df['PLATFORMCODE'].unique()\n", "print('\\nThis DataFrame contains the platform codes:', unique_platform_codes, '\\n')\n", "regex_data_df" ] }, { "cell_type": "markdown", "metadata": { "id": "qaWqgS56bnS3" }, "source": [ "### With coordinates range" ] }, { "cell_type": "markdown", "metadata": { "id": "R3Ua9HhUbtTE" }, "source": [ "Another possibility when querying the data is to specify a range of coordinates.\n", "This can be done by inserting in the query filters the following:\n", "\n", "```latitude%3E=-75&latitude%3C=-30&longitude%3E=-50&longitude%3C=50```\n", "\n", "Effectively selecting platforms inside a square delimited by:\n", "\n", "- latitude equal or greater than -75 and equal or less than -30\n", "\n", "and\n", "\n", "- longitude equal or greater than -50 and equal or less than 50." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 125 }, "id": "03XyfGE8fQ4e", "outputId": "c02044ec-0567-45b1-db5f-604adbbcdde8", "tags": [ "hide-input" ] }, "outputs": [ { "data": { "application/vnd.google.colaboratory.intrinsic+json": { "summary": "{\n \"name\": \"coords_data_df\",\n \"rows\": 2,\n \"fields\": [\n {\n \"column\": \"PLATFORMCODE\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.0,\n \"min\": 6990622.0,\n \"max\": 6990622.0,\n \"num_unique_values\": 1,\n \"samples\": [\n 6990622.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"latitude\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"-68.94769391657476\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"longitude\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"-21.311241844741982\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}", "type": "dataframe", "variable_name": "coords_data_df" }, "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PLATFORMCODElatitudelongitude
16990622.0-68.9694744353638628.087817418426756
26990622.0-68.94769391657476-21.311241844741982
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "\n", "
\n", " \n", " \n", " \n", "
\n", "\n", "
\n", "
\n" ], "text/plain": [ " PLATFORMCODE latitude longitude\n", "1 6990622.0 -68.96947443536386 28.087817418426756\n", "2 6990622.0 -68.94769391657476 -21.311241844741982" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "coords_variables = '.csv?PLATFORMCODE%2Clatitude%2Clongitude'\n", "coords_filter = '&latitude%3E=-75&latitude%3C=-30&longitude%3E=-50&longitude%3C=50&distinct()'\n", "\n", "coords_data_resp = requests.get(BASE_URL + coords_variables + coords_filter)\n", "coords_data_df = pd.read_csv(io.StringIO(coords_data_resp.text), sep=',')\n", "\n", "coords_data_df = coords_data_df.dropna(subset=['PLATFORMCODE'])\n", "\n", "coords_data_df" ] }, { "cell_type": "markdown", "metadata": { "id": "h8u9WXnHLpS5" }, "source": [ "### **Additional resources**" ] }, { "cell_type": "markdown", "metadata": { "id": "KiKTNro9LHFz" }, "source": [ "For additional information about ERDDAP please visit: \n", "\n", " [https://er1.s4oceanice.eu/erddap/information.html](https://er1.s4oceanice.eu/erddap/information.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Python libraries that have been used in this notebook are:\n", "- [requests](https://requests.readthedocs.io/en/latest/)\n", "- [pandas](https://pandas.pydata.org/)\n", "- [io](https://docs.python.org/3/library/io.html)" ] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.0" } }, "nbformat": 4, "nbformat_minor": 0 }