{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "7ydeifHy_KZ0" }, "source": [ "# **OCEAN ICE's ERDDAP querying: griddap** #" ] }, { "cell_type": "markdown", "metadata": { "id": "W2Lk2xzd_KZ2" }, "source": [ "This notebook will illustrate how to build queries and make requests to [https://er1.s4oceanice.eu/erddap/index.html](https://er1.s4oceanice.eu/erddap/index.html) using Python." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For an interactive version of this page please visit the Google Colab: \n", "[ Open in Google Colab ](https://colab.research.google.com/drive/1AVW9G1iuSB_WfFoPf8luOp-py1mO67YX)
\n", "(To open link in new tab press Ctrl + click)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatively this notebook can be opened with Binder by following the link:\n", "[OCEAN ICE'S ERDDAP querying: griddap](https://mybinder.org/v2/gh/s4oceanice/literacy.s4oceanice/main?urlpath=%2Fdoc%2Ftree%2Fnotebooks_binder%2Foceanice_erddap_querying_griddap.ipynb)" ] }, { "cell_type": "markdown", "metadata": { "id": "OQSLQ574_KZ2", "vscode": { "languageId": "plaintext" } }, "source": [ "## **Setup**\n", "\n", "To begin we need to import the necessary libraries." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "6FfRJZc2_KZ2" }, "outputs": [], "source": [ "# !pip install requests pandas\n", "# these packages should be installed with the command above if running the code locally\n", "\n", "import requests\n", "import pandas as pd\n", "import io" ] }, { "cell_type": "markdown", "metadata": { "id": "HI6J1aGWwOFq" }, "source": [ "## **Get a list of available datasets**" ] }, { "cell_type": "markdown", "metadata": { "id": "x0XY0JDFwTRv" }, "source": [ "To check what griddap datasets are available in the ERDDAP and get their URLs the first step is to make a request to [https://er1.s4oceanice.eu/erddap/tabledap/allDatasets.html](https://er1.s4oceanice.eu/erddap/tabledap/allDatasets.html) using the URL that will allow us to get the datasets' ids and their URLs based on the data structure. After receiving the data it will be loaded into a pandas DataFrame." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 519 }, "id": "Wdmc1ee3xNGp", "outputId": "1f6643f3-6d4d-41cc-e5a8-e2779cc4db6c" }, "outputs": [ { "data": { "application/vnd.google.colaboratory.intrinsic+json": { "summary": "{\n \"name\": \"cleaned_df\",\n \"rows\": 15,\n \"fields\": [\n {\n \"column\": \"datasetID\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 15,\n \"samples\": [\n \"SSP585_FWF_1990_2300_OceanSectors\",\n \"SSP585_FWF_1990_2300_AIS\",\n \"INSITU_GLO_PHY_TS_OA_MY_013_052\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"url\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 15,\n \"samples\": [\n \"https://er1.s4oceanice.eu/erddap/griddap/SSP585_FWF_1990_2300_OceanSectors\",\n \"https://er1.s4oceanice.eu/erddap/griddap/SSP585_FWF_1990_2300_AIS\",\n \"https://er1.s4oceanice.eu/erddap/griddap/INSITU_GLO_PHY_TS_OA_MY_013_052\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}", "type": "dataframe", "variable_name": "cleaned_df" }, "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datasetIDurl
0INSITU_GLO_PHY_TS_OA_MY_013_052https://er1.s4oceanice.eu/erddap/griddap/INSITU_GLO_PHY_TS_OA_MY_013_052
1seanoe_slev_anomaly_geostrophic_currentshttps://er1.s4oceanice.eu/erddap/griddap/seanoe_slev_anomaly_geostrophic_currents
2GLORYS12V1_sea_floor_potential_temphttps://er1.s4oceanice.eu/erddap/griddap/GLORYS12V1_sea_floor_potential_temp
3GLODAPv2_2016b_MappedClimatologieshttps://er1.s4oceanice.eu/erddap/griddap/GLODAPv2_2016b_MappedClimatologies
4NOAA_OISST_v2https://er1.s4oceanice.eu/erddap/griddap/NOAA_OISST_v2
5SOCATv2024_tracks_gridded_monthlyhttps://er1.s4oceanice.eu/erddap/griddap/SOCATv2024_tracks_gridded_monthly
6EU_circumpolar_seaice_prod_fluxes_1992_2023https://er1.s4oceanice.eu/erddap/griddap/EU_circumpolar_seaice_prod_fluxes_1992_2023
7SSP585_FWF_1990_2300_ZwallyBasinshttps://er1.s4oceanice.eu/erddap/griddap/SSP585_FWF_1990_2300_ZwallyBasins
8SSP126_FWF_1990_2300_ZwallyBasinshttps://er1.s4oceanice.eu/erddap/griddap/SSP126_FWF_1990_2300_ZwallyBasins
9SSP585_FWF_1990_2300_OceanSectorshttps://er1.s4oceanice.eu/erddap/griddap/SSP585_FWF_1990_2300_OceanSectors
10SSP126_FWF_1990_2300_OceanSectorshttps://er1.s4oceanice.eu/erddap/griddap/SSP126_FWF_1990_2300_OceanSectors
11SSP585_FWF_1990_2300_AIShttps://er1.s4oceanice.eu/erddap/griddap/SSP585_FWF_1990_2300_AIS
12SSP126_FWF_1990_2300_AIShttps://er1.s4oceanice.eu/erddap/griddap/SSP126_FWF_1990_2300_AIS
13PSMSL_Absolute_sea_level_trendhttps://er1.s4oceanice.eu/erddap/griddap/PSMSL_Absolute_sea_level_trend
14SCAR_RAATDhttps://er1.s4oceanice.eu/erddap/griddap/SCAR_RAATD
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "\n", "
\n", " \n", " \n", " \n", "
\n", "\n", "
\n", "
\n" ], "text/plain": [ " datasetID \\\n", "0 INSITU_GLO_PHY_TS_OA_MY_013_052 \n", "1 seanoe_slev_anomaly_geostrophic_currents \n", "2 GLORYS12V1_sea_floor_potential_temp \n", "3 GLODAPv2_2016b_MappedClimatologies \n", "4 NOAA_OISST_v2 \n", "5 SOCATv2024_tracks_gridded_monthly \n", "6 EU_circumpolar_seaice_prod_fluxes_1992_2023 \n", "7 SSP585_FWF_1990_2300_ZwallyBasins \n", "8 SSP126_FWF_1990_2300_ZwallyBasins \n", "9 SSP585_FWF_1990_2300_OceanSectors \n", "10 SSP126_FWF_1990_2300_OceanSectors \n", "11 SSP585_FWF_1990_2300_AIS \n", "12 SSP126_FWF_1990_2300_AIS \n", "13 PSMSL_Absolute_sea_level_trend \n", "14 SCAR_RAATD \n", "\n", " url \n", "0 https://er1.s4oceanice.eu/erddap/griddap/INSITU_GLO_PHY_TS_OA_MY_013_052 \n", "1 https://er1.s4oceanice.eu/erddap/griddap/seanoe_slev_anomaly_geostrophic_currents \n", "2 https://er1.s4oceanice.eu/erddap/griddap/GLORYS12V1_sea_floor_potential_temp \n", "3 https://er1.s4oceanice.eu/erddap/griddap/GLODAPv2_2016b_MappedClimatologies \n", "4 https://er1.s4oceanice.eu/erddap/griddap/NOAA_OISST_v2 \n", "5 https://er1.s4oceanice.eu/erddap/griddap/SOCATv2024_tracks_gridded_monthly \n", "6 https://er1.s4oceanice.eu/erddap/griddap/EU_circumpolar_seaice_prod_fluxes_1992_2023 \n", "7 https://er1.s4oceanice.eu/erddap/griddap/SSP585_FWF_1990_2300_ZwallyBasins \n", "8 https://er1.s4oceanice.eu/erddap/griddap/SSP126_FWF_1990_2300_ZwallyBasins \n", "9 https://er1.s4oceanice.eu/erddap/griddap/SSP585_FWF_1990_2300_OceanSectors \n", "10 https://er1.s4oceanice.eu/erddap/griddap/SSP126_FWF_1990_2300_OceanSectors \n", "11 https://er1.s4oceanice.eu/erddap/griddap/SSP585_FWF_1990_2300_AIS \n", "12 https://er1.s4oceanice.eu/erddap/griddap/SSP126_FWF_1990_2300_AIS \n", "13 https://er1.s4oceanice.eu/erddap/griddap/PSMSL_Absolute_sea_level_trend \n", "14 https://er1.s4oceanice.eu/erddap/griddap/SCAR_RAATD " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "datasets_url = 'https://er1.s4oceanice.eu/erddap/tabledap/allDatasets.csv?datasetID%2Cgriddap'\n", "\n", "# request and load into DataFrame\n", "datasets_resp = requests.get(datasets_url)\n", "datasets_df = pd.read_csv(io.StringIO(datasets_resp.text), sep=',')\n", "\n", "# drop rows where tabledap is NaN\n", "datasets_df = datasets_df.dropna(subset=['griddap'])\n", "\n", "# add url column\n", "datasets_df['url'] = datasets_df['griddap']\n", "cleaned_df = datasets_df.drop(columns=['griddap'])\n", "\n", "pd.set_option('display.max_colwidth', None)\n", "cleaned_df = cleaned_df.reset_index(drop=True)\n", "cleaned_df" ] }, { "cell_type": "markdown", "metadata": { "id": "NGuJYRDI5X-Q" }, "source": [ "Using these URLs we will than be able to get their data. \n", "In this example we will use the INSITU_GLO_PHY_TS_OA_MY_013_052 dataset, with the URL:\n", "[https://er1.s4oceanice.eu/erddap/griddap/INSITU_GLO_PHY_TS_OA_MY_013_052\n", "](https://er1.s4oceanice.eu/erddap/griddap/INSITU_GLO_PHY_TS_OA_MY_013_052\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "2GI1pHT8_KZ3" }, "source": [ "## **Get a list of variables for the dataset**" ] }, { "cell_type": "markdown", "metadata": { "id": "iHEskNpp_KZ3" }, "source": [ "Now we can make a request to the dataset's metadata, which will give us a list of all the available variables and their relative data type.\n", "These variables can be than used in the following requests." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 459 }, "id": "Y0gjus3u_KZ3", "outputId": "bd53a1c0-2887-4d0c-8f59-cfac0c9f3c79" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Time Coverage Start: 2013-04-01T00:00:00Z\n", "Time Coverage End: 2019-07-31T00:00:00Z\n", "Geospatial max Lat: 349.0\n", "Geospatial min Lat: 0.0\n", "Geospatial max Lon: 349.0\n", "Geospatial min Lon: 0.0\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipython-input-810593471.py:19: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " variables_df.drop(columns=['Row Type', 'Attribute Name', 'Value'], inplace=True)\n" ] }, { "data": { "application/vnd.google.colaboratory.intrinsic+json": { "summary": "{\n \"name\": \"variables_df\",\n \"rows\": 7,\n \"fields\": [\n {\n \"column\": \"Variable Name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 7,\n \"samples\": [\n \"time\",\n \"longitude\",\n \"U\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Data Type\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"double\",\n \"short\",\n \"float\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}", "type": "dataframe", "variable_name": "variables_df" }, "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Variable NameData Type
0timedouble
1longitudeshort
2latitudeshort
3slafloat
4formal_errorfloat
5Ufloat
6Vfloat
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "\n", "
\n", " \n", " \n", " \n", "
\n", "\n", "
\n", "
\n" ], "text/plain": [ " Variable Name Data Type\n", "0 time double\n", "1 longitude short\n", "2 latitude short\n", "3 sla float\n", "4 formal_error float\n", "5 U float\n", "6 V float" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "BASE_URL = 'https://er1.s4oceanice.eu/erddap/griddap/seanoe_slev_anomaly_geostrophic_currents'\n", "\n", "# building the full url for the metadata and making the request\n", "metadata_url = BASE_URL.replace('tabledap', 'info').replace('griddap', 'info') + '/index.csv'\n", "\n", "metadata_resp = requests.get(metadata_url)\n", "metadata_df = pd.read_csv(io.StringIO(metadata_resp.text), sep=',')\n", "\n", "# Extract time_coverage_start and time_coverage_end\n", "time_coverage_start = metadata_df.loc[metadata_df['Attribute Name'] == 'time_coverage_start', 'Value'].iloc[0]\n", "time_coverage_end = metadata_df.loc[metadata_df['Attribute Name'] == 'time_coverage_end', 'Value'].iloc[0]\n", "geospatial_lat_max = metadata_df.loc[metadata_df['Attribute Name'] == 'geospatial_lat_max', 'Value'].iloc[0]\n", "geospatial_lat_min = metadata_df.loc[metadata_df['Attribute Name'] == 'geospatial_lat_min', 'Value'].iloc[0]\n", "geospatial_lon_max = metadata_df.loc[metadata_df['Attribute Name'] == 'geospatial_lon_max', 'Value'].iloc[0]\n", "geospatial_lon_min = metadata_df.loc[metadata_df['Attribute Name'] == 'geospatial_lon_min', 'Value'].iloc[0]\n", "\n", "variables_df = metadata_df.loc[metadata_df['Row Type'].isin(['variable', 'dimension'])]\n", "variables_df.reset_index(drop=True, inplace=True)\n", "variables_df.drop(columns=['Row Type', 'Attribute Name', 'Value'], inplace=True)\n", "\n", "print(f\"Time Coverage Start: {time_coverage_start}\")\n", "print(f\"Time Coverage End: {time_coverage_end}\")\n", "print(f\"Geospatial max Lat: {geospatial_lat_max}\")\n", "print(f\"Geospatial min Lat: {geospatial_lat_min}\")\n", "print(f\"Geospatial max Lon: {geospatial_lon_max}\")\n", "print(f\"Geospatial min Lon: {geospatial_lon_min}\")\n", "\n", "variables_df" ] }, { "cell_type": "markdown", "metadata": { "id": "WsM4KzezErso" }, "source": [ "## **Get a list of platform codes**" ] }, { "cell_type": "markdown", "metadata": { "id": "G0gdlsevEgQB" }, "source": [ "We will then perform another request to retrieve all the sla values in the time range and the bounding coordinates we want (in this case we will use the time_coverage_end value and the maximum range between geospatial_lat_min/geospatial_lat_max and geospatial_lon_min/geospatial_lon_max available values, see the output above).\n", "\n", "**N.B. The wider the range the more the loading time will be. Loading could fail if the range is too wide.**\n", "\n", "In other datasets it is possible that there are no time or coordinate ranges. Anyway when there is a variable with a range of values, the query follows the same structure: `.csv?` + `variable_we_want_to_see` (in this case sla) + `%5B` + `eventually_another_variable` + `%5B` + `(min value):1:(max value)` + `%5D%5B` + `(min value 2):1:(max value 2)` and so on if another value range is available." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 423 }, "id": "_xPd_j6bET-p", "outputId": "673a2f96-5de7-4523-a026-4c4a1832d4c6" }, "outputs": [ { "data": { "application/vnd.google.colaboratory.intrinsic+json": { "type": "dataframe", "variable_name": "platforms_df" }, "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timelongitudelatitudesla
0UTCdegrees_eastdegrees_northm
12019-07-31T00:00:00Z009.96921E36
22019-07-31T00:00:00Z019.96921E36
32019-07-31T00:00:00Z029.96921E36
42019-07-31T00:00:00Z039.96921E36
...............
1224962019-07-31T00:00:00Z3493459.96921E36
1224972019-07-31T00:00:00Z3493469.96921E36
1224982019-07-31T00:00:00Z3493479.96921E36
1224992019-07-31T00:00:00Z3493489.96921E36
1225002019-07-31T00:00:00Z3493499.96921E36
\n", "

122501 rows × 4 columns

\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "\n", "
\n", " \n", " \n", " \n", "
\n", "\n", "
\n", "
\n" ], "text/plain": [ " time longitude latitude sla\n", "0 UTC degrees_east degrees_north m\n", "1 2019-07-31T00:00:00Z 0 0 9.96921E36\n", "2 2019-07-31T00:00:00Z 0 1 9.96921E36\n", "3 2019-07-31T00:00:00Z 0 2 9.96921E36\n", "4 2019-07-31T00:00:00Z 0 3 9.96921E36\n", "... ... ... ... ...\n", "122496 2019-07-31T00:00:00Z 349 345 9.96921E36\n", "122497 2019-07-31T00:00:00Z 349 346 9.96921E36\n", "122498 2019-07-31T00:00:00Z 349 347 9.96921E36\n", "122499 2019-07-31T00:00:00Z 349 348 9.96921E36\n", "122500 2019-07-31T00:00:00Z 349 349 9.96921E36\n", "\n", "[122501 rows x 4 columns]" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "platforms_query = f'.csv?sla%5B({time_coverage_end}):1:({time_coverage_end})%5D%5B({geospatial_lat_min}):1:({geospatial_lat_max})%5D%5B({geospatial_lon_min}):1:({geospatial_lon_max})%5D'\n", "\n", "# The data format specified is 'csv' (in which the first row contains the column names and the second the units of measurment, which will be removed from the dataframe in these examples).\n", "# Other possibilities are 'csv0' which will return only the data rows and 'csvp', which will return a csv with the column names (and their unit of measurment) as first row and data starting from the second.\n", "# the additional parameter &distinct() will ensure we will get only unique rows\n", "\n", "platform_resp = requests.get(BASE_URL + platforms_query)\n", "# Skip the first two rows (header and units)\n", "platforms_df = pd.read_csv(io.StringIO(platform_resp.text), sep=',')\n", "platforms_df" ] }, { "cell_type": "markdown", "metadata": { "id": "h8u9WXnHLpS5" }, "source": [ "### **Additional resources**" ] }, { "cell_type": "markdown", "metadata": { "id": "KiKTNro9LHFz" }, "source": [ "For additional information about ERDDAP please visit: \n", " \n", " [https://er1.s4oceanice.eu/erddap/information.html](https://er1.s4oceanice.eu/erddap/information.html)\n", "\n", "The webpages for the Python's libraries that have been used in this notebook are:\n", "- REQUESTS: https://requests.readthedocs.io/en/latest/\n", "- PANDAS: https://pandas.pydata.org/\n", "- IO: https://docs.python.org/3/library/io.html" ] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.0" } }, "nbformat": 4, "nbformat_minor": 0 }