{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "RBM4f40pCHl1"
},
"source": [
"# **Interactive Exploration of In-Situ Temperature Datasets with WMS Overlay** #"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LyTR8-41CNq5"
},
"source": [
"**Purpose**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For an interactive version of this page please visit the Google Colab: \n",
"[ Open in Google Colab ](https://colab.research.google.com/drive/1Yr5fVtIdYTf7WBZG_6adw6603NRgj4Ur)
\n",
"(To open link in new tab press Ctrl + click)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Alternatively this notebook can be opened with Binder by following the link:\n",
"[Interactive Exploration of In-Situ Temperature Datasets with WMS Overlay](https://mybinder.org/v2/gh/s4oceanice/literacy.s4oceanice/main?urlpath=%2Fdoc%2Ftree%2Fnotebooks_binder%2Foceanice_cora_overlay.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "RRXnDNhDCRcy"
},
"source": [
"This notebook provides an **interactive map interface** to explore in-situ oceanographic temperature datasets served through the **OCEAN ICE ERDDAP server**.\n",
"\n",
"It combines two key elements:\n",
"\n",
"* **In-situ observational data** (ARGO floats, cruises, trawlers, etc.), filtered by\n",
"dataset and time.\n",
"\n",
"* **COriolis Ocean Dataset for Reanalysis (CORA) [INSITU_GLO_PHY_TS_OA_MY_013_052](https://www.seanoe.org/data/00351/46219/)**, displayed via **WMS overlay** for spatial context.\n",
"\n",
"Users can:\n",
"\n",
"* Select a dataset and browse its time coverage (start → end).\n",
"\n",
"* Choose a specific month and load available measurements.\n",
"\n",
"* Visualize the data on a map, with points color-coded by temperature.\n",
"\n",
"* Overlay WMS climatologies for comparison.\n",
"\n",
"* Inspect values interactively by clicking markers.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4FCRezPlDEBR"
},
"source": [
"**Data sources**"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JH5ePb9sDHsR"
},
"source": [
"**In-situ datasets** (OCEAN ICE ERDDAP):\n",
"\n",
"* [SURVOSTRAL](https://er1.s4oceanice.eu/erddap/tabledap/SURVOSTRAL.html) (ship of opportunity line in the Southern Ocean)\n",
"\n",
"* [DATA_TRAWLER_SST](https://er1.s4oceanice.eu/erddap/tabledap/DATA_TRAWLER_SST.html) (surface trawler measurements)\n",
"\n",
"* ARGO_FLOATS_OCEANICE ([DE](https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE_DE.html), [UK](https://https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE_UK.html), [global](https://https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE.html))\n",
"\n",
"* [SOCHIC Cruise 2022](https://https://er1.s4oceanice.eu/erddap/tabledap/SOCHIC_Cruise_2022_Agulhas_II_met.html) (Agulhas II)\n",
"\n",
"\n",
"**Global gridded climatology** (Copernicus Marine):\n",
"\n",
"* Product: **COriolis Ocean Dataset for Reanalysis (CORA)** [INSITU_GLO_PHY_TS_OA_MY_013_052](https://https://www.seanoe.org/data/00351/46219/)\n",
"\n",
"* Content: Objective analysis of temperature (TEMP) and salinity (PSAL) from 1960 → present.\n",
"\n",
"* Accessed through ncWMS for visual overlays."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bLmO6c9VFV3E"
},
"source": [
"**Instruction to use this notebook**"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "nEOssUJcFgWT"
},
"source": [
"Run each code cell by clicking the Play button (▶️) on the left side of each grey code block. This will execute the code in order and allow all features to work properly."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "mtIXesFbFmZ9"
},
"source": [
"**Explaining the code**"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ap6Ot9UHFoHl"
},
"source": [
"**1. Import required libraries and define data sources**"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "IeZfFlQ4FsZB"
},
"source": [
"The following libraries are used in this notebook:\n",
"* **Data handling**: `pandas`, `numpy`\n",
"* **Requests & parsing**: `requests`, `urllib.error`, `xml.etree.ElementTree`\n",
"* **Visualization**: `matplotlib`, `branca.colormap`\n",
"* **Mapping**: `ipyleaflet` (Map, WMSLayer, CircleMarker, Popup)\n",
"* **Interactivity**: `ipywidgets` (Dropdown, SelectionSlider, VBox, etc.)\n",
"* **Notebook display**: `IPython.display `\n",
"\n",
"\n",
"URLs are defined for:\n",
"* ERDDAP metadata (`allDatasets.csv`)\n",
"* Dataset access (`BASE_URL`)\n",
"* WMS capabilities for Copernicus climatology."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "kqQFZMP-T7GW"
},
"outputs": [],
"source": [
"# @title\n",
"import pandas as pd\n",
"from branca.colormap import LinearColormap\n",
"import matplotlib.colors\n",
"import urllib.error\n",
"import requests\n",
"import xml.etree.ElementTree as ET\n",
"from IPython.display import display\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"from ipyleaflet import (\n",
" Map,\n",
" WMSLayer,\n",
" Marker,\n",
" CircleMarker,\n",
" Popup\n",
" )\n",
"from ipywidgets import (\n",
" interact,\n",
" Output,\n",
" Dropdown,\n",
" SelectionSlider,\n",
" FloatSlider,\n",
" SelectionRangeSlider,\n",
" VBox,\n",
" Label,\n",
" HTML\n",
" )\n",
"\n",
"datasets = [\n",
" 'SURVOSTRAL',\n",
" 'DATA_TRAWLER_SST',\n",
" #'MEOP_Animal-borne_profiles',\n",
" 'ARGO_FLOATS_OCEANICE_DE',\n",
" 'ARGO_FLOATS_OCEANICE',\n",
" 'ARGO_FLOATS_OCEANICE_UK',\n",
" 'SOCHIC_Cruise_2022_Agulhas_II_met'\n",
" ]\n",
"\n",
"temp_vars = [\n",
" 'temperature',\n",
" 'Temperature',\n",
" 'temperature_ctd',\n",
" 'temp',\n",
" 'TEMP',\n",
" 'sea_surface_temperature',\n",
" 'Temperature_oC'\n",
" ]\n",
"\n",
"METADATA_URL = 'https://er1.s4oceanice.eu/erddap/tabledap/allDatasets.csv?metadata'\n",
"BASE_URL = 'https://er1.s4oceanice.eu/erddap/tabledap/'\n",
"SLA_URL = 'https://prod-erddap.emodnet-physics.eu/ncWMS/wms?SERVICE=WMS&REQUEST=GetCapabilities&VERSION=1.3.0&DATASET=INSITU_GLO_PHY_TS_OA_MY_013_052'"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "awvzugumGi3t"
},
"source": [
"**2. Retrieve and parse dataset metadata**"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JO0ICoo3Gohp"
},
"source": [
"This code block reads `allDatasets.csv` from ERDDAP and filters for relevant dataset names (ARGO, SURVOSTRAL, trawler, cruise).\n",
"\n",
"Then, it extracts:\n",
"\n",
"* **time_coverage_start** / **time_coverage_end**\n",
"\n",
"* **temperature variable name**\n",
"\n",
"* **pressure variable name (if available)**.\n",
"\n",
"Finally, it stores metadata in a structured dictionary (`dataset_info`)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "rLZqy-mVUMnA"
},
"outputs": [],
"source": [
"# @title\n",
"try:\n",
" # Use pandas to read the CSV directly from the URL\n",
" df_metadata = pd.read_csv(METADATA_URL)\n",
"\n",
" # Filter the dataframe to keep rows where the 'metadata' column contains any string from the datasets list\n",
" filtered_df = df_metadata[df_metadata['metadata'].str.contains('|'.join(datasets), na=False)].copy()\n",
" #display(filtered_df)\n",
" # Add '.csv' to the 'metadata' column\n",
" filtered_df['metadata'] = filtered_df['metadata'] + '.csv'\n",
"\n",
" dataset_info = {}\n",
"\n",
" for index, row in filtered_df.iterrows():\n",
" url = row['metadata']\n",
" dataset_name = None\n",
" # Find the dataset name from the original datasets list using a more precise match\n",
" for ds in datasets:\n",
" # Use a more specific check to ensure the full dataset name is in the URL\n",
" if f'/{ds}/' in url:\n",
" dataset_name = ds\n",
" break\n",
"\n",
" if dataset_name:\n",
" #print(f\"Attempting to process dataset: {dataset_name} from {url}\")\n",
" try:\n",
" df_dataset = pd.read_csv(url)\n",
"\n",
" time_coverage_start = None\n",
" time_coverage_end = None\n",
" temp_variable = None\n",
" pressure_variable = None\n",
"\n",
"\n",
" for idx, row in df_dataset.iterrows():\n",
" if row['Attribute Name'] == 'time_coverage_start':\n",
" time_coverage_start = row['Value']\n",
" elif row['Attribute Name'] == 'time_coverage_end':\n",
" time_coverage_end = row['Value']\n",
" elif row['Row Type'] == 'variable':\n",
" if row['Variable Name'] in temp_vars:\n",
" temp_variable = row['Variable Name']\n",
" elif row['Variable Name'] in ['PRESS', 'depth']:\n",
" pressure_variable = row['Variable Name']\n",
"\n",
" dataset_info[dataset_name] = {\n",
" 'time_coverage_start': time_coverage_start,\n",
" 'time_coverage_end': time_coverage_end,\n",
" 'temperature_variable': temp_variable,\n",
" 'pressure_variable': pressure_variable\n",
" }\n",
" #print(f\"Successfully processed dataset: {dataset_name}\")\n",
" except Exception as e:\n",
" print(f\"Error processing dataset from {url}: {e}\")\n",
"\n",
" if not dataset_info:\n",
" print(\"No information extracted for the specified datasets.\")\n",
"\n",
"except Exception as e:\n",
" print(f\"An error occurred: {e}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ttr0IjG2HM3w"
},
"source": [
"**3. Build dataset and month selection controls**"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "62VEcrJFHSAY"
},
"source": [
"In the following two sections two dropdowns are created:\n",
"* **Dropdown 1**: helps to choose the dataset.\n",
"\n",
"* **Dropdown 2**: is dynamically populated with available months (between start/end coverage).\n",
"\n",
"This section automatically updates when dataset changes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "eZmEB48UbFL8"
},
"outputs": [],
"source": [
"# @title\n",
"dataset_dropdown = Dropdown(\n",
" options=dataset_info.keys(),\n",
" description='Select Dataset:',\n",
")\n",
"#display(dataset_dropdown)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 81,
"referenced_widgets": [
"ab349ddd7d1148c191de9e60be8db88a",
"da9001e3165a423fb379f42341cc03c0",
"25accc4ec9db4cc8aadcd47287af4fdc",
"fbbb51f43ceb4cd5a20d3c08e082db42",
"6c245a6c3ddd4be581500ad9bcdc9418",
"03d54fe31709477eabeb4d0456f1df4c",
"893860f84a26479ea1ca1905a8b5c5f4",
"67a3035b1b8345fe8ee329dba6163e74"
]
},
"id": "bf4971d4",
"outputId": "0aa47823-4dc8-47d6-a30a-23e7b256b9b8"
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "ab349ddd7d1148c191de9e60be8db88a",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"VBox(children=(Dropdown(description='Select Dataset:', options=('SURVOSTRAL', 'DATA_TRAWLER_SST', 'ARGO_FLOATS…"
]
},
"metadata": {
"application/vnd.jupyter.widget-view+json": {
"colab": {
"custom_widget_manager": {
"url": "https://ssl.gstatic.com/colaboratory-static/widgets/colab-cdn-widget-manager/2b70e893a8ba7c0f/manager.min.js"
}
}
}
},
"output_type": "display_data"
}
],
"source": [
"# @title\n",
"month_dropdown = Dropdown(description='Select Month:')\n",
"\n",
"def on_dataset_change(change):\n",
" dataset_name = change['new']\n",
" if dataset_name in dataset_info:\n",
" start_date_str = dataset_info[dataset_name]['time_coverage_start']\n",
" end_date_str = dataset_info[dataset_name]['time_coverage_end']\n",
"\n",
" if start_date_str and end_date_str:\n",
" try:\n",
" start_date = pd.to_datetime(start_date_str)\n",
" end_date = pd.to_datetime(end_date_str)\n",
"\n",
" # Generate a list of month-year strings\n",
" months = pd.date_range(start_date.replace(day=1), end_date.replace(day=1), freq='MS')\n",
" month_options = [month.strftime('%Y-%m-%d') for month in months]\n",
"\n",
" month_dropdown.options = month_options\n",
" month_dropdown.value = month_options[0] if month_options else None # Select the first month by default\n",
"\n",
" except Exception as e:\n",
" print(f\"Error processing dates for {dataset_name}: {e}\")\n",
" month_dropdown.options = []\n",
" month_dropdown.value = None\n",
" else:\n",
" print(f\"Time coverage information not available for {dataset_name}\")\n",
" month_dropdown.options = []\n",
" month_dropdown.value = None\n",
" else:\n",
" print(f\"Dataset '{dataset_name}' not found in dataset_info.\")\n",
" month_dropdown.options = []\n",
" month_dropdown.value = None\n",
"\n",
"dataset_dropdown.observe(on_dataset_change, names='value')\n",
"\n",
"# Display both dropdowns\n",
"display(VBox([dataset_dropdown, month_dropdown]))\n",
"\n",
"# Trigger the update for the initial value\n",
"if dataset_dropdown.value:\n",
" on_dataset_change({'new': dataset_dropdown.value})"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0hsEn5SjIgrR"
},
"source": [
"**4. Load data for selected dataset & month**"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "olvRZ0YPIlSm"
},
"source": [
"This code block:\n",
"* Builds ERDDAP query URL for `time`, `latitude`, `longitude`, `temperature`, (`pressure`).\n",
"\n",
"* Restricts query to chosen month.\n",
"\n",
"* Converts units if temperature is in Kelvin → Celsius.\n",
"\n",
"* Drops missing values and displays first rows."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 241,
"referenced_widgets": [
"78582aedfb8b423f8469028f733ac09d",
"d66103b561984216b0e06c768b943e71"
]
},
"id": "bPSG35uIdaaI",
"outputId": "1bd800e3-c0b5-435d-9c63-0876518c890a"
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "78582aedfb8b423f8469028f733ac09d",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Output()"
]
},
"metadata": {
"application/vnd.jupyter.widget-view+json": {
"colab": {
"custom_widget_manager": {
"url": "https://ssl.gstatic.com/colaboratory-static/widgets/colab-cdn-widget-manager/2b70e893a8ba7c0f/manager.min.js"
}
}
}
},
"output_type": "display_data"
}
],
"source": [
"# @title\n",
"#global df_data\n",
"def load_data(dataset_name, selected_month):\n",
" global df_data # Declare df_data as global here\n",
" if dataset_name in dataset_info:\n",
" temp_variable = dataset_info[dataset_name]['temperature_variable']\n",
" pressure_variable = dataset_info[dataset_name]['pressure_variable']\n",
"\n",
" # Construct the base URL\n",
" url = f\"{BASE_URL}{dataset_name}.csv?time%2Clatitude%2Clongitude%2C{temp_variable}\"\n",
"\n",
" # Add pressure variable if available\n",
" if pressure_variable:\n",
" url += f\"%2C{pressure_variable}\"\n",
"\n",
" # Calculate start and end dates for the selected month\n",
" start_date = pd.to_datetime(selected_month)\n",
" end_date = start_date + pd.offsets.MonthBegin(1)\n",
"\n",
" # Add time constraints to the URL\n",
" url += f\"&time%3E={start_date.strftime('%Y-%m-%dT%H:%M:%SZ')}&time%3C={end_date.strftime('%Y-%m-%dT%H:%M:%SZ')}\"\n",
"\n",
" try:\n",
" print(f\"Fetching data from: {url}\")\n",
" df_data = pd.read_csv(url)\n",
" # Drop rows with any NaN values\n",
" df_data.dropna(inplace=True)\n",
"\n",
" # Check if temperature is in Kelvin and convert to Celsius if necessary\n",
" if not df_data.empty and df_data.iloc[0][temp_variable].lower() == 'kelvin':\n",
" # Convert temperature column to object type to allow mixed types\n",
" df_data[temp_variable] = df_data[temp_variable].astype(object)\n",
" # Use .loc for assignment to avoid chained assignment warning\n",
" df_data.loc[1:, temp_variable] = pd.to_numeric(df_data.loc[1:, temp_variable], errors='coerce') - 273.15\n",
" # Replace the first row with 'celsius'\n",
" df_data.loc[0, temp_variable] = 'celsius'\n",
"\n",
"\n",
" display(df_data.head())\n",
" except urllib.error.HTTPError as e:\n",
" if e.getcode() == 404:\n",
" print(f\"Data not available for the selected date: {selected_month}\")\n",
" else:\n",
" print(f\"HTTP error fetching data from {url}: {e}\")\n",
" df_data = pd.DataFrame() # Assign an empty DataFrame on error\n",
" except Exception as e:\n",
" print(f\"Error fetching data from {url}: {e}\")\n",
" df_data = pd.DataFrame() # Assign an empty DataFrame on error\n",
"\n",
"# Create an output widget to display results\n",
"output_widget = Output()\n",
"display(output_widget)\n",
"\n",
"# Observe changes in the month dropdown and load data\n",
"def on_month_change(change):\n",
" with output_widget:\n",
" output_widget.clear_output()\n",
" dataset_name = dataset_dropdown.value\n",
" selected_month = change['new']\n",
" if dataset_name and selected_month:\n",
" load_data(dataset_name, selected_month)\n",
"\n",
"month_dropdown.observe(on_month_change, names='value')\n",
"\n",
"# Trigger data loading for the initially selected dataset and month\n",
"if dataset_dropdown.value and month_dropdown.value:\n",
" with output_widget:\n",
" output_widget.clear_output()\n",
" load_data(dataset_dropdown.value, month_dropdown.value)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XcPh1DkhIRkJ"
},
"source": [
"5. Define costume temperature colormap & legend"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8TTIonYNI6OK"
},
"source": [
"Thise code block uses **LinearColormap** (−15°C → +35°C) with rainbow-style gradient and creates a **colorbar legend** (HTML widget).\n",
"\n",
"It is used later to color markers by temperature."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "jVb9Y7VEr1O5"
},
"outputs": [],
"source": [
"# @title\n",
"# Define a custom colormap and normalization\n",
"custom_min_temp = -15 # Example min temperature\n",
"custom_max_temp = 35 # Example max temperature\n",
"colormap = LinearColormap(colors=['#9400D3', '#4B0082', '#0000FF', '#00FF00', '#FFFF00', '#FF7F00', '#FF0000'], vmin=custom_min_temp, vmax=custom_max_temp)\n",
"\n",
"# Generate temperature values for the legend\n",
"num_steps = 7 # Number of steps in the legend\n",
"step_size = (colormap.vmax - colormap.vmin) / (num_steps - 1)\n",
"temp_values = [colormap.vmin + i * step_size for i in range(num_steps)]\n",
"\n",
"# Create a list of colors for the gradient\n",
"colors = [matplotlib.colors.to_hex(colormap(temp)) for temp in temp_values]\n",
"gradient_css = f\"linear-gradient(to right, {', '.join(colors)})\"\n",
"\n",
"# Create HTML for the continuous color bar\n",
"color_bar_html = f'
\n | time | \nlatitude | \nlongitude | \nTEMP | \nPRESS | \n
---|---|---|---|---|---|
0 | \nUTC | \ndegrees_north | \ndegrees_east | \ndegree_Celsius | \ndecibar | \n
1 | \n2024-01-12T00:33:00Z | \n-74.85373 | \n-102.42796666666666 | \n0.094 | \n14.1 | \n
2 | \n2024-01-12T00:33:00Z | \n-74.85373 | \n-102.42796666666666 | \n0.087 | \n23.9 | \n
3 | \n2024-01-12T00:33:00Z | \n-74.85373 | \n-102.42796666666666 | \n0.028 | \n34.0 | \n
4 | \n2024-01-12T00:33:00Z | \n-74.85373 | \n-102.42796666666666 | \n-0.173 | \n44.1 | \n