Interactive Exploration of In-Situ Temperature Datasets with WMS Overlay

Interactive Exploration of In-Situ Temperature Datasets with WMS Overlay#

Purpose

For an interactive version of this page please visit the Google Colab:
Open in Google Colab
_{(To open link in new tab press Ctrl + click)}

Alternatively this notebook can be opened with Binder by following the link: Interactive Exploration of In-Situ Temperature Datasets with WMS Overlay

This notebook provides an interactive map interface to explore in-situ oceanographic temperature datasets served through the OCEAN ICE ERDDAP server.

It combines two key elements:

In-situ observational data (ARGO floats, cruises, trawlers, etc.), filtered by dataset and time.
COriolis Ocean Dataset for Reanalysis (CORA) INSITU_GLO_PHY_TS_OA_MY_013_052, displayed via WMS overlay for spatial context.

Users can:

Select a dataset and browse its time coverage (start → end).
Choose a specific month and load available measurements.
Visualize the data on a map, with points color-coded by temperature.
Overlay WMS climatologies for comparison.
Inspect values interactively by clicking markers.

Data sources

In-situ datasets (OCEAN ICE ERDDAP):

SURVOSTRAL (ship of opportunity line in the Southern Ocean)
DATA_TRAWLER_SST (surface trawler measurements)
ARGO_FLOATS_OCEANICE (DE, UK, global)
SOCHIC Cruise 2022 (Agulhas II)

Global gridded climatology (Copernicus Marine):

Product: COriolis Ocean Dataset for Reanalysis (CORA) INSITU_GLO_PHY_TS_OA_MY_013_052
Content: Objective analysis of temperature (TEMP) and salinity (PSAL) from 1960 → present.
Accessed through ncWMS for visual overlays.

Instruction to use this notebook

Run each code cell by clicking the Play button (▶️) on the left side of each grey code block. This will execute the code in order and allow all features to work properly.

Explaining the code

1. Import required libraries and define data sources

The following libraries are used in this notebook:

Data handling: pandas, numpy
Requests & parsing: requests, urllib.error, xml.etree.ElementTree
Visualization: matplotlib, branca.colormap
Mapping: ipyleaflet (Map, WMSLayer, CircleMarker, Popup)
Interactivity: ipywidgets (Dropdown, SelectionSlider, VBox, etc.)
Notebook display: IPython.display

URLs are defined for:

ERDDAP metadata (allDatasets.csv)
Dataset access (BASE_URL)
WMS capabilities for Copernicus climatology.

# @title
import pandas as pd
from branca.colormap import LinearColormap
import matplotlib.colors
import urllib.error
import requests
import xml.etree.ElementTree as ET
from IPython.display import display
import matplotlib.pyplot as plt
import numpy as np
from ipyleaflet import (
                        Map,
                        WMSLayer,
                        Marker,
                        CircleMarker,
                        Popup
                       )
from ipywidgets import (
                        interact,
                        Output,
                        Dropdown,
                        SelectionSlider,
                        FloatSlider,
                        SelectionRangeSlider,
                        VBox,
                        Label,
                        HTML
                       )

datasets = [
    'SURVOSTRAL',
    'DATA_TRAWLER_SST',
    #'MEOP_Animal-borne_profiles',
    'ARGO_FLOATS_OCEANICE_DE',
    'ARGO_FLOATS_OCEANICE',
    'ARGO_FLOATS_OCEANICE_UK',
    'SOCHIC_Cruise_2022_Agulhas_II_met'
    ]

temp_vars = [
    'temperature',
    'Temperature',
    'temperature_ctd',
    'temp',
    'TEMP',
    'sea_surface_temperature',
    'Temperature_oC'
    ]

METADATA_URL = 'https://er1.s4oceanice.eu/erddap/tabledap/allDatasets.csv?metadata'
BASE_URL = 'https://er1.s4oceanice.eu/erddap/tabledap/'
SLA_URL = 'https://prod-erddap.emodnet-physics.eu/ncWMS/wms?SERVICE=WMS&REQUEST=GetCapabilities&VERSION=1.3.0&DATASET=INSITU_GLO_PHY_TS_OA_MY_013_052'

2. Retrieve and parse dataset metadata

This code block reads allDatasets.csv from ERDDAP and filters for relevant dataset names (ARGO, SURVOSTRAL, trawler, cruise).

Then, it extracts:

time_coverage_start / time_coverage_end
temperature variable name
pressure variable name (if available).

Finally, it stores metadata in a structured dictionary (dataset_info).

# @title
try:
    # Use pandas to read the CSV directly from the URL
    df_metadata = pd.read_csv(METADATA_URL)

    # Filter the dataframe to keep rows where the 'metadata' column contains any string from the datasets list
    filtered_df = df_metadata[df_metadata['metadata'].str.contains('|'.join(datasets), na=False)].copy()
    #display(filtered_df)
    # Add '.csv' to the 'metadata' column
    filtered_df['metadata'] = filtered_df['metadata'] + '.csv'

    dataset_info = {}

    for index, row in filtered_df.iterrows():
        url = row['metadata']
        dataset_name = None
        # Find the dataset name from the original datasets list using a more precise match
        for ds in datasets:
            # Use a more specific check to ensure the full dataset name is in the URL
            if f'/{ds}/' in url:
                dataset_name = ds
                break

        if dataset_name:
            #print(f"Attempting to process dataset: {dataset_name} from {url}")
            try:
                df_dataset = pd.read_csv(url)

                time_coverage_start = None
                time_coverage_end = None
                temp_variable = None
                pressure_variable = None


                for idx, row in df_dataset.iterrows():
                    if row['Attribute Name'] == 'time_coverage_start':
                        time_coverage_start = row['Value']
                    elif row['Attribute Name'] == 'time_coverage_end':
                        time_coverage_end = row['Value']
                    elif row['Row Type'] == 'variable':
                        if row['Variable Name'] in temp_vars:
                            temp_variable = row['Variable Name']
                        elif row['Variable Name'] in ['PRESS', 'depth']:
                            pressure_variable = row['Variable Name']

                dataset_info[dataset_name] = {
                    'time_coverage_start': time_coverage_start,
                    'time_coverage_end': time_coverage_end,
                    'temperature_variable': temp_variable,
                    'pressure_variable': pressure_variable
                }
                #print(f"Successfully processed dataset: {dataset_name}")
            except Exception as e:
                print(f"Error processing dataset from {url}: {e}")

    if not dataset_info:
        print("No information extracted for the specified datasets.")

except Exception as e:
    print(f"An error occurred: {e}")

3. Build dataset and month selection controls

In the following two sections two dropdowns are created:

Dropdown 1: helps to choose the dataset.
Dropdown 2: is dynamically populated with available months (between start/end coverage).

This section automatically updates when dataset changes.

# @title
dataset_dropdown = Dropdown(
    options=dataset_info.keys(),
    description='Select Dataset:',
)
#display(dataset_dropdown)

# @title
month_dropdown = Dropdown(description='Select Month:')

def on_dataset_change(change):
    dataset_name = change['new']
    if dataset_name in dataset_info:
        start_date_str = dataset_info[dataset_name]['time_coverage_start']
        end_date_str = dataset_info[dataset_name]['time_coverage_end']

        if start_date_str and end_date_str:
            try:
                start_date = pd.to_datetime(start_date_str)
                end_date = pd.to_datetime(end_date_str)

                # Generate a list of month-year strings
                months = pd.date_range(start_date.replace(day=1), end_date.replace(day=1), freq='MS')
                month_options = [month.strftime('%Y-%m-%d') for month in months]

                month_dropdown.options = month_options
                month_dropdown.value = month_options[0] if month_options else None # Select the first month by default

            except Exception as e:
                print(f"Error processing dates for {dataset_name}: {e}")
                month_dropdown.options = []
                month_dropdown.value = None
        else:
            print(f"Time coverage information not available for {dataset_name}")
            month_dropdown.options = []
            month_dropdown.value = None
    else:
        print(f"Dataset '{dataset_name}' not found in dataset_info.")
        month_dropdown.options = []
        month_dropdown.value = None

dataset_dropdown.observe(on_dataset_change, names='value')

# Display both dropdowns
display(VBox([dataset_dropdown, month_dropdown]))

# Trigger the update for the initial value
if dataset_dropdown.value:
    on_dataset_change({'new': dataset_dropdown.value})

4. Load data for selected dataset & month

This code block:

Builds ERDDAP query URL for time, latitude, longitude, temperature, (pressure).
Restricts query to chosen month.
Converts units if temperature is in Kelvin → Celsius.
Drops missing values and displays first rows.

# @title
#global df_data
def load_data(dataset_name, selected_month):
    global df_data # Declare df_data as global here
    if dataset_name in dataset_info:
        temp_variable = dataset_info[dataset_name]['temperature_variable']
        pressure_variable = dataset_info[dataset_name]['pressure_variable']

        # Construct the base URL
        url = f"{BASE_URL}{dataset_name}.csv?time%2Clatitude%2Clongitude%2C{temp_variable}"

        # Add pressure variable if available
        if pressure_variable:
            url += f"%2C{pressure_variable}"

        # Calculate start and end dates for the selected month
        start_date = pd.to_datetime(selected_month)
        end_date = start_date + pd.offsets.MonthBegin(1)

        # Add time constraints to the URL
        url += f"&time%3E={start_date.strftime('%Y-%m-%dT%H:%M:%SZ')}&time%3C={end_date.strftime('%Y-%m-%dT%H:%M:%SZ')}"

        try:
            print(f"Fetching data from: {url}")
            df_data = pd.read_csv(url)
            # Drop rows with any NaN values
            df_data.dropna(inplace=True)

            # Check if temperature is in Kelvin and convert to Celsius if necessary
            if not df_data.empty and df_data.iloc[0][temp_variable].lower() == 'kelvin':
                 # Convert temperature column to object type to allow mixed types
                 df_data[temp_variable] = df_data[temp_variable].astype(object)
                 # Use .loc for assignment to avoid chained assignment warning
                 df_data.loc[1:, temp_variable] = pd.to_numeric(df_data.loc[1:, temp_variable], errors='coerce') - 273.15
                 # Replace the first row with 'celsius'
                 df_data.loc[0, temp_variable] = 'celsius'


            display(df_data.head())
        except urllib.error.HTTPError as e:
            if e.getcode() == 404:
                print(f"Data not available for the selected date: {selected_month}")
            else:
                print(f"HTTP error fetching data from {url}: {e}")
            df_data = pd.DataFrame() # Assign an empty DataFrame on error
        except Exception as e:
            print(f"Error fetching data from {url}: {e}")
            df_data = pd.DataFrame() # Assign an empty DataFrame on error

# Create an output widget to display results
output_widget = Output()
display(output_widget)

# Observe changes in the month dropdown and load data
def on_month_change(change):
    with output_widget:
        output_widget.clear_output()
        dataset_name = dataset_dropdown.value
        selected_month = change['new']
        if dataset_name and selected_month:
            load_data(dataset_name, selected_month)

month_dropdown.observe(on_month_change, names='value')

# Trigger data loading for the initially selected dataset and month
if dataset_dropdown.value and month_dropdown.value:
     with output_widget:
        output_widget.clear_output()
        load_data(dataset_dropdown.value, month_dropdown.value)

Define costume temperature colormap & legend

Thise code block uses LinearColormap (−15°C → +35°C) with rainbow-style gradient and creates a colorbar legend (HTML widget).

It is used later to color markers by temperature.

# @title
# Define a custom colormap and normalization
custom_min_temp = -15 # Example min temperature
custom_max_temp = 35 # Example max temperature
colormap = LinearColormap(colors=['#9400D3', '#4B0082', '#0000FF', '#00FF00', '#FFFF00', '#FF7F00', '#FF0000'], vmin=custom_min_temp, vmax=custom_max_temp)

# Generate temperature values for the legend
num_steps = 7  # Number of steps in the legend
step_size = (colormap.vmax - colormap.vmin) / (num_steps - 1)
temp_values = [colormap.vmin + i * step_size for i in range(num_steps)]

# Create a list of colors for the gradient
colors = [matplotlib.colors.to_hex(colormap(temp)) for temp in temp_values]
gradient_css = f"linear-gradient(to right, {', '.join(colors)})"

# Create HTML for the continuous color bar
color_bar_html = f'<div style="width: 100%; height: 20px; background: {gradient_css};"></div>'

# Create HTML for the temperature labels below the color bar
label_html = '<div style="display: flex; justify-content: space-between; width: 100%;">'
for temp in temp_values:
    label_html += f'<span>{temp:.1f}°C</span>'
label_html += '</div>'

# Combine the color bar and labels
legend_html_content = f"<b>Temperature Legend (°C)</b><br>{color_bar_html}{label_html}"

# Create and display the HTML widget
legend_html = HTML(value=legend_html_content)
#display(legend_html)

6. Render interactive map with in-situ data + WMS overlay

This final section:

Creates global map centered at (0°,0°).
Filters measurements to near-surface (≤ 5 m depth or equivalent).
Adds circle markers for each in-situ measurement, colored by temperature.
Attaches popup with exact value to each marker.
Overlays Copernicus CORA TEMP WMS layer for same month.
Displays both map and legend together.

Note: Re-run the final map cell whenever you change dataset or month to refresh markers and overlays.

# @title
# Create a map centered on a specific location with a zoom level
m = Map(center=(0, 0), zoom=1)

# Format the selected month to the required time format
time_str = pd.to_datetime(month_dropdown.value).strftime('%Y-%m-%dT00:00:00Z')

# Remove existing WMS layers and markers
m.layers = [layer for layer in m.layers if not isinstance(layer, (WMSLayer, Marker, CircleMarker))]

# Add scatter plot from df_data if available and filtered
if 'df_data' in globals() and not df_data.empty:
    # Start with a copy of the data excluding the unit row
    filtered_df_data = df_data.iloc[1:].copy()

    dataset_name = dataset_dropdown.value
    pressure_variable = dataset_info[dataset_name].get('pressure_variable')
    temp_variable = dataset_info[dataset_name]['temperature_variable']

    # Apply pressure filtering if the pressure variable exists and is in the DataFrame
    if pressure_variable and pressure_variable in filtered_df_data.columns:
          # Convert pressure variable to numeric, handling potential errors
          if dataset_name == 'SURVOSTRAL' and pressure_variable == 'depth':
              # Remove 'm' and convert to numeric
              filtered_df_data[pressure_variable] = filtered_df_data[pressure_variable].astype(str).str.replace('m', '', regex=False)
              filtered_df_data[pressure_variable] = pd.to_numeric(filtered_df_data[pressure_variable], errors='coerce')
              filtered_df_data.dropna(subset=[pressure_variable], inplace=True) # Drop rows where conversion failed
              # Filter based on the numeric pressure variable
              if pd.api.types.is_numeric_dtype(filtered_df_data[pressure_variable]):
                filtered_df_data = filtered_df_data[filtered_df_data[pressure_variable].astype(float) <= 5]

          elif pressure_variable == 'PRESS':
              # Explicitly convert 'PRESS' to numeric
              filtered_df_data[pressure_variable] = pd.to_numeric(filtered_df_data[pressure_variable], errors='coerce')
              filtered_df_data.dropna(subset=[pressure_variable], inplace=True) # Drop rows where conversion failed
              # Filter based on the numeric pressure variable for 'PRESS'
              if pd.api.types.is_numeric_dtype(filtered_df_data[pressure_variable]):
                    filtered_df_data = filtered_df_data[filtered_df_data[pressure_variable].astype(float) <= 5 * 10]
          else:
            print(f"Warning: Pressure variable '{pressure_variable}' is not numeric after conversion attempts or does not have a defined filtering rule.")


    # Check if the filtered DataFrame is empty
    if filtered_df_data.empty:
        print("Filtered DataFrame is empty. No markers to add.")


    # Assuming 'latitude' and 'longitude' columns exist
    if 'latitude' in filtered_df_data.columns and 'longitude' in filtered_df_data.columns and temp_variable in filtered_df_data.columns:
        #print(f"Filtered DataFrame shape: {filtered_df_data.shape}")
        #print("First 5 rows of filtered_df_data:")
        #display(filtered_df_data.head())

        # Convert temperature column to numeric, handling potential errors
        filtered_df_data[temp_variable] = pd.to_numeric(filtered_df_data[temp_variable], errors='coerce')
        filtered_df_data.dropna(subset=[temp_variable], inplace=True) # Drop rows where conversion failed

        # Add circles for each data point, colored by temperature using the custom colormap
        for index, row in filtered_df_data.iterrows():
            temp_value = row[temp_variable]
            # Use the colormap to get the color for the temperature value
            color = matplotlib.colors.to_hex(colormap(temp_value))

            circle_marker = CircleMarker(
                location=(row['latitude'], row['longitude']),
                radius=5, # Adjust size as needed
                color=color,
                fill_color=color,
                fill_opacity=0.8
            )

            # Create an HTML popup with the temperature information
            popup_html = HTML()
            popup_html.value = f"Temperature: {temp_value:.2f} °C"
            popup = Popup(
                location=(row['latitude'], row['longitude']),
                child=popup_html,
                close_button=False,
                auto_close=False,
                close_on_escape_key=False
            )

            # Add the popup to the marker
            circle_marker.popup = popup

            m.add_layer(circle_marker)

    else:
        print("Latitude, Longitude, or Temperature column not found in filtered_df_data.")
else:
    print("df_data is not available or is empty. No markers to add.")

# Create and add the new WMS layer using the custom colormap's vmin and vmax
wms_url = (
    f"{SLA_URL.split('?')[0]}?"
    f"SERVICE=WMS&VERSION=1.3.0&REQUEST=GetMap&FORMAT=image/png&TRANSPARENT=true&"
    f"LAYERS='INSITU_GLO_PHY_TS_OA_MY_013_052/TEMP'&"
    f"ELEVATION=1&"
    f"TIME={time_str}&"
    f"CRS=EPSG:4326&STYLES=default-scalar/x-Rainbow&COLORSCALERANGE={colormap.vmin},{colormap.vmax}&WIDTH=256&HEIGHT=256&BBOX=-180,-90,180,90"
)

wms_layer = WMSLayer(
    url='https://prod-erddap.emodnet-physics.eu/ncWMS/wms',
    layers='INSITU_GLO_PHY_TS_OA_MY_013_052/TEMP',
    time=time_str,
    styles='default-scalar/x-Rainbow',
    transparent=True,
    format='image/png',
    # Add color scale range to WMS layer
    other_options={'COLORSCALERANGE': f'{colormap.vmin},{colormap.vmax}'}
)
m.add_layer(wms_layer)

# Initial map display and layer update based on the current month selection
display(m)
display(legend_html)