Exploring ARGO Float Observation: Interactive Maps and Profiles

Exploring ARGO Float Observation: Interactive Maps and Profiles#

For an interactive version of this page please visit the Google Colab:
Open in Google Colab
(To open link in new tab press Ctrl + click)

Alternatively this notebook can be opened with Binder by following the link: Exploring ARGO Float Observation: Interactive Maps and Profiles

Purpose

This notebook provides an interactive environment to explore and visualize global oceanographic data collected by ARGO floats, using data served through the ERDDAP service hosted by OCEAN ICE.

Users can:

  • Select key ocean variables (e.g., Temperature, Salinity) for analysis.

  • Explore temporal and vertical distributions using scatter plots and heatmaps.

  • Visualize float trajectories on an interactive map, with detailed in-situ measurement popups.

  • Interactively filter data by ARGO platform and variable for in-depth exploration.

Data sources

The notebook uses the following dataset: https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE.html

This dataset is hosted by the OCEAN ICE platform via ERDDAP and provides global, near-real-time observations collected by Argo floats (i.e.,free-drifting ocean sensors that profile the upper ~2000 meters of the ocean).

The Argo network provides over 100,000 temperature and salinity profiles annually, ensuring continuous monitoring of upper ocean conditions with rapid data availability (within hours) and delayed-mode, quality-controlled data.

Instructions to use this Notebook

To use this interactive notebook:

  • Run each code cell by clicking the Play button (▶️) on the left side of each grey code block.

  • Follow the interactive widgets to select variables, platforms and visualize the data

  • All interactive plots and maps will update automatically based on your selection.

Explaining the code

1. Install an import required libraries

This section loads all Phyton packages needed for interactive mapping, plotting, and widgets in the notebook environment.

The libraries include:

  1. Import Libraries and Modules

The following block of code is loading all the Python modules the Notebook will need to fetch data from online sources process it into tables and time series, display it on interactive maps, and plot it in customizable charts.

# @title
import requests
import xml.etree.ElementTree as ET
import pandas as pd
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import math
from io import BytesIO
import matplotlib.dates as mdates
from traitlets import Unicode
from ipyleaflet import (
    Map,
    Marker,
    basemaps,
    projections,
    WMSLayer,
    Popup,
    Icon,
    ImageOverlay,
    Polyline
)
from ipywidgets import (
    FloatSlider,
    Text,
    HBox,
    Layout,
    Output,
    VBox,
    HBox,
    HTML,
    Label,
    Dropdown
)
from IPython.display import display, clear_output

2. Define data source URLs and load platform metadata

The following block pulls the list of available ARGO platform IDs and dataset metadata from the ERDDAP server.

This ensures the notebook know what floats are available and theri spatial/temporal coverage before generating plots or maps.

# @title
PLATFORMCODE_URL = 'https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE.csv?PLATFORMCODE&distinct()'
METADATA_URL = 'https://er1.s4oceanice.eu/erddap/info/ARGO_FLOATS_OCEANICE/index.csv'

variables = ['TEMP', 'PSAL'] #'DOXY', 'TURBIDITY', 'CHLA', 'NITRATE'

platformcode = []
try:
  df = pd.read_csv(PLATFORMCODE_URL)
  platformcode = df['PLATFORMCODE'].tolist()
except Exception as e:
  print(f'ERROR, could not load platformcode list: {e}')

try:
  df_metadata = pd.read_csv(METADATA_URL)

  time_coverage_end = df_metadata[df_metadata['Attribute Name'] == 'time_coverage_end']['Value'].iloc[0]
  time_coverage_start = df_metadata[df_metadata['Attribute Name'] == 'time_coverage_start']['Value'].iloc[0]
  geospatial_lat_min = df_metadata[df_metadata['Attribute Name'] == 'geospatial_lat_min']['Value'].iloc[0]
  geospatial_lat_max = df_metadata[df_metadata['Attribute Name'] == 'geospatial_lat_max']['Value'].iloc[0]
  geospatial_lon_min = df_metadata[df_metadata['Attribute Name'] == 'geospatial_lon_min']['Value'].iloc[0]
  geospatial_lon_max = df_metadata[df_metadata['Attribute Name'] == 'geospatial_lon_max']['Value'].iloc[0]

except Exception as e:
  print(f'ERROR: could not extract metadata values: {e}')

3. Create dropdown menu for variable selection

A dropdown menu lets users select which oceanographic variable (i.e., temperature or salinity) they want to explore.

Note: changing the selection automatically updates plots and maps. The currently selected variable is stored in a global variable for use in subsequent calculations.

# @title
# Create Dropdown for variables
variables_dropdown = Dropdown(
    options=variables,
    description='Variable:',
    disabled=False,
)

display(variables_dropdown)

# Function to update the variable when the dropdown value changes
def update_variable(change):
    global variable
    variable = change['new']

# Observe changes in the dropdown
variables_dropdown.observe(update_variable, names='value')

# Initialize the variable with the default dropdown value
variable = variables_dropdown.value

4. Generate scatter plot of variable vs. pressure over time

This code block generates an interactive scatter plot showing how the selected variable varies with pressure (depth) over time.

# @title
# Output widget to display the scatter plot
scatterplot_output = Output()

def generate_and_display_scatterplot(variable_name):
    with scatterplot_output:
        clear_output(wait=True)  # Clear previous plot
        try:
            global_url = f'https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE.csv?time%2CPRESS%2C{variable_name}&time%3E={time_coverage_start}&time%3C={time_coverage_end}&latitude%3E={geospatial_lat_min}&latitude%3C={geospatial_lat_max}&longitude%3E={geospatial_lon_min}&longitude%3C={geospatial_lon_max}'

            # Load data from global_url
            df_global_scatter = pd.read_csv(global_url, skiprows=[1]) # Skip the units row
            df_global_scatter_units = pd.read_csv(global_url, nrows=1).iloc[0].to_dict() # Read units from the first row

            # Remove duplicate rows
            df_global_scatter = df_global_scatter.drop_duplicates()

            # display(df_global_scatter.head()) # Optional: display head for debugging

            # Convert 'time' column to datetime objects
            df_global_scatter['time'] = pd.to_datetime(df_global_scatter['time'])

            # Sort by time to ensure correct plotting order
            df_global_scatter = df_global_scatter.sort_values(by='time').reset_index(drop=True) # Reset index after sorting

            # Generate the scatter plot - disable the legend
            plt.figure(figsize=(15, 8))

            scatter_plot_ax = sns.scatterplot(data=df_global_scatter, x=variable_name, y='PRESS', hue='time', palette='viridis', s=10, legend=False) # s is marker size, changed palette, added legend=False
            plt.title(f'Scatter Plot of {variable_name} vs. Pressure colored by Time')
            plt.xlabel(f"{variable_name} ({df_global_scatter_units.get(variable_name, '')})") # Updated xlabel with units
            plt.ylabel(f"Pressure ({df_global_scatter_units.get('PRESS', '')})") # Updated ylabel with units
            plt.gca().invert_yaxis() # Invert y-axis for pressure (higher pressure is deeper)

            # Add a colorbar for time
            # Use matplotlib's internal numeric representation of dates for normalization
            norm = plt.Normalize(mdates.date2num(df_global_scatter['time'].min()), mdates.date2num(df_global_scatter['time'].max()))
            sm = plt.cm.ScalarMappable(cmap="viridis", norm=norm)
            # Set the array with matplotlib's numeric representation of dates
            sm.set_array(mdates.date2num(df_global_scatter['time']))

            # Create a colorbar
            cbar = plt.colorbar(sm, ax=scatter_plot_ax, label='Time')

            # Use AutoDateLocator and DateFormatter directly on the colorbar axes
            cbar.ax.yaxis.set_major_locator(mdates.AutoDateLocator())
            cbar.ax.yaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))


            plt.tight_layout() # Adjust layout to prevent labels overlapping
            plt.show()

        except Exception as e:
            print(f"Error loading data or generating scatter plot: {e}")

# Observe changes in the variables_dropdown and update the scatter plot
variables_dropdown.observe(lambda change: generate_and_display_scatterplot(change['new']), names='value')

# Display the initial scatter plot
generate_and_display_scatterplot(variables_dropdown.value)

# Display the output widget
display(scatterplot_output)

5. Create dropdown for platform selection

A dropdown menu lets users select a specific ARGO float platfrom.

  • Selecting a platform updates the URL used to query that floats data.

  • Platform selection affects both the heatmap and the interactive map.

# @title
# Create Dropdown for platform codes
platformcode_dropdown = Dropdown(
    options=platformcode,
    description='', # Remove description from dropdown itself
    disabled=False,
)

# Create Label for the dropdown
platformcode_label = Label('Platform Code:')

# Initialize the variable with the default dropdown value
selected_plat = platformcode_dropdown.value

# Function to update the selected_plat variable and platform_url when the dropdown value changes
def update_selected_plat(change):
    global selected_plat, platform_url
    selected_plat = change['new']
    # Update platform_url with the selected platform code and current variable
    platform_url = f'https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE.csv?PRESS%2C{variable}&PLATFORMCODE=%22{selected_plat}%22&time%3E={time_coverage_start}&time%3C={time_coverage_end}&latitude%3E={geospatial_lat_min}&latitude%3C={geospatial_lat_max}&longitude%3E={geospatial_lon_min}&longitude%3C={geospatial_lon_max}'

# Observe changes in the dropdown
platformcode_dropdown.observe(update_selected_plat, names='value')

# Display in an HBox
display(HBox([platformcode_label, platformcode_dropdown]))

6. Generate heatmap and interactive map

This block dynamically displays:

  • Heatmap: shows the selected variable vs. time and pressure bins for the chosen platform.

  • Interactive map: shows the float trajectory with markers. Popups display date, latitude, longitude variable value and pressure. Trajectory is drawn as a polyline connecting successive float locations.

Both the heatmap and interactive map update automatically when the variable or platform selection changes.

# @title
# Output widget to display the platform heatmap
platform_heatmap_output = Output()
# Define map_output here so it's available in this cell
map_output = Output()


def generate_and_display_platform_heatmap_and_map(selected_platform_code, selected_variable):
    # Update the platform heatmap
    with platform_heatmap_output:
        clear_output(wait=True)  # Clear previous heatmap
        try:
            platform_url = f'https://er1.s4oceanice.eu/erddap/tabledap/ARGO_FLOATS_OCEANICE.csv?time%2CPRESS%2C{selected_variable}%2Clatitude%2Clongitude&PLATFORMCODE=%22{selected_platform_code}%22&time%3E={time_coverage_start}&time%3C={time_coverage_end}&latitude%3E={geospatial_lat_min}&latitude%3C={geospatial_lat_max}&longitude%3E={geospatial_lon_min}&longitude%3C={geospatial_lon_max}'

            # Load data from platform_url - now includes lat and lon to simplify lookup
            df_platform = pd.read_csv(platform_url, skiprows=[1]) # Skip the units row
            df_platform_units = pd.read_csv(platform_url, nrows=1).iloc[0].to_dict() # Read units from the first row


            # Remove duplicate rows
            df_platform = df_platform.drop_duplicates()

            print(f"Data loaded for platform: {selected_platform_code}")
            # display(df_platform.head()) # Optional: display head for debugging

            # Convert 'time' column to datetime objects
            if 'time' in df_platform.columns:
                df_platform['time'] = pd.to_datetime(df_platform['time'])
            else:
                print("Error: 'time' column not found in data from platform_url.")
                raise ValueError("'time' column is required for this functionality.")

            # Ensure latitude and longitude are numeric in df_platform
            if 'latitude' in df_platform.columns and 'longitude' in df_platform.columns:
                df_platform['latitude'] = pd.to_numeric(df_platform['latitude'], errors='coerce')
                df_platform['longitude'] = pd.to_numeric(df_platform['longitude'], errors='coerce')
            else:
                 print("Error: 'latitude' or 'longitude' column not found in platform data.")
                 raise ValueError("'latitude' and 'longitude' columns are required for map functionality.")


            # Group pressure values into bins of 10 (for heatmap)
            if 'PRESS' in df_platform.columns:
                min_press = df_platform['PRESS'].min()
                max_press = df_platform['PRESS'].max()
                # Create bins starting from a multiple of 10 less than or equal to min_press
                bins = np.arange(math.floor(min_press / 10) * 10, math.ceil(max_press / 10) * 10 + 10, 10)
                df_platform['PRESS_BIN'] = pd.cut(df_platform['PRESS'], bins=bins, right=False, labels=bins[:-1])

                # Pivot the DataFrame for heatmap - time on x-axis, PRESS_BIN on y-axis, average of variable for values
                # Use only date for heatmap x-axis
                heatmap_data_platform = df_platform.copy()
                heatmap_data_platform['time_date'] = heatmap_data_platform['time'].dt.date
                heatmap_data_platform = heatmap_data_platform.pivot_table(index='PRESS_BIN', columns='time_date', values=selected_variable, aggfunc='mean', observed=True)

                # Drop rows (pressure bins) from the pivoted data that are all NaN
                heatmap_data_platform = heatmap_data_platform.dropna(axis=0, how='all')

                # Generate the heatmap with a thin, light gray grid
                plt.figure(figsize=(12, 8))
                sns.heatmap(heatmap_data_platform, cmap='coolwarm', linewidths=.5, linecolor='lightgray')
                plt.title(f'Heatmap of {selected_variable} ({df_platform_units.get(selected_variable, "")}) vs. Time and Pressure Bins for Platform {selected_platform_code}') # Added units to title
                plt.xlabel('Time')
                plt.ylabel(f"Pressure ({df_platform_units.get('PRESS', '')})") # Added units to ylabel
                # Removed plt.gca().invert_yaxis() as bin order might be handled by pandas.cut

                plt.show()

            else:
                print("Warning: 'PRESS' column not found in platform data. Heatmap will not be generated.")

        except Exception as e:
            print(f"Error loading data or generating heatmap for platform {selected_platform_code}: {e}")

    # Update the map
    with map_output:
        clear_output(wait=True) # Clear previous map output
        try:
            # Now we use df_platform for map data as it contains all necessary info
            df_map_data = df_platform.copy()

            # Drop rows with invalid coordinates or time
            df_map_data.dropna(subset=['latitude', 'longitude', 'time'], inplace=True)

            # Add debugging print: check how many rows are in df_map_data
            print(f"\nNumber of valid data points for map: {len(df_map_data)}")


            # If there are data points, create and display the map
            if not df_map_data.empty:
                # Get the mean latitude and longitude for centering the map
                center_lat = df_map_data['latitude'].mean()
                center_lon = df_map_data['longitude'].mean()

                # Create a map
                m = Map(center=(center_lat, center_lon), zoom=4)

                # Create a list of (latitude, longitude) tuples for the polyline
                route_coords = list(zip(df_map_data['latitude'], df_map_data['longitude']))

                # Add a polyline to the map to show the route
                if route_coords:
                    route_line = Polyline(
                        locations=route_coords,
                        color="blue",
                        fill=False
                    )
                    m.add_layer(route_line)

                # Add markers for each coordinate with a popup showing the date, latitude, longitude, and variable measurement
                for index, row in df_map_data.iterrows():
                    date_str = row['time'].strftime('%Y-%m-%d %H:%M:%S')
                    lat_str = f"{row['latitude']:.2f}"
                    lon_str = f"{row['longitude']:.2f}"

                    # Get the variable measurement, handle potential NaN values
                    variable_measurement = row.get(selected_variable)
                    pressure_value = row.get('PRESS')

                    variable_measurement_str = "N/A" if pd.isna(variable_measurement) else f"{variable_measurement:.2f} {df_platform_units.get(selected_variable, '')}" # Added units to popup
                    pressure_value_str = "N/A" if pd.isna(pressure_value) else f"{pressure_value:.2f} {df_platform_units.get('PRESS', '')}" # Added units to popup


                    # Add style to make the text black and include lat/lon/measurement
                    popup_html_content = f"""
                    <span style='color:black;'>
                        <b>Date:</b> {date_str}<br>
                        <b>Latitude:</b> {lat_str}<br>
                        <b>Longitude:</b> {lon_str}<br>
                        <b>{selected_variable}:</b> {variable_measurement_str}<br>
                        <b>Pressure:</b> {pressure_value_str}
                    </span>
                    """
                    popup_html = HTML(popup_html_content)
                    popup_html.layout.width = '200px'

                    # Create and add the marker with the popup
                    marker = Marker(location=(row['latitude'], row['longitude']), draggable=False)
                    marker.popup = popup_html
                    m.add_layer(marker)

                # Display the map within the output widget
                display(m)

            else:
                print(f"No valid coordinate data available for platform {selected_platform_code} in the specified time and geographical range to display on the map.")


        except Exception as e:
            print(f"Error loading data or generating map: {e}")


# Observe changes in the platformcode_dropdown/variables_dropdown and update both the heatmap and map
def update_platform_heatmap_and_map(change):
    generate_and_display_platform_heatmap_and_map(platformcode_dropdown.value, variables_dropdown.value)

platformcode_dropdown.observe(update_platform_heatmap_and_map, names='value')
variables_dropdown.observe(update_platform_heatmap_and_map, names='value') # Also observe variable changes

# Display the initial platform heatmap and map based on default selections
generate_and_display_platform_heatmap_and_map(platformcode_dropdown.value, variables_dropdown.value)

display(HBox([platform_heatmap_output, map_output], layout=Layout(width='100%', justify_content='space-around')))