Sentinel image segmentation using CNN (UNet encoder)ΒΆ

Overview of Neural Network Architectures and EncodersΒΆ

Neural Network ArchitecturesΒΆ

There are four widely used types of neural network architectures, each suited to different types of data and tasks:

1. Feedforward Neural Networks (FNNs)ΒΆ

  • Structure: Data flows in one directionβ€”from input to output.
  • Use Cases: Simple classification or regression (e.g., predicting house prices).
  • Limitation: Cannot handle spatial or sequential relationships in the data.

2. Convolutional Neural Networks (CNNs)ΒΆ

  • Structure: Uses convolutional layers with filters that scan across input (e.g., images).
  • Use Cases: Image classification, object detection, medical imaging.
  • Key Feature: Captures spatial patterns and hierarchies (edges, shapes).

3. Recurrent Neural Networks (RNNs)ΒΆ

  • Structure: Designed for sequential data, with outputs feeding back into the network.
  • Use Cases: Text generation, speech recognition, time series forecasting.
  • Limitation: Struggles with long-term dependencies (addressed by LSTM/GRU variants).

4. TransformersΒΆ

  • Structure: Uses self-attention to process all elements of a sequence in parallel.
  • Use Cases: Natural Language Processing (e.g., ChatGPT, translation, summarization).
  • Key Advantage: Handles long-range dependencies and is highly parallelizable.

Encoders in Neural NetworksΒΆ

An encoder is a component of a neural network that compresses input data (like text or images) into a smaller, informative representation called a feature representation or embedding.

Types of EncodersΒΆ

1. AutoencodersΒΆ

  • What: Networks with two partsβ€”an encoder compresses input, a decoder reconstructs it.
  • Use Case: Dimensionality reduction, anomaly detection, denoising images.
  • Example: Compressing a 28x28 image into a 10-dimensional latent vector.

2. Transformer EncodersΒΆ

  • What: The encoder block of a Transformer model, which contextualizes inputs using self-attention.
  • Use Case: Language models like BERT use transformer encoders for tasks like classification and entity recognition.
  • Key Feature: Each word is represented based on its relation to all others in the sequence.

3. Encoder-Decoder ModelsΒΆ

  • What: Two-part models where the encoder summarizes the input, and the decoder generates output.
  • Use Case: Machine translation, summarization, image captioning.
  • Example: Translating an English sentence into French.

Summary TableΒΆ

Neural Network ArchitecturesΒΆ

Architecture Best For Key Feature
FNN Basic tasks Simple one-way data flow
CNN Images Spatial awareness via convolution
RNN Sequential data Temporal memory of past inputs
Transformer Language tasks Self-attention and parallel processing

Encoder TypesΒΆ

Encoder Type Role Typical Use Case
Autoencoder Compress & reconstruct input Denoising, anomaly detection
Transformer Encoder Encode sequences contextually Text classification, sentence embedding
Encoder-Decoder Input β†’ latent β†’ output Translation, summarization

Surface water mapping using Sentinel-2 imagery using U-NET CNNΒΆ

InΒ [35]:
### Download and install geoai
### conda install -c conda-forge geoai
### Motivated by tutorial by Qiusheng Wu@creator of geoai

Import libraries and dataΒΆ

InΒ [36]:
import geoai
import leafmap
InΒ [37]:
# check gpu status using pyTorch
import torch
if torch.cuda.is_available():
    print("GPU is available") 
else:
    print("GPU is not available")
GPU is available
InΒ [Β ]:
url = "https://zenodo.org/records/5205674/files/dset-s2.zip?download=1"
data_dir = geoai.download_file(url)
InΒ [39]:
images_dir = f"{data_dir}/dset-s2/tra_scene"
masks_dir = f"{data_dir}/dset-s2/tra_truth"
tiles_dir = f"{data_dir}/dset-s2/tiles"

Create training tilesΒΆ

InΒ [Β ]:
# parameters for training files
geoai.export_geotiff_tiles_batch(
    images_folder = images_dir, # folder with input geotiff images
    masks_folder = masks_dir, # corresponding mask files
    output_folder = tiles_dir, # folder where tiles will be saved
    tile_size = 512, # patch size in pixels for training (higher more spatial context, but more memory)
    stride = 128, # overlap between patches, more training samples, and smoother predictions (less edge effects)
    quiet=True # suppress output messages
)

# After tiling, training batches will have shape:
# [batch_size, channels, height, width]
# For example: [32, 6, 512, 512] 
# = 32 tiles per batch, 6 bands/channels (e.g., Sentinel-2), 512Γ—512 pixels per tile

Training semantic segmentation modelΒΆ

InΒ [Β ]:
geoai.train_segmentation_model(
    images_dir=f"{tiles_dir}/images", # folder with input geotiff images
    labels_dir=f"{tiles_dir}/masks", # corresponding mask files, each pixels = 0 or 1 (background or water)
    output_dir=f"{tiles_dir}/unet_models",
    architecture="unet", # model architecture, and parameters
    encoder_name="resnet34", # backbone encoder
    encoder_weights="imagenet",
    num_channels=6, # sentinel-2 has 6 bands
    num_classes=2,  # binary classification (water vs. non-water)
    batch_size=32, # 32 image/mask pairs per training batch
    num_epochs=20,  # run through the dataset 20 times (as a test)
    learning_rate=0.001, # standard learning rate for Adam or similar optimizers
    val_split=0.2, # 20% of data for validation
    verbose=True, # print training progress
)

Evaluate model performanceΒΆ

InΒ [Β ]:
geoai.plot_performance_metrics(
    history_path=f"{tiles_dir}/unet_models/training_history.pth",
    figsize=(15, 5),
    verbose=True,
)
No description has been provided for this image
Best IoU: 0.9390
Final IoU: 0.9344
Best Dice: 0.9622
Final Dice: 0.9592

Run inferenceΒΆ

InΒ [Β ]:
images_dir = f"{data_dir}/dset-s2/val_scene"
masks_dir = f"{data_dir}/dset-s2/val_truth"
predictions_dir = f"{data_dir}/dset-s2/predictions"
model_path = f"{tiles_dir}/unet_models/best_model.pth"
InΒ [Β ]:
geoai.semantic_segmentation_batch(
    input_dir=images_dir,
    output_dir=predictions_dir,
    model_path=model_path,
    architecture="unet",
    encoder_name="resnet34",
    num_channels=6,
    num_classes=2,
    window_size=512,
    overlap=256,
    batch_size=32,
    quiet=True,
)
InΒ [Β ]:
test_image_path = (
    f"{data_dir}/dset-s2/val_scene/S2A_L2A_20190318_N0211_R061_6Bands_S2.tif"
)
ground_truth_path = (
    f"{data_dir}/dset-s2/val_truth/S2A_L2A_20190318_N0211_R061_S2_Truth.tif"
)
prediction_path = (
    f"{data_dir}/dset-s2/predictions/S2A_L2A_20190318_N0211_R061_6Bands_S2_mask.tif"
)
save_path = f"{data_dir}/dset-s2/S2A_L2A_20190318_N0211_R061_6Bands_S2_comparison.png"

fig = geoai.plot_prediction_comparison(
    original_image=test_image_path,
    prediction_image=prediction_path,
    ground_truth_image=ground_truth_path,
    titles=["Original", "Prediction", "Ground Truth"],
    figsize=(15, 5),
    save_path=save_path,
    show_plot=True,
    indexes=[5, 4, 3],
    divider=5000,
)
WARNING:rasterio._env:CPLE_AppDefined in S2A_L2A_20190318_N0211_R061_6Bands_S2.tif: TIFFReadDirectory:Sum of Photometric type-related color channels and ExtraSamples doesn't match SamplesPerPixel. Defining non-color channels as ExtraSamples.
WARNING:rasterio._env:CPLE_AppDefined in TIFFReadDirectory:Sum of Photometric type-related color channels and ExtraSamples doesn't match SamplesPerPixel. Defining non-color channels as ExtraSamples.
Plot saved to: dset-s2/dset-s2/S2A_L2A_20190318_N0211_R061_6Bands_S2_comparison.png
No description has been provided for this image

Apply the pre-trained UNet model to other regionΒΆ

InΒ [Β ]:
url = "https://earth-search.aws.element84.com/v1/"
collection = "sentinel-2-l2a"
time_range = "2025-01-01/2025-07-20"
bbox = [128.6735, -16.2466, 128.9577, -16.0962]
InΒ [Β ]:
search = leafmap.stac_search(
    url=url,
    max_items=10,
    collections=[collection],
    bbox=bbox,
    datetime=time_range,
    query={"eo:cloud_cover": {"lt": 10}},
    sortby=[{"field": "properties.eo:cloud_cover", "direction": "asc"}],
    get_collection=True,
)
search
InΒ [8]:
bands = ["blue", "green", "red", "nir", "swir16", "swir22"]
assets = list(search.values())[0]
links = [assets[band] for band in bands]
for link in links:
    print(link)
https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/52/L/DH/2025/6/S2B_52LDH_20250607_0_L2A/B02.tif
https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/52/L/DH/2025/6/S2B_52LDH_20250607_0_L2A/B03.tif
https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/52/L/DH/2025/6/S2B_52LDH_20250607_0_L2A/B04.tif
https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/52/L/DH/2025/6/S2B_52LDH_20250607_0_L2A/B08.tif
https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/52/L/DH/2025/6/S2B_52LDH_20250607_0_L2A/B11.tif
https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/52/L/DH/2025/6/S2B_52LDH_20250607_0_L2A/B12.tif
InΒ [Β ]:
out_dir = "s2"
leafmap.download_files(links, out_dir)
InΒ [Β ]:
s2_path = "s2.tif"
geoai.stack_bands(input_files=out_dir, output_file=s2_path)
InΒ [16]:
s2_mask = "s2_mask.tif"
model_path = f"{tiles_dir}/unet_models/best_model.pth"
InΒ [Β ]:
geoai.semantic_segmentation(
    input_path=s2_path, # path to the input Sentinel-2 image
    output_path=s2_mask, # path where the mask will be saved
    model_path=model_path, # path to the trained model
    architecture="unet", # model architecture
    encoder_name="resnet34", # pre-trained encoder
    num_channels=6,# Sentinel-2 has 6 bands
    num_classes=2, # binary classification (water vs. non-water)
    window_size=512, # tile size for sliding window
    overlap=256, # overlap between tiles for smooth edge artifacts
    batch_size=32, # number of tiles to process in parallel
)

Visualise the resultsΒΆ

InΒ [Β ]:
fig = geoai.plot_prediction_comparison(
    original_image=s2_path,         # sentinel-2 RGB/NIR image
    prediction_image=s2_mask,       # predicted mask from segmentation
    ground_truth_image=None,        # no ground truth
    titles=["Original", "Prediction"],
    figsize=(12, 5),
    show_plot=True,
    indexes=[5, 4, 3],              # B6-B5-B4 or B4-B3-B2 RGB bands, adjust as needed
    divider=10000                   # for Sentinel-2, divides 0–10000 reflectance values to [0–1]
)
No description has been provided for this image

EndΒΆ