Sentinel image segmentation using CNN (UNet encoder)ΒΆ
Overview of Neural Network Architectures and EncodersΒΆ
Neural Network ArchitecturesΒΆ
There are four widely used types of neural network architectures, each suited to different types of data and tasks:
1. Feedforward Neural Networks (FNNs)ΒΆ
- Structure: Data flows in one directionβfrom input to output.
- Use Cases: Simple classification or regression (e.g., predicting house prices).
- Limitation: Cannot handle spatial or sequential relationships in the data.
2. Convolutional Neural Networks (CNNs)ΒΆ
- Structure: Uses convolutional layers with filters that scan across input (e.g., images).
- Use Cases: Image classification, object detection, medical imaging.
- Key Feature: Captures spatial patterns and hierarchies (edges, shapes).
3. Recurrent Neural Networks (RNNs)ΒΆ
- Structure: Designed for sequential data, with outputs feeding back into the network.
- Use Cases: Text generation, speech recognition, time series forecasting.
- Limitation: Struggles with long-term dependencies (addressed by LSTM/GRU variants).
4. TransformersΒΆ
- Structure: Uses self-attention to process all elements of a sequence in parallel.
- Use Cases: Natural Language Processing (e.g., ChatGPT, translation, summarization).
- Key Advantage: Handles long-range dependencies and is highly parallelizable.
Encoders in Neural NetworksΒΆ
An encoder is a component of a neural network that compresses input data (like text or images) into a smaller, informative representation called a feature representation or embedding.
Types of EncodersΒΆ
1. AutoencodersΒΆ
- What: Networks with two partsβan encoder compresses input, a decoder reconstructs it.
- Use Case: Dimensionality reduction, anomaly detection, denoising images.
- Example: Compressing a 28x28 image into a 10-dimensional latent vector.
2. Transformer EncodersΒΆ
- What: The encoder block of a Transformer model, which contextualizes inputs using self-attention.
- Use Case: Language models like BERT use transformer encoders for tasks like classification and entity recognition.
- Key Feature: Each word is represented based on its relation to all others in the sequence.
3. Encoder-Decoder ModelsΒΆ
- What: Two-part models where the encoder summarizes the input, and the decoder generates output.
- Use Case: Machine translation, summarization, image captioning.
- Example: Translating an English sentence into French.
Summary TableΒΆ
Neural Network ArchitecturesΒΆ
Architecture | Best For | Key Feature |
---|---|---|
FNN | Basic tasks | Simple one-way data flow |
CNN | Images | Spatial awareness via convolution |
RNN | Sequential data | Temporal memory of past inputs |
Transformer | Language tasks | Self-attention and parallel processing |
Encoder TypesΒΆ
Encoder Type | Role | Typical Use Case |
---|---|---|
Autoencoder | Compress & reconstruct input | Denoising, anomaly detection |
Transformer Encoder | Encode sequences contextually | Text classification, sentence embedding |
Encoder-Decoder | Input β latent β output | Translation, summarization |
Surface water mapping using Sentinel-2 imagery using U-NET CNNΒΆ
InΒ [35]:
### Download and install geoai
### conda install -c conda-forge geoai
### Motivated by tutorial by Qiusheng Wu@creator of geoai
Import libraries and dataΒΆ
InΒ [36]:
import geoai
import leafmap
InΒ [37]:
# check gpu status using pyTorch
import torch
if torch.cuda.is_available():
print("GPU is available")
else:
print("GPU is not available")
GPU is available
InΒ [Β ]:
url = "https://zenodo.org/records/5205674/files/dset-s2.zip?download=1"
data_dir = geoai.download_file(url)
InΒ [39]:
images_dir = f"{data_dir}/dset-s2/tra_scene"
masks_dir = f"{data_dir}/dset-s2/tra_truth"
tiles_dir = f"{data_dir}/dset-s2/tiles"
Create training tilesΒΆ
InΒ [Β ]:
# parameters for training files
geoai.export_geotiff_tiles_batch(
images_folder = images_dir, # folder with input geotiff images
masks_folder = masks_dir, # corresponding mask files
output_folder = tiles_dir, # folder where tiles will be saved
tile_size = 512, # patch size in pixels for training (higher more spatial context, but more memory)
stride = 128, # overlap between patches, more training samples, and smoother predictions (less edge effects)
quiet=True # suppress output messages
)
# After tiling, training batches will have shape:
# [batch_size, channels, height, width]
# For example: [32, 6, 512, 512]
# = 32 tiles per batch, 6 bands/channels (e.g., Sentinel-2), 512Γ512 pixels per tile
Training semantic segmentation modelΒΆ
InΒ [Β ]:
geoai.train_segmentation_model(
images_dir=f"{tiles_dir}/images", # folder with input geotiff images
labels_dir=f"{tiles_dir}/masks", # corresponding mask files, each pixels = 0 or 1 (background or water)
output_dir=f"{tiles_dir}/unet_models",
architecture="unet", # model architecture, and parameters
encoder_name="resnet34", # backbone encoder
encoder_weights="imagenet",
num_channels=6, # sentinel-2 has 6 bands
num_classes=2, # binary classification (water vs. non-water)
batch_size=32, # 32 image/mask pairs per training batch
num_epochs=20, # run through the dataset 20 times (as a test)
learning_rate=0.001, # standard learning rate for Adam or similar optimizers
val_split=0.2, # 20% of data for validation
verbose=True, # print training progress
)
Evaluate model performanceΒΆ
InΒ [Β ]:
geoai.plot_performance_metrics(
history_path=f"{tiles_dir}/unet_models/training_history.pth",
figsize=(15, 5),
verbose=True,
)
Best IoU: 0.9390 Final IoU: 0.9344 Best Dice: 0.9622 Final Dice: 0.9592
Run inferenceΒΆ
InΒ [Β ]:
images_dir = f"{data_dir}/dset-s2/val_scene"
masks_dir = f"{data_dir}/dset-s2/val_truth"
predictions_dir = f"{data_dir}/dset-s2/predictions"
model_path = f"{tiles_dir}/unet_models/best_model.pth"
InΒ [Β ]:
geoai.semantic_segmentation_batch(
input_dir=images_dir,
output_dir=predictions_dir,
model_path=model_path,
architecture="unet",
encoder_name="resnet34",
num_channels=6,
num_classes=2,
window_size=512,
overlap=256,
batch_size=32,
quiet=True,
)
InΒ [Β ]:
test_image_path = (
f"{data_dir}/dset-s2/val_scene/S2A_L2A_20190318_N0211_R061_6Bands_S2.tif"
)
ground_truth_path = (
f"{data_dir}/dset-s2/val_truth/S2A_L2A_20190318_N0211_R061_S2_Truth.tif"
)
prediction_path = (
f"{data_dir}/dset-s2/predictions/S2A_L2A_20190318_N0211_R061_6Bands_S2_mask.tif"
)
save_path = f"{data_dir}/dset-s2/S2A_L2A_20190318_N0211_R061_6Bands_S2_comparison.png"
fig = geoai.plot_prediction_comparison(
original_image=test_image_path,
prediction_image=prediction_path,
ground_truth_image=ground_truth_path,
titles=["Original", "Prediction", "Ground Truth"],
figsize=(15, 5),
save_path=save_path,
show_plot=True,
indexes=[5, 4, 3],
divider=5000,
)
WARNING:rasterio._env:CPLE_AppDefined in S2A_L2A_20190318_N0211_R061_6Bands_S2.tif: TIFFReadDirectory:Sum of Photometric type-related color channels and ExtraSamples doesn't match SamplesPerPixel. Defining non-color channels as ExtraSamples. WARNING:rasterio._env:CPLE_AppDefined in TIFFReadDirectory:Sum of Photometric type-related color channels and ExtraSamples doesn't match SamplesPerPixel. Defining non-color channels as ExtraSamples.
Plot saved to: dset-s2/dset-s2/S2A_L2A_20190318_N0211_R061_6Bands_S2_comparison.png
Apply the pre-trained UNet model to other regionΒΆ
InΒ [Β ]:
url = "https://earth-search.aws.element84.com/v1/"
collection = "sentinel-2-l2a"
time_range = "2025-01-01/2025-07-20"
bbox = [128.6735, -16.2466, 128.9577, -16.0962]
InΒ [Β ]:
search = leafmap.stac_search(
url=url,
max_items=10,
collections=[collection],
bbox=bbox,
datetime=time_range,
query={"eo:cloud_cover": {"lt": 10}},
sortby=[{"field": "properties.eo:cloud_cover", "direction": "asc"}],
get_collection=True,
)
search
InΒ [8]:
bands = ["blue", "green", "red", "nir", "swir16", "swir22"]
assets = list(search.values())[0]
links = [assets[band] for band in bands]
for link in links:
print(link)
https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/52/L/DH/2025/6/S2B_52LDH_20250607_0_L2A/B02.tif https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/52/L/DH/2025/6/S2B_52LDH_20250607_0_L2A/B03.tif https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/52/L/DH/2025/6/S2B_52LDH_20250607_0_L2A/B04.tif https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/52/L/DH/2025/6/S2B_52LDH_20250607_0_L2A/B08.tif https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/52/L/DH/2025/6/S2B_52LDH_20250607_0_L2A/B11.tif https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/52/L/DH/2025/6/S2B_52LDH_20250607_0_L2A/B12.tif
InΒ [Β ]:
out_dir = "s2"
leafmap.download_files(links, out_dir)
InΒ [Β ]:
s2_path = "s2.tif"
geoai.stack_bands(input_files=out_dir, output_file=s2_path)
InΒ [16]:
s2_mask = "s2_mask.tif"
model_path = f"{tiles_dir}/unet_models/best_model.pth"
InΒ [Β ]:
geoai.semantic_segmentation(
input_path=s2_path, # path to the input Sentinel-2 image
output_path=s2_mask, # path where the mask will be saved
model_path=model_path, # path to the trained model
architecture="unet", # model architecture
encoder_name="resnet34", # pre-trained encoder
num_channels=6,# Sentinel-2 has 6 bands
num_classes=2, # binary classification (water vs. non-water)
window_size=512, # tile size for sliding window
overlap=256, # overlap between tiles for smooth edge artifacts
batch_size=32, # number of tiles to process in parallel
)
Visualise the resultsΒΆ
InΒ [Β ]:
fig = geoai.plot_prediction_comparison(
original_image=s2_path, # sentinel-2 RGB/NIR image
prediction_image=s2_mask, # predicted mask from segmentation
ground_truth_image=None, # no ground truth
titles=["Original", "Prediction"],
figsize=(12, 5),
show_plot=True,
indexes=[5, 4, 3], # B6-B5-B4 or B4-B3-B2 RGB bands, adjust as needed
divider=10000 # for Sentinel-2, divides 0β10000 reflectance values to [0β1]
)