DataLoader train
MARS-S2L training dataset walkthrough 
- Last Modified: 26-04-2026
- Author: Gonzalo Mateo-Garcia
Overview
This notebook demonstrates how to inspect the training split of the MARS-S2L dataset directly from the public Hugging Face repository using the marss2l package.
Specifically, it shows how to:
- Load the image and plume metadata published with MARS-S2L.
- Filter the dataset to the
train_2023split used for model development. - Build a
DatasetPlumesobject in analysis mode. - Plot representative training samples, including simulated plumes used during training.
Important
- This notebook is intended for dataset exploration and training-data inspection, not for operational plume validation.
- The MARS-S2L database, trained models and tutorials in this package are released under a Creative Commons non-commercial share-alike licence
Install marss2l package
Install the published package before running the notebook:
pip install marss2l
This notebook reads the dataset metadata and imagery through the marss2l loaders, so the package must be available in your Python environment before executing any code cells.
import matplotlib
import os
import logging
from marss2l.utils import setup_stream_logger, fs_from_path
logger = logging.getLogger(__name__)
setup_stream_logger(logger, level=logging.DEBUG)
%%time
from marss2l import dataframe_image_plumes
from marss2l.huggingface import CSV_PATH_DEFAULT_HF, CSV_PLUME_PATH_DEFAULT_HF
# path_images = "../data/validated_images_all.csv"
# path_prepend_data = "../data/"
# path_plumes = "../data/validated_images_plumes_fixed.csv"
# path_sources = "../data/loc_sources.csv"
# Load images and plumes from HuggingFace
path_images = CSV_PATH_DEFAULT_HF
path_plumes = CSV_PLUME_PATH_DEFAULT_HF
fs = fs_from_path(path_images)
dataframe_images = dataframe_image_plumes.read_csv_images(path_images, fs=fs,path_prepend_data=None)
dataframe_plumes = dataframe_image_plumes.read_csv_plumes(path_plumes, fs=fs)
dataframe_sources = None
# dataframe_sources = dataframe_image_plumes.read_csv_locs_sources(path_sources, fs=fs_local)
%%time
split = "train_2023"
dataframe_image_train, dataframe_plumes_train, dataframe_sources_train = dataframe_image_plumes.load_dataframe_split(
dataframe_or_csv_path=dataframe_images,
dataframe_or_csv_path_plumes=dataframe_plumes,
dataframe_or_csv_path_sources=dataframe_sources,
split=split,
fs=fs,
logger=logger,
all_locs=None,
load_plumes=True
)
from marss2l import loaders
# from importlib import reload
# reload(loaders)
from marss2l import loss
# reload(loss)
import json
import os
mode = "train"
dataset = loaders.DatasetPlumes(mode=mode,
device="cpu",
image_dataframe=dataframe_image_train,
plume_dataframe=dataframe_plumes_train,
sources_dataframe=dataframe_sources_train,
logger=logger,
film_dict_mapping=None,
film_train_zero_id=None,
analysis_mode=True,
rotate_data_augmentation=True,
fs=fs)
dataset.image_dataframe.shape
import matplotlib.pyplot as plt
from marss2l.loss import get_snr
import time
from marss2l.loss import DEFAULT_POS_WEIGHT
total = 5
for _i in range(total):
start = time.time()
sample = dataset[_i]
time_sampling = time.time()-start
fig, axs = dataset.plot_item(sample, text_prepend=f"{_i+1}/{total}")
plt.show(fig)
plt.close(fig)
time_plotting = time.time() - start - time_sampling
if sample["isplume"].item() == 1:
snr = get_snr(sample["ch4"], sample["y_target"], keepdim=False).item()
snr_text = f"{snr:.4f}"
else:
snr_text = "NA"
logger.info(f"SNR: {snr_text} Time sampling: {time_sampling:.3f}s time plotting: {time_plotting:.3f}s")
Only sample simulated images
total = 5
for _i in range(total):
start = time.time()
item_no_plume = dataset.sample_no_plume_image(None)
sample = dataset.simulate_plume(item_no_plume)
time_sampling = time.time()-start
fig, axs = dataset.plot_item(sample, text_prepend=f"{_i+1}/{total}")
plt.show(fig)
plt.close(fig)
time_plotting = time.time() - start - time_sampling
if sample["isplume"].item() == 1:
snr = get_snr(sample["ch4"], sample["y_target"], keepdim=False).item()
snr_text = f"{snr:.4f}"
else:
snr_text = "NA"
logger.info(f"SNR: {snr_text} Time sampling: {time_sampling:.3f}s time plotting: {time_plotting:.3f}s")
Licence
The marss2l package is published under a GNU Lesser GPL v3 licence
The MARS-S2L database and all pre-trained models are released under a Creative Commons non-commercial share-alike licence. For using the models and data in comercial pipelines written consent by UNEP IMEO must be provided.
marss2l tutorials and notebooks are released under a Creative Commons non-commercial share-alike licence.
If you find this work useful please cite:
@article{allen_2025,
title = {Artificial intelligence for methane detection: from continuous monitoring to verified mitigation},
author = {Allen, Anna and Mateo-Garcia, Gonzalo and Irakulis-Loitxate, Itziar and Martin, Manuel Montesino-San and Watine, Marc and Requeima, James and Gorroño, Javier and Randles, Cynthia and Mokalled, Tharwat and Guanter, Luis and Turner, Richard E. and Cifarelli, Claudio and Caltagirone, Manfredi},
url = {http://arxiv.org/abs/2511.21777},
doi = {10.48550/arXiv.2511.21777},
month = nov,
year = {2025}
}