There is a need to help farmers make decisions to maximize crop yields. Many studies have emerged in recent years using deep learning on remotely sensed images to detect plant diseases, which can be caused by multiple factors such as environmental conditions, genetics or pathogens. This problem can be considered as an anomaly detection task. However, these approaches are often limited by the availability of annotated data or prior knowledge of the existence of an anomaly. In many cases, it is not possible to obtain this information. In this work, we propose an approach that can detect plant anomalies without prior knowledge of their existence, thus overcoming these limitations. To this end, we train a model on an auxiliary prediction task using a dataset composed of samples of normal and abnormal plants. Our proposed method studies the distribution of heatmaps retrieved from an explainability model. Based on the assumptions that the model trained on the auxiliary task is able to extract important plant characteristics, we propose to study how closely the heatmap of a new observation follows the heatmap distribution of a normal dataset. Through the proposed a contrario approach, we derive a score indicating potential anomalies.
Experiments show that our approach outperforms reference approaches such as f-AnoGAN and OCSVM on the GrowliFlower and PlantDoc datasets and has competitive performances on the PlantVillage dataset, while not requiring the prior knowledge on the existence of anomalies.
@article{leygonie2024can,title={Can we detect plant diseases without prior knowledge of their existence?},author={Leygonie, Rebecca and Lobry, Sylvain and Wendling, Laurent},journal={International Journal of Applied Earth Observation and Geoinformation},volume={134},year={2024},publisher={Elsevier},}
Domain Adaptation for Mapping LCZs in Sub-Saharan Africa with Remote Sensing: A Comprehensive Approach to Health Data Analysis
Basile Rousse, Sylvain Lobry, Géraldine Duthé, and
2 more authors
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024
Environment and population are closely linked, but their interactions remain challenging to assess. To fill this gap, modeling the environment at a fine resolution brings a significant value, if combined with population-based studies. This is particularly challenging in regions where the availability of both population and environmental data is limited. In low- and middle-income countries, many demographic and health data are from nationally representative household surveys which now provide approximate geolocations of the sampled households. In parallel, freely available remote sensing data, due to their high spatial and temporal resolution, make it possible to capture the local environment at any time. This study aims to correlate standard demographic and health information with a highresolution environment characterization derived from satellite data, encompassing both rural and urban areas in sub-Saharan Africa. We use the Malaria Indicator Survey (MIS) conducted in 2017-2018 in Burkina Faso. We first present a deep semisupervised domain adaptation strategy based on the inter-tropical climatic characteristics of the country for precisely mapping Local Climate Zones (LCZs). This strategy models seasonal variations through contrastive learning to extract useful information for the mapping process. We then use this high-resolution LCZ map to characterize, in four groups, the immediate environment of the sampled households. We find a significant association between these local environments and malaria among households’ children. Going beyond the traditional dichotomous urban/rural characterization, our results provide interesting insights for public health. This innovative method offers new avenues for exploring population and environment interactions, especially in the growing climate change concern.
@article{rousse2024domain,title={Domain Adaptation for Mapping LCZs in Sub-Saharan Africa with Remote Sensing: A Comprehensive Approach to Health Data Analysis},author={Rousse, Basile and Lobry, Sylvain and Duth{\'e}, G{\'e}raldine and Golaz, Val{\'e}rie and Wendling, Laurent},journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},year={2024},publisher={IEEE},doi={10.1109/JSTARS.2024.3421284},project={DEMO}}
2021
Wasserstein Adversarial Regularization for learning with label noise
Kilian Fatras, Bharath Bhushan Damodaran, Sylvain Lobry, and
3 more authors
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021
Noisy labels often occur in vision datasets, especially when they are obtained from crowdsourcing or Web scraping. We propose a new regularization method, which enables learning robust classifiers in presence of noisy data. To achieve this goal, we propose a new adversarial regularization scheme based on the Wasserstein distance. Using this distance allows taking into account specific relations between classes by leveraging the geometric properties of the labels space. Our Wasserstein Adversarial Regularization (WAR) encodes a selective regularization, which promotes smoothness of the classifier between some classes, while preserving sufficient complexity of the decision boundary between others. We first discuss how and why adversarial regularization can be used in the context of noise and then show the effectiveness of our method on five datasets corrupted with noisy labels: in both benchmarks and real datasets, WAR outperforms the state-of-the-art competitors.
@article{fatras2021wasserstein,title={Wasserstein Adversarial Regularization for learning with label noise},author={Fatras, Kilian and Damodaran, Bharath Bhushan and Lobry, Sylvain and Flamary, Remi and Tuia, Devis and Courty, Nicolas},journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},year={2021},publisher={IEEE},doi={https://doi.org/10.1109/TPAMI.2021.3094662},}
Shrub decline and expansion of wetland vegetation revealed by very high resolution land cover change detection in the Siberian lowland tundra
Rúna Í. Magnússon, Juul Limpens, David Kleijn, and
4 more authors
Vegetation change, permafrost degradation and their interactions affect greenhouse gas fluxes, hydrology and surface energy balance in Arctic ecosystems. The Arctic shows an overall “greening” trend (i.e. increased plant biomass and productivity) attributed to expansion of shrub vegetation. However, Arctic shrub dynamics show strong spatial variability and locally “browning” may be observed. Mechanistic understanding of greening and browning trends is necessary to accurately assess the response of Arctic vegetation to a changing climate. In this context, the Siberian Arctic is an understudied region. Between 2010 and 2019, increased browning (as derived from the MODIS Enhanced Vegetation Index) was observed in the Eastern Siberian Indigirka Lowlands. To support interpretation of local greening and browning dynamics, we quantified changes in land cover and transition probabilities in a representative tundra site in the Indigirka Lowlands using a timeseries of three very high resolution (VHR) (0.5 m) satellite images acquired between 2010 and 2019. Using spatiotemporal Potts model regularization, we substantially reduced classification errors related to optical and phenological inconsistencies in the image material. VHR images show that recent browning was associated with declines in shrub, lichen and tussock vegetation and increases in open water, sedge and especially Sphagnum vegetation. Observed formation and expansion of small open water bodies in shrub dominated vegetation suggests abrupt thaw of ice-rich permafrost. Transitions from open water to sedge and Sphagnum, indicate aquatic succession upon disturbance. The overall shift towards open water and wetland vegetation suggests a wetting trend, likely associated with permafrost degradation. Landsat data confirmed widespread expansion of surface water throughout the Indigirka Lowlands. However, the increase in the area of small water bodies observed in VHR data was not visible in Landsat-derived surface water data, which suggests that VHR data is essential for early detection of small-scale disturbances and associated vegetation change in permafrost ecosystems.
@article{MAGNUSSON2021146877,title={Shrub decline and expansion of wetland vegetation revealed by very high resolution land cover change detection in the Siberian lowland tundra},journal={Science of The Total Environment},volume={782},pages={146877},year={2021},issn={0048-9697},doi={https://doi.org/10.1016/j.scitotenv.2021.146877},url={https://www.sciencedirect.com/science/article/pii/S0048969721019471},author={Magnússon, Rúna Í. and Limpens, Juul and Kleijn, David and {van Huissteden}, Ko and Maximov, Trofim C. and Lobry, Sylvain and Heijmans, Monique M.P.D.},keywords={Siberian lowland tundra, Arctic greening, Permafrost, Land cover change, Potts model, Vegetation succession},}
2020
A deep learning framework for matching of SAR and optical imagery
Lloyd Haydn Hughes, Diego Marcos, Sylvain Lobry, and
2 more authors
ISPRS Journal of Photogrammetry and Remote Sensing, 2020
SAR and optical imagery provide highly complementary information about observed scenes. A combined use of these two modalities is thus desirable in many data fusion scenarios. However, any data fusion task requires measurements to be accurately aligned. While for both data sources images are usually provided in a georeferenced manner, the geo-localization of optical images is often inaccurate due to propagation of angular measurement errors. Many methods for the matching of homologous image regions exist for both SAR and optical imagery, however, these methods are unsuitable for SAR-optical image matching due to significant geometric and radiometric differences between the two modalities. In this paper, we present a three-step framework for sparse image matching of SAR and optical imagery, whereby each step is encoded by a deep neural network. We first predict regions in each image which are deemed most suitable for matching. A correspondence heatmap is then generated through a multi-scale, feature-space cross-correlation operator. Finally, outliers are removed by classifying the correspondence surface as a positive or negative match. Our experiments show that the proposed approach provides a substantial improvement over previous methods for SAR-optical image matching and can be used to register even large-scale scenes. This opens up the possibility of using both types of data jointly, for example for the improvement of the geo-localization of optical satellite imagery or multi-sensor stereogrammetry.
@article{HUGHES2020166,title={A deep learning framework for matching of SAR and optical imagery},journal={ISPRS Journal of Photogrammetry and Remote Sensing},volume={169},pages={166-179},year={2020},issn={0924-2716},doi={https://doi.org/10.1016/j.isprsjprs.2020.09.012},url={https://www.sciencedirect.com/science/article/pii/S0924271620302598},author={Hughes, Lloyd Haydn and Marcos, Diego and Lobry, Sylvain and Tuia, Devis and Schmitt, Michael},keywords={Multi-modal image matching, Image registration, Feature detection, Deep learning, Synthetic Aperture Radar (SAR), Optical imagery},}
RSVQA: Visual Question Answering for Remote Sensing Data
Sylvain Lobry, Diego Marcos, Jesse Murray, and
1 more author
IEEE Transactions on Geoscience and Remote Sensing, 2020
This article introduces the task of visual question answering for remote sensing data (RSVQA). Remote sensing images contain a wealth of information, which can be useful for a wide range of tasks, including land cover classification, object counting, or detection. However, most of the available methodologies are task-specific, thus inhibiting generic and easy access to the information contained in remote sensing data. As a consequence, accurate remote sensing product generation still requires expert knowledge. With RSVQA, we propose a system to extract information from remote sensing data that is accessible to every user: we use questions formulated in natural language and use them to interact with the images. With the system, images can be queried to obtain high-level information specific to the image content or relational dependencies between objects visible in the images. Using an automatic method introduced in this article, we built two data sets (using low- and high-resolution data) of image/question/answer triplets. The information required to build the questions and answers is queried from OpenStreetMap (OSM). The data sets can be used to train (when using supervised methods) and evaluate models to solve the RSVQA task. We report the results obtained by applying a model based on convolutional neural networks (CNNs) for the visual part and a recurrent neural network (RNN) for the natural language part of this task. The model is trained on the two data sets, yielding promising results in both cases.
@article{9088993,author={Lobry, Sylvain and Marcos, Diego and Murray, Jesse and Tuia, Devis},journal={IEEE Transactions on Geoscience and Remote Sensing},title={RSVQA: Visual Question Answering for Remote Sensing Data},year={2020},volume={58},number={12},pages={8555-8566},doi={10.1109/TGRS.2020.2988782},}
2019
Water Detection in SWOT HR Images Based on Multiple Markov Random Fields
Sylvain Lobry, Loïc Denis, Brent Williams, and
2 more authors
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2019
One of the main objectives of the surface water and ocean topography (SWOT) mission, scheduled for launch in 2021, is to measure inland water levels using synthetic aperture radar (SAR) interferometry. A key step toward this objective is to precisely detect water areas. In this article, we present a method to detect water in SWOT images. Water is detected based on the relative brightness of the water and nonwater surfaces. Water brightness varies throughout the swath because of system parameters (i.e., the antenna pattern), as well as the phenomenology such as wind speed and surface roughness. To handle the effects of brightness variability, we propose to model the problem with one Markov random field (MRF) on the binary classification map, and two other MRFs to regularize the estimation of the class parameters (i.e., the land and water background power images). Our experiments show that the proposed method is more robust to the expected variations in SWOT images than traditional approaches.
@article{8897698,author={Lobry, Sylvain and Denis, Loïc and Williams, Brent and Fjørtoft, Roger and Tupin, Florence},journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},title={Water Detection in SWOT HR Images Based on Multiple Markov Random Fields},year={2019},volume={12},number={11},pages={4315-4326},url={https://ieeexplore.ieee.org/document/8897698},doi={10.1109/JSTARS.2019.2948788},project={SWOT}}
Half a Percent of Labels is Enough: Efficient Animal Detection in UAV Imagery Using Deep CNNs and Active Learning
Benjamin Kellenberger, Diego Marcos, Sylvain Lobry, and
1 more author
IEEE Transactions on Geoscience and Remote Sensing, 2019
We present an Active Learning (AL) strategy for reusing a deep Convolutional Neural Network (CNN)-based object detector on a new data set. This is of particular interest for wildlife conservation: given a set of images acquired with an Unmanned Aerial Vehicle (UAV) and manually labeled ground truth, our goal is to train an animal detector that can be reused for repeated acquisitions, e.g., in follow-up years. Domain shifts between data sets typically prevent such a direct model application. We thus propose to bridge this gap using AL and introduce a new criterion called Transfer Sampling (TS). TS uses Optimal Transport (OT) to find corresponding regions between the source and the target data sets in the space of CNN activations. The CNN scores in the source data set are used to rank the samples according to their likelihood of being animals, and this ranking is transferred to the target data set. Unlike conventional AL criteria that exploit model uncertainty, TS focuses on very confident samples, thus allowing quick retrieval of true positives in the target data set, where positives are typically extremely rare and difficult to find by visual inspection. We extend TS with a new window cropping strategy that further accelerates sample retrieval. Our experiments show that with both strategies combined, less than half a percent of oracle-provided labels are enough to find almost 80% of the animals in challenging sets of UAV images, beating all baselines by a margin.
@article{8807383,author={Kellenberger, Benjamin and Marcos, Diego and Lobry, Sylvain and Tuia, Devis},journal={IEEE Transactions on Geoscience and Remote Sensing},title={Half a Percent of Labels is Enough: Efficient Animal Detection in UAV Imagery Using Deep CNNs and Active Learning},year={2019},volume={57},number={12},pages={9524-9533},url={https://ieeexplore.ieee.org/document/8807383},doi={10.1109/TGRS.2019.2927393},}
2018
Correcting rural building annotations in OpenStreetMap using convolutional neural networks
John E. Vargas-Muñoz, Sylvain Lobry, Alexandre X. Falcão, and
1 more author
ISPRS Journal of Photogrammetry and Remote Sensing, 2018
Rural building mapping is paramount to support demographic studies and plan actions in response to crisis that affect those areas. Rural building annotations exist in OpenStreetMap (OSM), but their quality and quantity are not sufficient for training models that can create accurate rural building maps. The problems with these annotations essentially fall into three categories: (i) most commonly, many annotations are geometrically misaligned with the updated imagery; (ii) some annotations do not correspond to buildings in the images (they are misannotations or the buildings have been destroyed); and (iii) some annotations are missing for buildings in the images (the buildings were never annotated or were built between subsequent image acquisitions). First, we propose a method based on Markov Random Field (MRF) to align the buildings with their annotations. The method maximizes the correlation between annotations and a building probability map while enforcing that nearby buildings have similar alignment vectors. Second, the annotations with no evidence in the building probability map are removed. Third, we present a method to detect non-annotated buildings with predefined shapes and add their annotation. The proposed methodology shows considerable improvement in accuracy of the OSM annotations for two regions of Tanzania and Zimbabwe, being more accurate than state-of-the-art baselines.
@article{VARGASMUNOZ2019283,title={Correcting rural building annotations in OpenStreetMap using convolutional neural networks},journal={ISPRS Journal of Photogrammetry and Remote Sensing},volume={147},pages={283-293},year={2018},issn={0924-2716},doi={https://doi.org/10.1016/j.isprsjprs.2018.11.010},url={https://www.sciencedirect.com/science/article/pii/S092427161830306X},author={Vargas-Muñoz, John E. and Lobry, Sylvain and Falcão, Alexandre X. and Tuia, Devis},keywords={Very high resolution mapping, Convolutional neural networks, Shape priors, OpenStreetMap, Volunteered geographical information, Update of vector maps},}
Fine-grained landuse characterization using ground-based pictures: a deep learning solution based on globally available data
Shivangi Srivastava, John E. Vargas Muñoz, Sylvain Lobry, and
1 more author
International Journal of Geographical Information Science, 2018
ABSTRACTWe study the problem of landuse characterization at the urban-object level using deep learning algorithms. Traditionally, this task is performed by surveys or manual photo interpretation, which are expensive and difficult to update regularly. We seek to characterize usages at the single object level and to differentiate classes such as educational institutes, hospitals and religious places by visual cues contained in side-view pictures from Google Street View (GSV). These pictures provide geo-referenced information not only about the material composition of the objects but also about their actual usage, which otherwise is difficult to capture using other classical sources of data such as aerial imagery. Since the GSV database is regularly updated, this allows to consequently update the landuse maps, at lower costs than those of authoritative surveys. Because every urban-object is imaged from a number of viewpoints with street-level pictures, we propose a deep-learning based architecture that accepts arbitrary number of GSV pictures to predict the fine-grained landuse classes at the object level. These classes are taken from OpenStreetMap. A quantitative evaluation of the area of Île-de-France, France shows that our model outperforms other deep learning-based methods, making it a suitable alternative to manual landuse characterization.
@article{Shivangi_LU,author={Srivastava, Shivangi and Muñoz, John E. Vargas and Lobry, Sylvain and Tuia, Devis},title={Fine-grained landuse characterization using ground-based pictures: a deep learning solution based on globally available data},journal={International Journal of Geographical Information Science},volume={34},number={6},pages={1117-1136},year={2018},publisher={Taylor & Francis},doi={10.1080/13658816.2018.1542698},url={https://doi.org/10.1080/13658816.2018.1542698},eprint={https://doi.org/10.1080/13658816.2018.1542698},}
2016
Multitemporal SAR Image Decomposition into Strong Scatterers, Background, and Speckle
Sylvain Lobry, Loïc Denis, and Florence Tupin
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2016
Speckle phenomenon in synthetic aperture radar (SAR) images makes their visual and automatic interpretation a difficult task. To reduce strong fluctuations due to speckle, total variation (TV) regularization has been proposed by several authors to smooth out noise without blurring edges. A specificity of SAR images is the presence of strong scatterers having a radiometry several orders of magnitude larger than their surrounding region. These scatterers, especially present in urban areas, limit the effectiveness of TV regularization as they break the assumption of an image made of regions of constant radiometry. To overcome this limitation, we propose in this paper an image decomposition approach. There exist numerous methods to decompose an image into several components, notably to separate textural and geometrical information. These decomposition models are generally recast as energy minimization problems involving a different penalty term for each of the components. In this framework, we propose an energy suitable for the decomposition of SAR images into speckle, a smooth background, and strong scatterers, and discuss its minimization using max-flow/min-cut algorithms. We make the connection between the minimization problem considered, involving the L0 pseudonorm, and the generalized likelihood ratio test used in detection theory. The proposed decomposition jointly performs the detection of strong scatterers and the estimation of the background radiometry. Given the increasing availability of time series of SAR images, we consider the decomposition of a whole time series. New change detection methods can be based on the temporal analysis of the components obtained from our decomposition.
National Journals
2017
Décomposition de séries temporelles d’images SAR pour la détection de changement
Sylvain Lobry, Loïc Denis, Weying Zhao, and
1 more author
@article{TS2017,title={Décomposition de séries temporelles d'images SAR pour la détection de changement},author={Lobry, Sylvain and Denis, Loïc and Zhao, Weying and Tupin, Florence},journal={Traitement du Signal},year={2017}}
International Conferences
2025
Visual Question Answering on Multiple Remote Sensing Image Modalities
Hichem Boussaid, Lucrezia Tosato, Flora Weissgerber, and
3 more authors
In EarthVision at IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR, 2025
The extraction of visual features is an essential step in Visual Question Answering (VQA). Building a good visual representation of the analyzed scene is indeed one of the essential keys for the system to be able to correctly understand the latter in order to answer complex questions. In many fields such as remote sensing, the visual feature extraction step could benefit significantly from leveraging different image modalities carrying complementary (spectral, spatial and contextual) information. In this work, we propose to add multiple image modalities to VQA in the particular context of remote sensing, leading to a novel task for the computer vision community. To this end, we introduce a new VQA dataset, named TAMMI Text and Multi-Modal Imagery) with diverse questions on scenes described by three different modalities (very high resolution RGB, multi-spectral imaging data and synthetic aperture radar). Thanks to an automated pipeline, this dataset can be easily extended according to experimental needs. We also propose the MM-RSVQA (Multi-modal Multi-resolution Remote Sensing Visual Question Answering) model, based on VisualBERT, a vision-language transformer, to effectively combine the multiple image modalities and text through a trainable fusion process. A preliminary experimental study shows promising results of our methodology on this challenging dataset, with an accuracy of 65.56% on the targeted VQA task. This pioneering work paves the way for the community to a new multi-modal multi-resolution VQA task that can be applied in other imaging domains (such as medical imaging) where multi-modality can enrich the visual representation of a scene. A subset of the dataset and code is available in supplementary material.
@inproceedings{EV2025,title={Visual Question Answering on Multiple Remote Sensing Image Modalities},author={Boussaid, Hichem and Tosato, Lucrezia and Weissgerber, Flora and Kurtz, Camille and Wendling, Laurent and Lobry, Sylvain},booktitle={EarthVision at IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR},year={2025},}
PAN-RSVQA: Vision Foundation Models as Pseudo-ANnotators for Remote Sensing Visual Question Answering
Christel Chappuis, Gencer Sümbül, Syrielle Montariol, and
2 more authors
While the quantity of Earth observation (EO) images is constantly increasing, the benefits that can be derived from these images are still limited by the required technical expertise to run information extraction pipelines. Using natural language to break this barrier, Remote Sensing Visual Question Answering (RSVQA) aims to make EO images usable by a wider, general public. Traditional RSVQA methods utilize a visual encoder to extract generic features from images, which are then fused with the features of the questions entered by users. Given their multi-task nature, Vision foundation models (VFMs) allow to go beyond such generic visual features, and can be seen as pseudo-annotators extracting diverse sets of features from a collection of inter-related tasks (objects detected, segmentation maps, scene descriptions etc.). In this work, we propose PAN-RSVQA, a new method combining a VFM and its pseudo-annotations with RSVQA by leveraging a transformer-based multi-modal encoder. These pseudo-annotations bring diverse, naturally interpretable visual cues, as they are aligned with how humans reason about images: therefore, PAN-RSVQA not only exploits large-scale training of VFMs but also enables accurate and interpretable RSVQA. Experiments on two datasets show results on par with the state-of-the-art while enabling enhanced interpretation of the model predictions, which we analyze via sample visual perturbations and ablations of the role of each pseudo-annotator. In addition, PAN-RSVQA is modular and easily extendable to new pseudo-annotators from other VFMs.
@inproceedings{MORSE2025,title={PAN-RSVQA: Vision Foundation Models as Pseudo-ANnotators for Remote Sensing Visual Question Answering},author={Chappuis, Christel and Sümbül, Gencer and Montariol, Syrielle and Lobry, Sylvain and Tuia, Devis},booktitle={Workshop MORSE at CVPR},year={2025},}
LLM-Driven Data Augmentation for Visual Question Answering
Hichem Boussaid, Nayoung Kwon, Camille Kurtz, and
2 more authors
Remote Sensing Visual Question Answering (RSVQA) is a task aiming at automatic answering questions related to overhead imagery. Many studies have been conducted in recent years, focusing on the methods and the data. However, a recurrent problem is the lack of generalization abilities and robustness to questions with similar semantics but different wording. This work focuses on the data part, specifically the questions. Our objective is to make RSVQA models more robust to various changes in questions, more generalizable (e.g. to unseen phrasing, synonyms) and less susceptible to bias in the data. To this end, we propose to leverage the abilities of Large Language Models (LLMs) in the field of natural language processing, to enrich a RSVQA dataset by generating new questions with the same meaning and semantics. To showcase the effectiveness of this process we compare and confront a baseline, relying on back translation, and the proposed LLM-based approach on an urban dataset (RSVQA-HR). Our experimental study, with quantitative evaluation performances, highlights that models trained with the proposed data augmentation scheme are indeed more robust to unseen questions.
@inproceedings{JURSE2025,title={LLM-Driven Data Augmentation for Visual Question Answering},author={Boussaid, Hichem and Kwon, Nayoung and Kurtz, Camille and Wendling, Laurent and Lobry, Sylvain},booktitle={Joint Urban Remote Sensing Event (JURSE)},year={2025},}
ProMM-RS: Exploring Probabilistic learning for Multi-Modal Remote Sensing Image Representations
Nicolas Houdré, Diego Marcos, Dino Ienco, and
3 more authors
Remote sensing imagery offers diverse modalities, such as synthetic aperture radar and multispectral data, which can bring rich, complementary and valuable information about observed scenes. This information is of paramount importance for downstream applications (e.g. land cover mapping, natural resources monitoring, human settlement characterization) that may benefit from such complementarity. Remote sensing imagery often suffers from a lack of labeled data which can hamper the learning of good representations via state-of-the-art supervised methods. Self-supervised learning has thus emerged as a promising paradigm for remote sensing feature extraction, enabling the extraction of meaningful features without reliance on labeled data. While existing multi-modal contrastive models effectively capture shared information between modalities, they often struggle to account for the inherent heterogeneity of multi-modal remote sensing data. This limitation prevents them from fully leveraging the complementarity of multi-modal remote sensing data. Probabilistic representation learning has emerged as a powerful approach to capture the inherent uncertainty and diversity in multi-modal relationships. In this paper we present ProMM-RS, a novel multi-modal self-supervised training framework incorporating a joint probabilistic embedding space to explicitly model the uncertainty of representations between different inputs and modalities. We evaluate our learned representations with a scene classification downstream task from Sentinel optical and radar images, effectively showing the potential of probabilistic embeddings as a way to measure the relevancy of each modality representation, especially under an obstructed dataset.
@inproceedings{WACV2025,title={ProMM-RS: Exploring Probabilistic learning for Multi-Modal Remote Sensing Image Representations},author={Houdré, Nicolas and Marcos, Diego and Ienco, Dino and Wendling, Laurent and Kurtz, Camille and Lobry, Sylvain},booktitle={Workshop GeoCV at WACV},year={2025},}
Evaluating Transformers Learning by Representing Attention Weights as a Graph
Rebecca Leygonie, Sylvain Lobry, and Laurent Wendling
In The 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP), 2025
@inproceedings{VISIGRAPP2025,title={Evaluating Transformers Learning by Representing Attention Weights as a Graph},author={Leygonie, Rebecca and Lobry, Sylvain and Wendling, Laurent},booktitle={The 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP)},year={2025},}
2024
Segmentation-guided Attention for Visual Question Answering from Remote Sensing Images
Lucrezia* Tosato, Hichem* Boussaid, Flora Weissgerber, and
3 more authors
In IEEE International Geoscience and Remote Sensing Symposium IGARSS, 2024
Visual Question Answering for Remote Sensing (RSVQA) is a task that aims at answering natural language questions about the content of a remote sensing image. The visual features extraction is therefore an essential step in a VQA pipeline. By incorporating attention mechanisms into this process, models gain the ability to focus selectively on salient regions of the image, prioritizing the most relevant visual information for a given question. In this work, we propose to embed an attention mechanism guided by segmentation into a RSVQA pipeline. We argue that segmentation plays a crucial role in guiding attention by providing a contextual understanding of the visual information, underlying specific objects or areas of interest. To evaluate this methodology, we provide a new VQA dataset that exploits very high-resolution RGB orthophotos annotated with 16 segmentation classes and question/answer pairs. Our study shows promising results of our new methodology, gaining almost 10% of overall accuracy compared to a classical method on the proposed dataset.
@inproceedings{Tosato2024IGARSS,title={Segmentation-guided Attention for Visual Question Answering from Remote Sensing Images},author={Tosato, Lucrezia* and Boussaid, Hichem* and Weissgerber, Flora and Kurtz, Camille and Wendling, Laurent and Lobry, Sylvain},booktitle={IEEE International Geoscience and Remote Sensing Symposium IGARSS},year={2024},}
Can SAR improve RSVQA performance?
Lucrezia Tosato, Sylvain Lobry, Flora Weissgerber, and
1 more author
In EUSAR 2024; 15th European Conference on Synthetic Aperture Radar, 2024
Remote sensing visual question answering (RSVQA) has been involved in several research in recent years, leading to an increase in new methods. RSVQA automatically extracts information from satellite images, so far only optical, and a question to automatically search for the answer in the image and provide it in a textual form. In our research, we study whether Synthetic Aperture Radar (SAR) images can be beneficial to this field. We divide our study into three phases which include classification methods and VQA. In the first one, we explore the classification results of SAR alone and investigate the best method to extract information from SAR data. Then, we study the combination of SAR and optical data. In the last phase, we investigate how SAR images and a combination of different modalities behave in RSVQA compared to a method only using optical images. We conclude that adding the SAR modality leads to improved performances, although further research on using SAR data to automatically answer questions is needed as well as more balanced datasets.
@inproceedings{Tosato2024EUSAR,title={Can SAR improve RSVQA performance?},author={Tosato, Lucrezia and Lobry, Sylvain and Weissgerber, Flora and Wendling, Laurent},booktitle={EUSAR 2024; 15th European Conference on Synthetic Aperture Radar},year={2024},}
2023
An a contrario approach for plant disease detection
Rebecca Leygonie, Sylvain Lobry, and Laurent Wendling
In Workshop on Machine Vision for Earth Observation at BMVC, 2023
Detecting plant diseases or abnormalities is not a trivial task, as they can be caused by multiple factors such as environmental conditions, genetics, pathogens, etc. Because there is a need to help farmers make decisions to maximize crop yields, many studies have emerged in recent years using deep learning on agricultural images to detect plant diseases, which can be considered as an anomaly detection task. However, these approaches are often limited by the availability of annotated data or prior knowledge of the existence of an anomaly. We propose an approach that can detect part of the anomalies without prior knowledge of their existence, thus overcoming some of these limitations. To this end, we train a model on an auxiliary prediction task (plants’ age regression). We then use an explicability model to retrieve heatmaps whose distributions are studied. For each new observation, we propose to study how closely its heatmap follows the desired distribution and we derive a score indicating potential anomalies. Experiments on the GrowliFlower dataset indicate how our proposed method can help potential end-user to automatically find anomalies.
@inproceedings{Leygonie2023MVEO,title={An a contrario approach for plant disease detection},author={Leygonie, Rebecca and Lobry, Sylvain and Wendling, Laurent},booktitle={Workshop on Machine Vision for Earth Observation at BMVC},year={2023},}
Multi-task prompt-RSVQA to explicitly count objects on aerial images
Christel Chappuis, Charlotte Sertic, Nicolas Santacroce, and
4 more authors
In Workshop on Machine Vision for Earth Observation at BMVC, 2023
Introduced to enable a wider use of Earth Observation images using natural language, Remote Sensing Visual Question Answering (RSVQA) remains a challenging task, in particular for questions related to counting. To address this specific challenge, we propose a modular Multi-task prompt-RSVQA model based on object detection and question answering modules. By creating a semantic bottleneck describing the image and providing a visual answer, our model allows users to assess the visual grounding of the answer and better interpret the prediction. A set of ablation studies are designed to consider the contributions of different modules and evaluation metrics are discussed for a finer-grained assessment. Experiments demonstrate competitive results against literature baselines and a zero-shot VQA model. In particular, our proposed model predicts answers for numerical Counting questions that are consistently closer in distance to the ground truth.
@inproceedings{Chappuis2023MVEO,title={Multi-task prompt-RSVQA to explicitly count objects on aerial images},author={Chappuis, Christel and Sertic, Charlotte and Santacroce, Nicolas and Castillo Navarro, Javiera and Lobry, Sylvain and Le Saux, Bertrand and Tuia, Devis},booktitle={Workshop on Machine Vision for Earth Observation at BMVC},year={2023},}
Transforming multidimensional data into images to overcome the curse of dimensionality
Rebecca Leygonie, Sylvain Lobry, Guillaume Vimont, and
1 more author
In IEEE International Conference on Image Processing ICIP, 2023
When dealing with high-dimensional multivariate time series classification problems, a well-known difficulty is the \textitcurse of dimensionality.
In this article, we propose an original approach of transposition of multidimensional data into images to tackle the task of classification. We propose a lightweight hybrid model that take this transposed data as an input. This model contains convolutional layers as a feature extractor followed by a recurrent neural network. We apply our method to a large dataset consisting of individual patient medical records. We show that our approach allows us to significantly reduce the size of a network and increase its performance by opting for a transformation of the input data.
@inproceedings{Leygonie2023ICIP,title={Transforming multidimensional data into images to overcome the curse of dimensionality},author={Leygonie, Rebecca and Lobry, Sylvain and Vimont, Guillaume and Wendling, Laurent},booktitle={IEEE International Conference on Image Processing ICIP},year={2023},}
Automatic simulation of SAR images: comparing a deep-learning based method to a hybrid method
Nathan Letheule, Flora Weissgerber, Sylvain Lobry, and
1 more author
In IEEE International Geoscience and Remote Sensing Symposium IGARSS, 2023
@inproceedings{Letheule2023IGARSS,title={Automatic simulation of SAR images: comparing a deep-learning based method to a hybrid method},author={Letheule, Nathan and Weissgerber, Flora and Lobry, Sylvain and Colin, Elise},booktitle={IEEE International Geoscience and Remote Sensing Symposium IGARSS},year={2023},}
Linking population data to high resolution maps: a case study in Burkina Faso
Basile Rousse, Sylvain Lobry, Géraldine Duthé, and
2 more authors
In Machine Learning for Remote Sensing at ICLR (oral presentation), 2023
Recent research in demography focuses on linking population data to environmental indicators. Satellite imagery can support such projects by providing data at a large scale and a high frequency. Moreover, population surveys often provide geolocations of households, yet sometimes with an offset, to guarantee data confidentiality. In such cases, the proper management of this incertitude is required, to accurately link environmental indicators such as land cover/land use maps or spectral indices to population data. In this paper, we introduce a method based on the random sampling of possible households geolocations around the coordinates provided. Then, we link a land cover map generated using semi-supervised deep learning and a Malaria Indicator Survey in Burkina Faso. After linking households to their close environment, we distinguish several types of environment conducive to high malaria rates, beyond the urban/rural dichotomy.
@inproceedings{Rousse2023Linking,title={Linking population data to high resolution maps: a case study in Burkina Faso},author={Rousse, Basile and Lobry, Sylvain and Duthé, Géraldine and Golaz, Valérie and Wendling, Laurent},booktitle={Machine Learning for Remote Sensing at ICLR (oral presentation)},year={2023},project={DEMO}}
Seasonal semi-supervised domain adaptation for linking population studies and Local Climate Zones
Basile Rousse, Sylvain Lobry, Géraldine Duthé, and
2 more authors
Environment and demographic dynamics are strongly linked. However, relevant data to study this interaction may be scarce especially in sub-Saharan Africa where it is not always possible to perform such studies with a high temporal frequency. Satellite imagery, when linked to demographic data, can be a significant asset to estimate missing data as it covers every country with both high spatial and temporal resolution. We aim to take advantage of satellite data to characterize the environment in inter-tropical areas. This environment is regulated by the changing of two seasons that are essential to consider. We introduce a semi-supervised domain adaptation strategy for neural networks based on seasonal changes. This strategy can be used to produce land cover maps in regions of the world where limited labeled datasets are available. We apply this method to produce environmental indicators and link them to malaria rates from the Malaria Indicator Survey of Burkina Faso. We show that malaria rates are correlated not only to urbanisation but also to the environmental characterisation of studied areas.
@inproceedings{Rousse2023Seasonal,title={Seasonal semi-supervised domain adaptation for linking population studies and Local Climate Zones},author={Rousse, Basile and Lobry, Sylvain and Duthé, Géraldine and Golaz, Valérie and Wendling, Laurent},booktitle={Joint Urban Remote Sensing Event (JURSE)},year={2023},project={DEMO}}
2022
Prompt–RSVQA: Prompting visual context to a language model for Remote Sensing Visual Question Answering
Chritel Chappuis, Valerie Zermatten, Sylvain Lobry, and
2 more authors
In EarthVision at IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR, 2022
Remote sensing visual question answering (RSVQA) was recently proposed with the aim of interfacing natural language and vision to ease the access of information contained in Earth Observation data for a wide audience, which is granted by simple questions in natural language. The traditional vision/language interface is an embedding obtained by fusing features from two deep models, one processing the image and another the question. Despite the success of early VQA models, it remains difficult to control the adequacy of the visual information extracted by its deep model, which should act as a context regularizing the work of the language model. We propose to extract this context information with a visual model, convert it to text and inject it, i.e. prompt it, into a language model. The language model is therefore responsible to process the question with the visual context, and extract features, which are useful to find the answer. We study the effect of prompting with respect to a black-box visual extractor and discuss the importance of training a visual model producing accurate context.
@inproceedings{Chappuis2022Prompting,title={Prompt–RSVQA: Prompting visual context to a language model for Remote Sensing Visual Question Answering},author={Chappuis, Chritel and Zermatten, Valerie and Lobry, Sylvain and Le Saux, Bertrand and Tuia, Devis},booktitle={EarthVision at IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR},year={2022},}
Embedding Spatial Relations in Visual Question Answering for Remote Sensing
Maxime Faure, Sylvain Lobry, Camille Kurtz, and
1 more author
In 26TH International Conference on Pattern Recognition ICPR, 2022
Remote sensing images carry a wealth of information that is not easily accessible to end-users as it requires strong technical skills and knowledge.
Visual Question Answering (VQA), a task that aims at answering an open-ended question in natural language from an image, can provide an easier access to this information. Considering the geographical information contained in remote sensing images, questions often embed an important spatial aspect, for instance regarding the relative position of two objects. Our objective is to better model the spatial relations in the construction of a ground-truth database of image/question/answer triplets and to assess the capacity a VQA model has to answer these questions. In this article, we propose to use histograms of forces to model the directional spatial relations between geo-localized objects. This allows a finer modeling of ambiguous relationships between objects and to provide different levels of assessment of a relation (e.g. object A is slightly/strictly to the west of object B). Using this new dataset, we evaluate the performances of a classical VQA model and propose a curriculum learning strategy to better take into account the varying difficulty of questions embedding spatial relations. With this approach, we show an improvement in the performances of our model, highlighting the interest of embedding spatial relations in VQA for remote sensing applications.
@inproceedings{Faure2022Embedding,title={Embedding Spatial Relations in Visual Question Answering for Remote Sensing},author={Faure, Maxime and Lobry, Sylvain and Kurtz, Camille and Wendling, Laurent},booktitle={26TH International Conference on Pattern Recognition ICPR},year={2022},}
Language transformers for remote sensing visual question answering
Christel Chappuis, Vincent Mendez, Eliot Walt, and
3 more authors
In IEEE International Geoscience and Remote Sensing Symposium IGARSS, 2022
Remote sensing visual question answering (RSVQA) opens new avenues to promote the use of satellites data, by interfacing satellite image analysis with natural language processing. Capitalizing on the remarkable advances in natural language processing and computer vision, RSVQA aims at finding an answer to a question formulated by a human user about a remote sensing image. This is achieved by extracting representations from images and questions, and then fusing them in a joint representation. Focusing on the language part of the architecture, this study compares and evaluates the adequacy to the RSVQA task of two language models, a traditional recurrent neural network (Skip-thoughts) and a recent attention-based Transformer (BERT). We study whether large transformer models are beneficial to the task and whether fine-tuning is needed for these models to perform at their best. Our findings show that the models benefit from fine-tuning language models and that RSVQA with BERT is slightly but consistently better when properly fine-tuned.
@inproceedings{chappuis2022Language,title={Language transformers for remote sensing visual question answering},author={Chappuis, Christel and Mendez, Vincent and Walt, Eliot and Lobry, Sylvain and Le Saux, Bertrand and Tuia, Devis},booktitle={IEEE International Geoscience and Remote Sensing Symposium IGARSS},year={2022},}
Matching environmental data produced from remote sensing images to demographic data in Sub-Saharan Africa
Lys Thay*, Basile Rousse*, Sylvain Lobry, and
3 more authors
@inproceedings{Thay2022LPS,title={Matching environmental data produced from remote sensing images to demographic data in Sub-Saharan Africa},author={Thay*, Lys and Rousse*, Basile and Lobry, Sylvain and Duthé, Géraldine and Wendling, Laurent and Golaz, Valérie},booktitle={ESA Living Planet Symposium},year={2022},project={DEMO}}
2021
How to find a good image-text embedding for remote sensing visual question answering?
Christel Chappuis, Sylvain Lobry, Benjamin Kellenberger, and
2 more authors
Visual question answering (VQA) has recently been intro- duced to remote sensing to make information extraction from overhead imagery more accessible to everyone. VQA considers a question (in nat- ural language, therefore easy to formulate) about an image and aims at providing an answer through a model based on computer vision and natu- ral language processing methods. As such, a VQA model needs to jointly consider visual and textual features, which is frequently done through a fusion step. In this work, we study three different fusion methodologies in the context of VQA for remote sensing and analyse the gains in ac- curacy with respect to the model complexity. Our findings indicate that more complex fusion mechanisms yield an improved performance, yet that seeking a trade-off between model complexity and performance is worthwhile in practice.
@inproceedings{chappuis2021find,title={How to find a good image-text embedding for remote sensing visual question answering?},author={Chappuis, Christel and Lobry, Sylvain and Kellenberger, Benjamin and Le Saux, Bertrand and Tuia, Devis},booktitle={MACLEAN Workshop at ECML/PKDD 2021},year={2021},}
RSVQA Meets Bigearthnet: A New, Large-Scale, Visual Question Answering Dataset for Remote Sensing
Sylvain Lobry, Begüm Demir, and Devis Tuia
In 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, 2021
Visual Question Answering is a new task that can facilitate the extraction of information from images through textual queries: it aims at answering an open-ended question for- mulated in natural language about a given image. In this work, we introduce a new dataset to tackle the task of visual question answering on remote sensing images: this large- scale, open access dataset extracts image/question/answer triplets from the BigEarthNet dataset. This new dataset contains close to 15 millions samples and is openly avail- able. We present the dataset construction procedure, its characteristics and first results using a deep-learning based methodology. These first results show that the task of vi- sual question answering is challenging and opens new in- teresting research avenues at the interface of remote sensing and natural language processing. The dataset and the code to create and process it are open and freely available on https://rsvqa.sylvainlobry.com/
@inproceedings{lobry2021rsvqa,title={RSVQA Meets Bigearthnet: A New, Large-Scale, Visual Question Answering Dataset for Remote Sensing},author={Lobry, Sylvain and Demir, Beg{\"u}m and Tuia, Devis},booktitle={2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS},pages={1218--1221},year={2021},organization={IEEE},}
2020
Better Generic Objects Counting When Asking Questions to Images: A Multitask Approach for Remote Sensing Visual Question Answering
Sylvain Lobry, Diego Marcos, Benjamin Kellenberger, and
1 more author
In ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2020
@inproceedings{lobry2020better,title={Better Generic Objects Counting When Asking Questions to Images: A Multitask Approach for Remote Sensing Visual Question Answering},author={Lobry, Sylvain and Marcos, Diego and Kellenberger, Benjamin and Tuia, Devis},booktitle={ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences},pages={1021--1027},year={2020},absract={Visual Question Answering for Remote Sensing (RSVQA) aims at extracting information from remote sensing images through queries formulated in natural language. Since the answer to the query is also provided in natural language, the system is accessible to non-experts, and therefore dramatically increases the value of remote sensing images as a source of information, for example for journalism purposes or interactive land planning. Ideally, an RSVQA system should be able to provide an answer to questions that vary both in terms of topic (presence, localization, counting) and image content. However, aiming at such flexibility generates problems related to the variability of the possible answers. A striking example is counting, where the number of objects present in a remote sensing image can vary by multiple orders of magnitude, depending on both the scene and type of objects. This represents a challenge for traditional Visual Question Answering (VQA) methods, which either become intractable or result in an accuracy loss, as the number of possible answers has to be limited. To this end, we introduce a new model that jointly solves a classification problem (which is the most common approach in VQA) and a regression problem (to answer numerical questions more precisely). An evaluation of this method on the RSVQA dataset shows that this finer numerical output comes at the cost of a small loss of performance on non-numerical questions.},}
Learning multi-label aerial image classification under label noise: A regularization approach using word embeddings
Yuansheng Hua, Sylvain Lobry, Lichao Mou, and
2 more authors
In IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium, 2020
Training deep neural networks requires well-annotated datasets. However, real world datasets are often noisy, es- pecially in a multi-label scenario, i.e. where each data point can be attributed to more than one class. To this end, we propose a regularization method to learn multi-label classifi- cation networks from noisy data. This regularization is based on the assumption that semantically close classes are more likely to appear together in a given image. Hereby, we encode label correlations with prior knowledge and regularize noisy network predictions using label correlations. To evaluate its effectiveness, we perform experiments on a mutli-label aerial image dataset contaminated with controlled levels of label noise. Results indicate that networks trained using the pro- posed method outperform those directly learned from noisy labels and that the benefits increase proportionally to the amount of noise present.
@inproceedings{hua2020learning,title={Learning multi-label aerial image classification under label noise: A regularization approach using word embeddings},author={Hua, Yuansheng and Lobry, Sylvain and Mou, Lichao and Tuia, Devis and Zhu, Xiao Xiang},booktitle={IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium},pages={525--528},year={2020},organization={IEEE},}
Interpretable Scenicness from Sentinel-2 Imagery
Alex Levering, Diego Marcos, Sylvain Lobry, and
1 more author
In IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium, 2020
Landscape aesthetics, or scenicness, has been identified as an important ecosystem service that contribute to human health and well-being. Currently there are no methods to inventorize landscape scenicness on a large scale. In this paper we study how to upscale local assessments of scenicness provided by human observers, and we do so by using satellite images. Moreover, we develop an explicitly interpretable CNN model that allows assessing the connections between landscape scenicness and the presence of specific landcover types. To generate the landscape scenicness ground truth, we use the ScenicOrNot crowdsourcing database, which provides geo-referenced, human-based scenicness estimates for ground based photos in Great Britain. Our results show that it is feasible to predict landscape scenicness based on satellite imagery. The interpretable model performs comparably to an unconstrained model, suggesting that it is possible to learn a semantic bottleneck that represents well the present landcover classes and still contains enough information to accurately predict the location’s scenicness.
@inproceedings{levering2020interpretable,title={Interpretable Scenicness from Sentinel-2 Imagery},author={Levering, Alex and Marcos, Diego and Lobry, Sylvain and Tuia, Devis},booktitle={IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium},pages={3983--3986},year={2020},organization={IEEE},}
2019
Semantically Interpretable Activation Maps: what-where-how explanations within CNNs
Diego Marcos, Sylvain Lobry, and Devis Tuia
In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019
A main issue preventing the use of Convolutional Neural Networks (CNN) in end user applications is the low level of transparency in the decision process. Previous work on CNN interpretability has mostly focused either on localizing the regions of the image that contribute to the result or on building an external model that generates plausible explanations. However, the former does not provide any semantic information and the latter does not guarantee the faithfulness of the explanation. We propose an intermediate representation composed of multiple Semantically Interpretable Activation Maps (SIAM) indicating the presence of predefined attributes at different locations of the image. These attribute maps are then linearly combined to produce the final output. This gives the user insight into what the model has seen, where, and a final output directly linked to this information in a comprehensive and interpretable way. We test the method on the task of landscape scenicness (aesthetic value) estimation, using an intermediate representation of 33 attributes from the SUN Attributes database. The results confirm that SIAM makes it possible to understand what attributes in the image are contributing to the final score and where they are located. Since it is based on learning from multiple tasks and datasets, SIAM improve the explanability of the prediction without additional annotation efforts or computational overhead at inference time, while keeping good performances on both the final and intermediate tasks.
@inproceedings{marcos2019semantically,title={Semantically Interpretable Activation Maps: what-where-how explanations within CNNs},author={Marcos, Diego and Lobry, Sylvain and Tuia, Devis},booktitle={2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)},pages={4207--4215},year={2019},organization={IEEE},}
Visual question answering from remote sensing images
Sylvain Lobry, Jesse Murray, Diego Marcos, and
1 more author
In IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, 2019
Remote sensing images carry wide amounts of information beyond land cover or land use. Images contain visual and structural information that can be queried to obtain high level information about specific image content or relational dependencies between the objects sensed. This paper explores the possibility to use questions formulated in natural language as a generic and accessible way to extract this type of information from remote sensing images, i.e. visual question answering. We introduce an automatic way to create a dataset using OpenStreetMap 1 data and present some preliminary results. Our proposed approach is based on deep learning, and is trained using our new dataset.
@inproceedings{lobry2019visual,title={Visual question answering from remote sensing images},author={Lobry, Sylvain and Murray, Jesse and Marcos, Diego and Tuia, Devis},booktitle={IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium},pages={4951--4954},year={2019},organization={IEEE},}
Deep learning models to count buildings in high-resolution overhead images
Sylvain Lobry, and Devis Tuia
In 2019 Joint Urban Remote Sensing Event (JURSE), 2019
This paper addresses the problem of counting buildings in very high-resolution overhead true color imagery. We study and discuss the relevance of deep-learning based methods to this task. Two architectures and two loss functions are proposed and compared. We show that a model enforcing equivariance to rotations is beneficial for the task of counting in remotely sensed images. We also highlight the importance of robustness to outliers of the loss function when considering remote sensing applications.
@inproceedings{lobry2019deep,title={Deep learning models to count buildings in high-resolution overhead images},author={Lobry, Sylvain and Tuia, Devis},booktitle={2019 Joint Urban Remote Sensing Event (JURSE)},pages={1--4},year={2019},organization={IEEE},}
2018
Scale equivariance in CNNs with vector fields
Diego Marcos, Benjamin Kellenberger, Sylvain Lobry, and
1 more author
In International Conference on Machine Learning (ICML)/FAIM workshop on Towards learning with limited labels: Equivariance, Invariance, and Beyond, 2018
We study the effect of injecting local scale equivariance into Convolutional Neural Networks. This is done by applying each convolutional filter at multiple scales. The output is a vector field encoding for the maximally activating scale and the scale itself, which is further processed by the following convolutional layers. This allows all the intermediate representations to be locally scale equivariant. We show that this improves the performance of the model by over 20% in the scale equivariant task of regressing the scaling factor applied to randomly scaled MNIST digits. Furthermore, we find it also useful for scale invariant tasks, such as the actual classification of randomly scaled digits. This highlights the usefulness of allowing for a compact representation that can also learn relationships between different local scales by keeping internal scale equivariance.
@inproceedings{marcos2018scale,title={Scale equivariance in CNNs with vector fields},author={Marcos, Diego and Kellenberger, Benjamin and Lobry, Sylvain and Tuia, Devis},booktitle={International Conference on Machine Learning (ICML)/FAIM workshop on Towards learning with limited labels: Equivariance, Invariance, and Beyond},year={2018},}
Correcting misaligned rural building annotations in open street map using convolutional neural networks evidence
John E Vargas-Munoz, Diego Marcos, Sylvain Lobry, and
3 more authors
In IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, 2018
Mapping rural buildings in developing countries is crucial to monitor and plan in those vulnerable areas. Despite the existence of some rural building annotations in OpenStreetMap (OSM), those are of insufficient quantity and quality to train models able to map large areas accurately. In particular, these annotations are very often misaligned with respect to the buildings that are present in updated aerial imagery. We propose a Markov Random Field (MRF) method to correct misaligned rural building annotations. To do so, our method uses i) the correlation between candidate aligned OSM annotations and buildings roughly detected on aerial images and ii) the local consistency of the alignment vectors.
@inproceedings{vargas2018correcting,title={Correcting misaligned rural building annotations in open street map using convolutional neural networks evidence},author={Vargas-Munoz, John E and Marcos, Diego and Lobry, Sylvain and dos Santos, Jefersson A and Falcao, Alexandre X and Tuia, Devis},booktitle={IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium},pages={1284--1287},year={2018},organization={IEEE},}
Land-use characterisation using Google Street View pictures and OpenStreetMap
Shivangi Srivastava, Sylvain Lobry, Devis Tuia, and
1 more author
In 21st AGILE Conference on Geographic Information Science (2018), 2018
This paper presents a study on the use of freely available, geo-referenced pictures from Google Street View to model and predict land-use at the urban-objects scale. This task is traditionally done manually and via photointerpretation, which is very time consuming. We propose to use a machine learning approach based on deep learning and to model land-use directly from both the pictures available from Google Street View and OpenStreetMap annotations. Because of the large availability of these two data sources, the proposed approach is scalable to cities around the globe and presents the possibility of frequent updates of the map. As base information, we use features extracted from single pictures around the object of interest; these features are issued from pre-trained convolutional neural networks. Then, we train various classifiers (Linear and RBF support vector machines, multi layer perceptron) and compare their performances. We report on a study over the city of Paris, France, where we observed that pictures coming from both inside and outside the urban-objects capture distinct, but complementary features.
@inproceedings{srivastava2018land,title={Land-use characterisation using Google Street View pictures and OpenStreetMap},author={Srivastava, Shivangi and Lobry, Sylvain and Tuia, Devis and Munoz, John Vargas},booktitle={21st AGILE Conference on Geographic Information Science (2018)},year={2018},}
Speckle reduction in PolSAR by multi-channel variance stabilization and Gaussian denoising: MuLoG
Charles-Alban Deledalle, Loic Denis, Florence Tupin, and
1 more author
In EUSAR 2018; 12th European Conference on Synthetic Aperture Radar, 2018
Due to speckle phenomenon, some form of filtering must be applied to SAR data prior to performing any polarimetric analysis. Beyond the simple multilooking operation (i.e., moving average), several methods have been designed specifically for PolSAR filtering. The specifics of speckle noise and the correlations between polarimetric channels make PolSAR filtering more challenging than usual image restoration problems. Despite their striking performance, existing image denoising algorithms, mostly designed for additive white Gaussian noise, cannot be directly applied to PolSAR data. We bridge this gap with MuLoG by providing a general scheme that stabilizes the variance of the polarimetric channels and that can embed almost any Gaussian denoiser. We describe MuLoG approach and illustrate its performance on airborne PolSAR data using a very recent Gaussian denoiser based on a convolutional neural network.
@inproceedings{8438063,author={Deledalle, Charles-Alban and Denis, Loic and Tupin, Florence and Lobry, Sylvain},booktitle={EUSAR 2018; 12th European Conference on Synthetic Aperture Radar},title={Speckle reduction in PolSAR by multi-channel variance stabilization and Gaussian denoising: MuLoG},year={2018},volume={},number={},pages={1-5},doi={},url={https://ieeexplore.ieee.org/document/8438063}}
2017
Double MRF for water classification in SAR images by joint detection and reflectivity estimation
Sylvain Lobry, Loïc Denis, Florence Tupin, and
1 more author
In 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2017
Classification of SAR images is a challenging task as the radiometric properties of a class may not be constant throughout the image. The assumption made in most classification algorithms that a class can be modeled by constant parameters is then not valid. In this paper, we propose a classification algorithm based on two Markov random fields that accounts for local and global variations of the parameters inside the image and produces a regularized classification. This algorithm is applied on airborne TropiSAR and simulated SWOT HR data. Both quantitative and visual results are provided, demonstrating the effectiveness of the proposed method.
@inproceedings{8127445,author={Lobry, Sylvain and Denis, Loïc and Tupin, Florence and Fjortoft, Roger},booktitle={2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS)},title={Double MRF for water classification in SAR images by joint detection and reflectivity estimation},year={2017},volume={},number={},pages={2283-2286},doi={10.1109/IGARSS.2017.8127445},url={https://ieeexplore.ieee.org/document/8127445},project={SWOT}}
Unsupervised detection of thin water surfaces in SWOT images based on segment detection and connection
Sylvain Lobry, Florence Tupin, and Roger Fjortoft
In 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2017
The objective of the Surface Water and Ocean Topography (SWOT) mission is to regularly monitor the height of the earth’s water surfaces. One of the challenges toward obtaining global measurements of these surfaces is to detect small water areas. In this article we introduce a method for the detection of thin water surfaces, such as rivers, in SWOT images. It combines a low-level step (segment detection) with a high-level regularization of these features. The method is then tested on a simulated SWOT image.
@inproceedings{8127807,author={Lobry, Sylvain and Tupin, Florence and Fjortoft, Roger},booktitle={2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS)},title={Unsupervised detection of thin water surfaces in SWOT images based on segment detection and connection},year={2017},volume={},number={},pages={3720-3723},doi={10.1109/IGARSS.2017.8127807},url={https://ieeexplore.ieee.org/document/8127807},project={SWOT}}
Urban area change detection based on generalized likelihood ratio test
Weiying Zhao, Sylvain Lobry, Henri Maitre, and
2 more authors
In 2017 9th International Workshop on the Analysis of Multitemporal Remote Sensing Images (MultiTemp), 2017
Change detection methods often use denoised data because the original speckle noise has a strong influence on the detection results. The effect of using different data sources (different equivalent number of looks, original data, denoised data) and different threshold methods are studied based on four kinds of generalized likelihood ratio test approaches. NL-SAR [1] denoised data and the corresponding spatially varying equivalent number of looks are taken into account in the detection procedure. The bi-temporal experimental results on simulated data, realistic synthetic Sentinel-1 SAR data show the improvement of using equivalent number of looks of denoised data and corresponding adaptive thresholds for change detection in urban areas.
@inproceedings{8035245,author={Zhao, Weiying and Lobry, Sylvain and Maitre, Henri and Nicolas, Jean-Marie and Tupin, Florence},booktitle={2017 9th International Workshop on the Analysis of Multitemporal Remote Sensing Images (MultiTemp)},title={Urban area change detection based on generalized likelihood ratio test},year={2017},volume={},number={},pages={1-4},doi={10.1109/Multi-Temp.2017.8035245},url={https://ieeexplore.ieee.org/document/8035245},}
2016
A decomposition model for scatterers change detection in multi-temporal series of SAR images
Sylvain Lobry, Florence Tupin, and Loïc Denis
In 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2016
This paper presents a method for strong scatterers change detection in synthetic aperture radar (SAR) images based on a decomposition for multi-temporal series. The formulated decomposition model jointly estimates the background of the series and the scatterers. The decomposition model retrieves possible changes in scatterers and the date at which they occurred. An exact optimization method of the model is presented and applied to a TerraSAR-X time series.
@inproceedings{7729869,author={Lobry, Sylvain and Tupin, Florence and Denis, Loïc},booktitle={2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS)},title={A decomposition model for scatterers change detection in multi-temporal series of SAR images},year={2016},volume={},number={},pages={3362-3365},doi={10.1109/IGARSS.2016.7729869},}
Non-Uniform Markov Random Fields for Classification of SAR Images
Sylvain Lobry, Florence Tupin, and Roger Fjortoft
In Proceedings of EUSAR 2016: 11th European Conference on Synthetic Aperture Radar, 2016
When dealing with SAR image classification, the class parameters may vary along the swath for several reasons. Traditional classification algorithms are then not well adapted, as they assume constant class parameters. In this paper, we propose a binary classification algorithm based on Markov Random Fields that take into account the parameters variations in the swath, and we present results obtained on airborne TropiSAR and simulated SWOT HR data.
@inproceedings{7559390,author={Lobry, Sylvain and Tupin, Florence and Fjortoft, Roger},booktitle={Proceedings of EUSAR 2016: 11th European Conference on Synthetic Aperture Radar},title={Non-Uniform Markov Random Fields for Classification of SAR Images},year={2016},volume={},number={},pages={1-4},doi={},url={https://ieeexplore.ieee.org/document/7559390},project={SWOT}}
2015
Sparse + smooth decomposition models for multi-temporal SAR images
Sylvain Lobry, Loïc Denis, and Florence Tupin
In 2015 8th International Workshop on the Analysis of Multitemporal Remote Sensing Images (Multi-Temp), 2015
SAR images have distinctive characteristics compared to optical images: speckle phenomenon produces strong fluctuations, and strong scatterers have radar signatures several orders of magnitude larger than others. We propose to use an image decomposition approach to account for these peculiarities. Several methods have been proposed in the field of image processing to decompose an image into components of different nature, such as a geometrical part and a textural part. They are generally stated as an energy minimization problem where specific penalty terms are applied to each component of the sought decomposition. We decompose temporal series of SAR images into three components: speckle, strong scatterers and background. Our decomposition method is based on a discrete optimization technique by graph-cut. We apply it to change detection tasks.
@inproceedings{7245772,author={Lobry, Sylvain and Denis, Loïc and Tupin, Florence},booktitle={2015 8th International Workshop on the Analysis of Multitemporal Remote Sensing Images (Multi-Temp)},title={Sparse + smooth decomposition models for multi-temporal SAR images},year={2015},volume={},number={},pages={1-4},doi={10.1109/Multi-Temp.2015.7245772},url={https://ieeexplore.ieee.org/document/7245772},}
National Conferences
2023
Évaluation du couvert neigeux à partir d’images SAR par apprentissage profond basé sur des images optiques de référence
Mathias Montginoux, Flora Weissgerber, Sylvain Lobry, and
1 more author
Optical satellite images are commonly used to evaluate the snow cover. However, part of the information is lost due to clouds.
To fill this gap we propose to detect the snow from Sentinel-1 SAR images using a convolutional neural network trained with labels obtained from MODIS optical images. A binary semantic segmentation is computed from two polarimetric SAR inputs: a wet snow ratio and a dry snow ratio.
The model, called SESAR U-net, is trained on a small area and then tested over a whole watershed. The missing labels are interpolated and the uncertainty due to clouds is considered. Our proposed method gives an overall accuracy higher than 80%.
@inproceedings{Montginoux2023GRETSI,title={Évaluation du couvert neigeux à partir d'images SAR par apprentissage profond basé sur des images optiques de référence},author={Montginoux, Mathias and Weissgerber, Flora and Lobry, Sylvain and Idier, Jérôme},booktitle={GRETSI},year={2023},}
Transposition de données mutlidimensionelles en images pour pallier le fléau de la dimension
Rebecca Leygonie, Sylvain Lobry, Guillaume Vimont, and
1 more author
When dealing with high-dimensional multivariate time series classification problems, a well-known difficulty is the \textitcurse of dimensionality.
In this article, we propose an original approach of transposition of multidimensional data into images to tackle the task of classification. We propose a small hybrid model containing convolutional layers as a feature extractor followed by a recurrent neural network that take this transposed data as an input. We apply our method to a large dataset consisting of individual patient medical records. We show that our approach allows us to significantly reduce the size of a network and increase its performance by opting for a transformation of the input data.
@inproceedings{Leygonie2023ORASIS,title={Transposition de données mutlidimensionelles en images pour pallier le fléau de la dimension},author={Leygonie, Rebecca and Lobry, Sylvain and Vimont, Guillaume and Wendling, Laurent},booktitle={ORASIS},year={2023},}
2022
Apprentissage profond pour la classification de QR Codes bruités
Rebecca Leygonie, Sylvain Lobry, and Laurent Wendling
We wish to define the limitations of a classical classification model based on deep learning when applied on abstract images, which do not represent visually identifiable objects.
QR Codes fall into this category of abstract images: one bit corresponding to one encoded character, QR codes were not designed to be decoded by the naked eye. To understand the limitations of a deep learning-based model for abstract image classification, we train an image classification model on QR codes generated from the information obtained when reading a health pass. We compare the performance of a classification model with that of a classical (deterministic) decoding method in the presence of noise. This study allows us to conclude that a model based on deep learning can be relevant for the understanding of abstract images.
@inproceedings{Leygonie2022RFIAP,title={Apprentissage profond pour la classification de QR Codes bruités},author={Leygonie, Rebecca and Lobry, Sylvain and Wendling, Laurent},booktitle={RFIAP/CAP 2022},year={2022},}
2021
Segmentation Sémantique pour la Simulation d’Images SAR
Nathan Letheule, Flora Weissgerber, Sylvain Lobry, and
1 more author
Simulation of Synthetic Aperture Radar (SAR) images is an essential component of SAR applications development. This can be done using style transfer methods or through physical simulators. We propose a hybrid approach : physical simulation of a SAR image from a material map ob- tained by a deep network taking the optical image as input. We compare the simulations with those from a style transfer method. The first results show the potential of our approach.
@inproceedings{letheule2021segmentation,title={Segmentation S{\'e}mantique pour la Simulation d'Images SAR},author={Letheule, Nathan and Weissgerber, Flora and Lobry, Sylvain and Colin-Koeniguer, Elise},booktitle={ORASIS 2021},year={2021},}
2017
Détection de l’eau dans les images radar du futur satellite SWOT
Sylvain Lobry, Roger Fjortoft, Loı̈c Denis, and
1 more author
One of the objectives of the SWOT mission conducted by CNES and JPL is to obtain a global measurement of water heights. In order to apply an interferometric processing on SWOT images over continents, a first step is to obtain a classification indicating the presence of water. We introduce two methods adapted to the unusual acquisition parameters of the sensor for the detection of compact areas (i.e. lakes) and linear networks (i.e. rivers).
@inproceedings{lobry2017detection,title={D{\'e}tection de l’eau dans les images radar du futur satellite SWOT},author={Lobry, Sylvain and Fjortoft, Roger and Denis, Lo{\"\i}c and Tupin, Florence},booktitle={GRETSI},year={2017},project={SWOT}}
2016
Un modèle de décomposition pour la détection de changement dans les séries temporelles d’images RSO
This paper presents a method for strong scatterers change detection in synthetic aperture radar (SAR) images based on a decomposition for multi-temporal series. The formula- ted decomposition model jointly estimates the background of the series and the scatterers. The decomposition mo- del retrieves possible changes in scatterers and the date at which they occurred. An exact optimization method of the model is presented and applied to a TerraSAR-X time series.
@inproceedings{lobry2016modele,title={Un mod{\`e}le de d{\'e}composition pour la d{\'e}tection de changement dans les s{\'e}ries temporelles d’images RSO},author={Lobry, Sylvain and Denis, Lo{\"\i}c and Tupin, Florence},booktitle={RFIA},year={2016},}