We are proposing a special issue on Vision Language Models for Remote Sensing Analysis and Interpretation is ISPRS Journal of Photogrammetry and Remote Sensing. Please consider submitting!

Find the call.

One of the long-standing goals of automatic remote sensing (RS) understanding is to provide a thorough, human-like interpretation of the data that is accessible to users who lack expertise in RS. Despite this overarching goal, several recent approaches remain highly specialized, specific to single tasks, and usually boil down to some form of optimized classification or segmentation of the images. A gap between these techniques and users remains. Natural language processing (NLP) techniques can help bridge this gap between complex visual data in RS scenes and human understanding. By incorporating NLP techniques, it becomes easier to interpret the data and extract insights that can be communicated in a clear and concise manner.

The emergence of Vision language models (VLMs) as a new type of learning paradigm has shown great promise for the analysis of RS images and the extraction of higher level semantics. VLMs are able to learn both visual and textual features from data, which can be used to improve the performance of RS image interpretation. Moreover, using language can lead to models that are more explainable, interactive and that can gather knowledge from the world that is not contained in the images themselves.

This special issue will focus on the most recent developments in VLMs for the analysis of RS imagery. We welcome submissions that describe the use of VLMs for a variety of tasks, including but not limited to:

  • Captioning of images, multi-temporal images, and aerial videos;
  • Text-to-image retrieval;
  • Visual question answering;
  • Visual question generation;
  • Visual grounding;
  • Multimodal multitask learning;
  • VLMs prompting in RS;
  • Conversational agents.