Recent advancements in Natural language Processing (NLP), particularly in Large languagemodels (LLMs), associated with deep learning-based computer vision techniques, have shown substantial potential for automating a...
详细信息
Recent advancements in Natural language Processing (NLP), particularly in Large languagemodels (LLMs), associated with deep learning-based computer vision techniques, have shown substantial potential for automating a variety of tasks. These are known as visual LLMs and one notable model is visual ChatGPT, which combines ChatGPT's LLM capabilities with visual computation to enable effective image analysis. These models' abilities to process images based on textual inputs can revolutionize diverse fields, and while their application in the remote sensing domain remains unexplored, it is important to acknowledge that novel implementations are to be expected. Thus, this is the first paper to examine the potential of visual ChatGPT, a cutting-edge LLM founded on the GPT architecture, to tackle the aspects of image processing related to the remote sensing domain. Among its current capabilities, visual ChatGPT can generate textual descriptions of images, perform canny edge and straight line detection, and conduct image segmentation. These offer valuable insights into image content and facilitate the interpretation and extraction of information. By exploring the applicability of these techniques within publicly available datasets of satellite images, we demonstrate the current model's limitations in dealing with remote sensing images, highlighting its challenges and future prospects. Although still in early development, we believe that the combination of LLMs and visualmodels holds a significant potential to transform remote sensing image processing, creating accessible and practical application opportunities in the field.
The most popular approach in object classification is based on the bag of visual-words model,which has several fundamental problems that restricting the performance of this method, such as low time efficiency, the syn...
详细信息
The most popular approach in object classification is based on the bag of visual-words model,which has several fundamental problems that restricting the performance of this method, such as low time efficiency, the synonym and polysemy of visual words, and the lack of spatial information between visual *** view of this, an object classification based on weakly supervised E2LSH and saliency map weighting is ***, E2LSH (Exact Euclidean Locality Sensitive Hashing) is employed to generate a group of weakly randomized visual dictionary by clustering SIFT features of the training dataset, and the selecting process of hash functions is effectively supervised inspired by the random forest ideas to reduce the randomcity of ***, graph-based visual saliency (GBVS) algorithm is applied to detect the saliency map of different images and weight the visual words according to the saliency ***, saliency map weighted visual language model is carried out to accomplish object *** results datasets of Pascal 2007 and Caltech-256 indicate that the distinguishability of objects is effectively improved and our method is superior to the state-of-the-art object classification methods.
暂无评论