Identification of medicinal plants is essential for the development of effective medicines and the preservation of biodiversity. The goal of the artificial intelligence discipline of computer vision is to make it poss...
详细信息
Transformer with self-attention has led to the revolutionizing of natural language processing field, and recently inspires the emergence of Transformer-style architecture design with competitive results in numerous co...
详细信息
Transformer with self-attention has led to the revolutionizing of natural language processing field, and recently inspires the emergence of Transformer-style architecture design with competitive results in numerous computer vision tasks. Nevertheless, most of existing designs directly employ self-attention over a 2D feature map to obtain the attention matrix based on pairs of isolated queries and keys at each spatial location, but leave the rich contexts among neighbor keys under-exploited. In this work, we design a novel Transformer-style module, i.e., Contextual Transformer (CoT) block, for visual recognition. Such design fully capitalizes on the contextual information among input keys to guide the learning of dynamic attention matrix and thus strengthens the capacity of visual representation. Technically, CoT block first contextually encodes input keys via a 3 x 3 convolution, leading to a static contextual representation of inputs. We further concatenate the encoded keys with input queries to learn the dynamic multi-head attention matrix through two consecutive 1 x 1 convolutions. The learnt attention matrix is multiplied by input values to achieve the dynamic contextual representation of inputs. The fusion of the static and dynamic contextual representations are finally taken as outputs. Our CoT block is appealing in the view that it can readily replace each 3 x 3 convolution in ResNet architectures, yielding a Transformer-style backbone named as Contextual Transformer Networks (CoTNet). Through extensive experiments over a wide range of applications (e.g., image recognition, object detection, instance segmentation, and semantic segmentation), we validate the superiority of CoTNet as a stronger backbone.
Classification of galaxies is traditionally associated with their morphologies through visual inspection of images. The amount of data to come render this task, inhuman and machine Learning (mainly Deep Learning) has ...
详细信息
Classification of galaxies is traditionally associated with their morphologies through visual inspection of images. The amount of data to come render this task, inhuman and machine Learning (mainly Deep Learning) has been called to the rescue for more than a decade. However, the results look mitigate and there seems to be a shift away from the paradigm of the traditional morphological classification of galaxies. In this paper, I want to show that the algorithms indeed are very sensitive to the features present in images, features that do not necessarily correspond to the Hubble or de vaucouleurs vision of a galaxy. However, this does not preclude to get the correct insights into the physics of galaxies. I have applied a state-of-the-art 'traditional' machine Learning clustering tool, called Fisher-EM, a latent discriminant subspace Gaussian mixture model algorithm to 4458 galaxies carefully classified into 18 types by the EFIGI project. The optimum number of clusters given by the integrated complete likelihood criterion is 47. The correspondence with the EFIGI classification is correct, but it appears that the Fisher-EM algorithm gives a great importance to the distribution of light which translates to characteristics such as the bulge to disc ratio, the inclination or the presence of foreground stars. The discrimination of some physical parameters (bulge-to-total luminosity ratio, (B-v)(T), intrinsic diameter, presence of flocculence or dust, and arm strength) is very comparable in the two classifications.
Automatic vision-based inspection systems have played a key role in product quality assessment for decades through the segmentation, detection, and classification of defects. Historically, machine learning frameworks,...
详细信息
Automatic vision-based inspection systems have played a key role in product quality assessment for decades through the segmentation, detection, and classification of defects. Historically, machine learning frameworks, based on hand-crafted feature extraction, selection, and validation, counted on a combined approach of parameterized imageprocessing algorithms and explicated human knowledge. The outstanding performance of deep learning (DL) for vision systems, in automatically discovering a feature representation suitable for the corresponding task, has exponentially increased the number of scientific articles and commercial products aiming at industrial quality assessment. In such a context, this article reviews more than 220 relevant articles from the related literature published until February 2023, covering the recent consolidation and advances in the field of fully-automatic DL-based surface defects inspection systems, deployed in various industrial applications. The analyzed papers have been classified according to a bi-dimensional taxonomy, that considers both the specific defect recognition task and the employed learning paradigm. The dependency on large and high-quality labeled datasets and the different neural architectures employed to achieve an overall perception of both well-visible and subtle defects, through the supervision of fine or/and coarse data annotations have been assessed. The results of our analysis highlight a growing research interest in defect representation power enrichment, especially by transferring pre-trained layers to an optimized network and by explaining the network decisions to suggest trustworthy retention or rejection of the products being evaluated.
This paper presents a novel approach for enhancing vehicle safety and navigation through an integrated system for lane detection, vehicle alignment, and automatic braking using visual feedback. Our proposed system emp...
详细信息
The growing popularity of vision transformers (viTs) in remote sensing image classification is due to their ability to effectively capture long-range dependencies. However, their high computational cost and memory foo...
详细信息
The growing popularity of vision transformers (viTs) in remote sensing image classification is due to their ability to effectively capture long-range dependencies. However, their high computational cost and memory footprint limit their applicability, particularly for small-scale datasets and resource-constrained environments. To address these challenges, we propose the multiscale multihead compact convolutional transformer (MSHCCT), a lightweight yet powerful model that integrates convolutional tokenization with small-scale viTs to enhance multiscale feature representation while maintaining computational efficiency. Despite a modest increase in parameters and training time, MSHCCT achieves superior classification accuracy and robustness on high-resolution aerial scenes. Importantly, our approach eliminates the need for model pretraining, additional datasets, or multisensor data fusion, ensuring a computationally efficient and practical solution for remote sensing applications. The code will be made publicly available at https://***/aj1365/MSHCCT
Multitask learning (MTL) is a challenging puzzle, particularly in the realm of computer vision (Cv). Setting up vanilla deep MTL requires either hard or soft parameter sharing schemes that employ greedy search to find...
详细信息
Multitask learning (MTL) is a challenging puzzle, particularly in the realm of computer vision (Cv). Setting up vanilla deep MTL requires either hard or soft parameter sharing schemes that employ greedy search to find the optimal network designs. Despite its widespread application, the performance of MTL models is vulnerable to under-constrained parameters. In this article, we draw on the recent success of vision transformer (viT) to propose a multitask representation learning method called multitask viT (MTviT), which proposes a multiple branch transformer to sequentially process the image patches (i.e., tokens in transformer) that are associated with various tasks. Through the proposed cross-task attention (CA) module, a task token from each task branch is regarded as a query for exchanging information with other task branches. In contrast to prior models, our proposed method extracts intrinsic features using the built-in self-attention mechanism of the viT and requires just linear time on memory and computation complexity, rather than quadratic time. Comprehensive experiments are carried out on two benchmark datasets, including NYU-Depth v2 (NYUDv2) and CityScapes, after which it is found that our proposed MTviT outperforms or is on par with existing convolutional neural network (CNN)-based MTL methods. In addition, we apply our method to a synthetic dataset in which task relatedness is controlled. Surprisingly, experimental results reveal that the MTviT exhibits excellent performance when tasks are less related.
L Color cast, an aberration common in digital images, poses challenges in various imageprocessingapplications, affecting image quality and visual perception. This research investigates diverse methodologies for colo...
详细信息
A group of eye conditions known as glaucoma impair the optic nerve, which is in charge of sending visual data from the eye to the brain. Glaucoma impacts 3.54% of adults aged 40 to 80 around the world. Early detection...
详细信息
A group of eye conditions known as glaucoma impair the optic nerve, which is in charge of sending visual data from the eye to the brain. Glaucoma impacts 3.54% of adults aged 40 to 80 around the world. Early detection of glaucoma is crucial as it can prevent total optic nerve damage, which would cause irreversible vision loss. It is possible for specialists to diagnose glaucoma medically, but treatment options are either expensive or time-consuming and requires ongoing care from medical professionals. There have been numerous initiatives at streamlining all components of the glaucoma categorization process, however these models are challenging for users to comprehend the key predictors, resulting in them being unreliable for use by medical experts. The study uses eye fundus images to classify glaucoma patients using three distinct Deep Learning techniques: Convolutional neural network, visual Geometry Group 16 (vGG16), and Global Context Network (GC-Net). In addition, several data pre-processing techniques are used to avoid overfitting and achieve high accuracy. This research compares and analyses the performance of various architectures using the aforementioned techniques. The CNN model had the best accuracy of 83% when in contrast to the other deep learning models.
Electron microscopy (EM) enables capturing high resolution images of very small structures in biological and non-biological specimens such as membrane proteins, viruses, subcellular structures, nanoparticles, or mater...
Electron microscopy (EM) enables capturing high resolution images of very small structures in biological and non-biological specimens such as membrane proteins, viruses, subcellular structures, nanoparticles, or material surfaces. Electron microscopy plays a critical role in research, development, and diagnosis in many applications of biological, physical, chemical and material sciences. Thanks to advances in instrumentation, electron microscopy generates large amounts of complex data that is no longer feasible to analyze manually. There is a growing need for development of computational methods and tools for automated analysis of electron microscopy data generated for variety of research fields. Recent advances in artificial intelligence and machine learning, particularly in deep learning have revolutionized imageprocessing and computer vision. In this work, we explored deep learning guided imageprocessing and computer vision solutions to address the growing high-performance processing needs of image data acquired using electron microscopy. The proposed solutions involved novel multi-step, 2D/3D fusion approaches to address the unique challenges of complex, low-contrast, noisy electron microscopy imagery; and selfsupervised, semi-supervised, or meta-learning schemes to address the challenges caused by lack of or limited amounts of labeled training data. These image analysis solutions were used for detection, segmentation, and quantification of various biological structures of interest such as proteins, viruses, mitochondrial or neural structures; and non-biological structures of interest such as carbon nanotube forests. Experiments conducted on the proposed methods showed robust and promising results towards automated, objective, and quantitative analysis of electron microscopy image data, that is of great value for biology, medicine, and material science applications.
暂无评论