The proceedings contain 47 papers. The topics discussed include: Tri-PANet: triplanar ensemble network with parallelly aggregating global and local information for brain tumor segmentation;an object-based extremely we...
ISBN:
(纸本)9798350306545
The proceedings contain 47 papers. The topics discussed include: Tri-PANet: triplanar ensemble network with parallelly aggregating global and local information for brain tumor segmentation;an object-based extremely weak supervised learning paradigm: case study in classification of crops using UVA images;abdominal multi-organ segmentation based on dual self-attention module;Res2NetFuse: a novel Res2Net-based fusion method for infrared and visible images;D2-LRR a dual-decomposed MDLatLRR approach for medical image fusion;a dual-polarization feature enhancement network for SAR ship detection;a lightweight reconstruction network for surface defect inspection;a method for determining optimal leaf picking amount of litchi based on YOLOv7;and a progressive optimization method for image alignment of cameras with different focal lengths.
Crankshaft is one of the mechanical components of the vehicle engine, and quality control of it holds significant importance in the production line. In this paper, a vision-based system was developed to detect apparen...
详细信息
ISBN:
(纸本)9798350350494;9798350350500
Crankshaft is one of the mechanical components of the vehicle engine, and quality control of it holds significant importance in the production line. In this paper, a vision-based system was developed to detect apparent structural defects on the crankshaft surface. By examining the different approaches in computer vision tasks, the semantic segmentation technique was chosen to solve this problem. In the first stage, a dataset consisting of 400 crankshaft experimental images with structural defects such as scratch, pitting, and grinding were collected. Then, the Convolutional Neural Network (CNN) with MobileNet architecture was trained to detect apparent defects, and an Intersection over Union (IoU) evaluation criteria of 64.7% was obtained. In the third stage, some imageprocessing techniques were used to increase the performance. By applying the DexiNed edge detection filter on the train-set images, the IoU was increased by 8.4%. Considering the importance of this issue in the automotive industry, it has been tried again to boost the performance by augmenting the dataset images. On the other hand, this can also prevent overfitting of the model. By training the model under the same conditions as the previous stages, the IoU in this stage increased by 13.2% and reached 86.3%.
Weakly-Supervised Semantic Segmentation (WSSS) with image-level labels, commonly uses Class Activation Maps (CAM) to generate pseudo-labels. However, Convolutional Neural Networks (CNNs), with their limited local rece...
详细信息
ISBN:
(纸本)9798350350494;9798350350500
Weakly-Supervised Semantic Segmentation (WSSS) with image-level labels, commonly uses Class Activation Maps (CAM) to generate pseudo-labels. However, Convolutional Neural Networks (CNNs), with their limited local receptive field, often struggle to identify entire object regions. Recently, the vision Transformer (ViT) architecture has been employed instead of CNNs to capture long-range feature dependencies, by using the self-attention mechanism. Despite its advantages, ViT tends to overlook local feature details, leading to attention maps with low quality and unclear object details. This paper introduces a novel method to enhance the local details in attention maps by leveraging local patches. These local patches are selected from regions that are more likely to contain the desired objects. By effectively utilizing these local patches during the training and generation stages, the model yields more detailed attention maps. Extensive experiments were conducted on the PASCAL VOC 2012 benchmark dataset to demonstrate the efficacy of the proposed approach. The results show significant improvements (+2.6% mIoU) with minimal computational overhead, underscoring the potential of the proposed method in the field of Weakly-Supervised Semantic Segmentation.
Breast cancer is one of the most dangerous diseases among women. Different methods are used to diagnose this cancer that among these, imaging and computer-aided systems are more common. In these systems, one of the mo...
详细信息
ISBN:
(纸本)9798350350494;9798350350500
Breast cancer is one of the most dangerous diseases among women. Different methods are used to diagnose this cancer that among these, imaging and computer-aided systems are more common. In these systems, one of the most important step is preprocessing and removing unnecessary areas of the images, as well as extracting the chest area. In this paper, we present a method that consists of preprocessing, feature extraction, and using a machine learning classifier. In the preprocessing step, we propose a method to extract the region of interest in both angles of mammography images. The proposed novel method includes applying gamma correction thresholding to the images and obtaining two binary images based on the proposed threshold using the Otsu method. Results show the proposed method successfully removes the chest muscle with 98% accuracy. In the next, for feature extraction phase, we utilize three different methods for extracting features. Finally, by employing an Extra tree model classifier, we classify mammography images into normal and abnormal. By incorporating the block-based feature extraction method, we achieve 98% accuracy in classification. Overall, our approach demonstrates the effectiveness of preprocessing and feature extraction for diagnosing breast cancer using mammography images.
Accurate classification of land cover from aerial images is one of the research topics in remote sensing and is also in high demand in industry. However, obtaining labeled data for training different classifiers that ...
详细信息
ISBN:
(纸本)9798350350494;9798350350500
Accurate classification of land cover from aerial images is one of the research topics in remote sensing and is also in high demand in industry. However, obtaining labeled data for training different classifiers that heavily depend on supervision is still a challenging and resource-intensive task. Unsupervised methods have emerged as a powerful alternative to overcome the limitations associated with labeled data. Such methods have a high ability to discover hidden patterns and structures in multi-spectral images and have the possibility of classifying various types of land cover without relying on labeled samples. Our research primarily involved the analysis of World-View3 satellite imagery. Our strategy involved creating an advanced pipeline that extracted features using autoencoders. Through this approach, the multi-spectral images' key characteristics are efficiently extracted. Subsequently, we implement transfer learning to re-train the model with a limited number of labeled data. By applying transfer learning, our pipeline significantly enhances the capability of multispectral imageprocessing, enabling a more comprehensive and accurate interpretation of satellite imagery data. Finally, we evaluate our results not only by providing a confusion matrix but also through a visual comparison between the class map and the RGB composition of the MSI image.
The precise and automated segmentation of ovarian tumors in medical images plays a pivotal role in the treatment of ovarian cancer in women. U-Net has demonstrated remarkable success in the field of medical image segm...
详细信息
ISBN:
(纸本)9798350350494;9798350350500
The precise and automated segmentation of ovarian tumors in medical images plays a pivotal role in the treatment of ovarian cancer in women. U-Net has demonstrated remarkable success in the field of medical image segmentation. However, due to its small receptive field, U-Net faces challenges in extracting global context information. Moreover, due to the significant variation in scale and size among tumors, it is essential to employ a network capable of effectively extracting information at Multiple scales. In this study, we present a U-Net-based network named PCU-Net for the segmentation of ovarian tumors, incorporating ConvMixer and Pyramid Dilated Convolution (PDC) modules. The ConvMixer module captures global context information by utilizing large-size kernels. The PDC module integrates local and global contextual patterns through utilization of parallel dilated convolution with different dilation rate. Furthermore, our model has fewer parameters than U-Net. We assess the proposed method's performance using the Multi-Modality Ovarian Tumor Ultrasound (MMOTU) dataset. The results indicate that in comparison to U-Net, our proposed PCU-Net exhibits an improvement of 4.23% in terms of Intersection over Union (IoU) and 2.99% in terms of Dice Similarity Coefficient (DSC).
Emotion AI is a research domain that aims to understand human emotions from visual or textual data. However, existing methods often ignore the influence of cultural diversity on emotional interpretation. In this paper...
详细信息
ISBN:
(纸本)9798350350494;9798350350500
Emotion AI is a research domain that aims to understand human emotions from visual or textual data. However, existing methods often ignore the influence of cultural diversity on emotional interpretation. In this paper, we propose a multi-modal deep learning model that integrates cultural awareness into emotion recognition. Our model uses images as the primary data source and comments from individuals across different regions as the secondary data source. Our results show that our model achieves robust performance across various scenarios. Our contribution is to introduce a novel fusion approach that bridges cultural gaps and fosters a more nuanced understanding of emotions. Due to the best of our knowledge, few works are using this approach, for Emotion AI, combining different types of data sources and models. We evaluate our model on the ArtELingo dataset, which contains image-comment pairs with Chinese, Arabic, and English annotations. The experimental results in the evaluation phase demonstrate an impressive 80% recognition accuracy for the model that merges image-text features.
Hyperspectral anomaly detection is crucial for applications like aerial surveillance in remote sensing images. However, robust identification of anomalous pixels remains challenging. A novel spectral-spatial anomaly d...
详细信息
ISBN:
(纸本)9798350350494;9798350350500
Hyperspectral anomaly detection is crucial for applications like aerial surveillance in remote sensing images. However, robust identification of anomalous pixels remains challenging. A novel spectral-spatial anomaly detection technique called Dual-Domain Autoencoders (DDA) is proposed to address these challenges. First, Nonnegative Matrix Factorization (NMF) is applied to decompose the hyperspectral data into anomaly and background components. Refinement of the designation is then done using intersection masking. Next, a spectral autoencoder is trained on identified background signature pixels and used to reconstruct the image. The reconstruction error highlights spectral anomalies. Furthermore, a spatial autoencoder is trained on principal component patches from likely background areas. Fused reconstruction error from the spectral and spatial autoencoders is finally used to give enhanced anomaly detection. Experiments demonstrate higher AUC for DDA over individual autoencoders and benchmark methods. The integration of matrix factorization and dual-domain, fused autoencoders thus provides superior anomaly identification. Spatial modeling further constrains the background, enabling accurate flagging of unusual local hyperspectral patterns. This study provides the effectiveness of employing autoencoders trained on intelligently sampled hyperspectral pixel signatures and spatial features for improved spectral-spatial anomaly detection.
In recent years, weakly supervised semantic segmentation using image-level labels as supervision has received significant attention in the field of computer vision. Most existing methods have addressed the challenges ...
详细信息
ISBN:
(纸本)9798350350494;9798350350500
In recent years, weakly supervised semantic segmentation using image-level labels as supervision has received significant attention in the field of computer vision. Most existing methods have addressed the challenges arising from the lack of spatial information in these labels by focusing on facilitating supervised learning through the generation of pseudolabels from class activation maps (CAMs). Due to the localized pattern detection of Convolutional Neural Networks (CNNs), CAMs often emphasize only the most discriminative parts of an object, making it challenging to accurately distinguish foreground objects from each other and the background. Recent studies have shown that vision Transformer (ViT) features, due to their global view, are more effective in capturing the scene layout than CNNs. However, the use of hierarchical ViTs has not been extensively explored in this field. This work explores the use of Swin Transformer by proposing "SWTformer" to enhance the accuracy of the initial seed CAMs by bringing local and global views together. SWTformer-V1 generates class probabilities and CAMs using only the patch tokens as features. SWTformer-V2 incorporates a multi-scale feature fusion mechanism to extract additional information and utilizes a background-aware mechanism to generate more accurate localization maps with improved cross-object discrimination. Based on experiments on the PascalVOC 2012 dataset, SWTformer-V1 achieves a 0.98% mAP higher localization accuracy, outperforming state-of-the-art models. It also yields comparable performance by 0.82% mIoU on average higher than other methods in generating initial localization maps, depending only on the classification network. SWTformer-V2 further improves the accuracy of the generated seed CAMs by 5.32% mIoU, further proving the effectiveness of the local-to-global view provided by the Swin transformer. Code available at: https://***/RozhanAhmadi/SWTformer
The proceedings contain 16 papers. The topics discussed include: performance evaluation of recent object detection models for traffic safety applications on edge;tracking of artillery shell using optical flow;action r...
ISBN:
(纸本)9781450397926
The proceedings contain 16 papers. The topics discussed include: performance evaluation of recent object detection models for traffic safety applications on edge;tracking of artillery shell using optical flow;action recognition with non-uniform key frame selector;a view direction-driven approach for automatic room mapping in mixed reality;automatic gait gender classification using convolutional neural networks;deep 3D-2D convolutional neural networks combined with Mobinenetv2 for hyperspectral image classification;attention based BiGRU-2DCNN with hunger game search technique for low-resource document-level sentiment classification;strategies of multi-step-ahead forecasting for chaotic time series using autoencoder and LSTM neural networks: a comparative study;semi-supervised defect segmentation with uncertainty-aware pseudo-labels from multi-branch network;and security analysis of visual based share authentication and algorithms for invalid shares generation in malicious model.
暂无评论