The accurate distribution of joints on the tunnel face is crucial for assessing the stability and safety of surrounding rock during tunnel construction. This paper introduces the Mask R-CNN image segmentation algorith...
详细信息
The accurate distribution of joints on the tunnel face is crucial for assessing the stability and safety of surrounding rock during tunnel construction. This paper introduces the Mask R-CNN image segmentation algorithm, a state-of-the-art deeplearning model, to achieve efficient and accurate identification and extraction of joints on tunnel face images. First, digital images of tunnel faces were captured and stitched, resulting in 286 complete images suitable for analysis. Then, the joints on the tunnel face were extracted using traditional imageprocessing algorithms, the commonly used U-net image segmentation model, and the Mask R-CNN image segmentation model introduced in this paper to address the lack of recognition accuracy. Finally, the extraction results obtained by the three methods were compared. The comparison results show that the joint extraction method based on the Mask R-CNN image segmentation deeplearning model introduced in this paper achieved the best joint extraction effect with a Dice similarity coefficient of 87.48%, outperforming traditional methods and the U-net model, which scored 60.59% and 75.36%, respectively, realizing accurate and efficient acquisition of tunnel face rock joints. These findings suggest that the Mask R-CNN model can be effectively implemented in real-time monitoring systems for tunnel construction projects.
Having well-focused synthetic aperture sonar (SAS) imagery is important for its accurate analysis and support of autonomous systems. Despite advances in motion estimation and image formation methods, there persists a ...
详细信息
Having well-focused synthetic aperture sonar (SAS) imagery is important for its accurate analysis and support of autonomous systems. Despite advances in motion estimation and image formation methods, there persists a need for robust autofocus algorithms deployed both topside and in situ embedded in unmanned underwater vehicles (UUVs) for real-timeprocessing. This need stems from the fact that systematic focus errors are common in SAS and often result from misestimating sound speed in the medium or uncompensated vehicle motion. In this article, we use an SAS-specific convolutional neural network (CNN) to robustly and quickly autofocus SAS images. Our method, which we call deep adaptive phase learning (DAPL), explicitly utilizes the relationship between the $k$-space domain and the complex-valued SAS image to perform the autofocus operation in a manner distinctly different than existing optical image deblurring techniques that solely rely on magnitude-only imagery. We demonstrate that DAPL mitigates three types of systematic phase errors common to SAS platforms (and combinations thereof): quadratic phase error (QPE), sinusoidal error, and sawtooth error (i.e., yaw error). We show results for DAPL against a publicly available, real-world high-frequency SAS dataset, and also compare them against several existing techniques including phase gradient autofocus (PGA). Our results show that DAPL is competitive with or outperforms state-of-the-art alternatives without requiring manual parameter tuning.
Advancements in image captioning technology have played a pivotal role in enhancing the quality of life for those with visual impairments, fostering greater social inclusivity. The computer vision and natural language...
详细信息
Advancements in image captioning technology have played a pivotal role in enhancing the quality of life for those with visual impairments, fostering greater social inclusivity. The computer vision and natural language processing methods enhances the accessibility and comprehensibility of pictures via the addition of textual descriptions. Significant advancements have been achieved in photo captioning, specifically tailored for those with visual impairments. Nevertheless, some challenges must be addressed, like ensuring the precision of automatically generated captions and effectively handling pictures that include many objects or settings. This research presents a ground breaking architecture for real-time picture captioning using a VGG16-LSTM deeplearning model with computer vision assistance. The framework has been developed and deployed in a Raspberry Pi 4B single-board computer, with graphics processing unit capabilities. This implementation allows for the automated generation of relevant captions for photographs captured in realtime by a NoIR camera module. This characteristic makes it a portable and uncomplicated choice for those with visual impairments. The efficacy of the VGG16-LSTM deeplearning model is evaluated via comprehensive testing, including both sighted and visually impaired participants in diverse settingsThe experimental findings demonstrate that the proposed framework effectively operates as intended, generating real-time picture captions that are accurate and contextually appropriate. The analysis of user feedback indicates a significant improvement in the understanding of visual content, hence facilitating the mobility and interaction of individuals with visual impairments in their environment. We have used multiple dataset including Flicke8k, Flickr30k, VizWiz captioning and custom dataset for the model training, validation and testing process. During the training phase, the ResNet-50 and VGG-16 models achieve 80.84% and 84.13% accuracy, r
Modern robust steganography-based cyber attacks often bypass intrinsic cloud security measures, and contemporary steganalysis methods struggle to address these covert threats due to recent advancements in deep learnin...
详细信息
Modern robust steganography-based cyber attacks often bypass intrinsic cloud security measures, and contemporary steganalysis methods struggle to address these covert threats due to recent advancements in deeplearning (DL)-based steganography techniques. Existing steganography removal methods are constrained by trade-offs involving high processingtimes, poor quality of sanitized images, and insufficient removal of steganographic content. This paper introduces SteriCNN, a lightweight deep residual neural network model designed for steganography removal. SteriCNN effectively eliminates embedded steganographic information while preserving the visual integrity of the sanitized images. We employ a series of convolutional blocks with three residual connections for feature extraction, feature learning, feature attention, and image reconstruction from the residue. The proposed model utilizes the correlation of channel features to achieve a faster learning rate, and by varying the dilation rate in convolutional blocks, the model achieves wider receptive fields, enabling it to cover larger areas of the input image at each layer. SteriCNN is targeted for blind image sterilization for real-time use cases due to its low training and prediction time costs. Our study shows impressive results for both traditional and deeplearning-based stego vulnerabilities, with approximately 90% of steganograms eliminated while maintaining an average PSNR value of 46 dB and an SSIM of 0.99 when tested with popular steganography methods.
In face recognition systems, light direction, reflection, and emotional and physical changes on the face are some of the main factors that make recognition difficult. Researchers continue to work on deeplearning-base...
详细信息
In face recognition systems, light direction, reflection, and emotional and physical changes on the face are some of the main factors that make recognition difficult. Researchers continue to work on deeplearning-based algorithms to overcome these difficulties. It is essential to develop models that will work with high accuracy and reduce the computational cost, especially in real-time face recognition systems. deep metric learning algorithms called representative learning are frequently preferred in this field. However, in addition to the extraction of outstanding representative features, the appropriate classification of these feature vectors is also an essential factor affecting the performance. The Scene Change Indicator (SCI) in this study is proposed to reduce or eliminate false recognition rates in sliding windows with a deep metric learning model. This model detects the blocks where the scene does not change and tries to identify the comparison threshold value used in the classifier stage with a new value more precisely. Increasing the sensitivity ratio across the unchanging scene blocks allows for fewer comparisons among the samples in the database. The model proposed in the experimental study reached 99.25% accuracy and 99.28% F-1 score values compared to the original deep metric learning model. Experimental results show that even if there are differences in facial images of the same person in unchanging scenes, misrecognition can be minimized because the sample area being compared is narrowed.
Place recognition is a critical technology in robot navigation and autonomous driving, remains challenging due to inefficient point cloud computation, limited feature representation capability, and poor robustness to ...
详细信息
Place recognition is a critical technology in robot navigation and autonomous driving, remains challenging due to inefficient point cloud computation, limited feature representation capability, and poor robustness to long-term environmental changes. We propose MVSE-Net, a feature extraction network with embedded semantic information for multi-view feature fusion. MVSE-Net can convert point cloud data acquired by LiDAR in realtime into global descriptors for retrieval. processing a point cloud by projecting it onto a 2D image can greatly improve computational efficiency. We projected the point cloud into a range-view (RV) image and a bird's-eye-view (BEV) image in forward and top view, respectively. The semantic segmentation network is then used to process the RV image, and the feature extraction part of the semantic model is connected to the transformer attention module to further refine the features for the place recognition task. The point cloud containing the semantic segmentation results is then converted into a semantic BEV image, and the multi-channel BEV image is processed using a group convolutional network. Finally, the features of the two branches are fused into a global feature representation by post-fusion. Our experiments on three publicly available datasets demonstrate that MVSE-Net exhibits high recall and strong generalization in LiDAR place recognition, outperforming previous state-of-the-art methods.
In the era of intelligent cities, IP cameras have advanced beyond simple video recording to realtime information processing and analysis. Sequentially processing each video stream is inefficient and limits the abilit...
详细信息
In the era of intelligent cities, IP cameras have advanced beyond simple video recording to realtime information processing and analysis. Sequentially processing each video stream is inefficient and limits the ability of multi camera systems to manage large volumes of data effectively. Traditional camera systems are also often limited to specific events within narrow scenarios. To address these issues, we propose a parallel processing architecture for an intelligent realtime multi IP camera system. This architecture is designed to efficiently handle the complex and resource-intensive demands of realtime multi IP camera processing, utilizing purpose-specific deeplearning models and managing CPU computational tasks effectively. The core components include a parallelized camera capture module and a parallelized AI unit, with asynchronous processing between them. The system is optimized to handle realtime high definition feeds, enabling efficient vehicle and license plate detection, multi object tracking, traffic violation detection, and license plate recognition. It leverages the latest object detection models, tracking algorithms, and character recognition techniques, and offers scalability through a modular design that allows for the integration of additional deeplearning models and decision criteria. The proposed system demonstrated high performance and realtimeprocessing in traffic scenarios using frames from 32 realtime IP cameras, contributing to more efficient traffic management and automation within smart city infrastructure.
The determination of average grain size is an important component in the microstructural characterization of metallic materials. The grain size is usually determined using the intercept and comparison methods, but the...
详细信息
In today's rapidly evolving digital landscape, the demand for multimedia applications is surging, driven by significant advancements in computer and storage technologies that enable efficient compression and stora...
详细信息
In today's rapidly evolving digital landscape, the demand for multimedia applications is surging, driven by significant advancements in computer and storage technologies that enable efficient compression and storage of visual data in large-scale databases. However, challenges such as inaccuracy, inefficiency, and suboptimal precision and recall in image retrieval systems necessitate the development of faster and more reliable techniques for searching and retrieving images. Traditional retrieval systems often rely on RGB colour spaces, which may inadequately represent critical image information. In response, we propose a content-based image retrieval (CBIR) system that integrates advanced techniques such as quadtree segmentation alongside modern lightweight deeplearning models, specifically MobileNet and EfficientNet, to enhance precision and recall. Our comparative experiments reveal that these deeplearning models significantly outperform traditional methods, including SVM classifiers combined with feature extraction techniques such as Histogram of Oriented Gradients (HOG), Scale-Invariant Feature Transform (SIFT), and Speeded-Up Robust Features (SURF). Notably, MobileNet and EfficientNet achieved F1-scores of 0.87 and 0.89, respectively, with enhanced processing efficiencies that resulted in feature extraction times reduced to 20 ms and classification times down to 8 ms. This translates to rapid image retrieval times as low as 35 ms, highlighting the superior performance of modern deeplearning models in enhancing both retrieval accuracy and efficiency for large-scale image databases, making them ideal for real-time applications.
Rock segmentation on the Martian is particularly critical for rover navigation, obstacle avoidance, and scientific target detection. We propose a lightweight network for real-time semantic segmentation of Martian rock...
详细信息
Rock segmentation on the Martian is particularly critical for rover navigation, obstacle avoidance, and scientific target detection. We propose a lightweight network for real-time semantic segmentation of Martian rocks (RockNet). First, we propose the cross-dimension channel attention (CDCA) model to replace traditional downsample and upsample operation, which gives more weight to the channels with more useful information by adjusting the weight of each channel. Second, we modify the short-term dense concatenate model, we adopt dilated convolution to learn the feature with a larger receptive field, and through the skip connection structure, the degradation of the network can be reduced. Finally, we propose a feature fusion module (FFM) to fully fuse different levels of features. With only 0.86M parameters, our model gets 82.37% mIoU and 105.7 FPS running speed on the dataset of TWMARS.
暂无评论