检索结果-内蒙古大学图书馆

A Method for Extracting Joints on Mountain Tunnel Faces Based on Mask R-CNN image Segmentation Algorithm

APPLIED SCIENCES-BASEL 2024年第15期14卷 6403页

作者： Qiao, Honglei Yang, Xinan Liang, Zuquan Liu, Yu Ge, Zhifan Zhou, Jian Tongji Univ Key Lab Rd & Traff Engn Minist Educ Shanghai 201804 Peoples R China Hangzhou City Univ Dept Civil Engn Hangzhou 310015 Peoples R China Hangzhou City Univ Key Lab Safe Construct & Intelligent Maintenance U Hangzhou 310015 Peoples R China

The accurate distribution of joints on the tunnel face is crucial for assessing the stability and safety of surrounding rock during tunnel construction. This paper introduces the Mask R-CNN image segmentation algorithm, a state-of-the-art deep learning model, to achieve efficient and accurate identification and extraction of joints on tunnel face images. First, digital images of tunnel faces were captured and stitched, resulting in 286 complete images suitable for analysis. Then, the joints on the tunnel face were extracted using traditional image processing algorithms, the commonly used U-net image segmentation model, and the Mask R-CNN image segmentation model introduced in this paper to address the lack of recognition accuracy. Finally, the extraction results obtained by the three methods were compared. The comparison results show that the joint extraction method based on the Mask R-CNN image segmentation deep learning model introduced in this paper achieved the best joint extraction effect with a Dice similarity coefficient of 87.48%, outperforming traditional methods and the U-net model, which scored 60.59% and 75.36%, respectively, realizing accurate and efficient acquisition of tunnel face rock joints. These findings suggest that the Mask R-CNN model can be effectively implemented in real-time monitoring systems for tunnel construction projects.

关键词： mountain tunnel tunnel construction safety rock mass joints image processing deep learning Dice similarity coefficient

来源：评论

学校读者我要写书评

暂无评论

deep Adaptive Phase learning: Enhancing Synthetic Aperture Sonar imagery Through Learned Coherent Autofocus

引用

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 2024年 17卷 9517-9532页

作者： Gerg, Isaac D. Cook, Daniel A. Monga, Vishal Kitware Inc Clifton Pk NY 12065 USA Georgia Tech Res Inst Smyrna GA 30080 USA Penn State Univ Sch EECS University Pk PA 16802 USA

Having well-focused synthetic aperture sonar (SAS) imagery is important for its accurate analysis and support of autonomous systems. Despite advances in motion estimation and image formation methods, there persists a need for robust autofocus algorithms deployed both topside and in situ embedded in unmanned underwater vehicles (UUVs) for real-time processing. This need stems from the fact that systematic focus errors are common in SAS and often result from misestimating sound speed in the medium or uncompensated vehicle motion. In this article, we use an SAS-specific convolutional neural network (CNN) to robustly and quickly autofocus SAS images. Our method, which we call deep adaptive phase learning (DAPL), explicitly utilizes the relationship between the $k$-space domain and the complex-valued SAS image to perform the autofocus operation in a manner distinctly different than existing optical image deblurring techniques that solely rely on magnitude-only imagery. We demonstrate that DAPL mitigates three types of systematic phase errors common to SAS platforms (and combinations thereof): quadratic phase error (QPE), sinusoidal error, and sawtooth error (i.e., yaw error). We show results for DAPL against a publicly available, real-world high-frequency SAS dataset, and also compare them against several existing techniques including phase gradient autofocus (PGA). Our results show that DAPL is competitive with or outperforms state-of-the-art alternatives without requiring manual parameter tuning.

关键词： Synthetic aperture sonar Electronics packaging Systematics Synthetic aperture radar Sea floor Apertures real-time systems deep learning image enhancement synthetic aperture sonar

来源：评论

学校读者我要写书评

暂无评论

A real-time image captioning framework using computer vision to help the visually impaired

引用

MULtimeDIA TOOLS AND APPLICATIONS 2023年第20期83卷 59413-59438页

作者： Safiya, K. M. Pandian, R. Sathyabama Inst Sci & Technol Deemed to Be Univ Dept Comp Sci & Engn Chennai India Sathyabama Inst Sci & Technol Deemed to be Univ Dept Elect & Commun Engn Chennai India

Advancements in image captioning technology have played a pivotal role in enhancing the quality of life for those with visual impairments, fostering greater social inclusivity. The computer vision and natural language processing methods enhances the accessibility and comprehensibility of pictures via the addition of textual descriptions. Significant advancements have been achieved in photo captioning, specifically tailored for those with visual impairments. Nevertheless, some challenges must be addressed, like ensuring the precision of automatically generated captions and effectively handling pictures that include many objects or settings. This research presents a ground breaking architecture for real-time picture captioning using a VGG16-LSTM deep learning model with computer vision assistance. The framework has been developed and deployed in a Raspberry Pi 4B single-board computer, with graphics processing unit capabilities. This implementation allows for the automated generation of relevant captions for photographs captured in real time by a NoIR camera module. This characteristic makes it a portable and uncomplicated choice for those with visual impairments. The efficacy of the VGG16-LSTM deep learning model is evaluated via comprehensive testing, including both sighted and visually impaired participants in diverse settingsThe experimental findings demonstrate that the proposed framework effectively operates as intended, generating real-time picture captions that are accurate and contextually appropriate. The analysis of user feedback indicates a significant improvement in the understanding of visual content, hence facilitating the mobility and interaction of individuals with visual impairments in their environment. We have used multiple dataset including Flicke8k, Flickr30k, VizWiz captioning and custom dataset for the model training, validation and testing process. During the training phase, the ResNet-50 and VGG-16 models achieve 80.84% and 84.13% accuracy, r

关键词： Artificial intelligence Computer Vision Text-to-Speech image Captioning LSTM VGG16 Visually impaired

来源：评论

学校读者我要写书评

暂无评论

SteriCNN: Cloud native stego content sterilization framework

引用

JOURNAL OF INFORMATION SECURITY AND APPLICATIONS 2024年 87卷

作者： Banerjee, Abhisek Ganguly, Sreeparna Mukherjee, Imon Ganguly, Nabanita Indian Inst Informat Technol Kalyani 741235 India Maulana Abul Kalam Azad Univ Technol Kolkata 700064 India

Modern robust steganography-based cyber attacks often bypass intrinsic cloud security measures, and contemporary steganalysis methods struggle to address these covert threats due to recent advancements in deep learning (DL)-based steganography techniques. Existing steganography removal methods are constrained by trade-offs involving high processing times, poor quality of sanitized images, and insufficient removal of steganographic content. This paper introduces SteriCNN, a lightweight deep residual neural network model designed for steganography removal. SteriCNN effectively eliminates embedded steganographic information while preserving the visual integrity of the sanitized images. We employ a series of convolutional blocks with three residual connections for feature extraction, feature learning, feature attention, and image reconstruction from the residue. The proposed model utilizes the correlation of channel features to achieve a faster learning rate, and by varying the dilation rate in convolutional blocks, the model achieves wider receptive fields, enabling it to cover larger areas of the input image at each layer. SteriCNN is targeted for blind image sterilization for real-time use cases due to its low training and prediction time costs. Our study shows impressive results for both traditional and deep learning-based stego vulnerabilities, with approximately 90% of steganograms eliminated while maintaining an average PSNR value of 46 dB and an SSIM of 0.99 when tested with popular steganography methods.

关键词： Cloud security Steganography Steganalysis image stego content removal Residual neural network

来源：评论

学校读者我要写书评

暂无评论

Reducing false positive rate with the help of scene change indicator in deep learning based real-time face recognition systems

引用

MULtimeDIA TOOLS AND APPLICATIONS 2023年第30期82卷 47517-47536页

作者： Kutlugun, Mehmet Ali Sirin, Yahya Istanbul Sabahattin Zaim Univ Comp Sci & Engn Istanbul Turkiye

In face recognition systems, light direction, reflection, and emotional and physical changes on the face are some of the main factors that make recognition difficult. Researchers continue to work on deep learning-based algorithms to overcome these difficulties. It is essential to develop models that will work with high accuracy and reduce the computational cost, especially in real-time face recognition systems. deep metric learning algorithms called representative learning are frequently preferred in this field. However, in addition to the extraction of outstanding representative features, the appropriate classification of these feature vectors is also an essential factor affecting the performance. The Scene Change Indicator (SCI) in this study is proposed to reduce or eliminate false recognition rates in sliding windows with a deep metric learning model. This model detects the blocks where the scene does not change and tries to identify the comparison threshold value used in the classifier stage with a new value more precisely. Increasing the sensitivity ratio across the unchanging scene blocks allows for fewer comparisons among the samples in the database. The model proposed in the experimental study reached 99.25% accuracy and 99.28% F-1 score values compared to the original deep metric learning model. Experimental results show that even if there are differences in facial images of the same person in unchanging scenes, misrecognition can be minimized because the sample area being compared is narrowed.

关键词： Classification deep metric learning Illumination and pose changes image processing real-time face recognition

来源：评论

学校读者我要写书评

暂无评论

MVSE-Net: A Multi-View deep Network With Semantic Embedding for LiDAR Place Recognition

引用

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 2024年第11期25卷 17174-17186页

作者： Zhang, Jinpeng Zhang, Yunzhou Rong, Lei Tian, Rui Wang, Sizhan Northeastern Univ Coll Informat Sci & Engn Shenyang 110819 Peoples R China

Place recognition is a critical technology in robot navigation and autonomous driving, remains challenging due to inefficient point cloud computation, limited feature representation capability, and poor robustness to long-term environmental changes. We propose MVSE-Net, a feature extraction network with embedded semantic information for multi-view feature fusion. MVSE-Net can convert point cloud data acquired by LiDAR in real time into global descriptors for retrieval. processing a point cloud by projecting it onto a 2D image can greatly improve computational efficiency. We projected the point cloud into a range-view (RV) image and a bird's-eye-view (BEV) image in forward and top view, respectively. The semantic segmentation network is then used to process the RV image, and the feature extraction part of the semantic model is connected to the transformer attention module to further refine the features for the place recognition task. The point cloud containing the semantic segmentation results is then converted into a semantic BEV image, and the multi-channel BEV image is processed using a group convolutional network. Finally, the features of the two branches are fused into a global feature representation by post-fusion. Our experiments on three publicly available datasets demonstrate that MVSE-Net exhibits high recall and strong generalization in LiDAR place recognition, outperforming previous state-of-the-art methods.

关键词： LiDAR place recognition multi-view images semantic embedding localization deep learning methods LiDAR place recognition multi-view images semantic embedding localization deep learning methods

来源：评论

学校读者我要写书评

暂无评论

PARA-CAM: Parallel processing Architecture for Intelligent real time Multi IP Camera System With deep learning Models

引用

INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS 2025年第5期23卷 1563-1575页

作者： Seo, Tae-Moon Kang, Dong-Joong Pusan Natl Univ Sch Mech Engn 2Busandaehak Ro 63beon Gil Busan 46241 South Korea

In the era of intelligent cities, IP cameras have advanced beyond simple video recording to real time information processing and analysis. Sequentially processing each video stream is inefficient and limits the ability of multi camera systems to manage large volumes of data effectively. Traditional camera systems are also often limited to specific events within narrow scenarios. To address these issues, we propose a parallel processing architecture for an intelligent real time multi IP camera system. This architecture is designed to efficiently handle the complex and resource-intensive demands of real time multi IP camera processing, utilizing purpose-specific deep learning models and managing CPU computational tasks effectively. The core components include a parallelized camera capture module and a parallelized AI unit, with asynchronous processing between them. The system is optimized to handle real time high definition feeds, enabling efficient vehicle and license plate detection, multi object tracking, traffic violation detection, and license plate recognition. It leverages the latest object detection models, tracking algorithms, and character recognition techniques, and offers scalability through a modular design that allows for the integration of additional deep learning models and decision criteria. The proposed system demonstrated high performance and real time processing in traffic scenarios using frames from 32 real time IP cameras, contributing to more efficient traffic management and automation within smart city infrastructure.

关键词： Multi IP camera monitoring system object tracking parallel system traffic event processing

来源：评论

学校读者我要写书评

暂无评论

deep learning in metallography: adapting a grain size determination model to develop the new comparison charts for ISO 643

引用

Practical Metallography 2025年第3期62卷 176-194页

作者： Wu, Zhaobo Li, Jikang Zhang, Jianing Stücklin, Stephan Central Iron and Steel Research Institute Company Limited Beijing100081 China Swiss Steel Group Emmenbrücke Switzerland

The determination of average grain size is an important component in the microstructural characterization of metallic materials. The grain size is usually determined using the intercept and comparison methods, but there are problems with the relatively time-consuming intercept method and the low accuracy of the comparison method. In this paper, a new model is proposed to realize the task of automatic grain size grading using a combination of traditional image processing and deep learning. deep learning semantic segmentation and image classification are used to achieve the grading of different quality images. The model is capable of extracting and complementing grain boundaries and calculating the pixel area of the grains. The accuracy of the model was tested on metallographic phases of pure iron and austenitic stainless steel. Using this model, grain size comparison charts were produced for ISO 643 based on real metallographic photographs. © 2025 Walter de Gruyter GmbH, Berlin/Boston.

关键词： Semantic Segmentation

来源：评论

学校读者我要写书评

暂无评论

Optimizing image retrieval by leveraging YCbCr colour space quadtree segmentation and deep learning models for enhanced accuracy and efficiency

引用

IMAGING SCIENCE JOURNAL 2025年

作者： Damahe, Lalit Yenurkar, Ganesh Mane, Sulakshana Ramotra, Atul Kumar Jarbais, Goldi Nyangaresi, Vincent O. Yeshwantrao Chavan Coll Engn Dept Comp Sci & Engn Nagpur India Yeshwantrao Chavan Coll Engn Dept Comp Technol Nagpur India Bharati Vidyapeeth Coll Engn Dept Comp Engn Mumbai India ACE Engn Coll Dept CSE AI & ML Hyderabad Telangana India Jaram Oginga Odinga Univ Sci & Technol Dept Comp Sci & Engn Bondo Kenya SIMATS Saveetha Sch Engn Dept Appl Econ Chennai Tamilnadu India

In today's rapidly evolving digital landscape, the demand for multimedia applications is surging, driven by significant advancements in computer and storage technologies that enable efficient compression and storage of visual data in large-scale databases. However, challenges such as inaccuracy, inefficiency, and suboptimal precision and recall in image retrieval systems necessitate the development of faster and more reliable techniques for searching and retrieving images. Traditional retrieval systems often rely on RGB colour spaces, which may inadequately represent critical image information. In response, we propose a content-based image retrieval (CBIR) system that integrates advanced techniques such as quadtree segmentation alongside modern lightweight deep learning models, specifically MobileNet and EfficientNet, to enhance precision and recall. Our comparative experiments reveal that these deep learning models significantly outperform traditional methods, including SVM classifiers combined with feature extraction techniques such as Histogram of Oriented Gradients (HOG), Scale-Invariant Feature Transform (SIFT), and Speeded-Up Robust Features (SURF). Notably, MobileNet and EfficientNet achieved F1-scores of 0.87 and 0.89, respectively, with enhanced processing efficiencies that resulted in feature extraction times reduced to 20 ms and classification times down to 8 ms. This translates to rapid image retrieval times as low as 35 ms, highlighting the superior performance of modern deep learning models in enhancing both retrieval accuracy and efficiency for large-scale image databases, making them ideal for real-time applications.

关键词： CBIR HSV YCbCr quadtree segmentation query image multimedia applications MobileNet EfficientNet

来源：评论

学校读者我要写书评

暂无评论

Rocknet: lightweight network for real-time segmentation of Martian rocks

引用

JOURNAL OF real-time image processing 2025年第1期22卷 1-11页

作者： Wei, Pengfei Sun, Zezhou Tian, He Jilin Univ Sch Mech & Aerosp Engn Changchun 130025 Peoples R China Beijing Inst Spacecraft Syst Engn Beijing 100094 Peoples R China Beijing Key Lab Intelligent Space Robot Syst Techn Beijing 100094 Peoples R China

Rock segmentation on the Martian is particularly critical for rover navigation, obstacle avoidance, and scientific target detection. We propose a lightweight network for real-time semantic segmentation of Martian rocks (RockNet). First, we propose the cross-dimension channel attention (CDCA) model to replace traditional downsample and upsample operation, which gives more weight to the channels with more useful information by adjusting the weight of each channel. Second, we modify the short-term dense concatenate model, we adopt dilated convolution to learn the feature with a larger receptive field, and through the skip connection structure, the degradation of the network can be reduced. Finally, we propose a feature fusion module (FFM) to fully fuse different levels of features. With only 0.86M parameters, our model gets 82.37% mIoU and 105.7 FPS running speed on the dataset of TWMARS.

关键词： Rock segmentation deep learning Lightweight network Encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：