Ship draft reading is an essential link to the draft survey. At present, manual observation is primarily used to determine a ship's draft. However, manual observation is easily affected by complex situations such ...
详细信息
Ship draft reading is an essential link to the draft survey. At present, manual observation is primarily used to determine a ship's draft. However, manual observation is easily affected by complex situations such as Large waves on the water, Water obstacles, Water traces, Tilted draft characters, and Rusted draft characters. Traditional image-based methods of ship draft reading are difficult to adapt to these complex situations, and existing deep learning-based methods have disadvantages such as the poor robustness of ship draft reading in various complex situations. In this paper, we proposed a method that combines imageprocessing and deep learning and is capable of adapting to a variety of complex situations, particularly in the presence of Large waves on the water and Water obstacles. We also propose a small U-2-NetP neural network for semantic segmentation that incorporates Coordinate attention, hence enhancing the capture of information regarding spatial locations. Furthermore, its segmentation accuracy reached 96.47% compared with the original network. In addition, in consideration of the combination of lightweight and multitasking of the method, we use the lightweight Yolov5n network architecture to detect the ship draft characters, which achieves 98% of mAP_0.5 and effectively improves the lightweight of the draft reading. Experimental results on a real dataset encompassing many difficult situations illustrate the state-of-the-art performance of the suggested reading approach when compared to other existing deep learning methods. The average inaccuracy of the draft reading is less than +/- 0.005 m, and millimeter-level precision is achievable. It can serve as a valuable resource for manual reading. In addition, our work lays the groundwork for future research on the deployment of edge devices.
Compressive learning (CL) has proven to be highly successful in executing joint signal sampling and inference for intricate vision tasks through resource-limited Internet of Things (IoT) devices. Recent studies have t...
详细信息
Compressive learning (CL) has proven to be highly successful in executing joint signal sampling and inference for intricate vision tasks through resource-limited Internet of Things (IoT) devices. Recent studies have turned their attention toward utilizing the deep neural networks (DNNs) methodology, also known as DeepCL, to enhance performance in unimodal vision tasks. This approach incorporates learnable compressed sensing in a comprehensive, end-to-end manner. Current DeepCL techniques typically employ initial signal reconstruction as the input for subsequent DNNs for inference. However, this practice presents potential risks, such as privacy breaches and reduced performance due to information processing inequality. To address these issues, this article introduces the first cross-modal CL (CMCL) approach that enables image captioning directly on compressed measurements. When compared to previous DeepCL strategies, the proposed CMCL offers significant improvements in computational efficiency and privacy protection. Extensive experiments demonstrate that CMCL performance is nearly on par with leading image captioning methods, showcasing a metric value that is merely 2.75% lower than the uncompressed method when the data is compressed eightfold.
To tackle the formidable challenges that adverse weather conditions pose for image object detection, this paper presents an innovative approach grounded in the image Adaptive YOLO (IA-YOLO) framework. The framework ha...
详细信息
Hot-rolled strip steel is an extremely important industrial foundational material. The rapid and precise identification of surface defects in hot-rolled strip steel is beneficial for enhancing the quality of steel mat...
详细信息
Hot-rolled strip steel is an extremely important industrial foundational material. The rapid and precise identification of surface defects in hot-rolled strip steel is beneficial for enhancing the quality of steel materials and reducing economic losses. Current research primarily focuses on using convolutional neural networks (CNNs) for strip steel surface defect identification. Although the accuracy of identification has remarkably improved in comparison with traditional machine learning methods, it has overlooked issues related to dataset preprocessing and the problem of nonlightweight CNN models with large model parameters and high computational complexity. To address the abovementioned issues, this study proposes a hot-rolled steel strip surface defect identification method based on random data balancing and the lightweight CNN MobileNet-Pro. Random data balancing employs image augmentation to eliminate the differences in the quantity of categories between the hot-rolled strip steel surface defect data, providing diverse images to alleviate overfitting during model training. MobileNet-Pro is used to increase the model's effective receptive field. Building upon MobileNetV1, it introduces large convolutional kernels and improves depth-wise separable convolution. Experiments show that the new MobileNet-Pro, after random data balancing on the X-SDD dataset, achieves an accuracy of 96.47%, surpassing RepVGG + SA (95.10% accuracy, nonlightweight) and ResNet50 (93.86% accuracy, nonlightweight). Additionally, MobileNet-Pro outperforms mainstream lightweight networks from the MobileNet series, ShuffleNetV2, and GhostnetV2 in terms of performance on the CIFAR-100 and PASCAL VOC 2007 datasets, demonstrating excellent generalization capabilities. All our code and models are available on GitHub: https://***/OnlyForWW/MobileNet-Pro.
With the development of cloud computing, people usually outsource encrypted images for saving storage and protecting privacy. However, traditional image encryption methods not only hinder the availability of images su...
详细信息
With the development of cloud computing, people usually outsource encrypted images for saving storage and protecting privacy. However, traditional image encryption methods not only hinder the availability of images such as similarity retrieval, but also degrade the compression performance. To address this issue, we propose a retrievable image compression and encryption method(RICE). RICE takes into account the contradiction of image compression, availability and security, then propose a cascaded information bottleneck model, which includes the compression information bottleneck and the security and availability information bottleneck. The former is converted into a rate distortion problem and its optimal solution is sought by a convolutional neural network(CNN)-based compression network which includes channel space attention module and discrete wavelet transform(DWT) module. To solve the later, we propose a feature partition method to find a retrieval subset that balances the contradiction between security and availability, and design a DNA-based deterministic encryption method for this subset to support ciphertext retrieval. The ciphertext of the retrieved subset is sent to the proposed similarity search fully connected network(SimFcNet) to improve the retrieval accuracy. The remaining subset is encrypted by Non-deterministic encryption to further improve security. In general, the method RICE we proposed supports similarity retrievable in compressed domain ciphertext, and can achieve excellent performance. Experimental results show that our method is 36.56% higher than JPEG2000 at compression ratio of 60:1 in MS-SSIM, the accuracy of ciphertext retrieval can reach 0.828, and the security of ciphertext is close to that of traditional encryption methods.
Although context-based monocular depth estimation has shown remarkable improvement, the adaptation to unseen contexts is still a major challenge. On the other hand, the use of physical depth cues, such as defocus asso...
详细信息
ISBN:
(纸本)9781728198354
Although context-based monocular depth estimation has shown remarkable improvement, the adaptation to unseen contexts is still a major challenge. On the other hand, the use of physical depth cues, such as defocus associated with lens aberration, allows context-independent depth estimation. However, explicitly supervising physical depth cues would have a significant impact on cost and versatility, because of the need to use expensive equipment to obtain the ground truth. Therefore, we propose a novel self-supervised learning for a single-shot neural depth from defocus (DfD) utilizing structure from motion (SfM) images taken by the target lens. Since the scale of SfM depth is ambiguous, we used rank loss to train the network. To demonstrate the versatility of our method, we conducted validation experiments using not only DSLR cameras but also smartphones with small image sensors. We confirmed that our method is highly accurate by a large margin over state-of-the-art methods including the physically-calibrated neural single-shot DfD and context-based methods.
Accurate segmentation of tissues and lesions is crucial for disease diagnosis, treatment planning, and surgical navigation. Yet, the complexity of medical images presents significant challenges for traditional Convolu...
详细信息
Accurate segmentation of tissues and lesions is crucial for disease diagnosis, treatment planning, and surgical navigation. Yet, the complexity of medical images presents significant challenges for traditional Convolutional neural Networks and Transformer models due to their limited receptive fields or high computational complexity. State Space Models (SSMs) have recently shown notable vision performance, particularly Mamba and its variants. However, their feature extraction methods may not be sufficiently effective and retain some redundant structures, leaving room for parameter reduction. In response to these challenges, we introduce a methodology called Rotational Mamba-UNet, characterized by Residual Visual State Space (ResVSS) block and Rotational SSM Module. The ResVSS block is devised to mitigate network degradation caused by the diminishing efficacy of information transfer from shallower to deeper layers. Meanwhile, the Rotational SSM Module is devised to tackle the challenges associated with channel feature extraction within State Space Models. Finally, we propose a weighted multi-level loss function, which fully leverages the outputs of the decoder's three stages for supervision. We conducted experiments on ISIC17, ISIC18, CVC-300, Kvasir-SEG, CVC-ColonDB, Kvasir-Instrument datasets, and Low-grade Squamous Intraepithelial Lesion datasets provided by The Third Affiliated Hospital of Sun Yat-sen University, demonstrating the superior segmentation performance of our proposed RM-UNet. Additionally, compared to the previous VM-UNet, our model achieves a one-third reduction in parameters. Our code is available at https://***/Halo2Tang/RM-UNet.
As routine pathology moves into the digital age;the spread of high-efficiency and high-resolution tissue scanners opens up the possibility of routine analysis of three-dimensional samples containing fluorescent geneti...
详细信息
ISBN:
(纸本)9798350329537;9798350329520
As routine pathology moves into the digital age;the spread of high-efficiency and high-resolution tissue scanners opens up the possibility of routine analysis of three-dimensional samples containing fluorescent genetic signals. One of these cornerstones is confocal microscopy, with the help of which cell nuclei and their signals can be imaged in three dimensions. This article presents a novel deep learning-based algorithm for detecting signals within three-dimensional confocal microscopy images. By leveraging the power of convolutional neural networks, our approach significantly improves the accuracy and efficiency of signal detection compared to traditional imageprocessingmethods, especially in the case of thick sections. We demonstrate the algorithmic effectiveness through validation on various samples, highlighting its potential to advance research in biology and medicine.
Steady-state visual evoked potential (SSVEP) is widely used in brain computer interface (BCI), medical detection, and neuroscience, so there is significant interest in enhancing SSVEP features via signalprocessing fo...
详细信息
Steady-state visual evoked potential (SSVEP) is widely used in brain computer interface (BCI), medical detection, and neuroscience, so there is significant interest in enhancing SSVEP features via signalprocessing for better performance. In this study, an imageprocessing method was combined with brain signal analysis and a sharpening filter was used to extract image details and features for the enhancement of SSVEP features. The results demonstrated that sharpening filter could eliminate the SSVEP signal trend term and suppress its low-frequency component. Meanwhile, sharpening filter effectively enhanced the signal-to-noise ratios (SNRs) of the single-channel and multi-channel fused signals. image sharpening filter also significantly improved the recognition accuracy of canonical correlation analysis (CCA), filter bank canonical correlation analysis (FBCCA), and task-related component analysis (TRCA). The tools developed here effectively enhanced the SSVEP signal features, suggesting that imageprocessingmethods can be considered for improved brain signal analysis.
Deep learning methods can now generate high quality synthetic speech which is perceptually indistinguishable from real speech. As synthetic speech can be used for nefarious purposes, speech forensics methods to detect...
详细信息
ISBN:
(纸本)9798350351439;9798350351422
Deep learning methods can now generate high quality synthetic speech which is perceptually indistinguishable from real speech. As synthetic speech can be used for nefarious purposes, speech forensics methods to detect fully synthetic speech have been developed. Speech editing tools can also create partially synthetic speech in which only a part of the speech signal is synthetic. Detecting these short synthetic segments within a speech signal requires specialized methods to determine the temporal location of the synthetic speech. In this paper, we propose the Synthetic Speech Localization Convolutional Transformer (SSLCT), a neural network and transformer method for synthetic speech localization. SSLCT can temporally localize synthetic speech segments as small as 20 milliseconds. We demonstrate that SSLCT achieves less than 10% Equal Error Rate (EER), which is an improvement over several existing methods.
暂无评论