Direction-of-arrival (DOA) estimation is a fundamental task in audio signal processing that becomes difficult in real-world environments due to the presence of reverberation. To address this difficulty, Direct-Path Do...
详细信息
Direction-of-arrival (DOA) estimation is a fundamental task in audio signal processing that becomes difficult in real-world environments due to the presence of reverberation. To address this difficulty, Direct-Path Dominance (DPD) tests have been proposed as an effective approach for detecting time-frequency (TF) bins dominated by direct sound, which contain accurate DOA information. These have been found to be particularly efficient when working with spherical arrays. While methods based on neural networks (NNs) have been developed to estimate the DOA, they have limitations such as the need for a large training database, and often understanding of the system's operation is lacking. This work proposes two novel DPD-test methods based on a model-based deeplearning approach that combines the original DPD-test model with a data-driven system. Thus, it is possible to preserve the robustness of the original DPD-test across acoustic environments, while using a data-driven approach to better extract useful information about the direct sound, thereby enhancing the original method's performance. In particular, the paper investigates how energetic, temporal and spatial information contribute to the identification of TF-bins dominated by the direct signal. The proposed methods are trained on simulated data of a single sound source in a room, and evaluated on simulated and real data. The results show that energetic and temporal information provide new information about direct sound, which has not been considered in previous works and can improve its performance.
Automatic segmentation of histopathology whole -slide images (WSI) usually involves supervised training of deeplearning models with pixel -level labels to classify each pixel of the WSI into tissue regions such as be...
详细信息
Automatic segmentation of histopathology whole -slide images (WSI) usually involves supervised training of deeplearning models with pixel -level labels to classify each pixel of the WSI into tissue regions such as benign or cancerous. However, fully supervised segmentation requires large-scale data manually annotated by experts, which can be expensive and time-consuming to obtain. Non -fully supervised methods, ranging from semi -supervised to unsupervised, have been proposed to address this issue and have been successful in WSI segmentation tasks. But these methods have mainly been focused on technical advancements in algorithmic performance rather than on the development of practical tools that could be used by pathologists or researchers in real -world scenarios. In contrast, we present DEPICTER (deep rEPresentatIon ClusTERing), an interactive segmentation tool for histopathology annotation that produces a patch -wise dense segmentation map at WSI level. The interactive nature of DEPICTER leverages self- and semi -supervised learning approaches to allow the user to participate in the segmentation producing reliable results while reducing the workload. DEPICTER consists of three steps: first, a pretrained model is used to compute embeddings from image patches. Next, the user selects a number of benign and cancerous patches from the multi -resolution image. Finally, guided by the deep representations, label propagation is achieved using our novel seeded iterative clustering method or by directly interacting with the embedding space via feature space gating. We report both real-time interaction results with three pathologists and evaluate the performance on three public cancer classification dataset benchmarks through simulations. The code and demos of DEPICTER are publicly available at https://***/eduardchelebian/depicter.
Random noise attenuation is significant in seismic data *** deeplearning-based denoising methods have been widely developed and applied in recent *** practice,it is often time-consuming and laborious to obtain noise-...
详细信息
Random noise attenuation is significant in seismic data *** deeplearning-based denoising methods have been widely developed and applied in recent *** practice,it is often time-consuming and laborious to obtain noise-free data for supervised ***,we propose a novel deeplearning framework to denoise prestack seismic data without clean labels,which trains a high-resolution residual neural network(SRResnet)with noisy data for input and the same valid data with different noise for *** valid signals in noisy sample pairs are spatially correlated and random noise is spatially independent and unpredictable,the model can learn the features of valid data while suppressing random *** data targets are generated by a simple conventional method without fine-tuning *** initial estimates allow signal or noise leakage as the network does not require clean *** Monte Carlo strategy is applied to select training patches for increasing valid patches and expanding training *** learning is used to improve the generalization of real data *** synthetic and real data tests perform better than the commonly used state-of-the-art denoising methods.
This paper presents a deeplearning model specifically designed to effectively classify display Mura images. The model leverages advanced deeplearning techniques and computer vision methods to identify and categorize...
详细信息
In order to tackle some issues of the inadequate data clustering in the original basketball shooting track capture and counter-capture method, a novel approach is proposed. This method utilizes the background differen...
详细信息
Abstract: In order to improve the intellective level of water resources management, a real-time water level recognition method based on deep-learning algorithms and image-processing techniques is proposed in this pape...
详细信息
ISBN:
(纸本)9781450395687
Abstract: In order to improve the intellective level of water resources management, a real-time water level recognition method based on deep-learning algorithms and image-processing techniques is proposed in this paper. The recognition process is composed of four steps. Firstly, for the purpose of digit detection, YOLO-v3 model is deployed for extracting numbers from the water gauges. Then, the cropped number images are fed into the LSTM + CTC model as training samples so that digits can be recognized. In the third step, Hough transform are adopted to correct the tilt of water gauge in terms of the vertical edge feature. Morphological operation, associated with horizontal projection would position upper and lower edge of water gauge to recognize the scale lines correctly. Water level could be determined correspondingly. Model application shows that the recognition model has satisfying accuracy and efficiency, with potential being applied in practice.
image stitching is the synthesis of multiple partial image segments into a complete and continuous panoramic image through effective image alignment and seamless fusion techniques. It can achieve a wider field of view...
详细信息
image stitching is the synthesis of multiple partial image segments into a complete and continuous panoramic image through effective image alignment and seamless fusion techniques. It can achieve a wider field of view and richer information for display and analysis. Most deeplearning-based image stitching methods have significant advantages in improving accuracy, but they are not suitable for real-time applications due to multiple iterations of computation or deeper network depth. To deal with this problem, a fast unsupervised image stitching model is proposed in this article. In the proposed model, an adaptive feature extraction module (FEM) for deformation is designed, and then a fast unsupervised learning-based image alignment network is proposed. In addition, a stitching restoration network with a smaller number of parameters is presented to remove the redundant and unnecessary sampling and convolution operations in general deeplearning-based models. Finally, some experiments are conducted on both the synthetic and real-scene datasets. The total stitching accuracy of the proposed model is higher, and the details of the output images are clearer. The proposed can achieve 1.79, 26.54, and 0.86 in RMSE, peak signal-to-noise ratio (PSNR), and structural similarity (SSIM) on the alignment results, respectively, which are better than those of the state-of-the-art methods. Furthermore, the comparison results prove that the proposed model can effectively reduce memory loss, and achieve a fast unsupervised image stitching, with a very small model size.
High signal-to-noise ratio magnetotelluric (MT) data are crucial for accurately interpreting subsurface structures. Recently, deeplearning has become popular for MT denoising due to its ability to avoid parameter tun...
详细信息
High signal-to-noise ratio magnetotelluric (MT) data are crucial for accurately interpreting subsurface structures. Recently, deeplearning has become popular for MT denoising due to its ability to avoid parameter tuning and enable real-timeprocessing. These methods typically fit or predict signals in noisy segments after identifying and segmenting signal and noise in the time domain. However, these methods struggle to preserve low-and high-frequency signals effectively due to high noise levels in these segments. To address this issue, we develop a novel deep-learning denoising method that separately recovers low-and high-frequency signals using distinct strategies. Low-frequency signals are fitted using an inverse autoencoder with a channel attention mechanism, effectively removing high-frequency components. High-frequency signals are then predicted using a bidirectional long short-term memory network combined with a squeeze-and-excitation mechanism, enhancing prediction by considering global and local signal characteristics. In addition, we introduce the multivariate state estimation technique (MSET) for real-time signal-noise identification. MSET analyzes residuals after separating low-frequency signals to identify noise. Denoising is performed only on segments with significant noise, preserving more effective signals. Finally, the fitted low-frequency dominant and predicted high-frequency components are combined to form the denoised MT signals. This combined approach significantly improves the restoration quality of effective signals compared with existing methods. Experimental results demonstrate that our method exhibits superior denoising capabilities in quantitative and qualitative evaluations, including apparent resistivity-phase curves and polarization direction analysis, offering enhanced performance over current deep-learning methods.
deeplearning technologies have revolutionized the management of energy, energy consumption, and data security within smart grids through non-intrusive load monitoring (NILM). This paper explores the use of deep learn...
详细信息
Accurate and timely lane detection is imperative for the seamless operation of autonomous driving systems. In this study, leveraging the gradual variation of lane features within a defined range of width and length, w...
详细信息
Accurate and timely lane detection is imperative for the seamless operation of autonomous driving systems. In this study, leveraging the gradual variation of lane features within a defined range of width and length, we introduce an enhanced Spatial-Temporal Recurrent Neural Network (SCNN) framework. This framework serves as the cornerstone of an innovative hybrid spatial-temporal model for lane detection, which is tailored to address the prevalent issues of substandard detection performance and insufficient real-timeprocessing in intricate scenarios, such as those involving lane erosion and inconsistent lighting conditions, which often challenge conventional models. With the foundational understanding that lanes manifest as continuous lines, we employ a temporal sequence of lane imagery as the input to our model, thereby ensuring a rich provision of feature information. The model adopts an encoder-decoder structure and integrates a Spatial-Temporal Recurrent Neural Network module for the extraction of interrelated information from the image sequence. The model culminates in the output of the lane detection results for the terminal frame. The proposed lane detection model exhibits a commendable synthesis of accuracy and real-time efficiency, attaining an Accuracy of 97.87%, an F-1 -score of 0.943, and a FPS of 19.342 on the tvtLANE dataset and an Accuracy of 98.21%, an F-1 -score of 0.957 on the Tusimple dataset. These metrics signify a superior performance over a majority of the current lane detection methods.
暂无评论