Polyps, like a silent time bomb in the gut, are always lurking and can explode into deadly colorectal cancer at any time. Many methods are attempted to maximize the early detection of colon polyps by screening, howeve...
详细信息
The sequence reconstruction problem, introduced by Levenshtein in 2001, considers a scenario where the sender transmits a codeword from some codebook, and the receiver obtains N noisy outputs of the codeword. We study...
详细信息
Soft context formation is a lossless image coding method for screen content. It encodes images pixel by pixel via arithmetic coding by collecting statistics for probability distribution estimation. Its main pipeline i...
Soft context formation is a lossless image coding method for screen content. It encodes images pixel by pixel via arithmetic coding by collecting statistics for probability distribution estimation. Its main pipeline includes three stages, namely a context model based stage, a color palette stage and a residual coding stage. Each subsequent stage is only employed if the previous stage can not be applied since necessary statistics, e.g. colors or contexts, have not been learned yet. We propose the following enhancements: First, information from previous stages is used to remove redundant color palette entries and prediction errors in subsequent stages. Additionally, implicitly known stage decision signals are no longer explicitly transmitted. These enhancements lead to an average bit rate decrease of 1.07% on the evaluated data. Compared to VVC and HEVC, the proposed method needs roughly 0.44 and 0.17 bits per pixel less on average for 24-bit screen content images, respectively.
The detection and characterization of human veins using infrared (IR) image processing have gained significant attention due to its potential applications in biometric identification, medical diagnostics, and vein-bas...
详细信息
The detection and characterization of human veins using infrared (IR) image processing have gained significant attention due to its potential applications in biometric identification, medical diagnostics, and vein-based authentication systems. This paper presents a low-cost approach for automatic detection and characterization of human veins from IR images. The proposed method uses image processing techniques including segmentation, feature extraction, and, pattern recognition algorithms. Initially, the IR images are preprocessed to enhance vein structures and reduce noise. Subsequently, a CLAHE algorithm is employed to extract vein regions based on their unique IR absorption properties. Features such as vein thickness, orientation, and branching patterns are extracted using mathematical morphology and directional filters. Finally, a classification framework is implemented to categorize veins and distinguish them from surrounding tissues or artifacts. A setup based on Raspberry Pi was used. Experimental results of IR images demonstrate the effectiveness and robustness of the proposed approach in accurately detecting and characterizing human. The developed system shows promising for integration into applications requiring reliable and secure identification based on vein patterns. Our work provides an effective and low-cost solution for nursing staff in low and middle-income countries to perform a safe and accurate venipuncture.
Screen content images typically contain a mix of natural and synthetic image parts. Synthetic sections usually are comprised of uniformly colored areas and repeating colors and patterns. In the VVC standard, these pro...
Screen content images typically contain a mix of natural and synthetic image parts. Synthetic sections usually are comprised of uniformly colored areas and repeating colors and patterns. In the VVC standard, these properties are exploited using Intra Block Copy and Palette Mode. In this paper, we show that pixel-wise lossless coding can outperform lossy VVC coding in such areas. We propose an enhanced VVC coding approach for screen content images using the principle of soft context formation. First, the image is separated into two layers in a block-wise manner using a learning-based method with four block features. Synthetic image parts are coded losslessly using soft context formation, the rest with VVC. We modify the available soft context formation coder to incorporate information gained by the decoded VVC layer for improved coding efficiency. Using this approach, we achieve Bjontegaard-Delta-rate gains of 4.98% on the evaluated data sets compared to VVC.
The particle flow Gaussian particle filter (PFGPF) uses an invertible particle flow to generate a proposal density. It approximates the predictive and posterior distributions as Gaussian densities. In this paper, we u...
详细信息
Differentiable particle filters are an emerging class of particle filtering methods that use neural networks to construct and learn parametric state-space models. In real-world applications, both the state dynamics an...
Differentiable particle filters are an emerging class of particle filtering methods that use neural networks to construct and learn parametric state-space models. In real-world applications, both the state dynamics and measurements can switch between a set of candidate models. For instance, in target tracking, vehicles can idle, move through traffic, or cruise on motorways, and measurements are collected in different geographical or weather conditions. This paper proposes a new differentiable particle filter for regime-switching state-space models. The method can learn a set of unknown candidate dynamic and measurement models and track the state posteriors. We evaluate the performance of the novel algorithm in relevant models, showing its great performance compared to other competitive algorithms.
In this paper, we propose a novel loss by integrating a deep clustering (DC) loss at the frame-level and a speaker recognition loss at the segment-level into a single network without additional data requirements and e...
详细信息
ISBN:
(数字)9798350390155
ISBN:
(纸本)9798350390162
In this paper, we propose a novel loss by integrating a deep clustering (DC) loss at the frame-level and a speaker recognition loss at the segment-level into a single network without additional data requirements and exhaustive computation. The DC loss implicitly generates soft pseudo-phoneme labels for each frame-level feature, which facilitates extracting more discriminant speaker representation by suppressing phonetic content information. We study the DC loss not only on the acoustic feature, but also on the features extracted by the pre-trained models, such as wav2vec 2.0, HuBERT and WavLM. Experimental results on the VoxCeleb dataset shows that the overall system performance based on the pre-trained model features are better than the one on the acoustic feature. The proposed loss is significantly effective for systems on the acoustic feature and has a marginal improvement for systems on the pre-trained model feature.
Image fusion is a productive way to combine multi-sensor images and extract maximum information from enhance remote sensing data. The paper describes a novel methodology to improve IHS image fusion algorithm by medoid...
详细信息
Due to its superior performance and fewer parameters, CAM++ has become the state-of-the-art model for speaker verification tasks. This model uses 2D convolutional blocks to extract front-end features, which are then f...
详细信息
ISBN:
(数字)9798331516826
ISBN:
(纸本)9798331516833
Due to its superior performance and fewer parameters, CAM++ has become the state-of-the-art model for speaker verification tasks. This model uses 2D convolutional blocks to extract front-end features, which are then fed into a densely connected time-delay neural network backbone to extract deep features. However, the simple stacking of 2D convolutions may lead to the generation of a significant amount of redundant features, which is detrimental to efficient feature extraction. Furthermore, although CAM++ already has a relatively small number of parameters, there is still room for further optimization. To address these issues, this paper first employs depthwise separable convolutions to replace the dilated convolutions in the back-end network of CAM++, making the model more lightweight. Next, we introduce spatial and channel reconstruction convolution (SCConv) in the ResBlock module of CAM++ to reduce redundant features and optimize the feature extraction process. Finally, after SCConv, we apply squeeze and excitation attention mechanism to model the interdependencies between channels and recalibrate each channel, further enhancing the model's representational capacity. We name the resulting model LE-CAM++. Our proposed model achieves an EER of 0.686 and a minDCF of 0.084 on the VoxCeleb1–O dataset. Compared to the baseline model CAM ++, the EER is reduced by 11%, and the minDCF is reduced by 28%. Additionally, the model parameters are reduced by 8%.
暂无评论