Language models have been successfully used to model natural signals, such as images, speech, and music. A key component of these models is a high quality neural compression model that can compress high-dimensional na...
详细信息
ISBN:
(纸本)9781713899921
Language models have been successfully used to model natural signals, such as images, speech, and music. A key component of these models is a high quality neural compression model that can compress high-dimensional natural signals into lower dimensional discrete tokens. To that end, we introduce a high-fidelity universal neural audio compression algorithm that achieves 90x compression of 44.1 KHz audio into tokens at just 8kbps bandwidth. We achieve this by combining advances in high-fidelity audio generation with better vector quantization techniques from the image domain, along with improved adversarial and reconstruction losses. We compress all domains (speech, environment, music, etc.) with a single universal model, making it widely applicable to generative modeling of all audio. We compare with competing audio compression algorithms, and find our method outperforms them significantly. We provide thorough ablations for every design choice, as well as open-source code and trained model weights. We hope our work can lay the foundation for the next generation of high-fidelity audio modeling.
image denoising is a critical task in imageprocessing that involves the removal of noise or unwanted distortions from an image while preserving its essential features. Most of the commonly captured pictures are obtai...
详细信息
ISBN:
(纸本)9789819984756;9789819984763
image denoising is a critical task in imageprocessing that involves the removal of noise or unwanted distortions from an image while preserving its essential features. Most of the commonly captured pictures are obtained using mobile cameras or CCTV surveillance cameras producing video footage of the activities of people who are stationary or in motion. There is a need to restore such captured footage from noise so that it can become evidence for different criminal cases. Denoising face images captured using CCTV is a challenging task due to fine details being affected by noise. In this paper, we evaluate three image denoising techniques Block-Matching and 3D (BM3D), k-Means Singular Value Decomposition (KSVD), and Weighted Nuclear Norm Minimization (WNNM). The performance of these methods is analyzed using Mean Squared Error (MSE), Peak Signal to Noise Ratio (PSNR), Structural Similarity Index (SSIM), and visual Information Fidelity (viF). It is observed that the overall performance of KSVD is better for a Gaussian and Salt and Pepper noise.
Detection of corrosion in moving objects like ships is challenging due to the dynamic nature of the input image. Existing machine learning techniques are suitable for static images and the algorithms suffer in perform...
详细信息
Finding min s-t cuts in graphs is a basic algorithmic tool, with applications in image segmentation, community detection, reinforcement learning, and data clustering. In this problem, we are given two nodes as termina...
详细信息
ISBN:
(纸本)9781713899921
Finding min s-t cuts in graphs is a basic algorithmic tool, with applications in image segmentation, community detection, reinforcement learning, and data clustering. In this problem, we are given two nodes as terminals and the goal is to remove the smallest number of edges from the graph so that these two terminals are disconnected. We study the complexity of differential privacy for the min s-t cut problem and show nearly tight lower and upper bounds where we achieve privacy at no cost for running time efficiency. We also develop a differentially private algorithm for the multiway k-cut problem, in which we are given k nodes as terminals that we would like to disconnect. As a function of k, we obtain privacy guarantees that are exponentially more efficient than applying the advanced composition theorem to known algorithms for multiway k-cut. Finally, we empirically evaluate the approximation of our differentially private min s-t cut algorithm and show that it almost matches the quality of the output of non-private ones.
This study addresses the detection of dislodged fault in photovoltaic (PV) power stations by proposing a visible light imageprocessing method utilizing line scanning. The method employs HSV (Hue-Saturation-Value) col...
详细信息
Recently, many deep learning algorithms have emerged as advanced techniques in the medical field for diagnosing diseases, including heart disease. In this study, an approach was followed that is based on electrocardio...
详细信息
ISBN:
(纸本)9783031686498;9783031686504
Recently, many deep learning algorithms have emerged as advanced techniques in the medical field for diagnosing diseases, including heart disease. In this study, an approach was followed that is based on electrocardiogram (ECG) images to detect different heart diseases. Pre-processing was performed for the data images using morphology technology to remove lines from the background ECG paper image to obtain an image containing only the changes of electrical activity for the potion's heart. The pre-processed data images are trained at a rate of 80% of each class data image in the training stage and 20% of each class image used for the testing stage in the efficiency evaluating stage of each model. Seven classification models have been proposed in binary classification. Models 1-7 have been trained to classify the natural ECG case (Nrm) with the other diseases. Models' efficiency is calculated using four measures, where the accuracy reaches 100%, the precision reaches 100%, the specificity is 100%, and the f1-score is 100%. For models 6 and 7, the results of the accuracy reached (88.1366 and 91.0978)%, precision (80.7443 and 91.0834)%, specificity (79.1734 and 88.8665)%, and f1-score (79.4476 and 89.8999) %. The proposed diagnostic system is fast, accessible, more sensitive, and harmless. It is also more cost-effective than any other diagnostic method.
In real operating conditions of the control systems based on the parallax method with structured laser illumination, due to background solar illumination, nonlinear distortions of signals, known as the blooming effect...
详细信息
With the application of efficient retrieval in information systems and retrieval augmented generation with vector database for large language models, hash coding algorithms have made progress in recent years. The rise...
详细信息
In order to solve the problem of difficulty in segmenting the foreground of lawn weed images due to the similarity between the foreground and background grayscale, this paper proposes a Retinex enhancement algorithm b...
详细信息
Multiple Pulse Position Modulation (MPPM) has become an important method in optical communication, especially between LEDs and mobile cameras. This paper proposes an MPPM modulation and demodulation method for visible...
详细信息
ISBN:
(纸本)9798350379808;9798350379792
Multiple Pulse Position Modulation (MPPM) has become an important method in optical communication, especially between LEDs and mobile cameras. This paper proposes an MPPM modulation and demodulation method for visible Light Communication (VLC) systems using LED bulbs and the camera for the transceiver that addresses data transmission performance barriers when increasing the distance between receiver and transmitter, as well as helps minimize comparison error rates compared with other modulation techniques. The PPM and MPPM modulation methods are both highly rated for their power and bandwidth efficiency. Using binary codes and image data processingalgorithms at the receiver, along with optimized mapping, aids in minimizing character errors and enhancing communication performance. Additionally, integrating MPPM into the VLC system solves the problem of brightness control in real-world scenarios. The MPPM-based brightness control system is capable of dynamically adjusting brightness, providing higher communication performance and stability for the VLC system.
暂无评论