Over a year has passed since the finalization of Versatile Video Coding (H.266/VVC), yet it is still far from practical deployment, a major reason being the excessive complexity. The flexible and sophisticated quad-tr...
详细信息
ISBN:
(数字)9781665496209
ISBN:
(纸本)9781665496209
Over a year has passed since the finalization of Versatile Video Coding (H.266/VVC), yet it is still far from practical deployment, a major reason being the excessive complexity. The flexible and sophisticated quad-tree with nested multi-type tree partitioning structure in VVC provides considerable performance gains while bringing about an exponential increase in encoding time. To reduce the coding complexity, this paper proposes a Convolutional Neural Network (CNN) based fast Coding Unit (CU) partitioning algorithm for intra coding, which accelerates CU partition through predicting the partition modes with texture information and terminating redundant modes in advance. Corresponding classifiers are designed for different CU sizes to improve prediction accuracy. Low rate-distortion performance degradation is guaranteed by introducing performance loss due to misclassification into the loss function. Experiments show that the proposed method can save encoding time ranging from 38.39% to 62.33% with 0.92% to 2.36% bit rate increase.
To address the challenges of prolonged sampling time and the limitation to process specific image sizes in current image deraining research, we propose a novel single image deraining approach based on denoising diffus...
详细信息
Object detection, a crucial component of medical image analysis, provides physicians with an interpretable auxiliary diagnostic basis. Although existing object detection models have had great success with natural imag...
详细信息
ISBN:
(数字)9781665496209
ISBN:
(纸本)9781665496209
Object detection, a crucial component of medical image analysis, provides physicians with an interpretable auxiliary diagnostic basis. Although existing object detection models have had great success with natural images, the growing resolution of medical images makes the problem especially challenging because of the increased expectations to exploit the image details and discover small targets in images. For instance, lesions are occasionally diminutive relative to high-resolution medical images. To address this problem, we present YOLO-SG, a salience-guided (SG) deep learning model that improves small object detection by attending to detailed regions via a generated salience map. YOLO-SG performs two rounds of detection: coarse detection and salience-guided detection. In the first round of coarse detection, YOLO-SG detects objects using a deep convolutional detection model and proposes a salience map utilizing the context surrounding objects to guide the subsequent round of detection. In the second round, YOLO-SG extracts salient regions from the original input image based on the generated salience map and combines local detail with global context information to improve the object detection performance. The experimental results demonstrate that YOLO-SG outperforms the state-of-the-art models, especially when detecting small objects.
Natural Language processing (NLP) has evolved significantly since the 1950s, with early research focusing on tasks like machine translation, information retrieval, and text summarization. Initially, most NLP research ...
详细信息
Computer vision datasets usually present long-tailed training distributions where the classes are not represented with the same number of training samples. This so-called class imbalance problem hinders the proper lea...
详细信息
ISBN:
(数字)9781665496209
ISBN:
(纸本)9781665496209
Computer vision datasets usually present long-tailed training distributions where the classes are not represented with the same number of training samples. This so-called class imbalance problem hinders the proper learning of inference models, biasing them towards over-represented classes and decreasing their generalization. Adopted solutions to tackle the effect of class imbalance are based on weighting the training loss according to the number of class samples, leading to regimes where low-represented classes guide the learning just accounting for their cardinal number. To also incorporate class complexity in the process, we propose a novel training scheme called CCL: Class-wise Curriculum Learning. Classes are first sorted based on a difficulty criterion which not only accounts for the number of training samples but also for their training outcomes. The curriculum is then used to guide the training: easy classes are fed first and-incrementally, the more difficult ones are added. The proposed approach is validated for image classification using long-tailed datasets. Results show that when the proposed Class-wise Curriculum Learning scheme is used, trained models outperform specific state-of-the-art methods devoted to handle the class imbalance problem. The code, data and reported models described along this paper are publicly available at https://***/vpulab/CCL
The cover is the face of a book and is a point of attraction for the readers. Designing book covers is an essential task in the publishing industry. One of the main challenges in creating a book cover is representing ...
详细信息
ISBN:
(纸本)9798350310085
The cover is the face of a book and is a point of attraction for the readers. Designing book covers is an essential task in the publishing industry. One of the main challenges in creating a book cover is representing the theme of the book's content in a single image. In this research, we explore ways to produce a book cover using artificial intelligence based on the fact that there exists a relationship between the summary of the book and its cover. Our key motivation is the application of text-to-image synthesis methods to generate images from given text or captions. We explore several existing text-to-image conversion techniques for this purpose and propose an approach to exploit these frameworks for producing book covers from provided summaries. We construct a dataset of English books that contains a large number of samples of summaries of existing books and their cover images. In this paper, we describe our approach to collecting, organizing, and pre-processing the dataset to use it for training models. We apply different text-to-image synthesis techniques to generate book covers from the summary and exhibit the results in this paper.
Diffusion models, as a paradigm of generative models, have achieved unprecedented success in the field of image generation since 2020, owing to their advantage of simplicity in optimization. They have been widely appl...
详细信息
imageprocessing filters offer several significant applications, making them a crucial component of various consumer electronics and multimedia systems. These image filters are designed as dedicated reusable intellect...
详细信息
With the popularity and development of short video applications, the behavior of using mobile devices to shoot and share user-generated content (UGC) videos has become increasingly common. Video quality assessment (VQ...
详细信息
Sugarcane is a significant crop with several uses in the food, bio-energy, and bio-based product sectors. Many elements, including climate, soil fertility, and plant diseases, can have an impact on the quality of suga...
详细信息
暂无评论