image categorization is a fundamental task in computer vision, with applications in domains such as object recognition, medical imaging, and autonomous systems. Traditional approaches frequently fail to balance accura...
详细信息
Learning-based image coding schemes, exemplified by JPEG AI, have shown potential by greatly exceeding the conventional image compression standards in rate-distortion (RD) performance. However, their widespread applic...
详细信息
Crowd counting aims at automatically estimating the number of persons in still images. It has attracted much attention due to its potential usage in surveillance, intelligent transportation and many other scenarios. I...
详细信息
ISBN:
(纸本)9781665475921
Crowd counting aims at automatically estimating the number of persons in still images. It has attracted much attention due to its potential usage in surveillance, intelligent transportation and many other scenarios. In the recent decade, most researchers have been focusing on the design of novel deep learning models for improved crowd counting performance. Such attempts include proposing advanced architectures of deep neural networks, using different training strategies and loss functions. Other than the capabilities of models, the crowd counting performance is also determined by the quantity and the quality of training data. Whilst the deep models are data-hungry and better performance can usually be expected with more training data, annotating images for training is time-consuming and expensive in real-world applications. In this work, we focus on the efficiency of data annotation for crowd counting. By varying the number of annotated images and the number of annotated points (one point is annotated per person head) for training, our experimental results demonstrate it is more efficient to annotate a small number of points per image across a large number of images for training. Based on this conclusion, we present a novel adaptive scaling mechanism for data augmentation to diversify the training images without extra annotation cost. The mechanism is proved effective via thorough experiments.
With the growing popularity of 3D content and virtual reality applications, effective no-reference stereoscopic image quality assessment (NR-SIQA) methods have become increasingly important. In this paper, we propose ...
详细信息
Early diagnosis of skin diseases is crucial for effective treatment, preventing spread, minimizing long-Term damage, managing chronic conditions, detecting underlying health issues, and promoting psychological well-be...
详细信息
Compared with the video compression standard High Efficiency Video Coding (HEVC), the latest standard Versatile Video Coding (VVC) has achieved great improvement in coding efficiency due to the application of more sop...
详细信息
With the great success in neural radiance fields (NeRF), human-specific NeRF has been actively introduced in recent years. However, such human rendering techniques often have difficulties to recover subtle details of ...
详细信息
Versatile Video Coding (VVC) introduces dependent quantization (DQ) as a key coding tool, but it also has a high computational complexity. Therefore, optimizing DQ is crucial for practical VVC encoder implementations....
详细信息
Rate control (RC) schemes allow audio and video encoders to produce bitstreams according to specific overall bitrate constraints. However, when no rate capping is enforced, the instantaneous bitrate may vary strongly ...
详细信息
Chest X-ray imaging is of critical importance in order to effectively diagnose chest diseases, which are increasing today due to various environmental and hereditary factors. Although chest X-ray is the most commonly ...
详细信息
ISBN:
(纸本)9798350343557
Chest X-ray imaging is of critical importance in order to effectively diagnose chest diseases, which are increasing today due to various environmental and hereditary factors. Although chest X-ray is the most commonly used device for detecting pathological abnormalities, it can be quite challenging for specialists due to misleading locations and sizes of pathological abnormalities, visual similarities, and complex backgrounds. Traditional deep learning (DL) architectures fall short due to relatively small areas of pathological abnormalities and similarities between diseased and healthy areas. In addition, DL structures with standard classification approaches are not ideal for dealing with problems involving multiple diseases. In order to overcome the aforementioned problems, firstly, background-independent feature maps were created using a conventional convolutional neural network (CNN). Then, the relationships between objects in the feature maps are made suitable for multi-label classification tasks using the focal modulation network (FMA), an innovative attention module that is more effective than the self-attention approach. Experiments using a Chest x-ray dataset containing both single and multiple labels for a total of 14 different diseases show that the proposed approach can provide superior performance for multi-label datasets.
暂无评论