This paper reviews the 1st LFNAT challenge on light field depth estimation, which aims at predicting disparity information of central view image in a light field (i.e., pixel offset between central view image and adja...
详细信息
As facial recognition technology continues to advance, addressing issues of fairness and accuracy in image datasets becomes increasingly critical. This paper outlines a novel approach aimed at simultaneously improving...
详细信息
This paper introduces a new approach for food image segmentation utilizing the Segment Anything Model (SAM), with the additional refinement achieved through fine-tuning with Low-Rank Adaptation layers (LoRA). The segm...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
This paper introduces a new approach for food image segmentation utilizing the Segment Anything Model (SAM), with the additional refinement achieved through fine-tuning with Low-Rank Adaptation layers (LoRA). The segmentation task involves generating a binary mask for food in RGB images, with pixels categorized as background or food. We conduct various experiments to assess and compare the performance of our proposed method with previous approaches. Our findings indicate that our method consistently outperforms other techniques, achieving an accuracy of 94.14%. The improved accuracy of our approach highlights its potential for various applications in food image analysis, contributing to the advancement of computervision techniques in the realm of food recognition and segmentation.
In recent years, the widespread use of autonomous vehicles, such as aerial and automotive, has enhanced our abilities to perform target tracking, dispensing our over-reliance on visual features. With the development o...
详细信息
ISBN:
(纸本)9798350302615
In recent years, the widespread use of autonomous vehicles, such as aerial and automotive, has enhanced our abilities to perform target tracking, dispensing our over-reliance on visual features. With the development of computervision and deep learning techniques, vision-based classification and recognition have recently received special attention in the scientific community. Moreover, recent advances in the field of neural networks with quantized weights and activations down to single bit precision have allowed the development of models that can be deployed in resource-constrained settings, where a trade-off between task performance and efficiency is accepted. In this work we design an efficient single stage object detector based on CenterNet containing a combination of full precision and binary layers. Our model is easy to train and achieves comparable results with a full precision network trained from scratch while requiring an order of magnitude less FLOP. This opens the possibility of deploying an object detector in applications where time is of the essence and a graphical processing unit (GPU) is absent. We train our model and evaluate its performance by comparing with state-of-the-art techniques, obtaining higher accurate results and provide an insight into the design process of resource constrained neural networks involving trade-offs.
This poster introduces SynMotion, a novel mmWave-based human motion sensing system addressing the scarcity of training datasets. By synthesizing mmWave signals using existing vision-based human motion datasets, this s...
详细信息
Classification and counting of bone marrow cells is necessary for the diagnosis and treatment of various blood disorders. However, this job needs to be done manually by doctors, which is a long period and high workloa...
详细信息
Intelligent monitoring technology has become a new research direction in the field of computervision in recent years. The computervision system is the video data received from the camera, analyzed and learned throug...
详细信息
In recent years, many works in the video action recognition literature have shown that two stream models (combining spatial and temporal input streams) are necessary for achieving state-of-the-art performance. In this...
详细信息
ISBN:
(纸本)9781665409155
In recent years, many works in the video action recognition literature have shown that two stream models (combining spatial and temporal input streams) are necessary for achieving state-of-the-art performance. In this paper we show the benefits of including yet another stream based on human pose estimated from each frame-specifically by rendering pose on input RGB frames. At first blush, this additional stream may seem redundant given that human pose is fully determined by RGB pixel values-however we show (perhaps surprisingly) that this simple and flexible addition can provide complementary gains. Using this insight, we propose a new model, which we dub PERF-Net (short for Pose Empowered RGB-Flow Net), which combines this new pose stream with the standard RGB and flow based input streams via distillation techniques and show that our model outperforms the state-of-the-art by a large margin in a number of human action recognition datasets while not requiring flow or pose to be explicitly computed at inference time. The proposed pose stream is also part of the winner solution of the ActivityNet Kinetics Challenge 2020 [1].
The incorporation of state-of-the-art technologies, such as deep learning algorithms and computervision, has paved the path for a revolutionary approach to precision agriculture, enabling farmers to reduce the enviro...
详细信息
Numerous neurological disorders can lead to a decline in patients' gross motor function, requiring long-term tracking of the disease progression to tailor treatment plans and evaluate intervention effectiveness. I...
详细信息
ISBN:
(纸本)9798350386523;9798350386530
Numerous neurological disorders can lead to a decline in patients' gross motor function, requiring long-term tracking of the disease progression to tailor treatment plans and evaluate intervention effectiveness. In recent years, researchers have explored computervision methods as a cost-effective, at-home solution for monitoring human motor control. However, the current state-of-the-art approach relies on transfer learning from a general human action recognition model to address the challenge of limited training dataset size. This results in a relatively hefty model that may strain user's devices when running locally. In this study, we propose distilling knowledge from the teacher model into a lightweight student model to enhance operational efficiency on user's devices. Compared to directly training the student model with a limited amount of data, transferring and distilling knowledge from a generalized model can notably enhance accuracy. The lightweight student network can operate 1.5 to 3 times faster on various user devices compared to the large teacher network, with improvement being more noticeable on older devices in particular. This speed improvement opens up numerous opportunities for the adoption of this technology, facilitating its integration into a wider range of devices and environments.
暂无评论