3D pose estimation based on a monocular camera can be applied to various fields such as human-computer interaction and human action recognition. As a two-stage 3D pose estimator, VideoPose3D achieves state-of-the-art ...
详细信息
ISBN:
(纸本)9784901122207
3D pose estimation based on a monocular camera can be applied to various fields such as human-computer interaction and human action recognition. As a two-stage 3D pose estimator, VideoPose3D achieves state-of-the-art accuracy. However, because of the limitation of two-stage processing, image information is partially lost in the process of mapping 2D poses to 3D space, which results in limited final accuracy. This paper proposes an image-assisting pose estimation model and a back-projection based offset generating module. The image-assisting pose estimation model consists of a 2D pose processing branch and an imageprocessing branch. image information is processed to generate an offset to refine the intermediate 3D pose produced by the 2D pose processing network. The back-projection based offset generating module projects the intermediate 3D poses to 2D space and calculates the error between the projection and input 2D pose. With the error combining with extracted image feature, the neural network generates an offset to decrease the error. By evaluation, the accuracy on each action of Human3.6M dataset gets an average improvement of 0.9 mm over the VideoPose3D baseline.
Nowadays, laser vision systems have allowed the development of different applications such as reverse engineering, manufacturing, navigation systems and, structural health monitoring (SHM). However, most of the machin...
详细信息
Nowadays, laser vision systems have allowed the development of different applications such as reverse engineering, manufacturing, navigation systems and, structural health monitoring (SHM). However, most of the machinevision systems for structural behavior analysis have restricted field of view, consume high levels of computational resources for imageprocessing and require special illumination conditions to achieve lower error rates. Therefore, the purpose of this paper is to present a technical vision system (TVS) for structural behavior analysis using dynamic laser triangulation and k-Nearest Neighbor (k-NN) machine learning regression algorithm. The proposed vision system was tested in order to demonstrate the practicality of it, different deformations and displacements were analyzed over real structures in controlled laboratory conditions to assure the reproducibility of the experimentation. The TVS prototype proved to be a reliable option on SHM tasks, presenting balance between precision and operating ranges, without the issues aforementioned.
In recent years, there has been a sharp increase in transmission of images to remote servers specifically for the purpose of computer vision. In many applications, such as surveillance, images are mostly transmitted f...
详细信息
ISBN:
(纸本)9781665492577
In recent years, there has been a sharp increase in transmission of images to remote servers specifically for the purpose of computer vision. In many applications, such as surveillance, images are mostly transmitted for automated analysis, and rarely seen by humans. Using traditional compression for this scenario has been shown to be inefficient in terms of bit-rate, likely due to the focus on human based distortion metrics. Thus, it is important to create specific image coding methods for joint use by humans and machines. One way to create the machine side of such a codec is to perform feature matching of some intermediate layer in a Deep Neural Network performing the machine task. In this work, we explore the effects of the layer choice used in training a learnable codec for humans and machines. We prove, using the data processing inequality, that matching features from deeper layers is preferable in the sense of rate-distortion. Next, we confirm our findings empirically by re-training an existing model for scalable human-machine coding. In our experiments we show the trade-off between the human and machine sides of such a scalable model, and discuss the benefit of using deeper layers for training in that regard.
This paper describes the development of a low-cost software, called Rat Steps, which allows the obtention of quantitative data (total distance traveled and average speed) as well as the graphic trajectory performed by...
详细信息
ISBN:
(纸本)9783030706012;9783030706005
This paper describes the development of a low-cost software, called Rat Steps, which allows the obtention of quantitative data (total distance traveled and average speed) as well as the graphic trajectory performed by an animal in the open field test. This behavioral test is widely used in neuroscience in order to visualize locomotor impairment following acute brain injury, including stroke, as well as the effect of experimental therapies for these neural disorders. The main tools used for the software development were digital imageprocessing techniques, Python programming, OpenCV library and machine learning algorithms, including the Mean Shift method. The software was successfully developed with effective obtention of quantitative parameters from the Open Field Test, which allows several applications in neuroscience research.
When using traditional phase-shift profilometry for 3D measurement, it is necessary to keep the measured object static during the shooting process. When the measured object is moving, errors will occur if the projecti...
详细信息
In recent years, the global population has shown substantial growth, leading to an increase in its food security needs. In response, greenhouse cultivation has emerged as a strategy to ensure controlled conditions for...
详细信息
Recent work in machine Learning and Computer vision has highlighted the presence of various types of systematic flaws inside ground truth object recognition benchmark datasets. Our basic tenet is that these flaws are ...
详细信息
The distributed stream processing system suffers from the rate variation and skewed distribution of input stream. The scaling policy is used to reduce the impact of rate variation, but cannot maintain high performance...
详细信息
ISBN:
(纸本)9783031124266;9783031124259
The distributed stream processing system suffers from the rate variation and skewed distribution of input stream. The scaling policy is used to reduce the impact of rate variation, but cannot maintain high performance with a low overhead when input stream is skewed. To solve this issue, we propose Alps, an Adaptive Load Partitioning Scaling system. Alps exploits adaptive partitioning scaling algorithm based on the willingness function to determine whether to use a partitioning policy. To our knowledge, this is the first approach integrates scaling policy and partitioning policy in an adaptive manner. In addition, Alps achieves the outstanding performance of distributed stream processing system with the least overhead. Compared with state-of-the-art scaling approach DS2, Alps reduces the end-to-end latency by 2 orders of magnitude on high-speed skewed stream and avoids the waste of resources on low-speed or balanced stream.
Google Earth Engine is a geospatial data processing platform that runs in the cloud. It offers free access to massive amounts of satellite data as well as unlimited computing power to monitor, visualize, and analyze e...
详细信息
Single image deraining is an important yet challenging task due to the ill-posed nature of the problem to derive the rain-free clean image from a rainy image. In this paper, we propose Recurrent RLCN-Guided Attention ...
详细信息
ISBN:
(纸本)9784901122207
Single image deraining is an important yet challenging task due to the ill-posed nature of the problem to derive the rain-free clean image from a rainy image. In this paper, we propose Recurrent RLCN-Guided Attention Network (RRANet) for single image deraining. Our main technical contributions lie in threefold: (i) We propose rectified local contrast normalization (RLCN) to apply to the input rainy image to effectively mark candidates of rain regions. (ii) We propose RLCN-guided attention module (RLCN-GAM) to learn an effective attention map for the deraining without the necessity of ground-truth rain masks. (iii) We incorporate RLCN-GAM into a recurrent neural network to progressively derive the rainy-to-clean image mapping. The quantitative and qualitative evaluations using representative deraining benchmark datasets demonstrate that our proposed RRANet outperforms existing state-of-the-art deraining methods, where it is particularly noteworthy that our method clearly achieves the best performance on a real-world dataset.
暂无评论