Convolutional neural network (CNN) pruning has become one of the most successful network compression approaches in recent years. Existing works on network pruning usually focus on removing the least important filters ...
详细信息
ISBN:
(纸本)9781665445092
Convolutional neural network (CNN) pruning has become one of the most successful network compression approaches in recent years. Existing works on network pruning usually focus on removing the least important filters in the network to achieve compact architectures. In this study, we claim that identifying structural redundancy plays a more essential role than finding unimportant filters, theoretically and empirically. We first statistically model the network pruning problem in a redundancy reduction perspective and find that pruning in the layer(s) with the most structural redundancy outperforms pruning the least important filters across all layers. Based on this finding, we then propose a network pruning approach that identifies structural redundancy of a CNN and prunes filters in the selected layer(s) with the most redundancy. Experiments on various benchmark network architectures and datasets show that our proposed approach significantly outperforms the previous state-of-the-art.
Human action recognition in the dark is a significant task with various applications, e.g., night surveillance and self-driving at night. However, the lack of video datasets for human actions in the dark hinders its d...
详细信息
ISBN:
(纸本)9781665448994
Human action recognition in the dark is a significant task with various applications, e.g., night surveillance and self-driving at night. However, the lack of video datasets for human actions in the dark hinders its development. Recently, a public dataset ARID has been introduced to stimulate progress for the task of human action recognition in dark videos. Currently, there are multiple models that perform well for action recognition in videos shot under normal illumination. However, research shows that these methods may not be effective in recognizing actions in dark videos. In this paper, we construct a novel neural network architecture: DarkLight Networks, which involves (i) a dual-pathway structure where both dark videos and its brightened counterpart are utilized for effective video representation;and (ii) a self-attention mechanism, which fuses and extracts corresponding and complementary features from the two pathways. Our approach achieves state-of-the-art results on ARID.
Line art plays a fundamental role in illustration and design, and allows for iteratively polishing designs. However, as they lack color, they can have issues in conveying final designs. In this work, we propose an int...
详细信息
ISBN:
(纸本)9781665448994
Line art plays a fundamental role in illustration and design, and allows for iteratively polishing designs. However, as they lack color, they can have issues in conveying final designs. In this work, we propose an interactive colorization approach based on a conditional generative adversarial network that takes both the line art and color hints as inputs to produce a high-quality colorized image. Our approach is based on a U-net architecture with a multi-discriminator framework. We propose a Concatenation and Spatial Attention module that is able to generate more consistent and higher quality of line art colorization from user given hints. We evaluate on a large-scale illustration dataset and comparison with existing approaches corroborate the effectiveness of our approach.
Locating semantically meaningful landmark points is a crucial component of a large number of computervision pipelines. Because of the small number of available datasets with ground truth landmark annotations, it is e...
详细信息
ISBN:
(数字)9781665490627
ISBN:
(纸本)9781665490627
Locating semantically meaningful landmark points is a crucial component of a large number of computervision pipelines. Because of the small number of available datasets with ground truth landmark annotations, it is essential to design robust unsupervised and semi-supervised methods for landmark detection. Many of the recent unsupervised learning methods rely on the equivariance properties of landmarks to synthetic image deformations. Our work focuses on such widely used methods and sheds light on its core problem, its inability to produce equivariant intermediate convolutional features. This finding leads us to formulate a two-step unsupervised approach that overcomes this challenge by first learning powerful pixel-based features that are then exploited in a second step within standard landmark detection approaches. Our methodology produces state-of-the-art results on several benchmarks such as the BBC Pose dataset, the Cat-head dataset and performs comparably in other situations.
In this paper, we present a regression-based pose recognition method using cascade Transformers. One way to categorize the existing approaches in this domain is to separate them into 1). heatmap-based and 2). regressi...
详细信息
ISBN:
(纸本)9781665445092
In this paper, we present a regression-based pose recognition method using cascade Transformers. One way to categorize the existing approaches in this domain is to separate them into 1). heatmap-based and 2). regression-based. In general, heatmap-based methods achieve higher accuracy but are subject to various heuristic designs (not end-to-end mostly), whereas regression-based approaches attain relatively lower accuracy but they have less intermediate non-differentiable steps. Here we utilize the encoder-decoder structure in Transformers to perform regression-based person and keypoint detection that is general-purpose and requires less heuristic design compared with the existing approaches. We demonstrate the keypoint hypothesis (query) refinement process across different self-attention layers to reveal the recursive self-attention mechanism in Transformers. In the experiments, we report competitive results for pose recognition when compared with the competing regression-based methods.
Sign language is essential for communication among deaf individuals, yet barriers persist effectively in translating its rich linguistic expressions into textual representations. The dynamic nature of signing poses a ...
详细信息
In recent years, there has been a considerable amount of research in the Gesture recognition domain, mainly owing to the technological advancements in computervision. Various new applications have been conceptualised...
详细信息
Deep Neural Networks (DNNs) are commonly used in camera systems for video surveillance. However, the computational demands of DNN inference pose challenges for on-edge video analytics due to potential delay. Additiona...
详细信息
ISBN:
(纸本)9798350370256;9798350370263
Deep Neural Networks (DNNs) are commonly used in camera systems for video surveillance. However, the computational demands of DNN inference pose challenges for on-edge video analytics due to potential delay. Additionally, edge cameras typically employ lightweight models, which are susceptible to data drift. In this demo, we present EdgeCam, an open-source distributed camera operating system that incorporates inference scheduling and continuous learning for video analytics. EdgeCam comprises multiple edge nodes and the cloud, enabling collaborative video analytics. Edge nodes also collect drift data to support continuous learning and maintain recognition accuracy. We have implemented essential functionalities and algorithms, ensuring modularity and ease of configuration. The source code of EdgeCam is at https://***/MSNLAB/EdgeCam.
Nowadays, video conference solutions are widely adopted for companies, education, and government. People segmentation is crucial for supporting virtual background, an essential video conference function to protect use...
详细信息
ISBN:
(纸本)9781665448994
Nowadays, video conference solutions are widely adopted for companies, education, and government. People segmentation is crucial for supporting virtual background, an essential video conference function to protect users' privacy. This paper demonstrated a people segmentation framework called CE-PeopleSeg, which employed an efficient segmentation method, structural pruning, and dynamic frame skipping techniques, leading to a fast inference speed on CPU. Our extensive experiments show that the proposed CE-PeopleSeg can achieve a high prediction mIoU of 87.9% on Supervised People Dataset while reaching a real-time inference speed of 32.40 fps on CPU with very low usage of 10%. Our code would be released at https://***/geekJZY/***.
computervision and Machine Learning are used to comprehend and extract information from photos and videos. This paper examines how computervision and machine learning are used to recognize hand gestures, which are t...
详细信息
暂无评论