We propose DualConvMesh-Nets (DCM-Net) a family of deep hierarchical convolutional networks over 3D geometric data that combines two types of convolutions. the first type, geodesic convolutions, defines the kernel wei...
详细信息
ISBN:
(纸本)9781728171685
We propose DualConvMesh-Nets (DCM-Net) a family of deep hierarchical convolutional networks over 3D geometric data that combines two types of convolutions. the first type, geodesic convolutions, defines the kernel weights over mesh surfaces or graphs. that is, the convolutional kernel weights are mapped to the local surface of a given mesh. the second type, Euclidean convolutions, is independent of any underlying mesh structure. the convolutional kernel is applied on a neighborhood obtained from a local affinity representation based on the Euclidean distance between 3D points. Intuitively, geodesic convolutions can easily separate objects that are spatially close but have disconnected surfaces, while Euclidean convolutions can represent interactions between nearby objects better, as they are oblivious to object surfaces. To realize a multi-resolution architecture, we borrow well-established mesh simplification methods from the geometry processing domain and adapt them to define mesh-preserving pooling and unpooling operations. We experimentally show that combining both types of convolutions in our architecture leads to significant performance gains for 3D semantic segmentation, and we report competitive results on three scene segmentation benchmarks. Our models and code are publicly available(1).
We investigate the regulation of human brain arousal in the central nervous system and its synchronization withthe autonomic nervous system affecting the facial dynamics and its behavioral gestalt. A major focus is m...
详细信息
ISBN:
(纸本)9781728193601
We investigate the regulation of human brain arousal in the central nervous system and its synchronization withthe autonomic nervous system affecting the facial dynamics and its behavioral gestalt. A major focus is made on the sensing observable during natural human eye to eye communication. Although the inner state of the autopoietic system is deterministic, its outer facial behavioral component non-deterministic. Beside the introduction of general validity of the classical empirical interpretation of the vigilance continuum during open eyes, we show that the facial behavior can be used as suitable surrogate measurement for specific states of mind. As a consequence we predict brainwaves from face videos formulated as inverse problem of the underlying stochastic process. Finally, we discuss the impact and range of application field.
We present 3D-MPA, a method for instance segmentation on 3D point clouds. Given an input point cloud, we propose an object-centric approach where each point votes for its object center. We sample object proposals from...
详细信息
ISBN:
(纸本)9781728171685
We present 3D-MPA, a method for instance segmentation on 3D point clouds. Given an input point cloud, we propose an object-centric approach where each point votes for its object center. We sample object proposals from the predicted object centers. then, we learn proposal features from grouped point features that voted for the same object center. A graph convolutional network introduces interproposal relations, providing higher-level feature learning in addition to the lower-level point features. Each proposal comprises a semantic label, a set of associated points over which we define a foreground-background mask, an objectness score and aggregation features. Previous works usually perform non-maximum-suppression (NMS) over proposals to obtain the final object detections or semantic instances. However, NMS can discard potentially correct predictions. Instead, our approach keeps all proposals and groups them together based on the learned aggregation features. We show that grouping proposals improves over NMS and outperforms previous state-of-the-art methods on the tasks of 3D object detection and semantic instance segmentation on the ScanNetV2 benchmark and the S3DIS dataset.
We address the problem of reposing an image of a human into any desired novel pose. this conditional image-generation task requires reasoning about the 3D structure of the human, including self-occluded body parts. Mo...
详细信息
ISBN:
(数字)9781728193601
ISBN:
(纸本)9781728193601
We address the problem of reposing an image of a human into any desired novel pose. this conditional image-generation task requires reasoning about the 3D structure of the human, including self-occluded body parts. Most prior works are either based on 2D representations or require fitting and manipulating an explicit 3D body mesh. Based on the recent success in deep learning-based volumetric representations, we propose to implicitly learn a dense feature volume from human images, which lends itself to simple and intuitive manipulation through explicit geometric warping. Once the latent feature volume is warped according to the desired pose change, the volume is mapped back to RGB space by a convolutional decoder. Our state-of-the-art results on the DeepFashion and the iPER benchmarks indicate that dense volumetric human representations are worth investigating in more detail.
the Euler Curve Transform (ECT) of Turner et al. is a complete invariant of an embedded simplicial complex, which is amenable to statistical analysis. We generalize the ECT to provide a similarly convenient representa...
详细信息
ISBN:
(纸本)9781728193601
the Euler Curve Transform (ECT) of Turner et al. is a complete invariant of an embedded simplicial complex, which is amenable to statistical analysis. We generalize the ECT to provide a similarly convenient representation for weighted simplicial complexes, objects which arise naturally, for example, in certain medical imaging applications. We leverage work of Ghrist et al. on Euler integral calculus to prove that this invariant-dubbed the Weighted Euler Curve Transform (WECT)-is also complete. We explain how to transform a segmented region of interest in a grayscale image into a weighted simplicial complex and then into a WECT representation. this WECT representation is applied to study Glioblastoma Multiforme brain tumor shape and texture data. We show that the WECT representation is effective at clustering tumors based on qualitative shape and texture features and that this clustering correlates with patient survival time.
In this paper, the problem of pruning and compressing the weights of various layers of deep neural networks is investigated. the proposed method aims to remove redundant filters from the network to reduce computationa...
详细信息
ISBN:
(纸本)9781728193601
In this paper, the problem of pruning and compressing the weights of various layers of deep neural networks is investigated. the proposed method aims to remove redundant filters from the network to reduce computational complexity and storage requirements, while improving the performance of the original network. More specifically, a novel filter selection criterion is introduced based on the fact that filters whose weights follow a Gaussian distribution correspond to hidden units that do not capture important aspects of data. To this end, Higher Order Statistics (HOS) are used and filters with low cumulant values that do not deviate significantly from Gaussian distribution are identified and removed from the network. In addition, a novel pruning strategy is proposed aiming to decide on the pruning ratio of each layer using the Shapiro-Wilk normality test. the use of auxiliary MSE losses (intermediate and after the softmax layer) during the fine-tuning phase further improves the overall performance of the compressed network. Extensive experiments with different network architectures and comparison with state-of-the-art approaches on well-known public datasets, such as CIFAR-10, CIFAR-100 and ILSCVR-12, demonstrate the great potential of the proposed approach.
Nowadays, drones can be seen in various applications in industry like surveillance and transportation. Industrial drones leverage fully-fledged computervision techniques, such as object detection based on Deep Learni...
详细信息
ISBN:
(纸本)9781728129891
Nowadays, drones can be seen in various applications in industry like surveillance and transportation. Industrial drones leverage fully-fledged computervision techniques, such as object detection based on Deep Learning Neural Networks (DNN), to efficiently perform these objectives. those techniques come with a high computational effort and are implemented on distributed schemes using ground devices with high performance and power consumption. this limits a drone's operational range since it has to communicate withthe ground devices constantly. To alleviate such constraints, an optimized, low-power perception system on the drone is desirable. this work improves a trained DNN architecture to navigate a UAV introduced by the University of Zurich called DroNet. DroNet is computationally expensive and has a high power consumption, making it unsuitable for embedded platforms because of low memory and computational power. In this paper, a ROS-based architecture is first designed to port DroNet on a low-power Jetson Nano board, which conducts the drone's perception and control tasks. Secondly, tuning parameters and various schemes have been carried out to run the inference of the DNN efficiently. To implement the different layers in DNNs, Nvidia's TensorRT SDK is used to compile a high-performance inference engine for the Jetson Nano. Results showed that the Jetson Nano can achieve real-time performance, with 47 frames per second using a Winograd convolution and well-tuned parallelization parameters. the implementation can also achieve a speedup of 2x as compared withthe Jetson Nanos ARM CPU while increasing the power consumption by 54%. Finally, the Jetson Nano's usability for drone inference algorithm is shown, achieving real-time response using the DroNet DNN without losing detection accuracy.
the AI City Challenge was created to accelerate intelligent video analysis that helps make cities smarter and safer. Transportation is one of the largest segments that can benefit from actionable insights derived from...
详细信息
ISBN:
(纸本)9781728193601
the AI City Challenge was created to accelerate intelligent video analysis that helps make cities smarter and safer. Transportation is one of the largest segments that can benefit from actionable insights derived from data captured by sensors, where computervision and deep learning have shown promise in achieving large-scale practical deployment. the 4th annual edition of the AI City Challenge has attracted 315 participating teams across 37 countries, who leverage city-scale real traffic data and high-quality synthetic data to compete in four challenge tracks. Track 1 addressed video-based automatic vehicle counting, where the evaluation is conducted on both algorithmic effectiveness and computational efficiency. Track 2 addressed city-scale vehicle re-identification with augmented synthetic data to substantially increase the training set for the task. Track 3 addressed city-scale multi-target multi-camera vehicle tracking. Track 4 addressed traffic anomaly detection. the evaluation system shows two leader boards, in which a general leader board shows all submitted results, and a public leader board shows results limited to our contest participation rules, that teams are not allowed to use external data in their work. the general leader board shows results more close to real-world situations where annotated data are limited. Our results show promise that AI technology can enable smarter and safer transportation systems.
Automatic detection of dangerous situations in order to ensure the safety of residents is a new step in the development of video surveillance systems in cities. And dangerous situations are often caused by deviant beh...
详细信息
ISBN:
(纸本)9789526924427
Automatic detection of dangerous situations in order to ensure the safety of residents is a new step in the development of video surveillance systems in cities. And dangerous situations are often caused by deviant behavior of people: robbery, brawl, vandalism and etc. But due to the strong variability of such scenes, their detection is a challenging problem, which still remains unresolved. the key to solving this problem is the recognition of fine-grained features and events of scenes and the application of knowledge management technologies. In this paper, three computervision technologies for detecting people, tracking people and estimating three-dimensional human poses were integrated withthe aim of recognizing the actions and interactions of people in three-dimensional space. For all technologies an open source implementations were used that showed high results in popular computervision challenges. A dataset was also created using computer graphics to test the developed system, containing scenes of the interaction of people in the city, shot under different point of views. this dataset showed that additional teaching of the human pose estimation component to handle challenging poses of people and camera viewpoints is required.
A low-power and high-precision reconfigurable processor based on optimized convolutional recurrent neural network is proposed for noise robust keyword recognition. In order to create a low-power and high-precision sys...
详细信息
暂无评论