检索结果-内蒙古大学图书馆

Fast-moving object counting with an event camera

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Bialik, Kamil Kowalczyk, Marcin Blachut, Krzysztof Kryjak, Tomasz Embedded Vision Systems Group Computer Vision Laboratory Department of Automatic Control and Robotics AGH University of Science and Technology Al. Mickiewicza 30 Krakow30-059 Poland

This paper proposes the use of an event camera as a component of a vision system that enables counting of fast-moving objects – in this case, falling corn grains. These type of cameras transmit information about the change in brightness of individual pixels and are characterised by low latency, no motion blur, correct operation in different lighting conditions, as well as very low power consumption. The proposed counting algorithm processes events in real time. The operation of the solution was demonstrated on a stand consisting of a chute with a vibrating feeder, which allowed the number of grains falling to be adjusted. The objective of the control system with a PID controller was to maintain a constant average number of falling objects. The proposed solution was subjected to a series of tests to determine the correctness of the developed method operation. On their basis, the validity of using an event camera to count small, fast-moving objects and the associated wide range of potential industrial applications can be confirmed. © 2022, CC BY.

关键词： Cameras

PointPillars Backbone Type Selection For Fast and Accurate LiDAR Object Detection

学校读者我要写书评

暂无评论

TechRxiv

TechRxiv 2022年

作者： Lis, Konrad Kryjak, Tomasz Embedded Vision Systems Group Computer Vision Laboratory Department of Automatic Control and Robotics AGH University of Science and Technology Al. Mickiewicza 30 Krakow30-059 Poland

3D object detection from LiDAR sensor data is an important topic in the context of autonomous cars and drones. In this paper, we present the results of experiments on the impact of backbone selection of a deep convolutional neural network on detection accuracy and computation speed. We chose the PointPillars network, which is characterised by a simple architecture, high speed, and modularity that allows for easy expansion. During the experiments, we paid particular attention to the change in detection efficiency (measured by the mAP metric) and the total number of multiply-addition operations needed to process one point cloud. We tested 10 different convolutional neural network architectures that are widely used in image-based detection problems. For a backbone like MobilenetV1, we obtained an almost 4x speedup at the cost of a 1.13% decrease in mAP. On the other hand, for CSPDarknet we got an acceleration of more than 1.5x at an increase in mAP of 0.33%. We have thus demonstrated that it is possible to significantly speed up a 3D object detector in LiDAR point clouds with a small decrease in detection efficiency. This result can be used when PointPillars or similar algorithms are implemented in embedded systems, including SoC FPGAs. The code is available at https://***/vision-agh/pointpillars backbone. © 2022, CC BY.

关键词： Object detection

Signal propagation in transformers: theoretical perspectives and the role of rank collapse 22

学校读者我要写书评

暂无评论

Signal propagation in transformers: theoretical perspectives...

Proceedings of the 36th International Conference on Neural Information Processing Systems

作者： Lorenzo Noci Sotiris Anagnostidis Luca Biggio Antonio Orvieto Sidak Pal Singh Aurelien Lucchi Dept of Computer Science ETH Zürich Dept of Computer Science ETH Zürich and Robotics & ML CSEM SA Alpnach Switzerland Dept of Computer Science ETH Zürich and MPI for Intelligent Systems Tübingen Department of Mathematics and Computer Science University of Basel

ISBN: (纸本)9781713871088

Transformers have achieved remarkable success in several domains, ranging from natural language processing to computer vision. Nevertheless, it has been recently shown that stacking self-attention layers — the distinctive architectural component of Transformers — can result in rank collapse of the tokens' representations at initialization. The question of if and how rank collapse affects training is still largely unanswered, and its investigation is necessary for a more comprehensive understanding of this architecture. In this work, we shed new light on the causes and the effects of this phenomenon. First, we show that rank collapse of the tokens' representations hinders training by causing the gradients of the queries and keys to vanish at initialization. Furthermore, we provide a thorough description of the origin of rank collapse and discuss how to prevent it via an appropriate depth-dependent scaling of the residual branches. Finally, our analysis unveils that specific architectural hyperparameters affect the gradients of queries and values differently, leading to disproportionate gradient norms. This suggests an explanation for the widespread use of adaptive methods for Transformers' optimization.

关键词：

Fast-moving object counting with an event camera

学校读者我要写书评

暂无评论

TechRxiv

TechRxiv 2022年

关键词： Cameras

PointPillars Backbone Type Selection For Fast and Accurate LiDAR Object Detection

学校读者我要写书评

暂无评论

arXiv 2022年

关键词： Object detection

Skeletal Human Action Recognition using Hybrid Attention based Graph Convolutional Network

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Xing, Hao Burschka, Darius Technical University of Munich Machine Vision and Perception Group Munich Institute of Robotics and Machine Intelligence Department of Computer Science Parkring 13 Munich85748 Germany

In skeleton-based action recognition, Graph Convolutional Networks model human skeletal joints as vertices and connect them through an adjacency matrix, which can be seen as a local attention mask. However, in most existing Graph Convolutional Networks, the local attention mask is defined based on natural connections of human skeleton joints and ignores the dynamic relations for example between head, hands and feet joints. In addition, the attention mechanism has been proven effective in Natural Language Processing and image description, which is rarely investigated in existing methods. In this work, we proposed a new adaptive spatial attention layer that extends local attention map to global based on relative distance and relative angle information. Moreover, we design a new initial graph adjacency matrix that connects head, hands and feet, which shows visible improvement in terms of action recognition accuracy. The proposed model is evaluated on two large-scale and challenging datasets in the field of human activities in daily life: NTU-RGB+D and Kinetics skeleton. The results demonstrate that our model has strong performance on both dataset. © 2022, CC BY.

关键词： Convolution

Traffic Sign Detection With Event Cameras and DCNN

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Wzorek, Piotr Kryjak, Tomasz Embedded Vision Systems Group Computer Vision Laboratory Department of Automatic Control and Robotics AGH University of Science and Technology Kraków Poland Department of Digital Systems Silesian University of Technology Gliwice Poland

In recent years, event cameras (DVS – Dynamic vision Sensors) have been used in vision systems as an alternative or supplement to traditional cameras. They are characterised by high dynamic range, high temporal resolution, low latency, and reliable performance in limited lighting conditions – parameters that are particularly important in the context of advanced driver assistance systems (ADAS) and self-driving cars. In this work, we test whether these rather novel sensors can be applied to the popular task of traffic sign detection. To this end, we analyse different representations of the event data: event frame, event frequency, and the exponentially decaying time surface, and apply video frame reconstruction using a deep neural network called FireNet. We use the deep convolutional neural network YOLOv4 as a detector. For particular representations, we obtain a detection accuracy in the range of 86.9-88.9% mAP@0.5. The use of a fusion of the considered representations allows us to obtain a detector with higher accuracy of 89.9% mAP@0.5. In comparison, the detector for the frames reconstructed with FireNet is characterised by an accuracy of 72.67% mAP@0.5. The results obtained illustrate the potential of event cameras in automotive applications, either as standalone sensors or in close cooperation with typical frame-based cameras. © 2022, CC BY.

关键词： Cameras

Optimisation of a Siamese Neural Network for Real-Time Energy Efficient Object Tracking 1

学校读者我要写书评

暂无评论

International Conference on computer vision and Graphics, ICCVG 2020

作者： Przewlocka, Dominika Wasala, Mateusz Szolc, Hubert Blachut, Krzysztof Kryjak, Tomasz Embedded Vision Systems Group Computer Vision Laboratory Department of Automatic Control and Robotics AGH University of Science and Technology Krakow Poland

ISBN: (数字)9783030590062

ISBN: (纸本)9783030590055

In this paper the research on optimisation of visual object tracking using a Siamese neural network for embedded vision systems is presented. It was assumed that the solution shall operate in real-time, preferably for a high resolution video stream, with the lowest possible energy consumption. To meet these requirements, techniques such as the reduction of computational precision and pruning were considered. Brevitas, a tool dedicated for optimisation and quantisation of neural networks for FPGA implementation, was used. A number of training scenarios were tested with varying levels of optimisations – from integer uniform quantisation with 16 bits to ternary and binary networks. Next, the influence of these optimisations on the tracking performance was evaluated. It was possible to reduce the size of the convolutional filters up to 10 times in relation to the original network. The obtained results indicate that using quantisation can significantly reduce the memory and computational complexity of the proposed network while still enabling precise tracking, thus allow to use it in embedded vision systems. Moreover, quantisation of weights positively affects the network training by decreasing overfitting. © 2020, Springer Nature Switzerland AG.

关键词： Field programmable gate arrays (FPGA)

A vision Based Hardware-Software Real-Time Control System for the Autonomous Landing of an UAV 1

学校读者我要写书评

暂无评论

International Conference on computer vision and Graphics, ICCVG 2020

作者： Blachut, Krzysztof Szolc, Hubert Wasala, Mateusz Kryjak, Tomasz Gorgon, Marek Embedded Vision Systems Group Computer Vision Laboratory Department of Automatic Control and Robotics AGH University of Science and Technology Krakow Poland

ISBN: (数字)9783030590062

ISBN: (纸本)9783030590055

In this paper we present a vision based hardware-software control system enabling the autonomous landing of a multirotor unmanned aerial vehicle (UAV). It allows for the detection of a marked landing pad in real-time for a 1280 720 @ 60 fps video stream. In addition, a LiDAR sensor is used to measure the altitude above ground. A heterogeneous Zynq SoC device is used as the computing platform. The solution was tested on a number of sequences and the landing pad was detected with 96% accuracy. This research shows that a reprogrammable heterogeneous computing system is a good solution for UAVs because it enables real-time data stream processing with relatively low energy consumption. © 2020, Springer Nature Switzerland AG.

关键词： System-on-chip