We have designed and implemented a system for real-time detection qi 2-D features on a reconfigurable computer based on Field Programmable Gate Arrays (FPGA's). We envision this device as the front-end si a system...
详细信息
ISBN:
(纸本)0818684976
We have designed and implemented a system for real-time detection qi 2-D features on a reconfigurable computer based on Field Programmable Gate Arrays (FPGA's). We envision this device as the front-end si a system able to track image features in real-time control applications like autonomous vehicle navigation. The algorithm employed to select good features is inspired by Tomasi and Kanade's method. Compared to the original method, the algorithm that we have devised does not require any floating point or transcendental operations, and can be implemented either in hardware or in software. Moreover, it maps efficiently into a highly pipelined architecture, well suited to implementation in FPGA technology. We have implemented the algorithm on a lour-cost reconfigurable computer and have observed reliable operation on an image stream generated by a standard NTSC video camera at 30 Hz.
Image processing and computervision are natural applications for High Performance Computing (here considered to be general-purpose parallel supercomputing), but there are many barriers to its effective use in compute...
详细信息
Adversarial Training (AT) is crucial for obtaining deep neural networks that are robust to adversarial attacks, yet recent works found that it could also make models more vulnerable to privacy attacks. In this work, w...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Adversarial Training (AT) is crucial for obtaining deep neural networks that are robust to adversarial attacks, yet recent works found that it could also make models more vulnerable to privacy attacks. In this work, we further reveal this unsettling property of AT by designing a novel privacy attack that is practically applicable to the privacy-sensitive Federated Learning (FL) systems. Using our method, the attacker can exploit AT models in the FL system to accurately reconstruct users' private training images even when the training batch size is large. Code is available at https://***/zjysteven/PrivayAttack_AT_FL.
Dimensionality reduction via feature projection has been widely used in patternrecognition and machine learning. It is often beneficial to derive the projections not only based on the inputs but also on the target va...
详细信息
ISBN:
(纸本)0769523722
Dimensionality reduction via feature projection has been widely used in patternrecognition and machine learning. It is often beneficial to derive the projections not only based on the inputs but also on the target values in the training data set. This is of particular importance in predicting multivariate or structured outputs. which is an area of growing interest. In this paper we introduce a novel projection framework which is sensitive to both input features and outputs. Based on the derived features prediction accuracy can be greatly improved. We validate our approach in two applications. The first is to model users ' preferences on a set of paintings. The second application is concerned with image categorization where each image may belong to multiple categories. The proposed algorithm produces very encouraging results in both settings.
We present a novel approach for accelerating convolutions during inference for CPU-based architectures. The most common method of computation involves packing the image into the columns of a matrix (im2col) and perfor...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
We present a novel approach for accelerating convolutions during inference for CPU-based architectures. The most common method of computation involves packing the image into the columns of a matrix (im2col) and performing general matrix multiplication (GEMM) with a matrix of weights. This results in two main drawbacks: (a) im2col requires a large memory buffer and can experience inefficient memory access, and (b) while GEMM is highly optimized for scientific matrices multiplications, it is not well suited for convolutions. We propose an approach that takes advantage of scalar-matrix multiplication and reduces memory overhead. Our experiments with commonly used network architectures demonstrate a significant speedup compared to existing indirect methods.
We propose to model the persistent-transient duality in human behavior using a parent-child multi-channel neural network, which features a parent persistent channel that manages the global dynamics and children transi...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
We propose to model the persistent-transient duality in human behavior using a parent-child multi-channel neural network, which features a parent persistent channel that manages the global dynamics and children transient channels that are initiated and terminated on-demand to handle detailed interactive actions. The short-lived transient sessions are managed by a proposed Transient Switch. The neural framework is trained to discover the structure of the duality automatically. Our model shows superior performances in human-object interaction motion prediction.
Landuse classification is an important problem in the remote sensing field. It can be used in a wide range of applications. In this paper we propose a hybrid method fusing edges and regions information for the landuse...
详细信息
ISBN:
(纸本)0769523722
Landuse classification is an important problem in the remote sensing field. It can be used in a wide range of applications. In this paper we propose a hybrid method fusing edges and regions information for the landuse classification of multispectral images. It mainly includes the steps of image pre-processing, initial segmentation and region merging. Especially, a novel spatial mean shift procedure is proposed so that some information can be extracted and used in the successive steps. Aiming at the multispectral images processing, we also design a band weighting strategy that give a proper weight to each band adaptively according to the region to be processed. Experimental results on the Landsat TM and ETM+ images validate the performance of the proposed method.
Extracting layers from video is very important for video representation, analysis, compression, and recognition. Assuming that a scene can be approximately described by multiple planar regions, this paper describes a ...
详细信息
ISBN:
(纸本)0769521584
Extracting layers from video is very important for video representation, analysis, compression, and recognition. Assuming that a scene can be approximately described by multiple planar regions, this paper describes a robust novel approach to automatically extract a set of affine transformations induced by these regions, and accurately segment the scene into several motion layers. First, a number of seed regions are determined by using two frame correspondences. Then the seed regions are expanded and refined using the level set representation and employing graph cut method. Next, these initial regions are merged into several initial layers according to the motion similarity. Third, after exploiting the occlusion order constraint on multiple frames the robust layer extraction is obtained by graph cut algorithm, and the occlusions between the overlapping layers are explicitly determined. Several examples are demonstrated in the experiments to show that our approach is effective and robust.
Trajectory prediction is an important task in autonomous driving. State-of-the-art trajectory prediction models often use attention mechanisms to model the interaction between agents. In this paper, we show that the a...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Trajectory prediction is an important task in autonomous driving. State-of-the-art trajectory prediction models often use attention mechanisms to model the interaction between agents. In this paper, we show that the attention information from such models can also be used to measure the importance of each agent with respect to the ego vehicle's future planned trajectory. Our experiment results on the nuPlans dataset show that our method can effectively find and rank surrounding agents by their impact on the ego's plan.
We present a new state-of-the-art on the text-to-video retrieval task on MSRVTT and LSMDC benchmarks where our model outperforms all previous solutions by a large margin. Moreover, state-of-the-art results are achieve...
详细信息
ISBN:
(纸本)9781665448994
We present a new state-of-the-art on the text-to-video retrieval task on MSRVTT and LSMDC benchmarks where our model outperforms all previous solutions by a large margin. Moreover, state-of-the-art results are achieved using a single model and without finetuning. This multidomain generalisation is achieved by a proper combination of different video caption datasets. We show that our practical approach for training on different datasets can improve test results of each other. Additionally, we check intersection between many popular datasets and show that MSRVTT as well as ActivityNet contains a significant overlap between the test and the training parts. More details are available at https://***/papermsucode/mdmmt.
暂无评论