Video-based face recognition has attracted a great deal of attention in recent years due to its wide applications. The challenge of video-based face recognition comes from several aspects. First, video data involves m...
详细信息
ISBN:
(纸本)9781479913329
Video-based face recognition has attracted a great deal of attention in recent years due to its wide applications. The challenge of video-based face recognition comes from several aspects. First, video data involves many frames, which increases data size and processing complexity. Second, key frames extracted from videos are usually of high intra-personal discrepancy due to variations in expressions, poses, and illuminations. In order to address these problems, we propose a novel semantic based subspace model to improve the performance of video based face recognition. The basic idea is to construct an appropriate low-dimensional subspace for each person, upon which a semantic model is built to classify the key frames of the person into specific class. After the semantic classification, the key frames belonging to the same classes, i.e. the same semantics, are used to train the linear classifiers for recognition. Extensive experiments on a large face video database (XM2VTS) clearly show that our approach obtains a significant performance improvement over the traditional approaches.
In this paper, a computation efficient regression framework is presented for estimating the 6D pose of rigid objects from a single RGB-D image, which is applicable to handling symmetric objects. This framework is desi...
详细信息
Passenger flow prediction is vitally significant for intelligent transportation systems (ITS). Most of the studies typically focus on the passenger flow prediction for an individual station, and only capture the tempo...
Passenger flow prediction is vitally significant for intelligent transportation systems (ITS). Most of the studies typically focus on the passenger flow prediction for an individual station, and only capture the temporal features without considering any spatial features. Constructing a passenger flow prediction model for multiple stations, or even a whole network, is more valuable for practical applications. Therefore, we develop a dynamic spatio-temporal network (DSTNet) with a self-attention (SA) mechanism for multi-station passenger flow prediction. A dynamic graph convolutional network (DGCN) is applied for the spatial feature extraction, and gated recurrent unit (GRU) is combined to learn the temporal features. SA is applied to further assign the weights for the extracted spatio-temporal features. The Experiment has been conducted on the passenger flow in the Xiamen bus rapid transit (BRT). The results demonstrate that the proposed DSTNet with SA (SA-DSTNet) outperforms the baselines in the multi-station passenger flow prediction task.
In this paper, a new sequence matching algorithm called as Exemplary Sequence Cardinality (ESC) is proposed. ESC combines several abilities of other sequence matching algorithms e.g. DTW, SSDTW, CDP, FSM, MVM, OSB~1. ...
详细信息
ISBN:
(纸本)9781479918065
In this paper, a new sequence matching algorithm called as Exemplary Sequence Cardinality (ESC) is proposed. ESC combines several abilities of other sequence matching algorithms e.g. DTW, SSDTW, CDP, FSM, MVM, OSB~1. Depending on the application domain, ESC can be tuned to behave such as these different sequence matching algorithms. Its generality and robustness comes from its ability to find subsequences (as in CDP and SSDTW), to skip outliers inside the target sequences (as in MVM and FSM) and also in the query sequence (as in OSB) and it has the ability to have many to one and one to many correspondences (as in DTW) between the elements of the query and the target sequences. It's special characteristic of skipping noisy elements from query sequence along with other afore mentioned properties gives it an edge over FSM. In case of word spotting application, the outliers skipping capability of ESC makes it less sensible to local variations in the spelling of words, and also to noise present in the query and/or in the target word images. Due to it's capability of sub-sequence matching, the ESC algorithm has the ability to retrieve a query inside a line or piece of line. Finally, its multiple matching facilities (many to one and one to many matching) is proven to be well advantageous in case of different length of target and query sequences due to the variability in scale, font, type/size factors. By experimenting on printed historical document images, we have demonstrated the interest of proposed ESC algorithm in specific cases when incorrect word segmentation and word level local variations occur regularly.
In this paper, pelage pattern matching is considered to solve the individual re-identification of the Saimaa ringed seals. Animal reidentification, together with the access to a large amount of image material through ...
详细信息
In region-based image annotation, keywords are usually associated with images instead of individual regions in the training data set. This poses a major challenge for any learning strategy. In this paper, we formulate...
详细信息
In region-based image annotation, keywords are usually associated with images instead of individual regions in the training data set. This poses a major challenge for any learning strategy. In this paper, we formulate image annotation as a supervised learning problem under Multiple-Instance Learning (MIL) framework. We present a novel Asymmetrical Support Vector Machine-based MIL algorithm (ASVM-MIL), which extends the conventional Support Vector Machine (SVM) to the MIL setting by introducing asymmetrical loss functions for false positives and false negatives. The proposed ASVM-MIL algorithm is evaluated on both image annotation data sets and the benchmark MUSK data sets.
Few-shot font generation (FFG) produces stylized font images with a limited number of reference samples, which can significantly reduce labor costs in manual font designs. Most existing FFG methods follow the style-co...
详细信息
ISBN:
(数字)9798350353006
ISBN:
(纸本)9798350353013
Few-shot font generation (FFG) produces stylized font images with a limited number of reference samples, which can significantly reduce labor costs in manual font designs. Most existing FFG methods follow the style-content dis-entanglement paradigm and employ the Generative Adver-sarial Network (GAN) to generate target fonts by combining the decoupled content and style representations. The complicated structure and detailed style are simultaneously generated in those methods, which may be the sub-optimal solutions for FFG task. Inspired by most manual font design processes of expert designers, in this paper, we model font generation as a multi-stage generative process. Specifically, as the injected noise and the data distribution in diffusion models can be well-separated into different sub-spaces, we are able to incorporate the font transfer process into these models. Based on this observation, we generalize diffusion methods to modelfont generative process by separating the reverse diffusion process into three stages with different functions: The structure construction stage first generates the structure information for the target character based on the source image, and the font transfer stage subsequently transforms the source font to the target font. Finally, the font refinement stage enhances the appearances and local details of the target font images. Based on the above multi-stage generative process, we construct our font generation framework. named MSD-Font, with a dual-network approach to generate font images. The superior performance demonstrates the effectiveness of our model. The code is available at: https://***/fubinfbIMSD-Font.
The EEG is a measure of voltage as a function of time. The voltage of the EEG regulates its amplitude (measured from peak to peak). EEG amplitudes in the cortex range start from 500 to 1500 μV, but the amplitude...
Transformer-based methods have shown impressive performance in low-level vision tasks, such as image super-resolution. However, we find that these networks can only utilize a limited spatial range of input information...
Transformer-based methods have shown impressive performance in low-level vision tasks, such as image super-resolution. However, we find that these networks can only utilize a limited spatial range of input information through attribution analysis. This implies that the potential of Transformer is still not fully exploited in existing networks. In order to activate more input pixels for better reconstruction, we propose a novel Hybrid Attention Transformer (HAT). It combines both channel attention and window-based self-attention schemes, thus making use of their complementary advantages of being able to utilize global statistics and strong local fitting capability. Moreover, to better aggregate the cross-window information, we introduce an overlapping cross-attention module to enhance the interaction between neighboring window features. In the training stage, we additionally adopt a same-task pre-training strategy to exploit the potential of the model for further improvement. Extensive experiments show the effectiveness of the proposed modules, and we further scale up the model to demonstrate that the performance of this task can be greatly improved. Our overall method significantly outperforms the state-of-the-art methods by more than 1dB.
暂无评论