Vector of Locally Aggregated Descriptors (VLAD) method, which aggregates descriptors and produces a compact image representation, has achieved great success in the field of image classification and retrieval. However,...
详细信息
ISBN:
(纸本)9780738133669
Vector of Locally Aggregated Descriptors (VLAD) method, which aggregates descriptors and produces a compact image representation, has achieved great success in the field of image classification and retrieval. However, the original VLAD method is a hard assignment strategy that only assigns each descriptor to the nearest neighbor visual word in dictionary, which leads to large quantization error. In this paper, improved VLAD based on adaptive bases and saliency weights is proposed to solve the above problem. The new method considers the local density distribution when assigning local descriptors, adaptively selects several nearest neighbor visual words, and takes the coding coefficients obtained by utilizing saliency as the weights of the selected visual words. Experimental results on Corel 10, 15 Scenes and UIUC Sports Event datasets show that the new coding method proposed in this paper achieves better classification performance compared with the existing five VLAD based methods and two commonly used representation methods.
Brain-computer interface (BCI) is a domain, in which a person can send information without using any exterior nerve or muscles, just using their brain signal, called electroencephalography (EEG) signal. Multiview lear...
详细信息
Brain-computer interface (BCI) is a domain, in which a person can send information without using any exterior nerve or muscles, just using their brain signal, called electroencephalography (EEG) signal. Multiview learning or data integration or data fusion from a different set of features is an emerging way in machine learning to improve the generalized performance by considering the knowledge with multiple views. Multiview learning has made rapid progress and development in recent years and is also facing many new challenges. This method can be used in the BCI domain, as the meaningful representa-tion of the EEG signal in plenty of ways. This study utilized the multiview ensemble learning (MEL) approach for the binary classification of five mental tasks on the six subjects individually. In this study, we used a well-known EEG database (Keirn and Aunon database). The EEG signal has been decomposed using by methods i.e wavelet transform (WT), empirical mode decomposition (EMD), empirical wavelet transform (EWT), and fuzzy C-means followed by EWT (FEWT). After that, the feature coding technique is applied using parametric feature formation from the decomposed signal. Hence, we had four views to learn four same type of independent base classifiers and predictions are made in an ensemble manner. The study is performed independently with three types of base classifiers, i.e., K-nearest neighbor (KNN), support vector machine (SVM) with linear and non-linear kernels The performance validation of the ten combinations of mental tasks was performed by three MEL based classifiers, i.e., K-nearest neighbor (KNN), support vector machine (SVM) with linear and non-linear kernels. For reliability of the obtained results of the classifiers, 10-fold cross-validation was used. The proposed algorithm shows a promising accuracy of 80% to 100% for binary pair-wise classification of mental tasks. (c) 2020 Elsevier B.V. All rights reserved.
Nowadays, skeleton information in videos plays an important role in human-centric video analysis but effective coding such massive skeleton information has never been addressed in previous work. In this paper, we make...
详细信息
ISBN:
(纸本)9781538692141
Nowadays, skeleton information in videos plays an important role in human-centric video analysis but effective coding such massive skeleton information has never been addressed in previous work. In this paper, we make the first attempt to solve this problem by proposing a multimodal skeleton coding tool containing three different coding schemes, namely, spatial differential-coding scheme, motion-vector-based differential-coding scheme and inter prediction scheme, thus utilizing both spatial and temporal redundancy to losslessly compress skeleton data. More importantly, these schemes are switched properly for different types of skeletons in video frames, hence achieving further improvement of compression rate. Experimental results show that our approach leads to 74.4% and 54.7% size reduction on our surveillance sequences and overall test sequences respectively, which demonstrates the effectiveness of our skeleton coding tool.
With the unprecedented success of deep learning in computer vision tasks, many cloud-based visual analysis applications are powered by deep learning models. However, the deep learning models are also characterized wit...
详细信息
ISBN:
(纸本)9781450368896
With the unprecedented success of deep learning in computer vision tasks, many cloud-based visual analysis applications are powered by deep learning models. However, the deep learning models are also characterized with high computational complexity and are task-specific, which may hinder the large-scale implementation of the conventional data communication paradigms. To enable a better balance among bandwidth usage, computational load and the generalization capability for cloud-end servers, we propose to compress and transmit intermediate deep learning features instead of visual signals and ultimately utilized features. The proposed strategy also provides a promising way for the standardization of deep feature coding. As the first attempt to this problem, we present a lossy compression framework and evaluation metrics for intermediate deep feature compression. Comprehensive experimental results show the effectiveness of our proposed methods and the feasibility of the proposed data transmission strategy. It is worth mentioning that the proposed compression framework and evaluation metrics have been adopted into the ongoing AVS (Audio Video coding Standard Workgroup) - Visual feature coding Standard.
An effective image representation is important to an image classification task. The most popular image representation framework utilizes a feature coding algorithm to encode the extracted low-level feature descriptors...
详细信息
ISBN:
(纸本)9781467325332;9781467325349
An effective image representation is important to an image classification task. The most popular image representation framework utilizes a feature coding algorithm to encode the extracted low-level feature descriptors into a vector representation. In this paper, we analyze the recently developed feature coding methods in a general way. According to their common characteristics, we propose a new coding scheme to perform feature coding based on the vector difference in a high-dimensional space which is obtained by explicit feature maps. As we illustrate, our method has promising results with small codebook sizes and generalizes most existing coding methods in a unified form.
Numerous real-time applications in computer vision rely on finding correspondences between local binary features. In many mobile scenarios, the visual information captured at a sensor node needs to be transmitted to a...
详细信息
ISBN:
(纸本)9781479970612
Numerous real-time applications in computer vision rely on finding correspondences between local binary features. In many mobile scenarios, the visual information captured at a sensor node needs to be transmitted to a processing server, which is capable of storing the visual information or executing a complex analysis task. However, not necessarily all the visual information need to be transmitted. In this paper, we present a rate allocation scheme that is capable of categorizing features into classes according to their usefulness and select the amount of data spent on each class to maximize the overall performance of a computer vision task. We demonstrate the approach using ORB, BRISK, and FREAK features and show the improvements on a homography estimation task.
At present, shared e-cars have become an important part of traffic in scenic spots. While shared e-cars bring convenience to tourists, they also exert a certain influence on the management of the traffic environment i...
详细信息
ISBN:
(纸本)9781538679753
At present, shared e-cars have become an important part of traffic in scenic spots. While shared e-cars bring convenience to tourists, they also exert a certain influence on the management of the traffic environment in scenic spots. Nowadays, the threshold for shared e-cars is low, and all business operators occupy the market share of the scenic spots and put the shared-cars on the market without plans. This behavior takes up the limited space, exacerbating the congestion of scenic space. Therefore, predicting the amount of shared e-cars to be supplied by a scenic spot in the near future accurately is of great significance for major business entities to do resource scheduling, reduce operating costs, and create an open and coordinated traffic environment in smart scenic spots. However, the traditional time-series prediction models such as AutoRegressive and Moving Average Model, Holt-Winters, and Long ShortTerm Memory can only be used for short-term rough predictions, and cannot be available under some special circumstances such as holidays or rush hours. In our work, we proposed EB-Boost, an ensemble learning method using feature coding based on target. The EB-Boost studied from historical data such as weather data, timestamp data and business data and established probability model and learned relationship between features and targets. We used the data of shared e-cars operated by Roboy Technology company to build a model, and compared the prediction results of EB-Boost with traditional time-series algorithms and neural network algorithms. Finally, we also discuss the robustness of the model and the predictive effect of shared e-cars in other scenic spots.
Saliency detection has been applied to the target acquisition case. This paper proposes a two-dimensional hidden Markov model (2D-HMM) that exploits the hidden semantic information of an image to detect its salient re...
详细信息
Saliency detection has been applied to the target acquisition case. This paper proposes a two-dimensional hidden Markov model (2D-HMM) that exploits the hidden semantic information of an image to detect its salient regions. A spatial pyramid histogram of oriented gradient descriptors is used to extract features. After encoding the image by a learned dictionary, the 2D-Viterbi algorithm is applied to infer the saliency map. This model can predict fixation of the targets and further creates robust and effective depictions of the targets' change in posture and viewpoint. To validate the model with a human visual search mechanism, two eyetrack experiments are employed to train our model directly from eye movement data. The results show that our model achieves better performance than visual attention. Moreover, it indicates the plausibility of utilizing visual track data to identify targets. (c) 2018 SPIE and IS&T
Color descriptors of an image are the most widely used visual features in content-based image retrieval sys- tems. In this study, we present a novel color-based image retrieval framework by integrating color space qua...
详细信息
Color descriptors of an image are the most widely used visual features in content-based image retrieval sys- tems. In this study, we present a novel color-based image retrieval framework by integrating color space quantization and feature coding. Although color features have advantages such as robustness and simple extraction, direct processing of the abundant amount of color information in an RGB image is a challenging task. To overcome this problem, a color space clustering quantization algorithm is proposed to obtain the clustering color space (CCS) by clustering the CIE1976L*a*b* space into 256 distinct colors, which ade- quately accommodate human visual perception. In addition, a new feature coding method called feature-to-character coding (FCC) is proposed to encode the block-based main color fea- tures into character codes. In this method, images are repre- sented by character codes that contribute to efficiently build- ing an inverted index by using color features and by utilizing text-based search engines. Benefiting from its high-efficiency computation, the proposed framework can also be applied to large-scale web image retrieval. The experimental results demonstrate that the proposed system can produce a signifi- cant augmentation in performance when compared to block- based main color image retrieval systems that utilize the tra- ditional HSV(Hue, Saturation, Value) quantization method.
This paper proposed a high-performance image retrieval framework, which combines the improved feature extraction algorithm SIFT (Scale Invariant feature Transform), improved feature matching, improved feature coding F...
详细信息
This paper proposed a high-performance image retrieval framework, which combines the improved feature extraction algorithm SIFT (Scale Invariant feature Transform), improved feature matching, improved feature coding Fisher and improved Gaussian Mixture Model (GMM) for image retrieval. Aiming at the problem of slow convergence of traditional GMM algorithm, an improved GMM is proposed. This algorithm initializes the GMM by using on-line K-means clustering method, which improves the convergence speed of the algorithm. At the same time, when the model is updated, the storage space is saved through the improvement of the criteria for matching rules and generating new Gaussian distributions. Aiming at the problem that the dimension of SIFT (Scale Invariant feature Transform) algorithm is too high, the matching speed is too slow and the matching rate is low, an improved SIFT algorithm is proposed, which preserves the advantages of SIFT algorithm in fuzzy, compression, rotation and scaling invariance advantages, and improves the matching speed, the correct match rate is increased by an average of 40% to 55%. Experiments on a recently released VOC 2012 database and a database of 20 category objects containing 230,800 images showed that the framework had high precision and recall rates and less query time. Compared with the standard image retrieval framework, the improved image retrieval framework can detect the moving target quickly and effectively and has better robustness.
暂无评论