Computer vision tasks suffer from the high cost of collecting large amounts of labeled data. Few-shot Learning (FSL) is a dominant approach to solve this problem because it provides an insight to learn the knowledge o...
详细信息
ISBN:
(纸本)9781665475921
Computer vision tasks suffer from the high cost of collecting large amounts of labeled data. Few-shot Learning (FSL) is a dominant approach to solve this problem because it provides an insight to learn the knowledge of novel categories with few training samples. In FSL task, Meta-learning and metric learning have achieved impressive results. However, the performance of this task is still limited by large intra-class variance and small inter-class distance caused by limited number of few samples. To solve this problem, In this paper, we propose a new method, which integrates meta-learning and metric learning techniques. Specifically, we first propose a feature representation module (FR) to construct representative support class prototypes and query features. Then, we design bias loss to minimize the bias between support and query samples. Furthermore, we design an intra-class loss to minimize the distance between query class prototype and each query sample. We denote this model as ML-FDA and validate it on standard few-shot classification benchmark datasets (MiniimageNet, CIFAR-FS, FC100). The results show that our method improves the performance over other same paradigm methods and achieves the best performance on most benchmarks. The ablation study and visulization analysis also demonstrate the effectiveness of our method.
Compressive sensing imaging (CSI) is a new framework for image coding, which enables acquiring and compressing a scene simultaneously. The CS encoder shifts the bulk of the system complexity to the decoder efficiently...
详细信息
ISBN:
(纸本)9781479902880
Compressive sensing imaging (CSI) is a new framework for image coding, which enables acquiring and compressing a scene simultaneously. The CS encoder shifts the bulk of the system complexity to the decoder efficiently. Ideally, implementation of CSI provides lossless compression in image coding. In this paper, we consider the lossy compression of the CS measurements in CSI system. We design a universal quantizer for the CS measurements of any input image. The proposed method firstly establishes a universal probability model for the CS measurements in advance, without knowing any information of the input image. Then a fast quantizer is designed based on this established model. Simulation result demonstrates that the proposed method has nearly optimal rate-distortion (R similar to D) performance, meanwhile, maintains a very low computational complexity at the CS encoder.
As short video industry grows up, quality assessment of user generated videos has become a hot issue. Existing no reference video quality assessment methods are not suitable for this type of application scenario since...
详细信息
ISBN:
(纸本)9781728180687
As short video industry grows up, quality assessment of user generated videos has become a hot issue. Existing no reference video quality assessment methods are not suitable for this type of application scenario since they are aimed at synthetic videos. In this paper, we propose a novel deep blind quality assessment model for user generated videos according to content variety and temporal memory effect. Content-aware features of frames are extracted through deep neural network, and a patch-based method is adopted to obtain frame quality score. Moreover, we propose a temporal memory-based pooling model considering temporal memory effect to predict video quality. Experimental results conducted on KoNViD-1k and LIVE-VQC databases demonstrate that the performance of our proposed method outperforms other state-of-the-art ones, and the comparative analysis proves the efficiency of o ur t emporal p ooling model.
The unmixing of hyperspectral data is a hot topic in the field of r emote s ensing. H owever, in p resence o f various types of noise, especially the noisy channels, the performance of unmixing approaches is seriously...
详细信息
ISBN:
(纸本)9781728180687
The unmixing of hyperspectral data is a hot topic in the field of r emote s ensing. H owever, in p resence o f various types of noise, especially the noisy channels, the performance of unmixing approaches is seriously deteriorated. To enhance the robustness of the unmixing method is a subject worth studying. This paper presents a robust unmixing method based on the recently- proposed multilinear mixing model, where the l(2,1) norm is adopted in the loss function to suppress the influence of noise. The sparseness of abundance is also considered to improve the parameter estimation. The resulting optimization problem is solved by the alternating direction multiplier method (ADMM). Experiments on both synthetic and real images demonstrate the performance of the proposed unmixing strategy.
Recent advances in mobile device technology have turned the mobile phones into powerfull devices with high resolution cameras and fast processing capabilities. Having more user interaction potential compared to regula...
详细信息
ISBN:
(纸本)9781467373869
Recent advances in mobile device technology have turned the mobile phones into powerfull devices with high resolution cameras and fast processing capabilities. Having more user interaction potential compared to regular PCs, mobile devices with cameras can enable richer content-based object image queries: the user can capture multiple images of the query object from different viewing angles and at different scales, thereby providing much more information about the object to improve the retrieval accuracy. The goal of this paper is to improve the mobile image retrieval performance using multiple query images. To this end, we use the well-known bag-of-visual-words approach to represent the images, and employ early and late fusion strategies to utilize the information in multiple query images. With extensive experiments on an object image dataset with a single object per image, we show that multi-image queries result in higher average precision performance than single image queries.
Proliferative Diabetic Retinopathy (PDR) is a serious retinal disease threatening diabetic patients. Intense retinal neovascularization in the retinal image is the most important clinical symptom of PDR, leading to vi...
详细信息
ISBN:
(纸本)9781665475921
Proliferative Diabetic Retinopathy (PDR) is a serious retinal disease threatening diabetic patients. Intense retinal neovascularization in the retinal image is the most important clinical symptom of PDR, leading to visual distortion if not controlled. Accurate and timely detection of neovascularization from retinal images allows patients to receive adequate treatment to avoid further vision loss. In this work, we propose a retinal neovascularization automatic segmentation model based on improved Pyramid Scene Parsing Network (PSP-Net). To improve the accuracy of the model, we introduce the proposed channel attention module into the model. The network is evaluated with color fundus images from practice. Evaluation results show the network is superior to FCN, SegNet, U-Net and PSP-Net in accuracy and sensitivity. The model could achieve accuracy, sensitivity, specificity, precision and Jaccard similarity score of 0.9832, 0.9265, 0.9897, 0.9116 and 0.8501, respectively. This paper proves through plenty of experimental results that the network model is able to improve the accuracy of segmentation, relieve the workload of doctors, and is worthy of further clinical promotion.
In this paper we propose an efficient multi-phase image segmentation for color images based on the piecewise constant multi-phase Vese-Chan model and the split Bregman method. The proposed model is first presented in ...
详细信息
ISBN:
(纸本)9781479902880
In this paper we propose an efficient multi-phase image segmentation for color images based on the piecewise constant multi-phase Vese-Chan model and the split Bregman method. The proposed model is first presented in a four-phase level set formulation and then extended to a multi-phase formulation. The four-phase and multi-phase energy functionals are defined and the corresponding minimization problems of the proposed active contour model are presented. The split Bregman method is applied to minimize the multi-phase energy functional efficiently. The proposed model has been applied to synthetic and real color images with promising results. The advantages of the proposed active contour model have been demonstrated by numerical results.
This paper proposes an algorithm for extracting the boundary of an object. In order to take fall advantage of global shape, our approach uses global shape parameters derived from Point Distribution Model (PDM). Unlike...
详细信息
ISBN:
(纸本)0819444111
This paper proposes an algorithm for extracting the boundary of an object. In order to take fall advantage of global shape, our approach uses global shape parameters derived from Point Distribution Model (PDM). Unlike PDM, the proposed method models global shape using curvature as well as edge. The objective function for applying the shape model is formulated using Bayesian rule. This method can extract a boundary of an object by evaluating the solution maximizing the objective function iteratively. Experimental results show that the proposed method requires less computational cost than the PDM and it is robust to noise, pose variation, and some occlusion.
This paper proposes a classification of the parallelisms in general-purpose processor based systems in three main categories. One category is the intra-processor parallelism that includes multimedia instructions and s...
详细信息
ISBN:
(纸本)0819444111
This paper proposes a classification of the parallelisms in general-purpose processor based systems in three main categories. One category is the intra-processor parallelism that includes multimedia instructions and superscalar and VLIW architectures. The former takes advantage of data parallelism. The latter benefit from instruction level parallelism. Another category is the inter-processor parallelism. We consider the parallelism between processors inside shared memory symmetric multiprocessor systems and in distributed memory clusters of workstations. Finally, in the last category, main features of the system level parallelism are studied including the input/output operations, the memory hierarchy and the exploitation of external processing. The potential gain is studied for each type of parallelism available in general-purpose processor based systems from a theoretical point of view as well as for existing image and video applications. The results in this paper showed that the exploitation of the different levels of parallelism available in PC workstations can lead to considerable gains in speed when optimizing a multimedia application. Finally the results of this work can be used to influence the design of new multimedia systems and media processors.
image captioning is the description of an image with natural language expressions using computer vision and natural language processing fields. Recent advances in hardware and processing power in smartphones lead the ...
详细信息
ISBN:
(纸本)9781665436496
image captioning is the description of an image with natural language expressions using computer vision and natural language processing fields. Recent advances in hardware and processing power in smartphones lead the development of many image captioning applications. In this study, a novel automatic image captioning system based on the encoder-decoder approach that can be applied in smartphones is proposed. While high-level visual information is extracted with the ResNet152V2 convolutional neural network in the encoder part, the proposed decoder transforms the extracted visual information into natural expressions of the images. The proposed decoder with the multilayer gated recurrent unit structure allows generating more meaningful captions using the most relevant visual information. The proposed system has been evaluated using different performance metrics on the MSCOCO dataset and it outperforms the state-of-the-art approaches. The proposed system is also integrated with our custom-designed Android application, named IMECA, which generates captions in offline mode unlike similar applications. Thus, image captioning is intended to be practical for more people.
暂无评论