In a world brimming with new products continually, novel waste types are ubiquitous. This makes current image-based garbage classification systems difficult to perform well due to the long-tailed effects of distributi...
详细信息
ISBN:
(数字)9798350365856
ISBN:
(纸本)9798350365863
In a world brimming with new products continually, novel waste types are ubiquitous. This makes current image-based garbage classification systems difficult to perform well due to the long-tailed effects of distribution of garbage types, and necessitates an urgent and efficient garbage classification with abilities of detecting new and rare wastes and class-incremental learning for environmental sustainability. Therefore, we propose a framework of Online System of Garbage Image-Oriented Intelligent Classification, Submission, and Examination, facilitating the incremental garbage classification efforts. In which, to identify novel garbage effectively, we also introduced few-shot object detection method with two key algorithms: Two-Stage Object Detection Learning Algorithm and Dynamic Query-based Incremental Few-shot Learning Algorithm. Our experiment results show that Both outperform the current existing ones in dataset, MS COCO. Then, a strategy of Class-Incremental learning based Residual Network is proposed to meet the need of new waste class-incremental learning. The experimental results support our strategy. Finally, a prototype system employed the above algorithms and the strategy is described.
Momentum Contrast (MoCo) achieves great success for unsupervised visual representation learning. However, there are a lot of supervised and semi-supervised datasets, which are already labeled. To fully utilize the lab...
详细信息
Structured pruning is a widely used technique for reducing the size of pre-trained language models (PLMs), but current methods often overlook the potential of compressing the hidden dimension (d) in PLMs, a dimension ...
The growth in data storage capacity and the increasing demands for high performance have created several challenges for concurrent indexing structures. One promising solution is learned indexes, which use a learning-b...
详细信息
3D sketches are widely used for visually representing the 3D shape and structure of objects or scenes. However, the creation of 3D sketch often requires users to possess professional artistic skills. Existing research...
详细信息
The intensive computations in convolutional neural networks (CNNs) pose challenges for resource-constrained devices; eliminating redundant computations from convolution is essential. This paper gives a principled meth...
ISBN:
(纸本)9781713871088
The intensive computations in convolutional neural networks (CNNs) pose challenges for resource-constrained devices; eliminating redundant computations from convolution is essential. This paper gives a principled method to detect and avoid transient redundancy, a type of redundancy existing in input data or activation maps and hence changing across inferences. By introducing a new form of convolution (TREC), this new method makes transient redundancy detection and avoidance an inherent part of the CNN architecture, and the determination of the best configurations for redundancy elimination part of CNN backward propagation. We provide a rigorous proof of the robustness and convergence of TREC-equipped CNNs. TREC removes over 96% computations and achieves 3.51× average speedups on microcontrollers with minimal (about 0.7%) accuracy loss.
Top-one recommendation with anonymous user behaviors, also known as session-based recommendation (SBR), faces challenges of top-one ranking and short anonymous sequences. To this end, we propose a novel objective that...
In multi-label learning, each instance is associated with a set of labels simultaneously. Most existing studies assume that the set of labels for each instance is complete. However, it is generally difficult to obtain...
In multi-label learning, each instance is associated with a set of labels simultaneously. Most existing studies assume that the set of labels for each instance is complete. However, it is generally difficult to obtain all the relevant labels of each instance, and only a partial or even empty set of relevant labels is available, which is called semi-supervised multi-label learning with missing labels. To tackle this problem, we propose a novel framework that considers label correlations and instance correlations to recover the missing labels and utilizes a large amount of unlabeled data simultaneously to improve the classification performance. Specifically, a new supplementary label matrix is firstly obtained by learning the label correlation. Secondly, considering each class label may be decided by some specific characteristics of its own, a label-specific data representation is hence learned for each class label. Thirdly, instance correlations are utilized not only to recover the missing labels, but also to propagate the supervision information from labeled instances to unlabeled ones. In addition, a united objective function is designed to facilitate the above processing and an accelerated proximal gradient method is adopted to solve the optimization problem. Finally, extensive experimental results conducted on several benchmark datasets demonstrate the effectiveness of the proposed method compared to competing ones.
Vision and diverse languages are important information sources in our living world. A model that understands multi-modalities and multi-languages can be applied to a wider range of real-life scenarios. To build such a...
ISBN:
(纸本)9781713871088
Vision and diverse languages are important information sources in our living world. A model that understands multi-modalities and multi-languages can be applied to a wider range of real-life scenarios. To build such a multimodal and multilingual model, existing works try to ensemble vision-language data from multiple languages in pre-training. However, due to the large number of languages, these works often require huge computing resources and cannot be flexibly extended to new languages. In this work, we propose a Multi-Lingual Acquisition (MLA) framework that can easily empower a monolingual Vision-Language Pre-training (VLP) model with multilingual capability. Specifically, we design a lightweight language acquisition encoder based on state-of-the-art monolingual VLP models. We further propose a two-stage training strategy to optimize the language acquisition encoder, namely the Native Language Transfer stage and the Language Exposure stage. With much less multilingual training data and computing resources, our model achieves state-of-the-art performance on multilingual image-text and video-text retrieval benchmarks.
暂无评论