Long-tailed data classification is prevalent in real-world scenarios, but training on such datasets can lead to biased classifications and poor performance. We address this challenge by focusing on improving feature r...
详细信息
Long-tailed data classification is prevalent in real-world scenarios, but training on such datasets can lead to biased classifications and poor performance. We address this challenge by focusing on improving feature representation for tail classes, which is often lower in quality due to their closer proximity to other distinct classes. Inspired by the similarity between head and tail classes, we propose Class-wise Knowledge Distillation (CKD) to help tail classes learn prediction distributions from head classes, thus calibrating their features. Additionally, we introduce Hard Negative Samples Sampling (HNSS) to enhance feature separation by selecting challenging negative examples for contrastive learning. Our Feature Calibration and Feature Separation (FCFS) method achieves competitive results on CIFAR10-LT, CIFAR100-LT, and ImageNet-LT benchmarks, demonstrating effective feature learning for long-tailed classification problems. This approach leverages both knowledge distillation and hard negative sampling to improve model performance.
The selection of activation functions in visual recognition significantly influences training dynamics and task performance. This study introduces an activation function called local spatial and global context activat...
详细信息
ISBN:
(纸本)9789819784899;9789819784905
The selection of activation functions in visual recognition significantly influences training dynamics and task performance. This study introduces an activation function called local spatial and global context activation (SCeLU), which is a conceptually effective activation function. SCeLU extends the Rectified Linear Unit (ReLU) and FReLU to a 3D activation by incorporating a negligible overhead of spatial context conditions. The forms of ReLU and FReLU are f (x) = max (x, 0) and f (x) = max (x, T(x)), respectively, where T(.) represents the 2D spatial condition. However, SCeLU takes the form of f (x) = max (x, Pi(x) . Gamma(x)), where Pi(.) represents the 3D global context condition and Gamma(.) represents the 2D local spatial condition. Intuitively, the context condition facilitates the modeling of global information, while the spatial condition enhances the capacity for local pixel-wise modeling. By appropriately combining spatial and context conditions, SCeLU demonstrates adaptability to complex visual layouts in various image recognition tasks. By simply changing the activation function, experiments conducted on ImageNet demonstrate a significant enhancement and robustness of SCeLU, particularly for small models, and some enhancements under partially highly optimized large models. Furthermore, our novel SCeLU seamlessly extends to object detection and semantic segmentation tasks, underscoring its effectiveness as an effective alternative in various visual recognition tasks. Our model is open-sourced at https://***/ YunDuanFei/SCeLU.
Current vision-inspired spiking neural networks (SNNs) face key challenges due to their model structures typically focusing on single mechanisms and neglecting the integration of multiple biological features. These li...
详细信息
Current vision-inspired spiking neural networks (SNNs) face key challenges due to their model structures typically focusing on single mechanisms and neglecting the integration of multiple biological features. These limitations, coupled with limited synaptic plasticity, hinder their ability to implement biologically realistic visual processing. To address these issues, we propose Spike-VisNet, a novel retina-inspired framework designed to enhance visual recognition capabilities. This framework simulates both the functional and layered structure of the retina. To further enhance this architecture, we integrate the FocusLayer-STDP learning rule, allowing Spike-VisNet to dynamically adjust synaptic weights in response to varying visual stimuli. This rule combines channel attention, inhibition mechanisms, and competitive mechanisms with spike-timing- dependent plasticity (STDP), significantly improving synaptic adaptability and visual recognition performance. Comprehensive evaluations on benchmark datasets demonstrate that Spike-VisNet outperforms other STDPbased SNNs, achieving precision scores of 98.6% on MNIST, 93.29% on ETH-80, and 86.27% on CIFAR-10. These results highlight its effectiveness and robustness, showcasing Spike-VisNet's potential to simulate human visual processing and its applicability to complex real-world visual challenges.
To solve the problem of finding the target object in the complicated maze, an intelligent robot with visual recognition and automatic driving is designed, which uses an OpenMV visual recognition module to identify the...
详细信息
his paper describes GREFIT (Gesture recognition based on Finger Tips), a neural-network-based system which recognizes continuous hand postures from gray-level video images (posture capturing). Our approach yields a fu...
详细信息
his paper describes GREFIT (Gesture recognition based on Finger Tips), a neural-network-based system which recognizes continuous hand postures from gray-level video images (posture capturing). Our approach yields a full identification of all finger joint angles (making, however, some assumptions about joint couplings to simplify computations). This allows a full reconstruction of the three-dimensional (3-D) hand shape, using an articulated hand model with 16 segments and 20 joint angles. GREFIT uses a two-stage approach to solve this task. In the first stage, a hierarchical system of artificial neural networks (ANNs) combined with a priori knowledge locates the two-dimensional (2-D) positions of the finger tips in the image. In the second stage, the 2-D position information is transformed by an ANN into an estimate of the 3-D configuration of an articulated hand model, which is also used for visualization. This model is designed according to the dimensions and movement possibilities of a natural human hand. The virtual hand imitates the user's hand to an remarkable accuracy and can follow postures from gray scale images at a frame rate of 10 Hz.
visual perception is a fundamental component for most robotics systems operating in human environments. Specifically, visual recognition is a prerequisite to a large variety of tasks such as tracking, manipulation, hu...
详细信息
visual perception is a fundamental component for most robotics systems operating in human environments. Specifically, visual recognition is a prerequisite to a large variety of tasks such as tracking, manipulation, human-robot interaction. As a consequence, the lack of successful recognition often becomes a bottleneck for the application of robotics system to real-world situations. In this paper we aim at improving the robot visual perception capabilities in a natural, human-like fashion, with a very limited amount of constraints to the acquisition scenario. In particular our goal is to build and analyze a learning system that can rapidly be re-trained in order to incorporate new evidence if available. To this purpose, we review the state-of-the-art coding-pooling pipelines for visual recognition and propose two modifications which allow us to improve the quality of the representation, while maintaining real-time performances: a coding scheme, Best Code Entries (BCE), and a new pooling operator, Mid-Level Classification Weights (MLCW). The former focuses entirely on sparsity and improves the stability and computational efficiency of the coding phase, the latter increases the discriminability of the visual representation, and therefore the overall recognition accuracy of the system, by exploiting data supervision. The proposed pipeline is assessed from a qualitative perspective on a Human-Robot Interaction (HRI) application on the iCub platform. Quantitative evaluation of the proposed system is performed both on in-house robotics data sets (iCubWorld) and on established computer vision benchmarks (Caltech-256, PASCAL VOC 2007). As a byproduct of this work, we provide for the robotics community an implementation of the proposed visual recognition pipeline which can be used as perceptual layer for more complex robotics applications. (C) 2016 Published by Elsevier B.V.
In this paper, we propose a deep model of visual recognition based on hybrid KPCA Network(H-KPCANet), which is based on the combination of one-stage KPCANet and two-stage KPCANet. The proposed model consists of four t...
详细信息
In this paper, we propose a deep model of visual recognition based on hybrid KPCA Network(H-KPCANet), which is based on the combination of one-stage KPCANet and two-stage KPCANet. The proposed model consists of four types of basic components: the input layer, one-stage KPCANet, two-stage KPCANet and the fusion layer. The role of one-stage KPCANet is to calculate the KPCA filters for convolution layer, and two-stage KPCANet is to learn PCA filters in the first stage and KPCA filters in the second stage. After binary quantization mapping and block-wise histogram, the features from two different types of KPCANets are fused in the fusion layer. The final feature of the input image can be achieved by weighted serial combination of the two types of features. The performance of our proposed algorithm is tested on digit recognition and object classification, and the experimental results on visual recognition benchmarks of MNIST and CIFAR-10 validated the performance of the proposed H-KPCANet.
A series of novel six-coordinated terpyridine zinc complexes,containing ammonium salts and thymine fragment at the two terminals,have been designed and synthesized,which can function as highly sensitive visualized sen...
详细信息
A series of novel six-coordinated terpyridine zinc complexes,containing ammonium salts and thymine fragment at the two terminals,have been designed and synthesized,which can function as highly sensitive visualized sensors for melamine detection via selective metallo-hydrogel *** fully characterization by various techniques,the complementary triple-hydrogen-bonding between the thymine fragment and melamine,as well as π-π stacking interactions may be responsible for the selective metallo-hydrogel *** light of the possible interference aroused by milk ingredients(proteins,peptides and amino acids) and legal/illegal additives(urine,sugars and vitamins),a series of control experiments are therefore *** our delight,this visual recognition is highly selective,no gelation was observed with the selected milk ingredients or ***,this new developed protocol enables convenient and highly selective visual recognition of melamine at a concentration as low as 10 ppm in raw milk without any tedious pretreatment.
Comparative studies of memory in monkey and human subjects suggest similarities in visual recognition memory across human and nonhuman primates. In order to investigate developmental aspects of visual recognition memo...
详细信息
Comparative studies of memory in monkey and human subjects suggest similarities in visual recognition memory across human and nonhuman primates. In order to investigate developmental aspects of visual recognition memory in monkey infants, the familiarization‐novelty procedure, developed for use with human infants, was employed with pigtailed monkey infants to study long‐delay recognition memory. Subjects were familiarized with a black‐and‐white abstract pattern. Twenty‐four hours later they were tested with the familiar pattern paired with a novel one. Results indicated a significant visual preference for the novel stimulus, providing evidence for recognition memory. These results parallel those obtained with human infants, suggesting further similarities in the development of visual recognition memory.
visual recognition in monkeys appears to involve the participation of two limbothalamic pathways, one including the amygdala and the magnocellular portion of the medial dorsal nucleus (MDmc) and the other, the hippoca...
详细信息
visual recognition in monkeys appears to involve the participation of two limbothalamic pathways, one including the amygdala and the magnocellular portion of the medial dorsal nucleus (MDmc) and the other, the hippocampus and the anterior nuclei of the thalamus (Ant N). Both MDmc and Ant N project, in turn, to the prefrontal cortex, mainly to its ventral and medial portions. To test whether the prefrontal projection targets of the two limbothalamic pathways also participate in memory functions, performance on a variety of learning and memory tasks was assessed in monkeys with lesions of the ventromedial prefrontal cortex (Group VM). Normal monkeys and monkeys with lesions of dorsolateral prefrontal cortex (Group DL) served as controls. Group VM was severely impaired on a test of object recognition, whereas Group DL did not differ appreciably from normal animals. Conversely, the animals in Group VM were able to learn a spatial delayed response task, whereas 2 of the 3 animals in Group DL could not. Neither group was impaired in the acquisition of visual discrimination habits, even though the successive trials on a given discrimination were separated by 24-h intervals. The patterns of deficient suggest that ventromedial prefrontal cortex constitutes another station in the limbothalamic system underlying cognitive memory processes, whereas the dorsolateral prefrontal cortex lies outside this system. The results support the view that the classical delayed-response deficit observed after dorsolateral prefrontal lesions represents a perceptuo-mnemonic impairment in spatial functions selectively rather than a memory loss of a more general nature.
暂无评论