Many application from the bee colony health state monitoring could be efficiently solved using a computervision techniques. One of such challenges is an efficient way for counting the number of incoming and outcoming...
详细信息
The generalization of Euclidean network paradigm to the Riemannian manifolds has attracted much attention for offering useful geometric representations in processing manifold-valued data in recent years. However, the ...
详细信息
ISBN:
(数字)9798350359312
ISBN:
(纸本)9798350359329
The generalization of Euclidean network paradigm to the Riemannian manifolds has attracted much attention for offering useful geometric representations in processing manifold-valued data in recent years. However, the information degradation during data compression mapping hinders Riemannian networks from going deeper, and there are very few solutions specifically designed for this problem. Given the remarkable success of deep Residual learning in Euclidean networks, a novel Riemannian residual learning mechanism (RRLM) is proposed in the context of Symmetric Positive Definite (SPD) manifolds, enabling the characterization of deep spatiotemporal features while preserving the manifold properties. Based on RRLM, a stack of SPD manifold-constrained residual-like blocks is designed on the tail of the original SPDNet(backbone) for the sake of conducting deep Riemannian residual learning. For simplicity, we refer to the network architecture introduced above as Riemannian residual SPD network (ResSPDNet). The experimental results achieved on three types of visual classification tasks, i.e., facial emotion recognition, drone recognition, and action recognition, demonstrate that our method can achieve improved accuracy with a deepened network structure.
The action anticipation task refers to predicting what action will happen based on observed videos, which requires the model to have a strong ability to summarize the present and then reason about the future. Experien...
详细信息
Knowledge distillation (KD) is a promising solution to compress large language models (LLMs) by transferring their knowledge to smaller models. During this process, white-box KD methods usually minimize the distance b...
详细信息
Audio-visual question answering (AVQA) requires reference to video content and auditory information, followed by correlating the question to predict the most precise answer. Although mining deeper layers of audio-visu...
详细信息
We present CLIDSUM, a benchmark dataset towards building cross-lingual summarization systems on dialogue documents. It consists of 67k+ dialogue documents and 112k+ annotated summaries in different target languages. B...
详细信息
Although there are advanced technologies for character recognition, automatic descriptive answer evaluation is an open challenge for the document image analysis community due to large diversified handwritten text and ...
ISBN:
(纸本)9781450397056
Although there are advanced technologies for character recognition, automatic descriptive answer evaluation is an open challenge for the document image analysis community due to large diversified handwritten text and answers to the question. This paper presents a novel method for detecting anomaly handwritten text in the responses written by the students to the questions. The method is proposed based on the fact that when the students are confident in answering questions, the students usually write answers legibly and neatly while they are not confident, they write sloppy writing which may not be easy for the reader to understand. To detect such anomaly handwritten text, we explore a new combination of Fourier transform and deep learning model for detecting edges. This result preserves the structure of handwritten text. For extracting features for classification of anomaly text and normal text, the proposed method studies the behavior of writing style, especially the variation at ascenders and descenders. Therefore, the proposed work draws principal axis which is invariant to rotation, scaling and some extent to distortion for the edge images. With respect to principal axis, the proposed method draws medial axis using uppermost and lowermost points. The distance between the medial axis and principal axis points are considered as feature vector. Further, the feature vector is passed to Artificial Neural Network for classification of anomaly text. The proposed method is evaluated by testing on our own dataset, standard dataset of gender identification (IAM) and handwritten forgery detection dataset (ACPR 2019). The results on different datasets show that the proposed work outperforms the existing methods.
At present, the most advanced semantic segmentation model training mainly relies on pixel-level annotation, that is, annotating the category of each pixel of an image. Such annotation usually is time-consuming and exp...
详细信息
A novel wideband 5.8GHz CPW-fed antenna is presented for Radio frequency identification (RFID) tag. Four U-shaped and four L-shaped branches are used as additional resonators to achieve wideband operation. The propose...
详细信息
A novel wideband 5.8GHz CPW-fed antenna is presented for Radio frequency identification (RFID) tag. Four U-shaped and four L-shaped branches are used as additional resonators to achieve wideband operation. The proposed antenna was analyzed numerically using the Method of moment (MOM) and the Finite element method (FEM). With the antenna size limited to $30\times 30 \text{mm}^{2}$ , the −10dB bandwidth obtained by MOM is 3.235GHz (5.765∼9GHz) and the −9.5dB band-width obtained by FEM is 2.74GHz (5.32∼8.06GHz), corresponding to 55.7% and 47.2% of the center frequency 5.8GHz respectively. Moreover, the simulated results show that the proposed antenna has gain of more than 4.8dBi and the radiation pattern is nearly omnidirectional in the H-plane. The measured −10dB bandwidth is 2.68GHz (5.63GHz∼8.31GHz), 46.2% of the 5.8GHz frequency. Furthermore, there are three measured resonant frequencies at 1.34GHz, 3.23GHz and 5.8GHz with lower than −10dB return loss respectively. The measurement result achieves a wideband RFID tag antenna performance and is in good agreement with the calculated results.
In-context learning (ICL) emerges as a promising capability of large language models (LLMs) by providing them with demonstration examples to perform diverse tasks. However, the underlying mechanism of how LLMs learn f...
详细信息
暂无评论