Video captioning aims to generate natural language descriptions for a given video clip. Existing methods mainly focus on end-to-end representation learning via word-by-word comparison between predicted captions and gr...
详细信息
Video captioning aims to generate natural language descriptions for a given video clip. Existing methods mainly focus on end-to-end representation learning via word-by-word comparison between predicted captions and ground-truth texts. Although significant progress has been made, such supervised approaches neglect semantic alignment between visual and linguistic entities, which may negatively affect the generated captions. In this work, we propose a hierarchical modular network to bridge video representations and linguistic semantics at four granularities before generating captions: entity, verb, predicate, and sentence. Each level is implemented by one module to embed corresponding semantics into video representations. Additionally, we present a reinforcement learning module based on the scene graph of captions to better measure sentence similarity. Extensive experimental results show that the proposed method performs favorably against the state-of-the-art models on three widely-used benchmark datasets, including microsoft research video description corpus (MSVD), MSR-video to text (MSR-VTT), and video-and-TEXt (VATEX).
作者:
Wang, YadiWang, JunPal, Nikhil R.Henan Univ
Sch Comp & Informat Engn Henan Key Lab Big Data Anal & Proc Kaifeng 475004 Peoples R China Henan Univ
Sch Comp & Informat Engn Inst Data & Knowledge Engn Kaifeng 475004 Peoples R China City Univ Hong Kong
Dept Comp Sci Kowloon Hong Kong Peoples R China City Univ Hong Kong
Sch Data Sci Kowloon Hong Kong Peoples R China Indian Stat Inst
Ctr Artificial Intelligence & Machine Learning Kolkata 700108 India Indian Stat Inst
Elect & Commun Sci Unit Kolkata 700108 India
As a crucial part of machinelearning and pattern recognition, feature selection aims at selecting a subset of the most informative features from the set of all available features. In this article, supervised feature ...
详细信息
As a crucial part of machinelearning and pattern recognition, feature selection aims at selecting a subset of the most informative features from the set of all available features. In this article, supervised feature selection is at first formulated as a mixed-integer optimization problem with an objective function of weighted feature redundancy and relevancy subject to a cardinality constraint on the number of selected features. It is equivalently reformulated as a bound-constrained mixed-integer optimization problem by augmenting the objective function with a penalty function for realizing the cardinality constraint. With additional bilinear and linear equality constraints for realizing the integrality constraints, it is further reformulated as a bound-constrained biconvex optimization problem with two more penalty terms. Two collaborative neurodynamic optimization (CNO) approaches are proposed for solving the formulated and reformulated feature selection problems. One of the proposed CNO approaches uses a population of discrete-time recurrent neural networks (RNNs), and the other use a pair of continuous-time projection networks operating concurrently on two timescales. Experimental results on 13 benchmark datasets are elaborated to substantiate the superiority of the CNO approaches to several mainstream methods in terms of average classification accuracy with three commonly used classifiers.
Unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) has largely extended the border and capacity of artificial intelligence of things (AIoT) by providing a key element for enabling flexible distributed ...
详细信息
Unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) has largely extended the border and capacity of artificial intelligence of things (AIoT) by providing a key element for enabling flexible distributed data inputs, computing capacity, and high mobility. To enhance data privacy for AIoT applications, federated learning (FL) is becoming a potential solution to perform training tasks locally on distributed IoT devices. However, with the limited onboard resources and battery capacity of each UAV node, optimization is required to achieve a large-scale and high-precision FL scheme. In this work, an optimized multi-UAV-assisted FL framework is designed, where regular IoT devices are in charge of performing training tasks, and multiple UAVs are leveraged to execute local and global aggregation tasks. An online resource allocation (ORA) algorithm is proposed to minimize the training latency by jointly deciding the selection decisions of clients and a global aggregation server. By leveraging the Lyapunov optimization technique, virtual energy queues are studied to depict the energy deficit. With the help of the actor-critic learning framework, a deep reinforcement learning (DRL) scheme is designed to improve per-round training performance. A deep neural network (DNN)-based actor module is designed to derive client selection decisions, and a critic module is proposed through a conventional optimization method to evaluate the obtained selection decisions. Moreover, a greedy scheme is developed to find the optimal global aggregation server. Finally, extensive simulation results demonstrate that the proposed ORA algorithm can achieve optimal training latency and energy consumption under various system settings.
Internet of Things (IoT) devices are becoming increasingly ubiquitous in daily life. They are utilized in various sectors like healthcare, manufacturing, and transportation. The main challenges related to IoT devices ...
详细信息
Internet of Things (IoT) devices are becoming increasingly ubiquitous in daily life. They are utilized in various sectors like healthcare, manufacturing, and transportation. The main challenges related to IoT devices are the potential for faults to occur and their reliability. In classical IoT fault detection, the client device must upload raw information to the central server for the training model, which can reveal sensitive business information. Blockchain (BC) technology and a fault detection algorithm are applied to overcome these challenges. Generally, the fusion of BC technology and fault detection algorithms can give a secure and more reliable IoT ecosystem. Therefore, this study develops a new Blockchain Assisted data Edge Verification with Consensus Algorithm for machinelearning (BDEV-CAML) technique for IoT Fault Detection purposes. The presented BDEV-CAML technique integrates the benefits of blockchain, IoT, and ML models to enhance the IoT network's trustworthiness, efficacy, and security. In BC technology, IoT devices that possess a significant level of decentralized decision-making capability can attain a consensus on the efficiency of intrablock transactions. For fault detection in the IoT network, the deep directional gated recurrent unit (DbigRU) model is used. Finally, the African vulture optimization algorithm (AVOA) technique is utilized for the optimal hyperparameter tuning of the DbigRU model, which helps in improving the fault detection rate. A detailed set of experiments were carried out to highlight the enhanced performance of the BDEV-CAML algorithm. The comprehensive experimental results stated the improved performance of the BDEV-CAML technique over other existing models with maximum accuracy of 99.6%.
Hypertension is a major global health concern, linked to various cardiovascular diseases and associated with distinct ocular manifestations. While recent advances in artificial intelligence have enabled accurate diagn...
详细信息
ISBN:
(纸本)9798350313345;9798350313338
Hypertension is a major global health concern, linked to various cardiovascular diseases and associated with distinct ocular manifestations. While recent advances in artificial intelligence have enabled accurate diagnosis of current hypertension through fundus images, predicting the future onset of hypertension remains an uncharted domain. In this study, we introduce the multi-scale clinical-guided binocular fusion framework (MCBO), designed to predict the likelihood of developing hypertension within the next four years. MCBO uniquely integrates left and right fundus images and clinical data, utilizing a shared-weight multi-stage Transformer-based encoder. Our multi-scale clinical-guided module (MCM) ensures image feature extraction is clinically contextualized based on clinical information, and our binocular fusion module (BFM) fuses binocular information. Comparative performance against seven baseline models establishes MCBO's supremacy, with improvements of 6.7% in Area Under Curve (AUC), 6.9% in Accuracy (ACC), 5.1% in Sensitivity (SEN) and 5.5% in Specificity (SPE). This approach offers a promising avenue for proactive hypertension management, underscoring the potential of integrating Deep learning with clinical data for enhanced healthcare outcomes. Our code is available at https://***/HaoshenLi/MCBO.
The imbalanced data classification problem has aroused lots of concerns from both academia and industry since data imbalance is a widespread phenomenon in many real-world scenarios. Although this problem has been well...
详细信息
Graph Neural Networks (GNNs) have achieved great success in various data mining tasks but they heavily rely on a large number of annotated nodes, requiring considerable human efforts. Despite the effectiveness of exis...
详细信息
Vehicle data security is a fundamental requirement in any vehicular social network (VSN). Encryption is the key technology to address this requirement. However, with encryption, we lose all the accessibility to the da...
详细信息
Vehicle data security is a fundamental requirement in any vehicular social network (VSN). Encryption is the key technology to address this requirement. However, with encryption, we lose all the accessibility to the data. Further, we may need to provide differential access capabilities to the different users. The attribute-based searchable encryption (ABSE) method satisfies all these requirements. It is a method for safely searching through encrypted files stored in a networked repository. It is a multi-user encryption method that combines the advantages of attribute-based encryption (ABE) with searchable encryption (SE). However, ABSE has an inherent cost and cannot be applied in a resource-constrained setting. Therefore, the proposed scheme aims to reduce the computational cost by readily accommodating frequent changes in the access structure and using a secret key and search trapdoor of constant size. This, in turn, reduces the bandwidth requirement as well. In addition, the suggested technique requires constant pairing operations during the search phase, making the search operation fast. Quantitatively, the secret key and trapdoor storage costs have been reduced to two and four-source group elements, respectively, and the number of bilinear pairing operations in the search algorithm has been reduced to four.
Class imbalance is one of the significant challenges in classification problems. The uneven distribution of data samples in different classes may occur due to human error, improper/unguided collection of data samples,...
详细信息
Class imbalance is one of the significant challenges in classification problems. The uneven distribution of data samples in different classes may occur due to human error, improper/unguided collection of data samples, etc. The uneven distribution of class samples among classes may affect the classification accuracy of the developed model. The main motivation behind this study is the design and development of methodologies for handling class imbalance problems. In this study, a new variant of the synthetic minority oversampling technique (SMOTE) has been proposed with the hybridization of particle swarm optimization (PSO) and Egyptian vulture (EV). The proposed method has been termed SMOTE-PSOEV in this study. The proposed method generates an optimized set of synthetic samples from traditional SMOTE and augments the five datasets for verification and validation. The SMOTE-PSOEV is then compared with existing SMOTE variants, i.e., Tomek Link, Borderline SMOTE1, Borderline SMOTE2, Distance SMOTE, and ADASYN. After data augmentation to the minority classes, the performance of SMOTE-PSOEV has been evaluated using support vector machine (SVM), Naive Bayes (NB), and k-nearest-neighbor (k-NN) classifiers. The results illustrate that the proposed models achieved higher accuracy than existing SMOTE variants.
Video Individual Counting (VIC) aims to predict the number of unique individuals in a single video. Existing methods learn representations based on trajectory labels for individuals, which are annotation-expensive. To...
ISBN:
(纸本)9798350353006
Video Individual Counting (VIC) aims to predict the number of unique individuals in a single video. Existing methods learn representations based on trajectory labels for individuals, which are annotation-expensive. To provide a more realistic reflection of the underlying practical challenge, we introduce a weakly supervised VIC task, wherein trajectory labels are not provided. Instead, two types of labels are provided to indicate traffic entering the field of view (inflow) and leaving the field view (outflow). We also propose the first solution as a baseline that formulates the task as a weakly supervised contrastive learning problem under group-level matching. In doing so, we devise an end-to-end trainable soft contrastive loss to drive the network to distinguish inflow, outflow, and the remaining. To facilitate future study in this direction, we generate annotations from the existing VIC datasets SenseCrowd and CroHD and also build a new dataset, UAVVIC. Extensive results show that our baseline weakly supervised method outperforms supervised methods, and thus, little information is lost in the transition to the more practically relevant weakly supervised task. The code and trained model can be found at CGNet.
暂无评论