Video question answering(VideoQA) is a challenging yet important task that requires a joint understanding of low-level video content and high-level textual semantics. Despite the promising progress of existing efforts...
详细信息
Video question answering(VideoQA) is a challenging yet important task that requires a joint understanding of low-level video content and high-level textual semantics. Despite the promising progress of existing efforts, recent studies revealed that current VideoQA models mostly tend to over-rely on the superficial correlations rooted in the dataset bias while overlooking the key video content, thus leading to unreliable results. Effectively understanding and modeling the temporal and semantic characteristics of a given video for robust VideoQA is crucial but, to our knowledge, has not been well investigated. To fill the research gap, we propose a robust VideoQA framework that can effectively model the cross-modality fusion and enforce the model to focus on the temporal and global content of videos when making a QA decision instead of exploiting the shortcuts in datasets. Specifically, we design a self-supervised contrastive learning objective to contrast the positive and negative pairs of multimodal input, where the fused representation of the original multimodal input is enforced to be closer to that of the intervened input based on video perturbation. We expect the fused representation to focus more on the global context of videos rather than some static keyframes. Moreover, we introduce an effective temporal order regularization to enforce the inherent sequential structure of videos for video representation. We also design a Kullback-Leibler divergence-based perturbation invariance regularization of the predicted answer distribution to improve the robustness of the model against temporal content perturbation of videos. Our method is model-agnostic and can be easily compatible with various VideoQA backbones. Extensive experimental results and analyses on several public datasets show the advantage of our method over the state-of-the-art methods in terms of both accuracy and robustness.
Integrating Large Language Models (LLMs) with Knowledge Graphs (KGs) enhances the interpretability and performance of AI systems. This research comprehensively analyzes this integration, classifying approaches into th...
详细信息
Improving website security to prevent malicious online activities is crucial,and CAPTCHA(Completely Automated Public Turing test to tell computers and Humans Apart)has emerged as a key strategy for distinguishing huma...
详细信息
Improving website security to prevent malicious online activities is crucial,and CAPTCHA(Completely Automated Public Turing test to tell computers and Humans Apart)has emerged as a key strategy for distinguishing human users from automated ***-based CAPTCHAs,designed to be easily decipherable by humans yet challenging for machines,are a common form of this ***,advancements in deep learning have facilitated the creation of models adept at recognizing these text-based CAPTCHAs with surprising *** our comprehensive investigation into CAPTCHA recognition,we have tailored the renowned UpDown image captioning model specifically for this *** approach innovatively combines an encoder to extract both global and local features,significantly boosting the model’s capability to identify complex details within CAPTCHA *** the decoding phase,we have adopted a refined attention mechanism,integrating enhanced visual attention with dual layers of Long Short-Term Memory(LSTM)networks to elevate CAPTCHA recognition *** rigorous testing across four varied datasets,including those from Weibo,BoC,Gregwar,and Captcha 0.3,demonstrates the versatility and effectiveness of our *** results not only highlight the efficiency of our approach but also offer profound insights into its applicability across different CAPTCHA types,contributing to a deeper understanding of CAPTCHA recognition technology.
The Internet of Things (IoT) has revolutionized our lives, but it has also introduced significant security and privacy challenges. The vast amount of data collected by these devices, often containing sensitive informa...
详细信息
Video forgery is one of the most serious problems affecting the credibility and reliability of video content. Therefore, detecting video forgery presents a major challenge for researchers due to the diversity of forge...
详细信息
Over the past few years,the application and usage of Machine Learning(ML)techniques have increased exponentially due to continuously increasing the size of data and computing *** the popularity of ML techniques,only a...
详细信息
Over the past few years,the application and usage of Machine Learning(ML)techniques have increased exponentially due to continuously increasing the size of data and computing *** the popularity of ML techniques,only a few research studies have focused on the application of ML especially supervised learning techniques in Requirement engineering(RE)activities to solve the problems that occur in RE *** authors focus on the systematic mapping of past work to investigate those studies that focused on the application of supervised learning techniques in RE activities between the period of 2002–*** authors aim to investigate the research trends,main RE activities,ML algorithms,and data sources that were studied during this ***-five research studies were selected based on our exclusion and inclusion *** results show that the scientific community used 57 *** those algorithms,researchers mostly used the five following ML algorithms in RE activities:Decision Tree,Support Vector Machine,Naïve Bayes,K-nearest neighbour Classifier,and Random *** results show that researchers used these algorithms in eight major RE *** activities are requirements analysis,failure prediction,effort estimation,quality,traceability,business rules identification,content classification,and detection of problems in requirements written in natural *** selected research studies used 32 private and 41 public data *** most popular data sources that were detected in selected studies are the Metric Data Programme from NASA,Predictor Models in Software engineering,and iTrust Electronic Health Care System.
Multi-hop reasoning for incomplete Knowledge Graphs(KGs)demonstrates excellent interpretability with decent *** Learning(RL)based approaches formulate multi-hop reasoning as a typical sequential decision *** intractab...
详细信息
Multi-hop reasoning for incomplete Knowledge Graphs(KGs)demonstrates excellent interpretability with decent *** Learning(RL)based approaches formulate multi-hop reasoning as a typical sequential decision *** intractable shortcoming of multi-hop reasoning with RL is that sparse reward signals make performance *** mainstream methods apply heuristic reward functions to counter this ***,the inaccurate rewards caused by heuristic functions guide the agent to improper inference paths and unrelated object *** this end,we propose a novel adaptive Inverse Reinforcement Learning(IRL)framework for multi-hop reasoning,called AInvR.(1)To counter the missing and spurious paths,we replace the heuristic rule rewards with an adaptive rule reward learning mechanism based on agent’s inference trajectories;(2)to alleviate the impact of over-rewarded object entities misled by inaccurate reward shaping and rules,we propose an adaptive negative hit reward learning mechanism based on agent’s sampling strategy;(3)to further explore diverse paths and mitigate the influence of missing facts,we design a reward dropout mechanism to randomly mask and perturb reward parameters for the reward learning *** results on several benchmark knowledge graphs demonstrate that our method is more effective than existing multi-hop approaches.
Digital image has been used in various fields as an essential carrier. Many color images have been constantly produced since their more realistic description, which takes up much storage space and network bandwidth. T...
详细信息
Smart agriculture systems leverage the possibilities offered by cutting-edge technologies such as IoT, AI, and remote sensing to revolutionize conventional farming by enhancing resource utilization, production, and cr...
详细信息
ISBN:
(纸本)9798331509675
Smart agriculture systems leverage the possibilities offered by cutting-edge technologies such as IoT, AI, and remote sensing to revolutionize conventional farming by enhancing resource utilization, production, and crop damage mitigation. Real-time monitoring of soil and crop health, predictive analytics, pest control, and precision irrigation measures are all enabled by these systems. They are able to address major Indian agriculture issues, consequently boosting yield and profitability and promoting environmental sustainability. The largescale deployment of intelligent agriculture systems will change the agriculture landscape in India and will assure long-term food security for an ever-growing population. Challenges include adequate research and future studies in order to better install and achieve smart agricultural systems to protect crops. Intelligent agriculture involves all advanced research, including science and innovations, in national development through space technologies to enhance soil quality, conserve water, and facilitate agriculture information. Space ventures will undergo improved modernization through the introduction of crop sprayers, precision gene editors, epigenetics, big data analytics, IoT, wind and photovoltaic smart energy, AI-enabled robotic applications, and wide-scale desalination technologies. Implementing digital farming systems in developing economies will help their sectors as 85 percent of the global population is set to live in developing countries by 2030. Automation will prove to be necessary since food scarcity is on the rise along with resource wastage. Control strategies such as the IoT, aerial imagery, machine learning, and artificial intelligence will boost production and prevent soil degradation. These advanced technologies are also able to alleviate such issues as plant disease detection, pesticide management, and water application. The introduction of the Internet of Things in the agricultural research world has started
The automatic localization of the left ventricle(LV)in short-axis magnetic resonance(MR)images is a required step to process cardiac images using convolutional neural networks for the extraction of a region of interes...
详细信息
The automatic localization of the left ventricle(LV)in short-axis magnetic resonance(MR)images is a required step to process cardiac images using convolutional neural networks for the extraction of a region of interest(ROI).The precise extraction of the LV’s ROI from cardiac MRI images is crucial for detecting heart disorders via cardiac segmentation or ***,this task appears to be intricate due to the diversities in the size and shape of the LV and the scattering of surrounding tissues across different ***,this study proposed a region-based convolutional network(Faster R-CNN)for the LV localization from short-axis cardiac MRI images using a region proposal network(RPN)integrated with deep feature classification and *** was trained using images with corresponding bounding boxes(labels)around the LV,and various experiments were applied to select the appropriate layers and set the suitable *** experimental findings showthat the proposed modelwas adequate,with accuracy,precision,recall,and F1 score values of 0.91,0.94,0.95,and 0.95,*** model also allows the cropping of the detected area of LV,which is vital in reducing the computational cost and time during segmentation and classification ***,itwould be an ideal model and clinically applicable for diagnosing cardiac diseases.
暂无评论