Object tracking based on deep learning is a challenging task. Accurate object detection and tracking requires a neural network that is deeper than a typical network. Therefore, it requires more levels of processing an...
详细信息
Traffic signals and other signs like parking., stop signs., etc. have become very crucial in autonomous and s elf-driving cars as it helps the smart system to comply with the basic traffic rules along with that it hel...
详细信息
Modeling energy storage units realistically is challenging as their decision-making is not governed by a marginal cost pricing strategy but relies on expected electricity prices. Existing electricity market models oft...
详细信息
Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate...
详细信息
Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate *** this paper,we propose a VQA system intended to answer yes/no questions about real-world images,in *** support a robust VQA system,we work in two directions:(1)Using deep neural networks to semantically represent the given image and question in a fine-grainedmanner,namely ResNet-152 and Gated Recurrent Units(GRU).(2)Studying the role of the utilizedmultimodal bilinear pooling fusion technique in the *** the model complexity and the overall model *** fusion techniques could significantly increase the model complexity,which seriously limits their applicability for VQA *** far,there is no evidence of how efficient these multimodal bilinear pooling fusion techniques are for VQA systems dedicated to yes/no ***,a comparative analysis is conducted between eight bilinear pooling fusion techniques,in terms of their ability to reduce themodel complexity and improve themodel performance in this case of VQA *** indicate that these multimodal bilinear pooling fusion techniques have improved the VQA model’s performance,until reaching the best performance of 89.25%.Further,experiments have proven that the number of answers in the developed VQA system is a critical factor that *** the effectiveness of these multimodal bilinear pooling techniques in achieving their main objective of reducing the model *** Multimodal Local Perception Bilinear Pooling(MLPB)technique has shown the best balance between the model complexity and its performance,for VQA systems designed to answer yes/no questions.
The field of human activity recognition has evolved significantly, driven largely by advancements in Internet of Things (IoT) device technology, particularly in personal devices. This study investigates the use of ult...
详细信息
This paper presents two hands-on, project-based courses on unmanned aerial systems recently offered by the Intelligent systemsengineering program at Indiana University. In Fall 2023, ENGR-E399/599 Autonomous Sports w...
详细信息
In recent times, appropriate decision-making in challenging and critical situations has been very well supported by multicriteria decision-making (MCDM) methods. The technique for order of preference by similarity to ...
详细信息
Cluster analysis can be perceived as a problem of grouping data points according to their mutual similarity. Clustering quality largely depends on choosing an effective distance metric, especially when dealing with mi...
详细信息
A successful software development process depends on software requirements. Those requirements are often classified into two categories: functional requirements (FR) and non-functional requirements (NFR). In software ...
详细信息
Brain tumors and intracranial hemorrhages are serious medical conditions that can greatly impact the quality of life for patients. Early detection and diagnosis of these conditions are crucial for effective treatment ...
详细信息
暂无评论