The palm oil industry is an important subsector in the Indonesian economy. Counting the number of oil palm trees using drone imagery is crucial in developing efficient strategies for managing oil palm plantations. Thi...
详细信息
Two-stage stochastic programming is a problem formulation for decision-making under uncertainty. In the first stage, the actor makes a best "here and now" decision in the presence of uncertain quantities tha...
详细信息
With the surge in the number of low earth orbit (LEO) satellites, continuous research has emerged on using satellite data to train artificial intelligence models. On one hand, traditional centralized training on the g...
详细信息
To date, over 40 Automated Program Repair (APR) tools have been designed with varying bug-fixing strategies, which have been demonstrated to have complementary performance in terms of being effective for different bug...
详细信息
Prefabrication promises to industrialize the construction industry. By constructing elements within a manufacturing environment, producers can better control quality and maximize production efficiency. Since the major...
详细信息
In this paper, a new low-frequency passive echo enhancer is designed, and the performance difference between single-layer enhancer and multilayer enhancer is elaborated. Based on the resonant frequency of planar spira...
详细信息
In recent years, micro-video apps such as TikTok, Kwai, etc. have become widely popular, but recommendation models dedicated for micro-videos are relatively few in number. The paper analyses the characteristics of mic...
详细信息
3D object detection based on deep neural networks (DNNs) has widely been adopted in the field of embedded applications, such as autonomous driving. Nonetheless, recent studies have demonstrated that LiDAR data tends t...
详细信息
Graph database systems store graph data as nodes and relationships, and utilize graph query languages (e.g., Cypher) for efficiently querying graph data. Proving the equivalence of graph queries is an important founda...
详细信息
Vision-language pretrained models have achieved significant success across various tasks. However, the lack of interpretability limits the application of multimodal models, especially for those that require the securi...
详细信息
Vision-language pretrained models have achieved significant success across various tasks. However, the lack of interpretability limits the application of multimodal models, especially for those that require the security of systems, data, and users. A key challenge in enhancing the interpretability of the models is the tradeoff between transparency and model performance in terms of accuracy and computing efficiency. We propose a vision-language multimodal pretrained model called multiway-fuzzy-experts bidirectional retention network (VL-MFER), which is designed to effectively interpret model decision processes while enhancing the consistent mapping between text and image features. We first propose a bidirectional retention network to handle and integrate the cross-modal high-dimensional data, which can effectively enhance both the performance and inference efficiency. Then, to improve the interpretability of vision-language pretrained models such as the unified vision-language pretraining with mixture-of-modality-experts, we propose a multiway fuzzy experts pool based on multiple different layered deep neuro-fuzzy systems for diverse downstream tasks. After training on diverse datasets consisting of images, text, and paired image-text data, we fine-tuned models for multiple vision and vision-language tasks. Experimental results show that VL-MFER performs well in all performance metrics of downstream tasks, and VL-MFER leads to enhancements in transparency compared to other methods, while improving computational efficiency and reducing inference time by up to 13%.
暂无评论