This paper presents a new approach to image-based visual servoing (IBVS) for Autonomous Underwater Vehicles (AUVs) with the goal of improved performance and computational efficiency. Traditional IBVS methods, when com...
详细信息
ISBN:
(纸本)9798350363029;9798350363012
This paper presents a new approach to image-based visual servoing (IBVS) for Autonomous Underwater Vehicles (AUVs) with the goal of improved performance and computational efficiency. Traditional IBVS methods, when combined with Model Predictive Control (MPC), face high computational demands due to the nonlinear dynamics and the large degrees of freedom (DOFs) in the variables of the associated optimization problem. Our method addresses this by reducing the DOFs of the optimization variable in the cost function while maintaining a good control performance. To further consider the smoothness of the MPC control signal, a soft constraint handling method is developed. The fast nonlinear MPC, combined with smoother control trajectories and effective constraint handling, makes our method particularly suitable for AUV IBVS applications in dynamic environments. Comparisons with standard strategies confirm the improved performance of our approach in terms of both speed and trajectory quality. Simulation results show that our approach can achieve an improved computation up to 100 times faster than conventional MPC-based IBVS methods, which highlights the great potential for real-time IBVS applications.
Large-scale dual-stream Vision-Language Pre-training (VLP) models provide an efficient solution for text-image retrieval tasks. Despite this, their performance often falls short of the most current single-stream model...
详细信息
ISBN:
(纸本)9798350359329;9798350359312
Large-scale dual-stream Vision-Language Pre-training (VLP) models provide an efficient solution for text-image retrieval tasks. Despite this, their performance often falls short of the most current single-stream models, primarily due to limited fine-grained text-image interactions. Recent trends indicate a union of these two types of networks. Some methods adopt a retrieve and rerank strategy, their performance improvements largely hinge on the single-stream encoder during inference. Other approaches utilize knowledge distillation to strengthen either the single-stream encoder or the dual-stream encoder, surpassing their previous capabilities. However, existing distillation techniques typically focus on a single knowledge type, neglecting the richer insights available in the teacher model. To bridge this gap, we introduce a Lightweight and Effective Multi-View Knowledge Distillation approach, named LEMKD, for text-image retrieval. This method effectively utilizes response-based, feature-based and relation-based knowledge, transferring the knowledge from the single-stream encoder to the dual-stream encoder. Our approach is executed on the widely used MS-COCO and Flickr30K datasets. Results demonstrate that LEMKD not only matches the exceptional performance of the most advanced single-stream models but also excels in dual-stream encoder performance amidst the recent integration of single-stream and dual-stream models.
In the domain of multimedia signal processing, shape priors serve as crucial cues for guiding model training. However, many existing techniques struggle to effectively represent complex shape priors in multimedia cont...
详细信息
Train model identification can enhance the structural monitoring of railway infrastructures by providing contextual information about train passages. While approaches relying on timetables are impractical due to delay...
详细信息
ISBN:
(纸本)9798350349955;9798350349948
Train model identification can enhance the structural monitoring of railway infrastructures by providing contextual information about train passages. While approaches relying on timetables are impractical due to delays, camera-based solutions present challenges related to deployment costs and privacy concerns. In this paper, we propose RATTLE, a self-contained framework for train tracking and identification based on audio signal fingerprinting. We have developed a prototype IoT system tailored for train tracking and ground truth assessment, enabling the acquisition of a real-world dataset spanning four months of measurements. Then, we conducted a comparative analysis of several traditional Machine Learning (ML) and Deep Learning (DL) algorithms for audio features classification, mel spectrogram classification, and image classification (serving as baselines). Our findings highlight that mel-trained CNN algorithms achieve high accuracy (97%) comparable to the best video-based DL solution, while substantially reducing model size. Furthermore, we explored the potential for migrating the classification task to the edge through quantisation techniques.
Backscatter Communication (BackCom) is gaining popularity due to its potential for sustainable and low-cost internet of Things (IoT) applications. However, due to the limited resources of passive tags, optimizing the ...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Backscatter Communication (BackCom) is gaining popularity due to its potential for sustainable and low-cost internet of Things (IoT) applications. However, due to the limited resources of passive tags, optimizing the backscatter modulation is critical for the widespread use of this technology. Current backscatter modulation designs ignore the impact of the tag's antenna structure, which we show in this paper to have a negative effect on system performance and lead to design discrepancies. We investigate the impact of the antenna structure parameter on the backscattered signal characteristics. Then, we propose a novel signal subtraction technique that effectively calibrates the received signalbased on the tag's antenna structure to enable accurate detection. Our simulation results demonstrate that different values of this critical parameter result in different backscattered signals, which influence the signal decoding efficiency at the receiver. Furthermore, our work provides insights for optimized system design and enhanced BackCom performance.
In order to achieve unattended tape storage management, this article designs a tape barcode recognition and positioning technologybased on video and image. The algorithm uses the YOLOV5s network model to quickly reco...
详细信息
Railways are popular for moving goods because they are fast and can carry a lot. Due to technological development, researchers are now thinking about using self-driving systems instead of regular trains to make transp...
详细信息
ISBN:
(纸本)9798331540661;9798331540678
Railways are popular for moving goods because they are fast and can carry a lot. Due to technological development, researchers are now thinking about using self-driving systems instead of regular trains to make transporting goods more efficient and safer. Current train control systems sometimes make slow decisions which can be unsafe. This research introduces a new way to make these decisions faster and better for self-driving trains. We propose an algorithm is called the Near-field scene Quantum optimization (NFQO) algorithm. This uses live satellite images to help the train understand its surroundings. Additionally, a Hexagonal Grid (HG) tool, it helps the train pick the best route quickly. The main advantage of developing quantum algorithm is used to make decisions super-fast and accurate. When combined NFQO and HG with satellite images, improves reliable and effective self-driving cargo trains transportation. We've tested NFQO, and it's better than other available tools. This research, which combines quantum technology, satellite images, and self-driving systems, points to exciting future developments in train transportation.
These delays have been linked to the traditional traffic signalsystems that are used as they are time bound with pre-set schedules that do not take into account changing operational surroundings. The real life integr...
详细信息
The objective of this study is to investigate the learning process of Visually Grounded Speech (VGS) models through joint learning that combines contrastive learning and masked image modeling. Typically, VGS models ai...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
The objective of this study is to investigate the learning process of Visually Grounded Speech (VGS) models through joint learning that combines contrastive learning and masked image modeling. Typically, VGS models aim to establish audio-visual alignment between images and their spoken captions within a contrastive learning framework. Building upon this seminal concept, in this work, we explore whether visual reconstruction with the help of cross-modality can enhance alignment, given that spoken captions describe visual appearances. To achieve this, we extend the contrastive learning-based VGS models by incorporating a masked autoencoder that utilizes cross-attention in the decoder. Through this cross-modal interaction in the decoder, spoken caption features guide the model to reconstruct the masked patches and capture correspondence between the two modalities. Our findings suggest that integrating cross-modal reconstruction within the contrastive learning framework enhances audio-visual feature alignment. Consequently, our proposed method gives comparable performance to existing models that utilize prior knowledge or other modalities, such as object region proposals or Contrastive Language-image Pretraining (CLIP).
The cognitive radio (CR) is a modern technology in cognitive radio-internet of things (CR-IoT) networks. In contrast, each CR-IoT user is unable to achieve both a better sensing gain, and an enhanced system sum rate i...
详细信息
ISBN:
(纸本)9781665491303
The cognitive radio (CR) is a modern technology in cognitive radio-internet of things (CR-IoT) networks. In contrast, each CR-IoT user is unable to achieve both a better sensing gain, and an enhanced system sum rate in conventional energy detection technique based CR-IoT networks with the present energy harvesting (EH) and security threats due to under-utilized the reporting framework. For this reason, we proposed EH-enabled CR-IoT networks using machine learning (ML) algorithms in which each normal CR-IoT user is assisted by finite capacity battery systems and energy harvested. In this paper, Firstly, the proposed hybrid detection technique based on EH-enabled CRIoT networks using ML algorithms is separating the trusted (normal) and untrusted (malicious users) CR-IoT users where all untrusted CR-IoT users are not participating in spectrum sensing due to they degraded the performance like sensing gain and system sum rate;Secondly, the proposed scheme is utilized the reporting framework where only trusted CR-IoT users are obtained longer sensing time slot which enhanced the sensing gain, the EH, and the sum rate;and Finally, the simulation results show that the proposed hybrid scheme outperformed the conventional schemes in terms of security, sensing gain, EH, and system sum rate.
暂无评论