With the advance of deep learning in the BigData era, image/video coding for machines (VCM) as called for proposals by the moving picture experts group (MPEG) now becomes the pivotal technique for extensive intelligen...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
With the advance of deep learning in the BigData era, image/video coding for machines (VCM) as called for proposals by the moving picture experts group (MPEG) now becomes the pivotal technique for extensive intelligent vision tasks. However, existing VCM methods typically focus on compressing features independently at each scale, ignoring the redundancy of features across multiple scales. This paper thus introduces a simple yet effective architecture called hybrid single input and multiple output (H-SIMO) for VCM, which can significantly reduce the redundancy across scales of features. More specifically, as the pyramid structure is commonly employed for localising multi-scale objects, our HSIMO method proposes to compress all features by inputting a single-scale feature while retaining the ability to decompress all the features. Moreover, an entropy model is seamlessly integrated into the training process to efficiently reduce the statistical redundancy of features. During the testing phase, the hybrid coding method, in conjunction with the versatile video coding (VVC), is employed to compress the features from both images and videos. We comprehensively evaluate the performance of our H-SIMO method in two standard machine vision tasks: object detection and instance segmentation, in which the experimental results verify the superior performances of our H-SIMO method.
This paper introduces the structure and operation mode of automatic production line based on the actual situation of laser quenching automatic production line of tool in enterprises. robotvision integrates workpiece ...
详细信息
ISBN:
(纸本)9781665464680
This paper introduces the structure and operation mode of automatic production line based on the actual situation of laser quenching automatic production line of tool in enterprises. robotvision integrates workpiece positioning coordinates with robot coordinates to realize the positioning and grasping function of robot through machine vision. Focus on OpenCV image processing methods. This paper describes its principle and possible problems from the aspects of system structure, robot coordinate calibration, visual identification and positioning and software design.
Tactile and textile skin technologies have become increasingly important for enhancing human-robot interaction and allowing robots to adapt to different environments. Despite notable advancements, there are ongoing ch...
详细信息
ISBN:
(纸本)9798350384581;9798350384574
Tactile and textile skin technologies have become increasingly important for enhancing human-robot interaction and allowing robots to adapt to different environments. Despite notable advancements, there are ongoing challenges in skin signalprocessing, particularly in achieving both accuracy and speed in dynamic touch sensing. This paper introduces a new framework that poses the touch sensing problem as an estimation problem of resistive sensory arrays. Utilizing a Regularized Least Squares objective function-which estimates the resistance distribution of the skin-we enhance the touch sensing accuracy and mitigate the ghosting effects, where false or misleading touches may be registered. Furthermore, our study presents a streamlined skin design that simplifies manufacturing processes without sacrificing performance. Experimental outcomes substantiate the effectiveness of our method, showing 26.9% improvement in multi-touch force-sensing accuracy for the tactile skin.
Age-related macular degeneration (AMD) is a leading cause of irreversible vision impairment among the elderly population worldwide, affecting over 196 million people globally. This study delves into innovative approac...
详细信息
ISBN:
(纸本)9798350350661;9798350350654
Age-related macular degeneration (AMD) is a leading cause of irreversible vision impairment among the elderly population worldwide, affecting over 196 million people globally. This study delves into innovative approaches for improving the detection and management of AMD through advanced technological solutions. To investigate the impact of ultraviolet (UV) and blue light exposure on AMD, the research analyzes their contribution to oxidative stress and retinal damage and evaluates potential protective measures or interventions to mitigate these effects. By leveraging state-of-the-art machine learning algorithms and advanced image processing techniques, the research aims to enhance the precision and efficiency of AMD diagnosis. The statistical burden of AMD underscores its significant impact on global health, with projections indicating a rising prevalence due to aging populations, lifestyle factors, and increasing digital screen use among younger generations. Effective management hinges on early detection and accurate monitoring of AMD biomarkers, which these methodologies seek to facilitate. Experimental evaluations demonstrate promising outcomes in diagnostic accuracy and scalability, highlighting the potential for widespread adoption in clinical practice. Furthermore, these advancements contribute to broader efforts in global eye health by offering scalable, AI-driven solutions that can improve patient outcomes and streamline healthcare workflows. By addressing the complexities of AMD diagnosis, this research supports healthcare providers in delivering timely interventions and personalized care, ultimately reducing the burden of AMD-related vision loss on individuals and healthcare systems.
Speaker verification is hampered by background noise, particularly at extremely low signal-to-Noise Ratio (SNR) under 0 dB. It is difficult to suppress noise without introducing unwanted artifacts, which adversely aff...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Speaker verification is hampered by background noise, particularly at extremely low signal-to-Noise Ratio (SNR) under 0 dB. It is difficult to suppress noise without introducing unwanted artifacts, which adversely affects speaker verification. We proposed the mechanism called Gradient Weighting (Grad-W), which dynamically identifies and reduces artifact noise during prediction. The mechanism is based on the property that the gradient indicates which parts of the input the model is paying attention to. Specifically, when the speaker network focuses on a region in the denoised utterance but not on the clean counterpart, we consider it artifact noise and assign higher weights for this region during optimization of enhancement. We validate it by training an enhancement model and testing the enhanced utterance on speaker verification. The experimental results show that our approach effectively reduces artifact noise, improving speaker verification across various SNR levels.
Femoral artery access is a common and critical procedure for various cardiovascular interventions. Although it is a time critical operation, accessing the Common Femoral Artery (CFA) typically requires expertise found...
详细信息
ISBN:
(纸本)9798350377712;9798350377705
Femoral artery access is a common and critical procedure for various cardiovascular interventions. Although it is a time critical operation, accessing the Common Femoral Artery (CFA) typically requires expertise found in specialized medical settings. The necessity for specialized personnel or transport to equipped facilities can lead to delays, potentially exacerbating patient outcomes. To address this challenge, a portable and cost-effective robotic device that autonomously localizes a CFA and precisely positions a needle guide is developed. Through the needle guide, needle can be quickly and accurately inserted into the artery even by non-specialist physicians. Different from the conventional B-mode ultrasound guided procedure, the proposed robotic solution utilizes a Doppler transducer for detecting the arterial location and employs a single M-mode transducer for depth measurement. A series of experiments are designed and conducted to validate the system's feasibility, achieving high accuracy within 2 mm, rapid processing within 1.5 min, and a 100% success rate, thus proving the system's efficacy. These results convince us for further refinement of the system and support its evaluation in animal studies.
Motion prediction in soccer involves capturing complex dynamics from player and ball interactions. We present FootBots, an encoder-decoder transformer-based architecture addressing motion prediction and conditioned mo...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Motion prediction in soccer involves capturing complex dynamics from player and ball interactions. We present FootBots, an encoder-decoder transformer-based architecture addressing motion prediction and conditioned motion prediction through equivariance properties. FootBots captures temporal and social dynamics using set attention blocks and multi-attention block decoder. Our evaluation utilizes two datasets: a real soccer dataset and a tailored synthetic one. Insights from the synthetic dataset highlight the effectiveness of FootBots' social attention mechanism and the significance of conditioned motion prediction. Empirical results on real soccer data demonstrate that FootBots outperforms baselines in motion prediction and excels in conditioned tasks, such as predicting the players based on the ball position, predicting the offensive (defensive) team based on the ball and the defensive (offensive) team, and predicting the ball position based on all players. Our evaluation connects quantitative and qualitative findings. https://***/9kaEkfzG3L8
A significant challenge in multi-view clustering lies in the comprehensive extraction of consistency and complementary information from heterogeneous multi-view data. Numerous methods employ contrastive learning techn...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
A significant challenge in multi-view clustering lies in the comprehensive extraction of consistency and complementary information from heterogeneous multi-view data. Numerous methods employ contrastive learning techniques to explore the information between views. However, the basic contrastive learning strategy does not consider cluster information when constructing sample pairs, potentially leading to the emergence of false negative pairs (FNPs). To tackle this concern, we propose a Multi-view Subspace Clustering with Consensus Graph Contrastive Learning (CGCL) model. Specifically, a self-representation layer is designed to acquire a consensus graph that elucidates the overall data distribution. Furthermore, a contrastive learning layer utilizes the cluster information embedded in the consensus graph to yield reliable sample pairs, resulting in a reduction of the detrimental FNPs and the extraction of complementary information from the various views. Extensive experiments on public datasets demonstrate the effectiveness of CGCL.
The proceedings contain 54 papers. The topics discussed include: ascertaining factors affecting autonomous driving in New Zealand - a framework for HMI design;digital identification of vehicles not only for investigat...
ISBN:
(纸本)9781665407700
The proceedings contain 54 papers. The topics discussed include: ascertaining factors affecting autonomous driving in New Zealand - a framework for HMI design;digital identification of vehicles not only for investigative and forensic purpose;model predictive control for reliable path following with application to the autonomous vehicle and considering different vehicle models;scenario based simulation testing of autonomous vehicle using Malaysian road;mindful people’s preference for different services of autonomous vehicles in China;route planning based on street criteria for autonomous driving vehicles;parameters influencing the subjective evaluation of traffic stream by drivers;identifying factors to improve the coach design and service of the new-type emu sleeper train: findings from passenger satisfaction survey;the impact of weather conditions on urban bus ridership;and analysis of influencing factors of pilot fatigue based on structural equation model.
Dynamic vision Sensors (DVS) have recently generated great interest because of the advantages of wide dynamic range and low latency compared with conventional frame-based cameras. However, the complicated behaviors in...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Dynamic vision Sensors (DVS) have recently generated great interest because of the advantages of wide dynamic range and low latency compared with conventional frame-based cameras. However, the complicated behaviors in dim light conditions are still not clear, restricting the applications of DVS. In this paper, we analyze the typical DVS circuit, and find that there exists discontinuity of event triggering time. In dim light conditions, the discontinuity becomes prominent. We point out that the discontinuity depends exclusively on the changing speed of intensity of light. Experimental results on real event data validate the analysis and the existence of discontinuity that reveals the non-first-order behaviors of DVS in dim light conditions.
暂无评论