The proceedings contain 58 papers. The topics discussed include: real-time heart rate detection based on body surface video data;a local dimming algorithm based on deep learning;multilevel interaction embedding for hy...
ISBN:
(纸本)9798350386660
The proceedings contain 58 papers. The topics discussed include: real-time heart rate detection based on body surface video data;a local dimming algorithm based on deep learning;multilevel interaction embedding for hyperspectral image super-resolution;a review of point target and extended target tracking algorithms;phase retrieval algorithm based on transport of intensity equation under the fusion of regularization and grating modulation;exploring data augmentation effects on a singular illumination distribution dataset with ColorJitter;unsupervised domain adaptation for cross-modality cardiac image segmentation based on contrastive image synthesis;a novel color image encryption scheme based on fractional-order chaotic system;and contextual transformer based small targets detection for cervical cell.
Big data privacy preservation is a critical challenge for data mining and data analysis. Existing methods for anonymizing big data streams using k-anonymity algorithms may cause high data loss, low data quality, and i...
详细信息
Big data privacy preservation is a critical challenge for data mining and data analysis. Existing methods for anonymizing big data streams using k-anonymity algorithms may cause high data loss, low data quality, and identity disclosure. In this paper, we propose a novel model for anonymizing big data streams using in-memory processing. The model uses a Spark framework to parallelize the anonymization process and a one-time clustering algorithm to avoid multiple iterations and allocate the data to optimal clusters. We evaluate the performance and effectiveness of the model using a real-world dataset and compare it with three popular k-anonymity algorithms: CRUE, Mean-Shift, and DBSCAN. The results show that the model has the lowest data loss and the highest data quality for different data sizes and k-values. The model is scalable, robust, adaptable, and flexible. The model can provide better data for data mining and data analysis while protecting data privacy and preventing data disclosure.
In the realm of edge intelligence, emerging video analytics applications are often based on resource constrained edge devices. These applications need systems which are able to provide both low-latency and high-accura...
详细信息
ISBN:
(纸本)9798350361360;9798350361353
In the realm of edge intelligence, emerging video analytics applications are often based on resource constrained edge devices. These applications need systems which are able to provide both low-latency and high-accuracy video stream processing, such as for object detection in real-timevideo streams. State-of-the-art systems tackle this challenge by leveraging edge computing and cloud computing. Such edge-cloud approaches typically combine low-latency results from the edge and high accuracy results from the cloud when processing a frame of the video stream. However, the accuracy achieved so far leaves much room for improvement. Furthermore, using more accurate object detection often requires having more capable hardware. This limits the edge devices which can be used. Applications related to autonomous drones, with the drone being the edge device, give one example. A wide variety of objects needs to be detected reliably for drones to operate safely. Drones with more computing capabilities are often more expensive and suffer from short battery life, as they consume more energy. In this paper, we introduce VATE, a novel edge-cloud system for object detection in real-timevideo streams. An enhanced approach for edgecloud fusion is presented, leading to improved object detection accuracy. A novel multi-object tracker is introduced, allowing VATE to run on less capable edge devices. The architecture of VATE enables it to be used when edge devices are capable of running on-device object detection frequently and when edge devices need to minimise on-device object detection to preserve battery life. Its performance is evaluated on a challenging, dronebased video dataset. The experimental results show that VATE improves accuracy by up to 27.5% compared to the state-of-theart system, while running on less capable and cheaper hardware.
Panoramic or stitched imageprocessing has wide applications in areas such as medical imaging, topographical mapping, and deep space exploration. Rapid development of high-speed communication and artificial intelligen...
详细信息
Did you already imagine how would it be to watch a sport match without sounds? You would miss all this specific sport related sounds but also mostly miss a big part of the atmosphere present in the stadium, that is pa...
详细信息
ISBN:
(纸本)9798400705243
Did you already imagine how would it be to watch a sport match without sounds? You would miss all this specific sport related sounds but also mostly miss a big part of the atmosphere present in the stadium, that is particular to live events. This is what happens to most Deaf and Hard of Hearing persons. Towards Tokyo 2025 Deaflympics, we developed an AI-based system able to recognize sounds and players motion to render in realtime sound related Onomatopoeia over the match video as one could see in Comics or Manga.
Learned hierarchical B-frame coding aims to leverage bidirectional reference frames for better coding efficiency. However, the domain shift between training and test scenarios due to dataset limitations poses a challe...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Learned hierarchical B-frame coding aims to leverage bidirectional reference frames for better coding efficiency. However, the domain shift between training and test scenarios due to dataset limitations poses a challenge. This issue arises from training the codec with small groups of pictures (GOP) but testing it on large GOPs. Specifically, the motion estimation network, when trained on small GOPs, is unable to handle large motion at test time, incurring a negative impact on compression performance. To mitigate the domain shift, we present an online motion resolution adaptation (OMRA) method. It adapts the spatial resolution of video frames on a per-frame basis to suit the capability of the motion estimation network in a pre-trained B-frame codec. Our OMRA is an online, inference technique. It need not re-train the codec and is readily applicable to existing B-frame codecs that adopt hierarchical bi-directional prediction. Experimental results show that OMRA significantly enhances the compression performance of two state-of-the-art learned B-frame codecs on commonly used datasets.
Sediment plumes are generated from both natural and human activities in benthic environments, increasing the turbidity of the water and reducing the amount of sunlight reaching the benthic vegetation. Seagrasses, whic...
详细信息
ISBN:
(数字)9781510661714
ISBN:
(纸本)9781510661707;9781510661714
Sediment plumes are generated from both natural and human activities in benthic environments, increasing the turbidity of the water and reducing the amount of sunlight reaching the benthic vegetation. Seagrasses, which are photosynthetic bioindicators of their environment, are threatened by chronic reductions in sunlight, impacting entire aquatic food chains. This research uses UAV aerial video and imagery to investigate the characteristics of sediment plumes generated by a model of anthropogenic disturbance. The extent, speed and motion of the plumes were assessed as these parameters may pertain to the potential impacts of plume turbidity on seagrass communities. In a case study using UAV video, the turbidity plume was observed to spread over 250 feet over 20 minutes of the UAV campaign. The directional speed of the plume was estimated to be between 10.4 and 10.6 ft/min. This was corroborated by observation of greatest plume turbidity and sediment load near the location of disturbance and diminishing with distance. Further temporal studies are necessary to determine long-term, if any, impacts of human activity-generated sediment plumes on seagrass beds.
This paper presents the design and implementation of a camera surveillance picture quality inspection system. The system assesses the video stream from surveillance cameras and provides immediate feedback on image qua...
详细信息
The escalating concern over worldwide security and criminal activities has led to the emergence and significance of closed-circuit television video surveillance systems as an essential tool for diverse security purpos...
详细信息
The escalating concern over worldwide security and criminal activities has led to the emergence and significance of closed-circuit television video surveillance systems as an essential tool for diverse security purposes. These systems are extensively implemented and serve a crucial function in the surveillance and upkeep of security. The predominant purpose of video surveillance systems is to gather data primarily for evidentiary purposes subsequent to the occurrence of a criminal incident. The demand for video surveillance systems capable of autonomously monitoring and promptly identifying criminals or intruders in real-time is steadily increasing. Nevertheless, the existing facial recognition methods pose difficulties in reliably identifying individuals who are in motion within a video frame. Moreover, conventional approaches necessitate a substantial quantity of photographs in order to achieve precise recognition following the acquisition of an individual’s facial pattern. In order to tackle these concerns, we developed the implementation of input optimisation algorithms alongside a novel framework for real-time face recognition in the context of video surveillance. The input optimization algorithms, integrated with adaptive thresholding techniques, effectively reduce the need for manual outlier removal by actively identifying outliers for each specific case. The application of this optimisation strategy has demonstrated a substantial enhancement in both the efficiency and precision of our system in comparison to alternative baseline methodologies. Through the use of a reduced set of input image, our system is capable of attaining a heightened degree of improvements. Specifically, employing tracking and temporal voting techniques enables our system to accomplish a real-time face recognition accuracy of 90.91%. The findings of this study suggest that our approach has the potential to be a valuable tool in various applications that necessitate rapid and precise fac
Gastrointestinal endoscopic image analysis presents significant challenges, such as considerable variations in quality due to the challenging in-body imaging environment, the often-subtle nature of abnormalities with ...
详细信息
Gastrointestinal endoscopic image analysis presents significant challenges, such as considerable variations in quality due to the challenging in-body imaging environment, the often-subtle nature of abnormalities with low interobserver agreement, and the need for real-timeprocessing. These challenges pose strong requirements on the performance, generalization, robustness and complexity of deep learning-based techniques in such safety-critical applications. While Convolutional Neural Networks (CNNs) have been the go-to architecture for endoscopic image analysis, recent successes of the Transformer architecture in computer vision raise the possibility to update this conclusion. To this end, we evaluate and compare clinically relevant performance, generalization and robustness of state-of-the-art CNNs and Transformers for neoplasia detection in Barrett's esophagus. We have trained and validated several top-performing CNNs and Transformers on a total of 10,208 images (2,079 patients), and tested on a total of 7,118 images (998 patients) across multiple test sets, including a high-quality test set, two internal and two external generalization test sets, and a robustness test set. Furthermore, to expand the scope of the study, we have conducted the performance and robustness comparisons for colonic polyp segmentation (Kvasir-SEG) and angiodysplasia detection (Giana). The results obtained for featured models across a wide range of training set sizes demonstrate that Transformers achieve comparable performance as CNNs on various applications, show comparable or slightly improved generalization capabilities and offer equally strong resilience and robustness against common image corruptions and perturbations. These findings confirm the viability of the Transformer architecture, particularly suited to the dynamic nature of endoscopic video analysis, characterized by fluctuating image quality, appearance and equipment configurations in transition from hospital to hospital. The
暂无评论