This project introduces a transformative object detection system designed to enhance the navigational capabilities of visually impaired individuals through the application of advanced computer vision technologies. Uti...
详细信息
ISBN:
(纸本)9798350395334;9798350395327
This project introduces a transformative object detection system designed to enhance the navigational capabilities of visually impaired individuals through the application of advanced computer vision technologies. Utilizing the You Only Look Once (YOLO) model, paired with the Comprehensive Object Collection (COCO) dataset, this system provides real-time, accurate object detection and classification. The core functionality of the application allows for the processing of both static images and live video feeds, enabling blind users to receive auditory announcements of nearby objects, thereby assisting with spatial awareness and environmental interaction. The system leverages a pre-trained YOLO model to ensure robust detection performance, achieving a peak detection accuracy of 99%. By delivering object labels and bounding box coordinates audibly, the application serves as a critical tool in improving the daily independence and quality of life for people with visual impairments. This project not only highlights the potential of deep learning in assistive technologies but also underscores the importance of adaptive solutions in inclusive technology development.
Synthetic Aperture Radar (SAR) images have a wide range of applications due to their all-weather and all-day working conditions. However, SAR images with different scenarios and imaging conditions are insufficient or ...
详细信息
This paper presents examination result of possibility for automatic unclear region detection in the CAD system for colorectal tumor with realtime endoscopic videoimage. We confirmed that it is possible to realize th...
详细信息
This paper presents examination result of possibility for automatic unclear region detection in the CAD system for colorectal tumor with realtime endoscopic videoimage. We confirmed that it is possible to realize the CAD system with navigation function of clear region which consists of unclear region detection by YOLO2 and classification by AlexNet and SVMs on customizable embedded DSP cores. Moreover, we confirmed the realtime CAD system can be constructed by a low power ASIC using customizable embedded DSP cores.
In recent years, with the extensive application of deep learning methods, face recognition technology has been greatly developed. Aiming at the problem of video surveillance in power network, a video surveillance meth...
详细信息
Hyperspectral imaging and artificial intelligence (AI) have transformed imaging and data processing through their ability to capture and analyze detailed spectral information. This paper explores the integration of hy...
详细信息
realtime control, diversified functions, system integration and miniaturization are an important development direction of video electronics system. Embedded design based on FPGA can manage system resources more reaso...
详细信息
Transformer has shown outstanding performance in time-series data processing, which can definitely facilitate quality assessment of video sequences. However, the quadratic time and memory complexities of Transformer p...
详细信息
ISBN:
(数字)9781665496209
ISBN:
(纸本)9781665496209
Transformer has shown outstanding performance in time-series data processing, which can definitely facilitate quality assessment of video sequences. However, the quadratic time and memory complexities of Transformer potentially impede its application to long video sequences. In this work, we study a mechanism of sharing attention across video clips in video quality assessment (VQA) scenario. Consequently, an efficient architecture based on integrating shared multi-head attention (MHA) into Transformer is proposed for VQA, which greatly ease the time and memory complexities. A long video sequence is first divided into individual clips. The quality features derived by an image quality model on each frame in a clip are aggregated by a shared MHA layer. The aggregated features across all clips are then fed into a global Transformer encoder for quality prediction at sequence level. The proposed model with a lightweight architecture demonstrates promising performance in no-reference VQA (NR-VQA) modelling on publicly available databases. The source code can be found at https://***/junyongyou/lagt_vqa.
The real-time query of surveillance video plays a significant role in many fields such as public safety, smart city, and abnormality monitoring. However, with the exponential growth of surveillance video data, traditi...
Retrieval-augmented generation (RAG) systems combine the strengths of language generation and information retrieval to power many real-world applications like chatbots. Use of RAG for understanding of videos is appeal...
详细信息
ISBN:
(纸本)9798400704369
Retrieval-augmented generation (RAG) systems combine the strengths of language generation and information retrieval to power many real-world applications like chatbots. Use of RAG for understanding of videos is appealing but there are two critical limitations. One-time, upfront conversion of all content in large corpus of videos into text descriptions entails high processingtimes. Also, not all information in the rich video data is typically captured in the text descriptions. Since user queries are not known apriori, developing a system for video to text conversion and interactive querying of video data is challenging. To address these limitations, we propose an incremental RAG system called iRAG, which augments RAG with a novel incremental workflow to enable interactive querying of a large corpus of videos. Unlike traditional RAG, iRAG quickly indexes large repositories of videos, and in the incremental workflow, it uses the index to opportunistically extract more details from select portions of the videos to retrieve context relevant to an interactive user query. Such an incremental workflow avoids long video to text conversion times, and overcomes information loss issues due to conversion of video to text, by doing on-demand query-specific extraction of details in video data. This ensures high quality of responses to interactive user queries that are often not known apriori. To the best of our knowledge, iRAG is the first system to augment RAG with an incremental workflow to support efficient interactive querying of a large corpus of videos. Experimental results on real-world datasets demonstrate 23x to 25x faster video to text ingestion, while ensuring that latency and quality of responses to interactive user queries is comparable to responses from a traditional RAG where all video data is converted to text upfront before any user querying.
Medical high-definition electronic endoscopes have high requirements on real-time performance and video quality. Compared with software-based image algorithms, the algorithms based on field programmable gate array (FP...
详细信息
暂无评论