This paper presents a unique way of door accessing system which uses the ESP32 microcontroller in conjunction with facial recognition technologies. Because of its Wi-Fi functionality and low power consumption, it'...
详细信息
In recent years, progress in machine learning methods has greatly influenced the creation of assistive technologies designed to enhance the quality of life for individuals with visual impairments. This paper introduce...
详细信息
Objectives To assess a new application of artificial intelligence for real-time detection of laryngeal squamous cell carcinoma (LSCC) in both white light (WL) and narrow-band imaging (NBI) videolaryngoscopies based on...
详细信息
Objectives To assess a new application of artificial intelligence for real-time detection of laryngeal squamous cell carcinoma (LSCC) in both white light (WL) and narrow-band imaging (NBI) videolaryngoscopies based on the You-Only-Look-Once (YOLO) deeplearning convolutional neural network (CNN). Study Design Experimental study with retrospective data. Methods Recorded videos of LSCC were retrospectively collected from in-office transnasal videoendoscopies and intraoperative rigid endoscopies. LSCC videoframes were extracted for training, validation, and testing of various YOLO models. Different techniques were used to enhance the image analysis: contrast limited adaptive histogram equalization, data augmentation techniques, and test time augmentation (TTA). The best-performing model was used to assess the automatic detection of LSCC in six videolaryngoscopies. Results Two hundred and nineteen patients were retrospectively enrolled. A total of 624 LSCC videoframes were extracted. The YOLO models were trained after random distribution of images into a training set (82.6%), validation set (8.2%), and testing set (9.2%). Among the various models, the ensemble algorithm (YOLOv5s with YOLOv5m-TTA) achieved the best LSCC detection results, with performance metrics in par with the results reported by other state-of-the-art detection models: 0.66 Precision (positive predicted value), 0.62 Recall (sensitivity), and 0.63 mean Average Precision at 0.5 intersection over union. Tests on the six videolaryngoscopies demonstrated an average computation time per videoframe of 0.026 seconds. Three demonstration videos are provided. Conclusion This study identified a suitable CNN model for LSCC detection in WL and NBI videolaryngoscopies. Detection performances are highly promising. The limited complexity and quick computational times for LSCC detection make this model ideal for real-timeprocessing. Level of Evidence 3 Laryngoscope, 2021
With the development of deeplearning, semantic segmentation has received considerable attention within the robotics community. For semantic segmentation to be applied to mobile robots or autonomous vehicles, real-tim...
详细信息
With the development of deeplearning, semantic segmentation has received considerable attention within the robotics community. For semantic segmentation to be applied to mobile robots or autonomous vehicles, real-timeprocessing is essential. In this article, a new real-time semantic segmentation network, called the adjacent feature propagation network (AFPNet), is proposed to achieve high performance and fast inference. AFPNet executes in realtime on a commercial embedded GPU. The network includes two new modules. The local memory module (LMM) is the first;it improves the upsampling accuracy by propagating the high-level features to the adjacent grids. The cascaded pyramid pooling module (CPPM) is the second;it reduces computational time by changing the structure of the pyramid pooling module. Using these two modules, the proposed AFPNet achieved 76.4% mean intersection-over-union on the Cityscapes test dataset, outperforming other real-time semantic segmentation networks. Furthermore, AFPNet was successfully deployed on an embedded board Jetson AGX Xavier and applied to the real-world navigation of a mobile robot, proving that AFPNet can be effectively used in a variety of real-time applications.
Accurate segmentation of the left ventricle (LV) from dynamic cardiac magnetic resonance imaging (MRI) is a critical focus in computer-assisted cardiovascular diagnostics. Most current deeplearning methods, which are...
详细信息
Human-Robot Collaboration (HRC) has evolved into a highly promising issue owing to the latest breakthroughs in Artificial Intelligence (AI) and Human-Robot Interaction (HRI), among other reasons. This emerging growth ...
详细信息
Human-Robot Collaboration (HRC) has evolved into a highly promising issue owing to the latest breakthroughs in Artificial Intelligence (AI) and Human-Robot Interaction (HRI), among other reasons. This emerging growth increases the need to design multi-agent algorithms that can manage also human preferences. This letter presents an extension of the Ant Colony Optimization (ACO) meta-heuristic to solve the Minimum time Search (MTS) task, in the case where humans and robots perform an object searching task together. The proposed model consists of two main blocks. The first one is a convolutional neural network (CNN) that provides the prior probabilities about where an object may be from a segmented image. The second one is the Sub-prior MTS-ACO algorithm (SP-MTS-ACO), which takes as inputs the prior probabilities and the particular search preferences of the agents in different sub-priors to generate search plans for all agents. The model has been tested in real experiments for the joint search of an object through a Vizanti web-based visualization in a tablet computer. The designed interface allows the communication between a human and our humanoid robot named IVO. The obtained results show an improvement in the search perception of the users without loss of efficiency.
Computed tomography imaging spectrometry (CTIS) is a snapshot hyperspectral imaging technique that can obtain a three-dimensional (2D spatial + 1D spectral) data cube of the scene captured within a single exposure. Th...
详细信息
Computed tomography imaging spectrometry (CTIS) is a snapshot hyperspectral imaging technique that can obtain a three-dimensional (2D spatial + 1D spectral) data cube of the scene captured within a single exposure. The CTIS inversion problem is typically highly ill-posed and is usually solved by time-consuming iterative algorithms. This work aims to take the full advantage of the recent advances in deep-learning algorithms to dramatically reduce the computational cost. For this purpose, a generative adversarial network is developed and integrated with self-attention, which cleverly exploits the clearly utilizable features of zero-order diffraction of CTIS. The proposed network is able to reconstruct a CTIS data cube (containing 31 spectral bands) in milliseconds with a higher quality than traditional methods and the state-of-the-art (SOTA). Simulation studies based on realimage data sets confirmed the robustness and efficiency of the method. In numerical experiments with 1000 samples, the average reconstruction time for a single data cube was similar to 16 ms. The robustness of the method against noise is also confirmed by numerical experiments with different levels of Gaussian noise. The CTIS generative adversarial network framework can be easily extended to solve CTIS problems with larger spatial and spectral dimensions, or migrated to other compressed spectral imaging modalities. (c) 2023 Optica Publishing Group
Aerial vehicles (AVs) commonly operate in vast environments, presenting a persistent challenge in achieving high-precision localization. The contemporary popular global positioning methods have their inherent limitati...
详细信息
Aerial vehicles (AVs) commonly operate in vast environments, presenting a persistent challenge in achieving high-precision localization. The contemporary popular global positioning methods have their inherent limitations. For instance, the precision of GPS is susceptible to decline or even complete failure when the signal is disrupted or absent. Furthermore, the precision of image retrieval techniques is inadequate. The construction of 3-D models is a time-consuming and storage-intensive endeavor. In addition, scene coordinate regression necessitates retraining to adapt to varying scenarios, which presents challenges when attempting to generalize across expansive environments. Addressing these challenges, we propose a network named AirGeoNet, which integrates satellite images and semantic maps to achieve high-precision efficient localization. In the first phase, we introduce the foundation model DINOV2 to extract features from satellite and aerial images, employ a vector of locally aggregated descriptor (VLAD) for image retrieval to get coarse position, and, finally, significantly enhance retrieval accuracy by combining sequential images with particle filters. Subsequently, AirGeoNet matches aerial images with semantic maps to determine the three degrees of freedom in pose, including position and orientation. The semantic maps utilized by AirGeoNet are sourced from OpenStreetMap and our self-produced QMap, and training is conducted in a supervised manner using real camera poses. Our AirGeoNet method is highly efficient, requiring only a 1546-D feature vector per image for image retrieval and 240k storage for a 0.9- km(2) semantic map while achieving state-of-the-art accuracy with single-frame localization errors of 2.854 m on semantically rich datasets and 11 m in complex scenarios. Our code is publicly available at https://***/mxz520mxz/***
The event cameras generate asynchronous event sequences, rendering most existing image-based algorithms in-applicable for direct use and processing. Therefore, the development of a simulator that utilizes event stream...
详细信息
In the field of aquaponics, where fish and plants coexist in a symbiotic environment, closely monitoring nitrate levels in the water is crucial due to their profound impact on aquatic and plant well-being. Traditional...
详细信息
暂无评论