Topological Data Analysis (TDA) uses ideas from topology to study the "shape" of data. It provides a set of tools to extract features, such as holes, voids, and connected components, from complex high-dimens...
详细信息
Topological Data Analysis (TDA) uses ideas from topology to study the "shape" of data. It provides a set of tools to extract features, such as holes, voids, and connected components, from complex high-dimensional data. This thesis presents an introductory exposition of the mathematics underlying the two main tools of TDA: Persistent Homology and the MAPPER algorithm. Persistent Homology detects topological features that persist over a range of resolutions, capturing both local and global geometric information. The MAPPER algorithm is a visualization tool that provides a type of dimensional reduction that preserves topological properties of the data by projecting them onto lower dimensional simplicial complexes. Furthermore, this thesis explores recent applications of these tools to natural language processing and computer vision. These applications are divided into two main approaches: In the first approach, TDA is used to extract features from data that is then used as input for a variety of machine learning tasks, like image classification or visualizing the semantic structure of text documents. The second approach, applies the tools of TDA to the machine learning algorithms themselves. For example, using MAPPER to study how structure emerges in the weights of a trained neural network. Finally, the results of several experiments are presented. These include using Persistent Homology for image classification, and using MAPPER to visual the global structure of these data sets. Most notably, the MAPPER algorithm is used to visualize vector representations of contextualized word embeddings as they move through the encoding layers of the BERT-base transformer model.
Computer vision is a subfield of artificial intelligence that relies on training computers to obtain a high level of understanding of vision data. A computer vision system aims at identifying objects through the acqui...
详细信息
Fusarium wilt disease(FWD) caused by Fusarium oxysporum f. sp. ciceris (Padwick) is the most important disease affecting chickpea yield among biotic stresses. Fusarium wilt is a vascular disease that causes permanent ...
详细信息
vision in the deep sea is acquiring increasing interest from many fields as the deep seafloor represents the largest surface portion on Earth. Unlike common shallow underwater imaging, deep sea imaging requires artifi...
详细信息
vision in the deep sea is acquiring increasing interest from many fields as the deep seafloor represents the largest surface portion on Earth. Unlike common shallow underwater imaging, deep sea imaging requires artificial lighting to illuminate the scene in perpetual darkness. Deep sea images suffer from degradation caused by scattering, attenuation and effects of artificial light sources and have a very different appearance to images in shallow water or on land. This impairs transferring current vision methods to deep sea applications. Development of adequate algorithms requires some data with ground truth in order to evaluate the methods. However, it is practically impossible to capture a deep sea scene also without water or artificial lighting effects. This situation impairs progress in deep sea vision research, where already synthesized images with ground truth could be a good solution. Most current methods either render a virtual 3D model, or use atmospheric image formation models to convert real world scenes to appear as in shallow water appearance illuminated by sunlight. Currently, there is a lack of image datasets dedicated to deep sea vision evaluation. This paper introduces a pipeline to synthesize deep sea images using existing real world RGB-D benchmarks, and exemplarily generates the deep sea twin datasets for the well known Middlebury stereo benchmarks. They can be used both for testing underwater stereo matching methods and for training and evaluating underwater imageprocessing algorithms. This work aims towards establishing an image benchmark, which is intended particularly for deep sea vision developments.
Medical image arbitrary-scale super-resolution (MI-ASSR) has recently gained widespread attention, aiming to supersample medical volumes at arbitrary scales via a single model. However, existing MIASSR methods face tw...
ISBN:
(纸本)9798350307184
Medical image arbitrary-scale super-resolution (MI-ASSR) has recently gained widespread attention, aiming to supersample medical volumes at arbitrary scales via a single model. However, existing MIASSR methods face two major limitations: (i) reliance on high-resolution (HR) volumes and (ii) limited generalization ability, which restricts their applications in various scenarios. To overcome these limitations, we propose Cube-based Neural Radiance Field (CuNeRF), a zero-shot MIASSR framework that is able to yield medical images at arbitrary scales and free viewpoints in a continuous domain. Unlike existing MISR methods that only fit the mapping between low-resolution (LR) and HR volumes, CuNeRF focuses on building a continuous volumetric representation from each LR volume without the knowledge of the corresponding HR one. This is achieved by the proposed differentiable modules: cube-based sampling, isotropic volume rendering, and cube-based hierarchical rendering. Through extensive experiments on magnetic resource imaging (MRI) and computed tomography (CT) modalities, we demonstrate that CuNeRF can synthesize high-quality SR medical images, which outperforms state-of-the-art MISR methods, achieving better visual verisimilitude and fewer objectionable artifacts. Compared to existing MISR methods, our CuNeRF is more applicable in practice.
作者:
Alyami, JaberKing Abdulaziz Univ
Fac Appl Med Sci Dept Radiol Sci Jeddah 21589 Saudi Arabia King Abdulaziz Univ
King Fahd Med Res Ctr Jeddah 21589 Saudi Arabia King Abdulaziz Univ
Smart Med Imaging Res Grp Jeddah 21589 Saudi Arabia King Abdulaziz Univ
Ctr Modern Math Sci & its Applicat Med Imaging & Artificial Intelligence Res Unit Jeddah 21589 Saudi Arabia
Radiological image analysis using machine learning has been extensively applied to enhance biopsy diagnosis accuracy and assist radiologists with precise cures. With improvements in the medical industry and its techno...
详细信息
Radiological image analysis using machine learning has been extensively applied to enhance biopsy diagnosis accuracy and assist radiologists with precise cures. With improvements in the medical industry and its technology, computer-aided diagnosis (CAD) systems have been essential in detecting early cancer signs in patients that could not be observed physically, exclusive of introducing errors. CAD is a detection system that combines artificially intelligent techniques with imageprocessingapplications thru computer vision. Several manual procedures are reported in state of the art for cancer diagnosis. Still, they are costly, time-consuming and diagnose cancer in late stages such as CT scans, radiography, and MRI scan. In this research, numerous state-of-the-art approaches on multi-organs detection using clinical practices are evaluated, such as cancer, neurological, psychiatric, cardiovascular and abdominal imaging. Additionally, numerous sound approaches are clustered together and their results are assessed and compared on benchmark datasets. Standard metrics such as accuracy, sensitivity, specificity and false-positive rate are employed to check the validity of the current models reported in the literature. Finally, existing issues are highlighted and possible directions for future work are also suggested.
QR code is widely used in different applications, and its detection is currently being done by software. However, hardware detection using FPGAs offers real-time processing ability, which makes it attractive for time-...
详细信息
ISBN:
(纸本)9781665451093
QR code is widely used in different applications, and its detection is currently being done by software. However, hardware detection using FPGAs offers real-time processing ability, which makes it attractive for time-critical applications, such as high-precision robotics and augmented reality. In light of this, an FPGA algorithm for QR code detection is proposed in this paper. It operates with a maximum latency of 12.2 ms to detect a QR code when the input image resolution is 640x480, which offers a 85.3% performance boost over the best state-ofthe-art software detector according to benchmarks. To the best of the authors' knowledge, this is the first work that explores the use of FPGA in QR code detection.
The goal of visual implants is to create artificial vision that can partially restore function. It can enhance the quality of life for visually challenged individuals by allowing them to feel light, even after years o...
详细信息
ISBN:
(数字)9798350372816
ISBN:
(纸本)9798350372816
The goal of visual implants is to create artificial vision that can partially restore function. It can enhance the quality of life for visually challenged individuals by allowing them to feel light, even after years of darkness, by the use of 60 microelectrodes implanted in the retina. The artificial vision that is made possible by current visual system stimulators has very poor resolution because of their small number of microelectrodes. Numerous researchers have sought to enhance artificial vision produced by low-resolution implants through the application of machine learning and imageprocessing techniques. Because phosphine pictures have low resolution, users report unhappiness with the Retinal Prosthesis System. This underscores the important need for targeted research aimed at improving visual clarity and user pleasure in general. This research proposes simulating artificial vision in which the visually impaired user receives information synthesized by the system through a low-resolution photo courtesy of a visual implant. Through the use of vision Transformer, the technique gathers useful data about people in the immediate vicinity of the visually impaired person, including their number, familiarity, gender, approximated ages, facial emotions, nearby items, and approximate distances. The information obtained from the user's glasses' camera frames is used to create signals that are then sent into a visual stimulator, offering a potentially effective way to improve the visual experience for those who are visually impaired. In order to facilitate economical real-time implementations in an independent portable system, an algorithm that best suits each feature is chosen based on its accuracy and time complexity. The proposed approach uses audio to provide crucial information about those in close proximity to a visually impaired person, enabling them to converse with others more comfortably. This paper can thus be taken into consideration for some next-generation v
In recent years, the field of image captioning has gained substantial attention, posing a complex challenge that necessitates the integration of computer vision (CV), natural language processing (NLP), and machine lea...
详细信息
Automatic License Plate Recognition systems aim to provide a solution for detecting, localizing, and recognizing license plate characters from vehicles appearing in video frames. However, deploying such systems in the...
详细信息
ISBN:
(纸本)9781665458245
Automatic License Plate Recognition systems aim to provide a solution for detecting, localizing, and recognizing license plate characters from vehicles appearing in video frames. However, deploying such systems in the real world requires real-time performance in low-resource environments. In our paper, we propose a two-stage detection pipeline paired with vision API that provides real-time inference speed along with consistently accurate detection and recognition performance. We used a haar-cascade classifier as a filter on top of our backbone MobileNet SSDv2 detection model. This reduces inference time by only focusing on high confidence detections and using them for recognition. We also impose a temporal frame separation strategy to distinguish between multiple vehicle license plates in the same clip. Furthermore, there are no publicly available Bangla license plate datasets, for which we created an image dataset and a video dataset containing license plates in the wild. We trained our models on the image dataset and achieved an AP(0.5) score of 86% and tested our pipeline on the video dataset and observed reasonable detection and recognition performance (82.7% detection rate, and 60.8% OCR F1 score) with real-time processing speed (27.2 frames per second).
暂无评论