Recently, providing real-time navigation of unmanned aerial vehicles independent of global positioning systems has become of great importance. The state-of-the-art methods based on deep learning, which give good resul...
详细信息
ISBN:
(纸本)9798350388978;9798350388961
Recently, providing real-time navigation of unmanned aerial vehicles independent of global positioning systems has become of great importance. The state-of-the-art methods based on deep learning, which give good results in certain datasets, and the existing methods can not provide real-time and good solutions on images with dynamic and fast moving. Moreover, the methods, were developed so far, were focused on object-based tracking algorithms. In this paper, the tracking of the points belonging to the target pattern, found by image matching, was performed with the machine learning model we developed for 10 sequential video images. The features extracted for the machine learning model are: (i) the change between the points of the previous image and the image before that, (ii) the points of interest in the previous image, (iii) the changes found with the homography matrix between sequential images. It was experimentally shown that, point tracking can be achieved with the least error, on avarage about 23 pixels for a 2 mega-pixel resolution image, among the algorithms in the literature that can process more than 30 images per second in a CPU environment of 2 GHz or above.
Amber is a system-on-chip (SoC) with a coarse-grained reconfigurable array (CGRA) for acceleration of dense linear algebra applications, such as machine learning (ML), imageprocessing, and computer vision. It is desi...
详细信息
Amber is a system-on-chip (SoC) with a coarse-grained reconfigurable array (CGRA) for acceleration of dense linear algebra applications, such as machine learning (ML), imageprocessing, and computer vision. It is designed using an agile accelerator-compiler codesign flow;the compiler updates automatically with hardware changes, enabling continuous application-level evaluation of the hardware-software system. To increase hardware utilization and minimize reconfigurability overhead, Amber features the following: 1) dynamic partial reconfiguration (DPR) of the CGRA for higher resource utilization by allowing fast switching between applications and partitioning resources between simultaneous applications;2) streaming memory controllers supporting affine access patterns for efficient mapping of dense linear algebra;and 3) low-overhead transcendental and complex arithmetic operations. The physical design of Amber features a unique clock distribution method and timing methodology to efficiently layout its hierarchical and tile-based design. Amber achieves a peak energy efficiency of 538 INT16 GOPS/W and 483 BFloat16 GFLOPS/W. Compared with a CPU, a GPU, and a field-programmable gate array (FPGA), Amber has up to 3902x , 152x, and 107x better energy-delay product (EDP), respectively.
A new color appearance model named sCAM has been developed, including a uniform color space, sUCS. The model has a simple structure but provides comprehensive functions for color related applications. It takes input f...
详细信息
A new color appearance model named sCAM has been developed, including a uniform color space, sUCS. The model has a simple structure but provides comprehensive functions for color related applications. It takes input from either XYZ D65 or signals from an RGB space. Their accuracy has been extensively tested. sUCS performed the best or second-best to the overall 28 datasets for space uniformity and the 6 datasets for hue linearity comparing the state of the art UCSs. sCAM also performed the best to fit all available one- and two-dimensional color appearance datasets. It is recommended to have field tests for all color related applications.
Neural Style Transfer (NST) is a popular technique of computer vision where the content of an image is blended with the style of another, which results in a fused image with certain properties of both original images....
详细信息
Neural Style Transfer (NST) is a popular technique of computer vision where the content of an image is blended with the style of another, which results in a fused image with certain properties of both original images. This approach has practical applications in various domains and has garnered significant attention in both industry and academia. An interesting application of this technique is segmented style transfer where a segmentation algorithm is used to locate objects within an image and then the style transfer method is performed locally, producing images with different styles for different objects. This approach opens up possibilities for creating visually striking compositions by seamlessly blending various artistic styles onto specific objects within an image, allowing for a new level of creative expression. This paper proposes a novel method that combines Segment Anything Model (SAM), a state-of-the-art vision transformer-based image segmentation model developed by Facebook, with style transfer. Our approach includes performing localized style transfer in selected segmentation regions of an image using classical style transfer algorithms. To ensure smooth transitions between the stylized and non-stylized border we also develop our loss function with a border smoothing technique. Experimental results demonstrate the robustness and effectiveness of the proposed methodology, including the ability to infuse multiple artistic styles into different objects within an image. The contributions of this work include integrating SAM with style transfer, proposing a novel loss function, evaluating the segmented style transfer in multiple content regions, comparing with state-of-the-art approaches, and experimenting with multiple style images for diverse stylization. Our primary focus centers on creating a model that serves as a digital painter across a wide range of image genres and artistic styles.
Face sketch and photo synthesis is widely applied in industry and information fields, such as entertainment business and heterogeneous face retrieval. The key challenge lies in completing a face transformation with bo...
详细信息
Face sketch and photo synthesis is widely applied in industry and information fields, such as entertainment business and heterogeneous face retrieval. The key challenge lies in completing a face transformation with both good visual effects and face identity preservation. However, existing methods are still difficult to obtain a good synthesis due to the large model gap between the two different face domains. Recently, diffusion models have achieved great success in image synthesis, which allows us to extend its application in such a face generation task. Thus, we propose IPDM, which constructs a mapping of latent representation for domain-adaptive face features. The other proposed IDP utilizes auxiliary features to correct the latent features through their directions and supplementary identity information, so that the generation can keep face identity unchanged. The various evaluation results show that our method is superior to state-of-the-art methods in both identity preservation and visual effects.
This paper introduces a high dynamic range pixel for early visionprocessing. Early vision is the first stage to subsequently extract semantic information for imageprocessing or video analytics. This paper proposes t...
详细信息
This book focuses on the latest developments in the fields of visual AI, imageprocessing and computer vision. It shows research in basic techniques like image pre-processing, feature extraction, and enhancement, alon...
详细信息
ISBN:
(数字)9783110756722;9783110756821
ISBN:
(纸本)9783110756678
This book focuses on the latest developments in the fields of visual AI, imageprocessing and computer vision. It shows research in basic techniques like image pre-processing, feature extraction, and enhancement, along with applications in biometrics, healthcare, neuroscience and forensics. The book highlights algorithms, processes, novel architectures and results underlying machine intelligence with detailed execution flow of models.
Common computer vision (CV) tasks include image classification, object detection, segmentation, and recognition. To handle such tasks, machine learning (ML) models for imageprocessing require a great amount of annota...
详细信息
The transition to Industry 4.0 intensifies the demand for advanced manufacturing techniques and efficient data processing capabilities. A notable challenge in engineering is that many older engineering drawings are on...
详细信息
ISBN:
(纸本)9783031683015;9783031683022
The transition to Industry 4.0 intensifies the demand for advanced manufacturing techniques and efficient data processing capabilities. A notable challenge in engineering is that many older engineering drawings are only available in paper form, creating significant barriers for modern automated systems. This study tackles these challenges by employing advanced deep-learning techniques alongside traditional imageprocessing to convert legacy engineering drawings into structured, machine-readable formats. Following this digitization process, this multi-modal approach further processes drawings containing a lot of heterogeneous data by filtering non-essential details to isolate and extract critical features. This process enables the conversion of complex drawings into formats suitable for computer vision and deep learning applications. The structured datasets resulting from this process are then utilized to enhance the efficiency of automated processes significantly. For instance, they enable more efficient pick-and-place operations by providing the data necessary for machine learning-driven automation.
Correct and robust ego-lane index estimation is crucial for autonomous driving in the absence of high-definition maps, especially in urban environments. Previous ego-lane index estimation approaches rely on feature ex...
详细信息
Correct and robust ego-lane index estimation is crucial for autonomous driving in the absence of high-definition maps, especially in urban environments. Previous ego-lane index estimation approaches rely on feature extraction, which limits the robustness. To overcome these shortages, this study proposes a robust ego-lane index estimation framework upon only the original visual image. After optimization of the processing route, the raw image was randomly cropped in the height direction and then input into a double supervised LaneLoc network to obtain the index estimations and confidences. A post-process was also proposed to achieve the global ego-lane index from the estimated left and right indexes with the total lane number. To evaluate our proposed method, we manually annotated the ego-lane index of public datasets which can work as an ego-lane index estimation baseline for the first time. The proposed algorithm achieved 96.48/95.40% (precision/recall) on the CULane dataset and 99.45/99.49% (precision/recall) on the TuSimple dataset, demonstrating the effectiveness and efficiency of lane localization in diverse driving environments. The code and dataset annotation results will be exposed publicly on https://***/haomo-ai/LaneLoc.
暂无评论