检索结果-内蒙古大学图书馆

IEEE Workshop on applications of Signal processing to Audio and Acoustics (WASPAA)

作者： Dong, Hao-Wen Liu, Xiaoyu Pons, Jordi Bhattacharya, Gautam Pascual, Santiago Serra, Joan Berg-Kirkpatrick, Taylor McAuley, Julian Dolby Labs San Francisco CA 94103 USA Univ Calif San Diego La Jolla CA 92093 USA

ISBN: (纸本)9798350323726

Recent work has studied text-to-audio synthesis using large amounts of paired text-audio data. However, audio recordings with high-quality text annotations can be difficult to acquire. In this work, we approach text-to-audio synthesis using unlabeled videos and pretrained language-vision models. We propose to learn the desired text-audio correspondence by leveraging the visual modality as a bridge. We train a conditional diffusion model to generate the audio track of a video, given a video frame encoded by a pretrained contrastive language-image pretraining (CLIP) model. At test time, we first explore performing a zero-shot modality transfer and condition the diffusion model with a CLIP-encoded text query. However, we observe a noticeable performance drop with respect to image queries. To close this gap, we further adopt a pretrained diffusion prior model to generate a CLIP image embedding given a CLIP text embedding. Our results show the effectiveness of the proposed method, and that the pretrained diffusion prior can reduce the modality transfer gap. While we focus on text-to-audio synthesis, the proposed model can also generate audio from image queries, and it shows competitive performance against a state-of-the-art image-to-audio synthesis model in a subjective listening test. This study offers a new direction of approaching text-to-audio synthesis that leverages the naturally-occurring audio-visual correspondence in videos and the power of pretrained language-vision models.

关键词： Sound synthesis audio generation multimodal learning diffusion models neural networks machine learning

来源：评论

学校读者我要写书评

暂无评论

SEGMENTATION OF INDUSTRIAL BURNER FLAMES: A COMPARATIvE STUDY FROM TRADITIONAL image processing TO machine AND DEEP LEARNING 5

SEGMENTATION OF INDUSTRIAL BURNER FLAMES: A COMPARATIVE STUD...

引用

5th International-Society-for-Photogrammetry-and-Remote-Sensing (ISPRS) Geospatial Week (GSW)

作者： Landgraf, S. Hillemann, M. Aberle, M. Jung, v. Ulrich, M. Karlsruhe Inst Technol Inst Photogrammetry & Remote Sensing IPF Karlsruhe Germany

In many industrial processes, such as power generation, chemical production, and waste management, accurately monitoring industrial burner flame characteristics is crucial for safe and efficient operation. A key step involves separating the flames from the background through binary segmentation. Decades of machine vision research have produced a wide range of possible solutions, from traditional image processing to traditional machine learning and modern deep learning methods. In this work, we present a comparative study of multiple segmentation approaches, namely Global Thresholding, Region Growing, Support vector machines, Random Forest, Multilayer Perceptron, U-Net, and DeepLabv3+, that are evaluated on a public benchmark dataset of industrial burner flames. We provide helpful insights and guidance for researchers and practitioners aiming to select an appropriate approach for the binary segmentation of industrial burner flames and beyond. For the highest accuracy, deep learning is the leading approach, while for fast and simple solutions, traditional image processing techniques remain a viable option.

关键词： Segmentation Industrial Burner Flames image processing machine Learning Deep Learning

来源：评论

学校读者我要写书评

暂无评论

Development of an Independent Adversarial Sample Detection Model, Based on image Features 5th

Development of an Independent Adversarial Sample Detection M...

引用

5th International Conference on 3D Imaging Technologies—Multidimensional Signal processing and Deep Learning, 3DIT-MSP and DL 2023

作者： Xu, Long Beijing University of Aeronautics and Astronautics Beijing China

ISBN: (纸本)9789819751808

Independent adversarial sample detection is an important problem in the field of computer vision and machine learning, especially in the context of the widespread use of deep learning models. This can lead to misclassification and performance degradation of the model, so adversarial sample detection is crucial to ensure the reliability of the model. This research focuses on the development of an independent adversarial sample detection model based on image features. A new approach is proposed which does not rely on the original model but focuses on detecting adversarial features in the samples. The effectiveness and robustness of the proposed method is verified in extensive experiments. The model is able to detect independent adversarial samples with high accuracy, regardless of whether the adversarial samples are targeted at a specific deep learning model or not. In addition, the method demonstrates excellent performance on a variety of image datasets and applications in different domains. It is expected to enhance the robustness and reliability of deep learning models, and thus better cope with adversarial sample attacks in practical applications. This approach also has a wide range of applications for a variety of computer vision tasks and domains. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.

关键词： Adversarial machine learning

来源：评论

学校读者我要写书评

暂无评论

Comparing YOLOv8 and Mask R-CNN for instance segmentation in complex orchard environments

引用

Artificial Intelligence in Agriculture 2024年第3期13卷 84-99页

作者： Ranjan Sapkota Dawood Ahmed Manoj Karkee Center for Precision&Automated Agricultural Systems Washington State University24106 N Bunn RdProsser99350 WashingtonUSA

Instance segmentation,an important image processing operation for automation in agriculture,is used to precisely delineate individual objects of interestwithin images,which provides foundational information for various automated or robotic tasks such as selective harvesting and precision *** study compares the one-stage YOLOv8 and the two-stage Mask R-CNN machine learning models for instance segmentation under varying orchard conditions across two *** 1,collected in dormant season,includes images of dormant apple trees,which were used to train multi-object segmentation models delineating tree branches and *** 2,collected in the early growing season,includes images of apple tree canopies with green foliage and immature(green)apples(also called fruitlet),which were used to train single-object segmentation models delineating only immature green *** results showed that YOLOv8 performed better than Mask R-CNN,achieving good precision and near-perfect recall across both datasets at a confidence threshold of ***,for Dataset 1,YOLOv8 achieved a precision of 0.90 and a recall of 0.95 for all *** comparison,Mask R-CNN demonstrated a precision of 0.81 and a recall of 0.81 for the *** Dataset 2,YOLOv8 achieved a precision of 0.93 and a recall of *** R-CNN,in this single-class scenario,achieved a precision of 0.85 and a recall of ***,the inference times for YOLOv8 were 10.9 ms for multi-class segmentation(Dataset 1)and 7.8 ms for single-class segmentation(Dataset 2),compared to 15.6 ms and 12.8 ms achieved by Mask R-CNN's,*** findings showYOLOv8's superior accuracy and efficiency in machine learning applications compared to two-stage models,specifically Mask-R-CNN,which suggests its suitability in developing smart and automated orchard operations,particularly when real-time applications are necessary in such cases as robotic harvesting and robotic immature green fruit thin

关键词： YOLOv8 Mask R-CNN Deep learning machine learning Automation Robotics Artificial intelligence machine vision

来源：评论

学校读者我要写书评

暂无评论

An Efficient and Effective image Decolorization Algorithm Based on Cumulative Distribution Function

引用

JOURNAL OF IMAGING 2024年第3期10卷 51-51页

作者： Wu, Tirui Eising, Ciaran Glavin, Martin Jones, Edward Univ Galway Sch Engn Galway H91 TK33 Ireland Univ Limerick Dept Elect & Comp Engn Limerick V94 T9PX Ireland

image decolorization is an image pre-processing step which is widely used in image analysis, computer vision, and printing applications. The most commonly used methods give each color channel (e.g., the R component in RGB format, or the Y component of an image in CIE-XYZ format) a constant weight without considering image content. This approach is simple and fast, but it may cause significant information loss when images contain too many isoluminant colors. In this paper, we propose a new method which is not only efficient, but also can preserve a higher level of image contrast and detail than the traditional methods. It uses the information from the cumulative distribution function (CDF) of the information in each color channel to compute a weight for each pixel in each color channel. Then, these weights are used to combine the three color channels (red, green, and blue) to obtain the final grayscale value. The algorithm works in RGB color space directly without any color conversion. In order to evaluate the proposed algorithm objectively, two new metrics are also developed. Experimental results show that the proposed algorithm can run as efficiently as the traditional methods and obtain the best overall performance across four different metrics.

关键词： cumulative distribution function edge recall ratio gradient recall ratio image contrast preservation image decolorization

来源：评论

学校读者我要写书评

暂无评论

Swin on Axes: Extending Swin Transformers to Quadtree image Representations

Swin on Axes: Extending Swin Transformers to Quadtree Image ...

引用

IEEE/CvF Winter Conference on applications of Computer vision (WACv)

作者： Oliu, Marc Nasrollahi, Kamal Escalera, Sergio Moeslund, Thomas B. Aalborg Univ Fredrik Bajers Vej 7KOst Aalborg Denmark Milestone Syst Banemarksvej 50 Brondby Denmark Univ Barcelona Gran Via Corts Catalanes 585 Barcelona Spain Comp Vis Ctr Campus UABEdifici O Cerdanyola Del Valles Spain

ISBN: (纸本)9798350370287;9798350370713

In recent years, Transformer models have revolutionized machine learning. While this has resulted in impressive results in the field of Natural Language processing, Computer vision quickly stumbled upon computation and memory problems due to the high resolution and dimensionality of the input data. This is particularly true for video, where the number of tokens increases cubically relative to the frame and temporal resolutions. A first approach to solve this was vision Transformers, which introduce a partitioning of the input into embedded grid cells, lowering the effective resolution. More recently, Swin Transformers introduced a hierarchical scheme that brought the concepts of pooling and locality to transformers in exchange for much lower computational and memory costs. This work proposes a reformulation of the latter that views Swin Transformers as regular Transformers applied over a quadtree representation of the input, intrinsically providing a wider range of design choices for the attentional mechanism. Compared to similar approaches such as Swin and MaxviT, our method works on the full range of scales while using a single attentional mechanism, allowing us to simultaneously take into account both dense short range and sparse long range dependencies with low computational overhead and without introducing additional sequential operations, thus making full use of GPU parallelism.

关键词： Natural language processing systems

来源：评论

学校读者我要写书评

暂无评论

Affine diffractive beam dividers

引用

JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS image SCIENCE AND vision 2024年第3期41卷 510-515页

作者： Gori, F. Martinez-herrero, R. Korotkova, O. Piquero, G. De Sande, J. C. G. Frezza, F. Santarsiero, M. Univ Roma Tre Dipartimento Ingn Ind Elettron & Meccan Via V Volterra 62 I-00146 Rome Italy Univ Complutense Madrid Dept Opt Ciudad Univ E-28040 Madrid Spain Univ Miami Dept Phys 1320 Campo Sano Dr Coral Gables FL 33146 USA Univ Politecn Madrid ETSIS Telecomunicac Nikola Tesla S-N Campus Sur Madrid 28031 Spain Univ Roma Sapienza Dipartimento Ingn Informaz Elettron & Telecomunic Via Eudossiana 18 I-00184 Rome Italy

Diffractive optical elements that divide an input beam into a set of replicas are used in many optical applications ranging from image processing to communications. Their design requires time-consuming optimization processes, which, for a given number of generated beams, are to be separately treated for one-dimensional and two-dimensional cases because the corresponding optimal efficiencies may be different. After generalizing their Fourier treatment, we prove that, once a particular divider has been designed, its transmission function can be used to generate numberless other dividers through affine transforms that preserve the efficiency of the original element without requiring any further optimization. (c) 2024 Optica Publishing Group

关键词： Beam shaping Dammann gratings Diffraction efficiency Diffraction theory Diffractive optical elements Materials processing

来源：评论

学校读者我要写书评

暂无评论

Rolling Shutter Camera:Modeling,Optimization and Learning

引用

machine Intelligence Research 2023年第6期20卷 783-798页

作者： Bin Fan Yuchao Dai Mingyi He School of Electronics and Information Northwestern Polytechnical UniversityXi’an710129China

Most modern consumer-grade cameras are often equipped with a rolling shutter mechanism,which is becoming increasingly important in computer vision,robotics and autonomous driving ***,its temporal-dynamic imaging nature leads to the rolling shutter effect that manifests as geometric *** the years,researchers have made significant progress in developing tractable rolling shutter models,optimization methods,and learning approaches,aiming to remove geometry distortion and improve visual *** this survey,we review the recent advances in rolling shutter cameras from two aspects of motion modeling and deep *** the best of our knowledge,this is the first comprehensive survey of rolling shutter *** the part of rolling shutter motion modeling and optimization,the principles of various rolling shutter motion models are elaborated and their typical applications are ***,the applications of deep learning in rolling shutter based image processing are ***,we conclude this survey with discussions on future research directions.

关键词： Rolling shutter motion modeling image correction temporal super-resolution deep learning

来源：评论

学校读者我要写书评

暂无评论

Deep Learning-Based Framework for Power Converter Circuit Identification and Analysis

引用

IEEE ACCESS 2024年 12卷 115356-115369页

作者： Bohara, Bharat Sarma Krishnamoorthy, Harish Univ Houston Dept Elect & Comp Engn Houston TX 77004 USA

This paper introduces a deep learning-based framework for identifying hand-drawn schematics of power converter circuits and performing automated simulations. The framework employs cutting-edge computer vision-based object detection models, such as YOLOv8, to achieve a high mean average precision (mAP) of 96.7% to accurately identify components. Wire tracing and connectivity are achieved through a combined architecture built upon classical image processing techniques and deep learning approaches. Detailed information extracted from a hand-drawn circuit schematic is used to automatically create its netlist for automated simulation through the spice engine. The proposed framework is successfully tested on various nonisolated (buck, boost) and isolated (flyback, full-bridge) converters under both continuous conduction mode (CCM) and discontinuous conduction mode (DCM) operations. In the comprehensive assessment of the entire framework, its efficacy is tested on 140 newly drawn circuit diagrams. The overall accuracy in the generation of netlists reaches a high value of 95.71%, utilizing the robust component detection capabilities of YOLOv8. Moreover, the framework enables the generation of both graphical representations and adjacency matrices for circuit diagrams. This output serves as a valuable dataset generator, contributing to the rapidly advancing domains of machine learning, including graph neural networks and geometric learning, particularly in the application space of power and energy systems. This framework can be further employed as an educational tool, and the ideas introduced can be developed to generate fully automated and efficient power converter designs for real-world applications.

关键词： Integrated circuit modeling Computational modeling Feature extraction Computer architecture Accuracy Optical character recognition Numerical models Computer vision Deep learning Automated circuit simulation computer vision deep learning hand-drawn circuit diagram NetList spice power converter automated graph generation

来源：评论

学校读者我要写书评

暂无评论

Research on the Layout System of Electric Wire Tower Construction Site Based on machine vision

Research on the Layout System of Electric Wire Tower Constru...

引用

International Conference on Smart applications and Sustainability in the Artificial Intelligence of Things, SAS-AIoT 2024

作者： Gao, Xing Huang, Wenfang Luo, Le Zhang, Bo Xu, Ke Chengdu Chengdian Electric Power Engineering Design Co. Ltd Sichuan Chengdu610000 China

ISBN: (纸本)9783031782756

Nowadays, the optimization of the construction site layout of power line tower grouping not only affects the project progress, but also relates to the construction safety. Therefore, this study proposes a machine vision (Mv)-based construction site layout system for power line tower formation to improve the accuracy and efficiency of construction layout. In this study, Mv technology is adopted to collect on-site video and image data through multiple high-definition cameras installed on the construction site. This data is processed in real time to identify mechanical equipment and staff in the construction area using image recognition algorithms, and to accurately track their location and movement trajectories. In addition, the system combines environmental information and construction progress data from the site to automatically plan and adjust the layout of construction resources through an integrated decision support module. This process covers a number of technical aspects such as data acquisition, image processing, pattern recognition and layout optimization. The accident rate has also decreased with the introduction of the Mv system. Specific data showed that the accident rate was reduced from 2 accidents per 1000 h of work to 1.2. The Mv-based construction site layout system for electric line tower grouping not only optimizes the construction process and increases efficiency and safety, but also demonstrates the great potential of Mv technology for complex industrial applications. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

关键词： machine vision

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：