检索结果-内蒙古大学图书馆

食品与发酵工业 2024年第22期50卷 389-398页

作者：叶东东徐霞丁玉庭浙江工业大学食品科学与工程学院浙江杭州310014 全省深蓝渔业资源绿色低碳高效开发重点实验室浙江杭州310014 国家远洋水产品加工技术研发分中心(杭州) 浙江杭州310014

在全球渔业产量不断增加和对鱼类品质保障需求提升的背景下,传统的鱼类加工和品质监测方法大多依赖人工操作,这不仅效率低下而且结果的一致性和准确性难以保证,逐渐无法满足现代需求。机器视觉和深度学习技术的结合,提供了一种高效、自... 详细信息

在全球渔业产量不断增加和对鱼类品质保障需求提升的背景下,传统的鱼类加工和品质监测方法大多依赖人工操作,这不仅效率低下而且结果的一致性和准确性难以保证,逐渐无法满足现代需求。机器视觉和深度学习技术的结合,提供了一种高效、自动化的方法来提升鱼类加工与品质监测的准确性和效率。该综述概述了机器视觉系统和深度学习在鱼类加工中的应用,包括分类分拣、切割定位、重量估算等方面,并详细介绍了利用高光谱成像、近红外成像、比色传感器和传统成像等方法在品质监测中的最新研究进展,突出了深度学习在提升这些技术识别、分类精度和处理复杂图像数据能力方面的潜力。尽管机器学习技术在单一的加工问题中取得了成功,但面对复杂数据和环境变化时的适应性仍有限,这促使深度学习的相关研究日益受到重视。该文发现当前针对鱼类加工领域的深度学习研究还相对较少,且缺乏能够综合解决鱼类加工和品质监测多重任务的系统性研究。

关键词：鱼类加工鱼类品质深度学习机器视觉机器学习

来源：评论

学校读者我要写书评

暂无评论

Multi-Level Deep Learning Depth and Color Fusion for Action Recognition 7

Multi-Level Deep Learning Depth and Color Fusion for Action ...

引用

PROCEEDINGS OF SPIE Optics, Photonics and Digital Technologies for Imaging applications VII

作者： Zelensky, A. Voronin, V. Zhdanova, M. Gapon, N. Tokareva, O. Semenishchev, E. Moscow State Univ Technol STANKIN Ctr Cognit Technol & Machine Vis Moscow Russia Don State Tech Univ Rostov Na Donu Russia

ISBN: (纸本)9781510651531;9781510651524

The solution to the problem of recognizing human actions on video sequences is one of the key areas on the path to the development and implementation of computer vision systems in various spheres of life. At the same time, additional sources of information (such as depth sensors, thermal sensors) allow to get more informative features and thus increase the reliability and stability of recognition. In this research, we focus on how to combine the multi-level decompression for depth and color information to improve the state of art action recognition methods. We present the algorithm, combining information from visible cameras and depth sensors based on the deep learning and PLIP model (parameterized model of logarithmic image processing) close to the human visual system's perception. The experiment results on the test dataset confirmed the high efficiency of the proposed action recognition method compared to the state-of-the-art methods that used only one modality image (visible or depth).

关键词： action recognition human activity image fusion depth image PLIP model computer imaging

来源：评论

学校读者我要写书评

暂无评论

Analyzing lower half facial gestures for lip reading applications: Survey on vision techniques

引用

COMPUTER vision AND image UNDERSTANDING 2023年第1期233卷

作者： Preethi, S. J. Krupa, B. Niranjana PES Univ Dept ECE Bangalore 560085 India

Lip reading has gained popularity due to the proliferation of emerging real-world applications. This article provides a comprehensive review of benchmark datasets available for lip-reading applications and pioneering works that analyze lower facial cues for lip-reading applications. A comprehensive review of lip reading applications is broadly classified into five distinct applications: Lip Reading Biometrics (LRB), Audio Visual Speech Recognition (AVSR), Silent Speech Recognition (SSR), Voice from Lips, and Lip HCI (Human-computer interaction). LRB entails extensive research in the fields of authentication and liveness detection. AVSR covers key findings that have contributed significantly to applications such as voice assistants, video-totext transcription, hearing aids, and pronunciation-correcting systems. SSR analyzes the efforts made for silent-video-to-text transcription and surveillance camera applications. The voice from lips section discusses applications such as voice for the voiceless and vision-infused speech inpainting. In lip HCI, LR-HCI for smartphones, smart TVs, computers, robots, and musical instruments is reviewed in detail. Comprehensive coverage is given to cutting-edge techniques in computer vision, signal processing, machine learning, and deep learning. The advancements that aid the system in learning to lip-read and authenticate lip gestures, generate text transcription, synthesize voice based on lip movements, and control systems via lip movements (lip HCI) are covered. The work concludes by highlighting the limitations of existing frameworks, the road maps of each application illustrating the evolution of techniques employed over time, and future research avenues in lip-reading applications.

关键词： Lip reading Audio visual speech recognition Silent speech recognition Voice from lips Lip HCI machine learning Deep learning

来源：评论

学校读者我要写书评

暂无评论

High-precision visual localization based on an improved shape-based matching algorithm

引用

APPLIED OPTICS 2021年第5期60卷 1294-1303页

作者： Hu, Shanshan Li, Dan Hong, Mingfeng Wang, Bing Xu, Xiangrong Anhui Univ Technol Sch Elect Informat & Engn Maanshan 243032 Anhui Peoples R China Anhui Univ Coinnovat Ctr Informat Supply & Assurance Technol Hefei 230601 Anhui Peoples R China Anhui Univ Technol Sch Mech Engn Maanshan 243032 Anhui Peoples R China

To deal with the requirement of high-precision localization of large-size workpieces in an industrial environment, an improved shape-based matching algorithm is proposed based on the phase stretching transformation and the iterative closest point algorithms. Basler industrial cameras are used to collect images of large-size workpieces, such as glass. The experimental results show that the average localization error is 0.05 +/- 0.013 mm, which can meet the requirements of practical applications. This algorithm can effectively and accurately achieve high-precision localization of different positions of multi-directionally transformed objects in industrial environments. (C) 2021 Optical Society of America

关键词： Digital imaging Edge detection image compression image processing image processing algorithms machine vision

来源：评论

学校读者我要写书评

暂无评论

Hardware -Efficient FPGA-Based. Approximate Multipliers for Error -Tolerant Computing 21

Hardware -Efficient FPGA-Based. Approximate Multipliers for ...

引用

21st International Conference on Field-Programmable Technology (ICFPT)

作者： Yao, Shangshang Zhang, Liang Natl Univ Def Technol Changsha Peoples R China

ISBN: (数字)9781665453363

ISBN: (纸本)9781665453363

With the increasing demand for data processing, approximate computing is widely used in various fault-tolerant applications such as image processing, computer vision and machine learning. These applications also require a huge number of multiplication operations. In this paper, we are mainly oriented to the softcore approximate multiplier which is implemented on FPGA via encoding the INIT parameter values in the Look-Up Table (LUT) primitives. Three approximate multipliers with associated carry chain are presented in the manner of reducing LUTs from proposed exact multiplier. An approximate multiplier without carry chain is also presented to further reduce the multiplier's critical path delay and power consumption. We also present an accuracy configurable adder to build high -order approximate multipliers for architectural space exploration. The resolution of the state-of-the-art Mean Relative Error Distance (AIRED) and Power -Delay Product (PDP) pareto front is improved and the approximate multiplier we proposed achieves 24.4%, 52.9% and 56.4% reduction in latency, area, and power over the soft multiplier IP core, respectively. Finally, we apply the proposed approximate multiplier design to image processing and convolutional neural networks (CNNs). Compared to advanced approximate multipliers, it offers less energy consumption and area while remaining acceptable qualities. Our designs are open sourced at h ttps://,g,i thu h. corriN aosh an gsh an g96/FPGAbased_approx_mult to assist further reproducing and development.

关键词： Error-Tolerant computing approximate multipliers image processing

来源：评论

学校读者我要写书评

暂无评论

k-Nearest Neighbor Classification for Pattern Recognition of a Reference Source Light for machine vision System

引用

IEEE SENSORS JOURNAL 2021年第10期21卷 11514-11521页

作者： Miranda-Vega, Jesus Elias Rivas-Lopez, Moises Fuentes, Wendy Flores Univ Politecn Baja California Mexicali 21376 Baja California Mexico Autonomous Univ Baja California Engn Inst Mexicali 21280 Baja California Mexico Autonomous Univ Baja California Fac Engn Mexicali 21280 Baja California Mexico

The design of machine vision applications allows automatic inspection, measuring systems, and robot guidance. Typical applications of industrial robots are based on no-contact sensors to give the robot information about the environment. Robot's machine vision requires photosensors or video cameras to make intelligent decisions about its localization. Video cameras used as image-capturing equipment are too costly in comparison with optical scanning systems (OSS). The OSS system provides spatial coordinates measurements that can be exploited to solve a wide variety of structural problems in real-time. Localization and guidance usingmachine learning (ML) techniques offer advantages due to signals captured can be transformed and be reduced for processing, storage, and displaying. The use of algorithms of ML enhances the performance of the optical system based on localization and guidance. Feature extraction represents an important part of ML techniques to transform the original raw data onto a low-dimensional subspace and holding relevant information. This work presents an improvement of an optical system based on k-nearest neighbor (k-NN) technique to solve the object detection and localization problem. The utility of this improvement allows the optical system can discriminate between the reference source and the optical noise or interference. The OSS system presented in this article has been implemented in structural health monitoring to measure the angular position even under "lighting and weather conditions". The feature extraction techniques used in this article were linear predictive coding (LPC), quartiles (Q(iquartile)), and autocorrelation coefficients (ACC). The results of using k-NN and autocorrelation coefficients and quartiles predicted more than 98% of correct classification by using a reference source light as a class 1 and a light bulb as an optical noise and called class 2.

关键词： Light-emitting diodes optoelectronic/photonic sensors sensor data processing

来源：评论

学校读者我要写书评

暂无评论

Design of vision system of transceiver robot based on Halcon

Design of vision system of transceiver robot based on Halcon

引用

2022 IEEE International Conference on Artificial Intelligence and Computer applications, ICAICA 2022

作者： Mei, Zhimin Li, Dong Chen, Tingyu Yu, Chengzheng Wuchang Institute of Technology Robotics Application Institute Wuchang Institute of Technology Wuhan China Wuhan City Polytechnic Institute of Electrical and Mechanical Wuhan Wuhan China Wuchang Institute of Technology School of Intelligent Manufacturing Wuhan China

ISBN: (数字)9781665499910

ISBN: (纸本)9781665499910

This article mainly research express the robot task, the visual image processing and the upper machine through HALCON procedures of image processing, c# development PC development, realize the Courier information acquisition and through the PC serial port communication to the microcontroller, in order to control the robot movement, storage, take out the Courier by robot operation. In the visual part of this paper, the barcode information and the shape information of the express delivery surface are identified by image preprocessing, and the information is processed and analyzed on the visual upper computer developed by C#, and the appropriate express delivery bit information is assigned to the single chip microcomputer. © 2022 IEEE.

关键词： machine design

来源：评论

学校读者我要写书评

暂无评论

One Shot Face image Style Transfer Method Based on GAN

One Shot Face Image Style Transfer Method Based on GAN

引用

image processing, Computer vision and machine Learning (ICICML), International Conference on

作者： Wenyin He Department of Computer Science Wuhan University Wuhan Hubei China

Style transformation on face images has traditionally been a popular research area in the field of computer vision, and its applications are quite extensive. Currently, the more mainstream schemes include Generative Adversarial Network (GAN)-based image generation as well as style transformation and Stable diffusion method. In 2019, the NVIDIA team proposed StyleGAN, which is a relatively mature scheme for generating real faces as well as face feature blending. The whole StyleGAN model is trained based on the Flickr-Faces-HQ Dataset (FFHQ) dataset, the This is a large dataset, so the model takes a long time to train. My aim is to form a One-shot stylized face image generator, which means that only one reference face and one stylized face need to be input, and a brand-new face with a mixture of features can be generated in a short training time. This is inspired by the existing research result JoJoGAN, which learns a style mapper from a single example of the style. JoJoGAN uses a GAN inversion procedure and StyleGAN's style-mixing property to produce a substantial paired dataset from a single example of the style. This paper will make improvements to JoJoGAN, including improving the encoder that utilizes the GAN Inversion method to generate latent codes for image features, and the random mixing of latent codes to produce a more refined paired dataset.

关键词：

来源：评论

学校读者我要写书评

暂无评论

In-domain Self-supervised Learning for Plankton image Classification on a Budget

In-domain Self-supervised Learning for Plankton Image Classi...

引用

IEEE Winter applications and Computer vision Workshops (WACVW)

作者： Massimiliano Ciranni Ani Gjergji Andrea Maracani Vittorio Murino Vito Paolo Pastore MaLGa DIBRIS University of Genoa Italy Istituto Italiano di Tecnologia Genoa Italy University of Verona Italy

ISBN: (数字)9798331536626

ISBN: (纸本)9798331536633

In the last few years, the abundance of available plank-ton images has significantly increased due to advancements in acquisition system technology. Consequently, a growing interest in automatic plankton image classification has surged. machine learning algorithms have recently emerged to assist in the analysis of this vast quantity of data, supporting traditional manual processing. However, annotating such data is costly and demands significant time and resources, thus requiring data-efficient machine learning solutions. The typical framework for tackling this issue has been the adoption of supervised imageNet pre-trained models, and fine-tuning them on the plankton classification downstream task. Nonetheless, self-supervised pre-training protocols may provide an effective alternative to the supervised approaches using imageNet, while allowing the exploitation of the increasingly large amount of unanno-tated plankton data. To the best of our knowledge, no work systematically analyzes the impact of self-supervised pre-training protocols for plankton image classification. To fill this gap, in this paper, we present a thorough comparison between in-domain (plankton images) and out-of-domain (imageNet) supervised and self-supervised pre-training, in terms of the quality of the corresponding embeddings for plankton image classification. We believe that this work may pave the way for further research in self-supervised protocols for the plankton domain, providing a valuable alternative to imageNet, and exploiting the vast amount of unannotated available plankton images.

关键词： Training Computer vision Protocols Systematics Plankton Transfer learning Self-supervised learning Feature extraction Transformers image classification

来源：评论

学校读者我要写书评

暂无评论

Masked Path Modeling for vision-and-Language Navigation

Masked Path Modeling for Vision-and-Language Navigation

引用

Conference on Empirical Methods in Natural Language processing (EMNLP)

作者： Dou, Zi-Yi Gao, Feng Peng, Nanyun Univ Calif Los Angeles Los Angeles CA 90024 USA Amazon Alexa AI Arlington VA USA

ISBN: (纸本)9798891760615

vision-and-language navigation (VLN) agents are trained to navigate in real-world environments based on natural language instructions. A major challenge in VLN is the limited available training data, which hinders the models' ability to generalize effectively. Previous approaches have attempted to alleviate this issue by using external tools to generate pseudo-labeled data or integrating web-scaled image-text pairs during training. However, these methods often rely on automatically-generated or out-of-domain data, leading to challenges such as suboptimal data quality and domain mismatch. In this paper, we introduce a masked path modeling (MPM) objective. MPM pre-trains an agent using self-collected data for subsequent navigation tasks, eliminating the need for external tools. Specifically, our method allows the agent to explore navigation environments and record the paths it traverses alongside the corresponding agent actions. Subsequently, we train the agent on this collected data to reconstruct the original action sequence when given a randomly masked subsequence of the original path. This approach enables the agent to accumulate a diverse and substantial dataset, facilitating the connection between visual observations of paths and the agent's actions, which is the foundation of the VLN task. Importantly, the collected data are in-domain, and the training process avoids synthetic data with uncertain quality, addressing previous issues. We conduct experiments on various VLN datasets and demonstrate the applications of MPM across different levels of instruction complexity. Our results exhibit significant improvements in success rates, with enhancements of 1.3%, 1.1%, and 1.2% on the val-unseen split of the Room-to-Room, Room-for-Room, and Room-across-Room datasets, respectively. Additionally, we underscore the adaptability of MPM as well as the potential for additional improvements when the agent is allowed to explore unseen environments prior to testing. [GRAPHICS]

关键词： Navigation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：