检索结果-内蒙古大学图书馆

Multimodal Fusion for Precision Personality Trait Analysis: A Comprehensive Model Integrating video, Audio, and Text Inputs

Multimodal Fusion for Precision Personality Trait Analysis: ...

引用

2024 International Conference on Smart Systems for Electrical, Electronics, Communication and Computer Engineering, ICSSEEC 2024

作者： Karpagam, G.R. Harsha vardhan, v.M. Kabilan, K.K. Pranav, P. Ramesh, Prednya Suvan Sathyendira, B. PSG College of Technology Department of Cse Tamilnadu India

ISBN: (纸本)9798350378177

The multi-modal personality trait analysis system aims to look at the association between personality characteristics, speech, body language, and facial expressions. Subsequently, many techniques for gathering data have been employed, such as capturing individuals in various social contexts, doing interviews, and carrying out personality tests. The data has been meticulously labelled to enable meaningful analysis of the modalities. In this study, we took a holistic approach, combining BERT for text classification, Mel-frequency cepstral coefficients with CNN for audio analysis, and Convolutional Neural Networks for image processing. In our examination, this integrated approach demonstrates the sophisticated use of natural language comprehension, audio signal processing, and computer vision. machine learning methods, such as deep neural networks and clustering approaches, have been used in personality trait analysis to find patterns and correlations between the modalities and personality traits. The accuracies for individual modalities in analysing a given video are image - 0.689654, audio - 0.665432, and text - 0.801234. Following a late fusion process that combines data from these modalities, the classification of content based on the 'big five' personality traits achieves an overall accuracy of 0.802345. © 2024 IEEE.

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

Deep Learning for HDR Imaging: State-of-the-Art and Future Trends

引用

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND machine INTELLIGENCE 2022年第12期44卷 8874-8895页

作者： Wang, Lin Yoon, Kuk-Jin Korea Adv Inst Sci & Technol Dept Mech Engn Visual Intelligence Lab Daejeon 34141 South Korea

High dynamic range (HDR) imaging is a technique that allows an extensive dynamic range of exposures, which is important in image processing, computer graphics, and computer vision. In recent years, there has been a significant advancement in HDR imaging using deep learning (DL). This study conducts a comprehensive and insightful survey and analysis of recent developments in deep HDR imaging methodologies. We hierarchically and structurally group existing deep HDR imaging methods into five categories based on (1) number/domain of input exposures, (2) number of learning tasks, (3) novel sensor data, (4) novel learning strategies, and (5) applications. Importantly, we provide a constructive discussion on each category regarding its potential and challenges. Moreover, we review some crucial aspects of deep HDR imaging, such as datasets and evaluation metrics. Finally, we highlight some open problems and point out future research directions.

关键词： Imaging image reconstruction Loss measurement Cameras Deep learning visualization Dynamic range High-dynamic-range (HDR) imaging deep learning (DL) convolutional neural networks (CNNs)

来源：评论

学校读者我要写书评

暂无评论

Drone-Based applications for Tailings Dam Monitoring

Drone-Based Applications for Tailings Dam Monitoring

引用

APCOM 2023 Conference: Intelligent Mining: Innovation, vision, and value

作者： Gomez, Jose A. Sattarvand, Javad Department of Mining & Metallurgical Engineering Mackay School of Earth Sciences & Engineering University of Nevada RenoNV United States

ISBN: (纸本)9780873355216

Failures of tailings dams have been happening lately. Due to the lack of laws on particular design criteria and stability requirements related monitoring during construction and maintenance, they are thought to be more fragile than hydraulic dams. Monitoring the dam is therefore necessary to understand its current condition and guarantee its safety. The physical condition of the dam could be evaluated with the early identification of seepage. Additionally, due to their adaptability and capacity for high-resolution data collecting, UAvs are an excellent choice for efficiently covering the tailings dam site. UAvs may capture high-quality photos when equipped with a high-resolution RGB camera, thermal sensors, or multispectral sensors. When these sensors are paired with image processing and machine learning algorithms, the result is a reliable estimate of the dam condition. © 2023 Society for Mining, Metallurgy & Exploration Inc. All rights reserved.

关键词： Learning algorithms

来源：评论

学校读者我要写书评

暂无评论

Semantic-Aware Autoregressive image Modeling for visual Representation Learning 38

Semantic-Aware Autoregressive Image Modeling for Visual Repr...

引用

38th AAAI Conference on Artificial Intelligence (AAAI) / 36th Conference on Innovative applications of Artificial Intelligence / 14th Symposium on Educational Advances in Artificial Intelligence

作者： Song, Kaiyou Zhang, Shan Wang, Tong Megvii Technol Beijing Peoples R China

ISBN: (纸本)1577358872

The development of autoregressive modeling (AM) in computer vision lags behind natural language processing (NLP) in self-supervised pre-training. This is mainly caused by the challenge that images are not sequential signals and lack a natural order when applying autoregressive modeling. In this study, inspired by human beings' way of grasping an image, i.e., focusing on the main object first, we present a semantic-aware autoregressive image modeling (SemAIM) method to tackle this challenge. The key insight of SemAIM is to autoregressive model images from the semantic patches to the less semantic patches. To this end, we first calculate a semantic-aware permutation of patches according to their feature similarities and then perform the autoregression procedure based on the permutation. In addition, considering that the raw pixels of patches are low-level signals and are not ideal prediction targets for learning high-level semantic representation, we also explore utilizing the patch features as the prediction targets. Extensive experiments are conducted on a broad range of downstream tasks, including image classification, object detection, and instance/semantic segmentation, to evaluate the performance of SemAIM. The results demonstrate SemAIM achieves state-of-the-art performance compared with other self-supervised methods. Specifically, with viT-B, SemAIM achieves 84.1% top-1 accuracy for fine-tuning on imageNet, 51.3% AP and 45.4% AP for object detection and instance segmentation on COCO, which outperforms the vanilla MAE by 0.5%, 1.0%, and 0.5%, respectively. Code is available at https://***/skyoux/SemAIM.

关键词： Object detection

来源：评论

学校读者我要写书评

暂无评论

Swarm optimisation-based bag of visual words model for content-based X-ray scan retrieval

引用

INTERNATIONAL JOURNAL OF BIOMEDICAL ENGINEERING AND TECHNOLOGY 2022年第2期40卷 168-183页

作者： Karthik, K. Kamath, S. Sowmya Natl Inst Technol Dept Informat Technol Healthcare Analyt & Language Engn HALE Lab Mangaluru Karnataka India

Classification and retrieval of medical images (MedIR) are emerging applications of computer vision for enabling intelligent medical diagnostics. Medical images are multi-dimensional and require specialised processing for the extraction of features from their manifold underlying content. Existing models often fail to consider the inherent characteristics of data and have thus often fallen short when applied to medical images. In this paper, we present a MedIR approach based on the bag of visual words (BovW) model for content-based medical image retrieval. When it comes to any medical approach models, an imbalance in the dataset is one of the issues. Hence the perspective is also considering a balanced set of categories from an imbalanced dataset. The proposed work on BovW model extracts features from each image are used to train supervised machine learning classifier for X-ray medical image classification and retrieval. During the experimental validation, the proposed model performed well with the classification accuracy of 89.73% and a good retrieval result using our filter-based approach.

关键词： content-based medical image retrieval image classification visual space modelling

来源：评论

学校读者我要写书评

暂无评论

Losing visual Needles in image Haystacks: vision Language Models are Easily Distracted in Short and Long Contexts

Losing Visual Needles in Image Haystacks: Vision Language Mo...

引用

2024 Conference on Empirical Methods in Natural Language processing, EMNLP 2024

作者： Sharma, Aditya Saxon, Michael Wang, William Yang University of California Santa Barbara United States

ISBN: (纸本)9798891761681

We present LOCOvQA, a dynamic benchmark generator for evaluating long-context extractive reasoning in vision language models (vLMs). LOCOvQA augments test examples for mathematical reasoning, vQA, and character recognition tasks with increasingly long visual contexts composed of both in-distribution and out-of-distribution distractor images. Across these tasks, a diverse set of vLMs rapidly lose performance as the visual context length grows, often exhibiting a striking logarithmic decay trend. This test assesses how well vLMs can ignore irrelevant information when answering queries-a task that is quite easy for language models (LMs) in the text domain-demonstrating that current state-ofthe-art vLMs lack this essential capability for many long-context applications. © 2024 Association for Computational Linguistics.

关键词： visual languages

来源：评论

学校读者我要写书评

暂无评论

Surface quality and orientation estimation using speckle imaging of ground specimens

引用

Procedia Computer Science 2025年 253卷 2096-2105页

作者： Deep Singh N. Arunachalam Department of Mechanical Engineering Indian Institute of Technology Madras Chennai Tamil Nadu India PIN - 600036

In the current era, machine vision systems are being implemented widely in varied fields due to its key features, such as rapid processing, non-contact-based technology and in-situ measurements. This technology also possesses wide applications in the manufacturing sector. The surface texture properties of any machined component vary based on the manufacturing process, machining parameters, tool and machine conditions etc. As the surface texture of the machined components greatly influences the functional performance, it is vital to examine the surface characteristics. The surface texture of the machine component can be assessed by implementing a series of image processing techniques on its speckle images. Speckle image refers to the randomly distributed granular pattern which is obtained when a rough or textured surface is illuminated using a laser beam. This paper focuses on estimating the orientation of the workpiece and examining the surface characteristics based on the post-processing of the speckle images. The hardened steel workpieces used in this investigation were ground by varying the process parameters and speckle images were obtained at 0°, 30°, 60° and 90° orientations. The shifted power spectral density of the ground sample images contains high-energy coefficients which mimic a line and its orientation varies based on the sample orientation. The Hough transform technique was applied to the binary image of shifted PSD to efficiently determine the orientation. Furthermore, correlations have been established between several surface texture characteristics and GLCM parameters with the surface roughness of ground samples.

关键词： Speckle image Grinding image processing Power spectral density (PSD) Hough transform Gray level co-occurrence matrix

来源：评论

学校读者我要写书评

暂无评论

Contextual Evaluation of Segmentation Models using Spatial Reasoning 26

Contextual Evaluation of Segmentation Models using Spatial R...

引用

26th Irish machine vision and image processing Conference, IMvIP 2024

作者： Porter, victoria Styles, Iain Curtis, Tim M. Taggart, Michael J. Albargothy, Mona J. Gault, Richard Centre for Intelligent Sustainable Computing School of Electronics Electrical Engineering and Computer Science Queen's University Belfast United Kingdom Wellcome-Wolfson Institute for Experimental Medicine Queen's University Belfast United Kingdom Biosciences Institute Newcastle University Newcastle upon Tyne United Kingdom

ISBN: (纸本)9781837242672

image segmentation models are often evaluated using measures of overlap and boundary deviation between a ground truth and a prediction. These measures do not indicate whether a prediction is an overestimation or underestimation of the ground truth. This contextual information is critical in medical imaging applications such as tumor detection where a model's tendency to overestimate a prediction would be preferred to avoid overlooking malignant cells. Spatial reasoning provides context on a model's segmentation performance in terms of its tendency to over- or underestimate a region of interest. Such context can highlight a model's decision-making trends and can be applied to inform targeted improvements. In this work, we provide a Python module1,2 that implements a model-agnostic spatial reasoning pipeline for the contextual evaluation of segmentation methods. We apply this pipeline to the output of the Segment Anything model on 3 electron microscopy (EM) datasets and demonstrate the meaningful inferences that can be made. © This is an open access article published by the IET under the Creative Commons Attribution License (http://***/licenses/by/3.0/)

关键词： image segmentation

来源：评论

学校读者我要写书评

暂无评论

High-Dimensional Data processing Using Quantum-Inspired Evolutionary Algorithms for Homeland Security Imaging Systems

High-Dimensional Data Processing Using Quantum-Inspired Evol...

引用

2024 International Conference on Communication, Computer Sciences and Engineering, IC3SE 2024

作者： Alekhya, v. Reddy, N. v. Uma Singh, Jay Boddu, Bhasker Sobti, Rajeev Hameed, Ali Abduhussien Institute Of Aeronautical Engineering Dundigal Hyderabad India New Horizon College Of Engineering Department Of Artificial Intelligence And Machine Learning Bangalore India Gl Bajaj Institute Of Technology & Management U.P. Greater Noida India Mlr Institute Of Technology Department Of Information Technology Telangana Hyderabad India Lovely Professional University Phagwara India College Of Medical Technology The Islamic University Radiology Techniques Department Najaf Iraq

ISBN: (纸本)9798350366846

This research study explores the emerging area of quantum-inspired evolutionary algorithms (QIEAs) applied to high-dimensional data processing, with a focus on homeland security imaging systems. This work attempts to close the paradigm gap in image processing methodologies caused by the growing complexity of security threats. By combining the adaptive processes of evolutionary algorithms with the probabilistic reasoning inherent in quantum computing, the proposed technique creates a potent tool for managing the inherent difficulties of high-dimensional data. The main contribution of this study is the creation of a brand-new QIEA framework that is specifically designed for image analysis in security contexts and exhibits higher accuracy and efficiency than traditional techniques. The quantum bit representation and quantum gate operations, which have been cleverly tailored to the evolutionary algorithm structure, provide the methodological basis, and improve the search capacity in multidimensional data fields. An unparalleled degree of clarity and detail in security photography is made possible by this fusion, which is essential for threat identification and prevention. Experiments conducted on a variety of difficult datasets show that the suggested method is reliable for identifying important characteristics in complicated photos, which is a crucial component of homeland security applications. Beyond only addressing short-term security issues, this study establishes a standard for next investigations into quantum-inspired computing for image processing. © 2024 IEEE.

关键词： Imaging systems

来源：评论

学校读者我要写书评

暂无评论

Realtime Object Distance Measurement Using Stereo vision image processing 2nd

Realtime Object Distance Measurement Using Stereo Vision Ima...

引用

2nd International Conference on Big Data, machine Learning, and applications, BigDML 2021

作者： Arunakumari, B.N. Shashidhar, R. Naziya Farheen, H.S. Roopa, M. Department of Computer Science and Engineering BMS Institute of Technology and Management Karnataka Bengaluru560064 India Department of Electronics and Communication Engineering JSS Science and Technology University Karnataka Mysuru570006 India Department of Electronics and Communication Engineering Navkis College of Engineering Hassan Karnataka Hassan573217 India Department of Electronics and Communication Engineering Dayananda Sagar College of Engineering Bengaluru560078 India

ISBN: (纸本)9789819934805

In recent years, great progress has been made on 2D and 3D image understanding tasks, such as object detection and instance segmentation. The recent trends in technology driverless cars are making a difference in daily life. The basic principle in these driverless cars is object detection and localization using multiple video cameras and LIDAR and it is one of the current trends in research and development, so attempts to achieve the same on small scale using the available resources. In the proposed method, firstly the stereo images are captured in a dual-lens camera, and secondly, converting the RGB image into a grayscale image. The third step is to apply a global threshold to separate the background, to get the same size of the image using morphological operation. Blob detection is used to detect the points and regions in the image. The fourth step is to detect the object distance and size measurement using the pinhole camera formula. Further, in the proposed work, an effort is made to determine the linear space between the camera and the object from the pictures taken from the camera. Typically, stereo images are used for computation. Binocular stereopsis, or stereo vision, is the capability to derive information about how far left the objects are, grounded uniquely on the comparative places of the object in the two eyes. It depends on both sensory and motor capabilities, using the similar principle the human brain employs, taking two images of the same object taken from two different linearly separated distances. The frame rate of the system can go a maximum of up to 15 frames per second. 15 frames per second can be considered as acceptable for most autonomous systems, and it will work in realtime. Effective convolutional matching technique between embeddings are used for localization that leads LIDAR to increase centimeter level accuracy by about 97%. © 2024, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

关键词： Pinhole cameras

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：