检索结果-内蒙古大学图书馆

8th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud), I-SMAC 2024

作者： Dankan Gowda, v. Pathak, Disha Prasad, K.D.v. Srinivas, ved Manu, Y.M. Reddy, N. Sudhakar Bms Institute of Technology and Management Department of Electronics and Communication Engineering Karnataka Bangalore India Symbiosis Centre for Management Studies-Hyderabad Mulshi Pune Maharashtra Lavale412115 India Symbiosis Institute of Business Management Hyderabad Pune India Thiagarajar School of Management Department of Human Resources Tamil Nadu Madurai India Adichunchanagiri University Bgs Institute of Technology Department of Computer Science and Engineering Nagamangala Taluk B.G.Nagara Karnataka Mandya District India Mohan Babu University School of Engineering Department of Ece Tirupati India

ISBN: (纸本)9798350376425

The machine learning algorithm proposed in this paper is suitable for Big Data multimodal datasets and in particular for integrating image and speech data. Preliminary feature extraction is based on convolutional neural network (CNN) for image analysis and recurrent neural network (RNN) for speech processing, while the fusion was done in a fusion layer. For assessment, the system's performance is compared on the aspect of accuracy and latency, computational overhead, and resource consumption. The above experiment results prove that the proposed framework reduces the latency to 120ms and the required computational power, but the training and validation set accuracies as high as 90% and 87% are higher than the corresponding values of basic models such as CNN and hybrids of CNN+LSTM. It also illustrates the system's scalability: inference time of less than 2s in the range of moderate batch size, and lower memory consumption (110 MB) in comparison with the baseline models. The characteristics of the model mean that it takes less resources to process data, making the model ideal for use in applications where there is a need for processing of data in real time. © 2024 IEEE.

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

MUWS 2024: The 3rd International Workshop on Multimodal Human Understanding for the Web and Social Media 24

MUWS 2024: The 3rd International Workshop on Multimodal Huma...

引用

4th Annual International Conference on Multimedia Retrieval (ICMR)

作者： Kastner, Marc A. Cheema, Gullal S. Hakimov, Sherzod Garcia, Noa Kyoto Univ Kyoto Japan Leibniz Univ Hannover L3S Res Ctr Hannover Germany Univ Potsdam Potsdam Germany Osaka Univ Osaka Japan

ISBN: (纸本)9798400706028

Multimodal human understanding and analysis are emerging research areas that cut through several disciplines like Computer vision (Cv), Natural Language processing (NLP), Speech processing, Human-Computer Interaction (HCI), and Multimedia. Several multimodal learning techniques have recently shown the benefit of combining multiple modalities in image-text, audio-visual and video representation learning and various downstream multimodal tasks. At the core, these methods focus on modelling the modalities and their complex interactions by using large amounts of data, different loss functions and deep neural network architectures. However, for many Web and Social media applications, there is the need to model the human, including the understanding of human behaviour and perception. For this, it becomes important to consider interdisciplinary approaches, including social sciences and psychology. The core is understanding various cross-modal relations, quantifying bias such as social biases, and the applicability of models to real-world problems. Interdisciplinary theories such as semiotics or gestalt psychology can provide additional insights on perceptual understanding through signs and symbols across multiple modalities. In general, these theories provide a compelling view of multimodality and perception that can further expand computational research and multimedia applications on the Web and Social media. The theme of the MUWS workshop, multimodal human understanding, includes various interdisciplinary challenges related to social bias analyses, multimodal representation learning, detection of human impressions or sentiment, hate speech, sarcasm in multimodal data, multimodal rhetoric and semantics, and related topics. The MUWS workshop is an interactive event and includes keynotes by relevant experts, a poster session, research presentations and discussion.

关键词： multimodality machine learning image-text relations social media web human understanding semiotics

来源：评论

学校读者我要写书评

暂无评论

Advancements in Computer vision for Automated Fruit Quality Inspection: A Focus on Apple Detection and Grading 9

Advancements in Computer Vision for Automated Fruit Quality ...

引用

9th IEEE International Conference on Smart Structures and Systems, ICSSS 2023

作者： Raji, v. Rajendran, E. Balaji, v. Bhayal, Dinesh Kumar Rishikesh, N. Karthikeyan, D. Skp Engineering College Department of Computer Science and Engineering Tamil Nadu Tiruvannamalai India Skp Engineering College Department of Electrical and Electronics Engineering Tamil Nadu Tiruvannamalai India MAI-NEFHI College of Engineering and Technology Faculty of Electrical and Electronics Engineering Asmara Eritrea Medi-Caps University Department of Information Technology Indore India Bannari Amman Institute of Technology Department of Electrical and Electronics Engineering Erode India Srm Institute of Science and Technology Department of Electrical and Electronics Engineering Chennai India

ISBN: (纸本)9798350384208

The paper discusses the most current advancements in image analysis and computer vision systems and how they are being used to assess the grade of food products. Computer vision is a quick, reliable, and objective examination method that has spread throughout many different sectors. Its efficiency and precision enable the creation of fully automated systems by meeting the growing needs for quality and output. With a focus on monochrome image processing, colour images, and multispectral imaging for contemporary sorting and grading systems, the needs and most current advancements in hardware and software for machine vision systems are reviewed. The suggested method utilises the SvM algorithm to learn and identify fruit quality while using processing of images to extract features. Research experiments show how accurate the system is. Also included are demonstrations of uses for spotting faults, pollution, and illness on vegetables and fruits. © 2023 IEEE.

关键词： Feature extraction

来源：评论

学校读者我要写书评

暂无评论

The Analysis of Srgb Color Space Based Density for Brain Tumor Segmentation 7th

The Analysis of Srgb Color Space Based Density for Brain Tum...

引用

7th International Symposium on Intelligent Informatics, ISI 2022

作者： Gangadharappa, S. Naveena, C. Aradhya, v. N. Manjunath Department of Computer Science and Engineering SJBIT Karnataka Bengaluru India Department of Computer Applications JSS Science and Technology University Mysuru Mysore India

ISBN: (纸本)9789811980930

Medical image processing is one of the significant fields to identify the diseases as earlier to diagnose them appropriately. The brain tumor segmentation process is sub branch of a medical image processing field. The computer vision and machine learning techniques provide an effective channel for the medical practitioners for diagnosing the diseases in an effective method. This research article implements the Srgb-based density analysis for isolating the brain tumor space in MRI images. Intensity values of a given input are normalized using Srgb color space and Gaussian filter to distinguish the tumor region from the background. The adaptive threshold technique helps identify the possible tumor space in brain MRI samples. The actual brain tumor space is extracted by performing the region properties such as area and density function. Finally, the accurate tumor space is detected by applying morphological functions with eliminating possible false positives. Performance metrics including recall, precision, and F-measure are used to assess the effectiveness of the proposed approach. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

关键词： Tumors

来源：评论

学校读者我要写书评

暂无评论

image processing Technique for Tracking and Counting of vehicles Using ROI

Image Processing Technique for Tracking and Counting of Vehi...

引用

2023 International Conference on Power Energy, Environment and Intelligent Control, PEEIC 2023

作者： Kumar, Ankit Chandel, Ritika Singh, Monika Apex Institute of Technology Chandigarh University Mohali India

ISBN: (纸本)9798350357769

This research paper presents a novel approach for vehicle tracking and counting utilizing the advanced object detection model YOLOv8 in the field of image processing. The accurate monitoring of vehicular traffic is crucial for various applications, including traffic management, urban planning, and safety analysis. Existing methods often struggle with real-time tracking and counting of vehicles due to challenges such as occlusions, varying lighting conditions, and complex traffic scenarios. In our proposed methodology, we leverage the power of YOLOv8, a deep learning-based object detection model, uses real-time detection to identify and monitor vehicles, ensuring high accuracy and robustness in various traffic environments. The proposed approach provides a valuable tool for traffic management authorities and urban planners, offering precise data for traffic flow analysis and infrastructure development. © 2023 IEEE.

关键词： Deep Learning image processing machine vision Object Detection ROI (Region of Interest) vehicle Detection YOLO (You Only Look Once)

来源：评论

学校读者我要写书评

暂无评论

An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance

An image speaks a thousand words, but can everyone listen? O...

引用

2024 Conference on Empirical Methods in Natural Language processing, EMNLP 2024

作者： Khanuja, Simran Ramamoorthy, Sathyanarayanan Song, Yueqi Neubig, Graham Carnegie Mellon University United States

ISBN: (纸本)9798891761643

Given the rise of multimedia content, human translators increasingly focus on culturally adapting not only words but also other modalities such as images to convey the same meaning. While several applications stand to benefit from this, machine translation systems remain confined to dealing with language in speech and text. In this work, we introduce a new task of translating images to make them culturally relevant. First, we build three pipelines comprising state-of-the-art generative models to do the task. Next, we build a two-part evaluation dataset - (i) concept: comprising 600 images that are cross-culturally coherent, focusing on a single concept per image;and (ii) application: comprising 100 images curated from real-world applications. We conduct a multi-faceted human evaluation of translated images to assess for cultural relevance and meaning preservation. We find that as of today, image-editing models fail at this task, but can be improved by leveraging LLMs and retrievers in the loop. Best pipelines can only translate 5% of images for some countries in the easier concept dataset and no translation is successful for some countries in the application dataset, highlighting the challenging nature of the task. Our project webpage is here and our code, data and model outputs can be found here. © 2024 Association for Computational Linguistics.

关键词： machine translation

来源：评论

学校读者我要写书评

暂无评论

Glaucoma Detection with Retinal Fundus images Using Segmentation and Classification

引用

machine Intelligence Research 2022年第6期19卷 563-580页

作者： Thisara Shyamalee Dulani Meedeniya Department of Computer Science and Engineering University of MoratuwaKatubedda 10400Sri Lanka

Glaucoma is a prevalent cause of blindness *** not treated promptly,it can cause vision and quality of life to *** to statistics,glaucoma affects approximately 65 million individuals *** image segmentation depends on the optic disc(OD)and optic cup(OC).This paper proposes a computational model to segment and classify retinal fundus images for glaucoma *** data augmentation techniques were applied to prevent overfitting while employing several data pre-processing approaches to improve the image quality and achieve high *** segmentation models are based on an attention U-Net with three separate convolutional neural networks(CNNs)backbones:Inception-v3,visual geometry group 19(vGG19),and residual neural network 50(ResNet50).The classification models also employ a modified version of the above three CNN *** the RIM-ONE dataset,the attention U-Net with the ResNet50 model as the encoder backbone,achieved the best accuracy of 99.58%in segmenting *** Inception-v3 model had the highest accuracy of 98.79%for glaucoma classification among the evaluated segmentation,followed by the modified classification architectures.

关键词： Attention U-Net segmentation classification Inception-v3 visual geometry group 19(vGG19) residual neural network 50(ResNet50) glaucoma fundus images

来源：评论

学校读者我要写书评

暂无评论

Computer vision-Based Object Detection and Depth Estimation to Generate Point Clouds and Estimate Distances

引用

Transactions of the Korean Institute of Electrical Engineers 2025年第1期74卷 164-169页

作者： Eom, Tae-Hyun Paik, Woojin Dept. of Computer Engineering Konkuk University Glocal Campus Korea Republic of Dept. of Computer Engineering Konkuk University Glocal Campus Korea Republic of

This research proposes a distance estimation method using Mono Camera-based object detection and depth estimation to generate Point Cloud data. The study aims to enhance the applicability of Mono Cameras in autonomous vehicles and robots, reducing costs compared to Stereo Camera systems. The method utilizes YOLOv8 for object detection and Depth Anything v2 for depth estimation. Results indicate that while the proposed method offers potential, it exhibits higher error rates in distance estimation compared to Stereo Camera-based approaches, primarily due to the limitations of current depth estimation technologies. The study highlights the need for further improvements in depth estimation models, particularly to address environmental factors such as lighting. Additionally, the research demonstrates that combining depth estimation with bounding box methods helps reduce estimation errors, showing promise for more stable performance. Future work will focus on improving depth estimation accuracy and making the proposed method more efficient for real-time applications, with the potential to integrate into visual SLAM, a key technology in autonomous driving systems. © 2025 Korean Institute of Electrical Engineers. All rights reserved.

关键词： Stereo image processing

来源：评论

学校读者我要写书评

暂无评论

A systematic review and Meta-data analysis on the applications of Deep Learning in Electrocardiogram

引用

Journal of Ambient Intelligence and Humanized Computing 2023年第7期14卷 9677-9750页

作者： Musa, Nehemiah Gital, Abdulsalam Ya’u Aljojo, Nahla Chiroma, Haruna Adewole, Kayode S. Mojeed, Hammed A. Faruk, Nasir Abdulkarim, Abubakar Emmanuel, Ifada Folawiyo, Yusuf Y. Ogunmodede, James A. Oloyede, Abdukareem A. Olawoyin, Lukman A. Sikiru, Ismaeel A. Katb, Ibrahim Department of Mathematical Sciences Abubakar Tafawa Balewa University Bauchi Nigeria University of Jeddah Jeddah Saudi Arabia Computer Science and Engineering University of Hafr Al-Batin Hafr Saudi Arabia Computer Science and Engineering University of Hafr Al-Batin Hafr Al-Batin Saudi Arabia Department of Computer Science University of Ilorin Ilorin Nigeria Department of Physics Sule Lamido University Kafin Hausa Nigeria Department of Electrical Engineering Ahmadu Bello University Zaria Zaria Nigeria Department of Medicine University of Ilorin Ilorin Nigeria

The success of deep learning over the traditional machine learning techniques in handling artificial intelligence application tasks such as image processing, computer vision, object detection, speech recognition, medical imaging and so on, has made deep learning the buzz word that dominates Artificial Intelligence applications. From the last decade, the applications of deep learning in physiological signals such as electrocardiogram (ECG) have attracted a good number of research. However, previous surveys have not been able to provide a systematic comprehensive review including biometric ECG based systems of the applications of deep learning in ECG with respect to domain of applications. To address this gap, we conducted a systematic literature review on the applications of deep learning in ECG including biometric ECG based systems. The study analyzed systematically, 150 primary studies with evidence of the application of deep learning in ECG. The study shows that the applications of deep learning in ECG have been applied in different domains. We presented a new taxonomy of the domains of application of the deep learning in ECG. The paper also presented discussions on biometric ECG based systems and meta-data analysis of the studies based on the domain, area, task, deep learning models, dataset sources and preprocessing methods. Challenges and potential research opportunities were highlighted to enable novel research. We believe that this study will be useful to both new researchers and expert researchers who are seeking to add knowledge to the already existing body of knowledge in ECG signal processing using deep learning algorithm. © 2022, The Author(s), under exclusive licence to Springer-verlag GmbH Germany, part of Springer Nature.

关键词： Electrocardiograms

来源：评论

学校读者我要写书评

暂无评论

SAR image Fusion Classification Based on Improved D-S Evidence Theory 8

SAR Image Fusion Classification Based on Improved D-S Eviden...

引用

2024 8th International Conference on machine vision and Information Technology, CMvIT 2024

作者： Liang, Jialu Fang, Hui Zhang, Xinggan School of Electronic Science and Engineering Nanjing University Nanjing China

applications for classifying Synthetic Aperture Radar (SAR) images are critical to environmental monitoring, urban planning, and land resource surveying. Fusion approaches work well for increasing SAR image categorization accuracy. However, effective processing of uncertainty information is frequently overlooked by conventional fusion classification systems. This paper presented a fusion classification approach for SAR images based on improved Dempster-Shafer (D-S) evidence theory, taking into account the uncertainty and possible conflicts in the classifiers' output. First, after adaptively identifying the optimal hyperparameters of models, the SAR data is identified using Gradient Boosting machine (GBM), Multilayer Perceptron (MLP), and Random Forest (RF). Different weights are then allocated in accordance with the variations in classifier performance. Ultimately, enhanced D-S evidence theory is used to integrate the output of these classifiers. In addition, uncertainty in classification results is also modeled and visualized. Experiments with Flevoland and Oberpfaffenhofen datasets show that the accuracy of this method is 85.21% and 91.88% respectively, outperforming both the soft voting ensemble classifier (SvE) and D-S evidence theory fusion method. This approach enhances the comprehension and capacity to manage uncertainty in the model output in addition to increasing the accuracy and reliability of the classification of SAR data. © Published under licence by IOP Publishing Ltd.

关键词： Synthetic aperture radar

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：