检索结果-内蒙古大学图书馆

Disruptive Technologies (ICDT), International Conference on

作者： Anushka Thakur Aayush Pande Naincy Pande Richa Khandelwal Prashant Dwibedy Department of Electronics and Computer Science Engineering Ramdeobaba University Nagpur India Department of Computer Science Engineering YCCE Nagpur India Department of Electronics Engineering Ramdeobaba University Nagpur India

ISBN: (数字)9798331519582

ISBN: (纸本)9798331519599

Modern educational environments require effective and efficient systems to track attendance and participation to ensure better learning outcomes and increased productivity. Traditional systems often mark attendance automatically, regardless of the level of student engagement. This paper introduces a novel system, “Revolutionizing Classroom Engagement with Face Recognition and Attention-Based Attendance,” designed to detect multiple faces in real time and automate the process of attendance marking based on students' attention levels. In contrast to traditional methods, attendance is only recorded when students surpass a predefined attention threshold (e.g. 75%) based on their focus during the lecture. This approach fosters a more dynamic, interactive, and focused learning environment. The proposed system leverages advanced face detection and recognition techniques, integrating Haar Cascade Classifiers, Deep Learning-based Face Detection, and K-Nearest Neighbors (KNN) to offer robust and accurate identification even in large, diverse classrooms. Real-time video processing is handled by OpenCV, which captures and analyzes classroom footage, while NumPy processes complex numerical computations for image data. Pandas is utilized for efficient attendance logging, storing data in easily accessible CSV files. The system's attention-tracking feature is another key innovation, as it analyzes students' gaze and behavioral cues to assess their level of engagement. This ensures that attendance is only recorded when students are genuinely focused and attentive. Designed to be scalable and non-intrusive the system can be adapted to classrooms of varying sizes and is easily incorporated into existing educational frameworks. By providing accurate attendance tracking and engagement analysis, the system not only simplifies administrative tasks but also contributes to fostering a smarter, more engaging, and more productive classroom environment.

关键词： Deep learning Accuracy Head Face recognition Scalability Nearest neighbor methods Real-time systems Recording Face detection Usability

来源：评论

学校读者我要写书评

暂无评论

Advancing Software-Defined Vehicles: An End-to-End Framework with Digital Twin Based Attestation for OTA Updates

Advancing Software-Defined Vehicles: An End-to-End Framework...

引用

International Communication Systems and Networks and Workshops, COMSNETS

作者： Krish Agrawal Jha Rohan Nishkarsh Luthra Pilla Venkata Sekhar Hrishesh Sharma Gourinath Banda Department of Computer Science and Engineering Indian Institute of Technology Indore

ISBN: (数字)9798331531195

ISBN: (纸本)9798331531201

Software-defined vehicles (SDVs) are an emerging paradigm in the automotive industry where vehicles’ functionality, performance, and safety can be enhanced and updated through software, even after production. Unlike traditional vehicles, which rely primarily on physical components, SDVs leverage advanced connectivity, real-time data analytics, and cloud integration to adapt to changing regulations, driver preferences, and environmental conditions. This shift enables vehicles to evolve continuously, responding to new technological advances and customer needs. In this paper, we propose an end-to-end framework to demonstrate and realize the full benefits of SDVs. We incorporate the concept of digital twin (of the vehicle) driven software update authorization to download or update an application on the vehicle. In our framework, attestation refers to the verification of each software update’s compatibility and functionality with the vehicle’s current ECUs before deployment. Any impending/requested update is first verified for its compatibility with the vehicle’s architecture (as per the twin) on the cloud. The idea is via the SDVs’ digital twin—since each and every ECU is virtualized —the simulation and testing of the actual hardware setup with the new application software can be done without direct impact on the physical vehicle. Once the application is installed across all relevant virtualized ECUs, the framework confirms compatibility, ensuring smooth deployment and functionality in the real-world vehicle. Post this successful attestation, the installation will be done on the real vehicle corresponding to that digital twin. Through this proposed framework, we intend to ensure the safety of the updation, while the new updates contribute to the functionality and performance improvements.

关键词： Industries Cloud computing Production Regulation Real-time systems Hardware Digital twins Safety Vehicles Testing

来源：评论

学校读者我要写书评

暂无评论

On the Hardness of the Drone Delivery Problem

arXiv

引用

arXiv 2025年

作者： Bartlmae, Simon Hene, Andreas Luo, Kelin Institute of Computer Science University of Bonn Bonn Germany Department of Computer Science and Engineering University at Buffalo NY United States

Fast shipping and efficient routing are key problems of modern logistics. Building on previous studies that address package delivery from a source node to a destination within a graph using multiple agents (such as vehicles, drones, and ships), we investigate the complexity of this problem in specialized graphs and with restricted agent types, both with and without predefined initial positions. Particularly, in this paper, we aim to minimize the delivery time for delivering a package. To achieve this, we utilize a set of collaborative agents, each capable of traversing a specific subset of the graph and operating at varying speeds. This challenge is encapsulated in the recently introduced Drone Delivery Problem with respect to delivery time (DDT). In this work, we show that the DDT with predefined initial positions on a line is NP-hard, even when considering only agents with two distinct speeds. This refines the results presented by Erlebach, et al. [10], who demonstrated the NP-hardness of DDT on a line with agents of arbitrary speeds. Additionally, we examine DDT in grid graphs without predefined initial positions, where each drone can freely choose its starting position. We show that the problem is NP-hard to approximate within a factor of O(n1−Ε), where n is the size of the grid, even when all agents are restricted to two different speeds as well as rectangular movement areas. We conclude by providing an easy O(n) approximation algorithm. © 2025, CC BY.

关键词： Drones

来源：评论

学校读者我要写书评

暂无评论

HEXGEN-2: DISAGGREGATED GENERATIVE INFERENCE OF LLMS IN HETEROGENEOUS ENVIRONMENT

arXiv

引用

arXiv 2025年

作者： Jiang, Youhe Yan, Ran Yuan, Binhang Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong

Disaggregating the prefill and decoding phases represents an effective new paradigm for generative inference of large language models (LLM), which eliminates prefill-decoding interference and optimizes resource allocation. However, it is still an open problem about how to deploy the disaggregated inference paradigm across a group of heterogeneous GPUs, which can be an economical alternative to deployment over homogeneous high-performance GPUs. Towards this end, we introduce HEXGEN-2, a distributed system for efficient and economical LLM serving on heterogeneous GPUs following the disaggregated paradigm. Built on top of HEXGEN, the core component of HEXGEN-2 is a scheduling algorithm that formalizes the allocation of disaggregated LLM inference computations and communications over heterogeneous GPUs and network connections as a constraint optimization problem. We leverage the graph partitioning and max-flow algorithms to co-optimize resource allocation, parallel strategies for distinct inference phases, and the efficiency of inter-phase key-value (KV) cache communications. We conduct extensive experiments to evaluate HEXGEN-2, i.e., on OPT (30B) and LLAMA-2 (70B) models in various real-world settings, the results reveal that HEXGEN-2 delivers up to a 2.0× and on average a 1.3× improvement in serving throughput, reduces the average inference latency by 1.5× compared with state-of-the-art systems given the same price budget, and achieves comparable inference performance with a 30% lower price budget. Copyright © 2025, The Authors. All rights reserved.

关键词： Budget control

来源：评论

学校读者我要写书评

暂无评论

Lip-Audio Modality Fusion for Deep Forgery Video Detection

引用

computers, Materials & Continua 2025年第2期82卷 3499-3515页

作者： Yong Liu Zhiyu Wang Shouling Ji Daofu Gong Lanxin Cheng Ruosi Cheng College of Cyberspace Security Information Engineering UniversityZhengzhou450001China Research Institute of Intelligent Networks Zhejiang LabHangzhou311121China College of Computer Science and Technology Zhejiang UniversityHangzhou310027China Henan Key Laboratory of Cyberspace Situation Awareness Zhengzhou450001China Key Laboratory of Cyberspace Security Ministry of EducationZhengzhou450001China

In response to the problem of traditional methods ignoring audio modality tampering, this study aims to explore an effective deep forgery video detection technique that improves detection precision and reliability by fusing lip images and audio signals. The main method used is lip-audio matching detection technology based on the Siamese neural network, combined with MFCC (Mel Frequency Cepstrum Coefficient) feature extraction of band-pass filters, an improved dual-branch Siamese network structure, and a two-stream network structure design. Firstly, the video stream is preprocessed to extract lip images, and the audio stream is preprocessed to extract MFCC features. Then, these features are processed separately through the two branches of the Siamese network. Finally, the model is trained and optimized through fully connected layers and loss functions. The experimental results show that the testing accuracy of the model in this study on the LRW (Lip Reading in the Wild) dataset reaches 92.3%;the recall rate is 94.3%;the F1 score is 93.3%, significantly better than the results of CNN (Convolutional Neural Networks) and LSTM (Long Short-Term Memory) models. In the validation of multi-resolution image streams, the highest accuracy of dual-resolution image streams reaches 94%. Band-pass filters can effectively improve the signal-to-noise ratio of deep forgery video detection when processing different types of audio signals. The real-time processing performance of the model is also excellent, and it achieves an average score of up to 5 in user research. These data demonstrate that the method proposed in this study can effectively fuse visual and audio information in deep forgery video detection, accurately identify inconsistencies between video and audio, and thus verify the effectiveness of lip-audio modality fusion technology in improving detection performance.

关键词： Deep forgery video detection lip-audio modality fusion mel frequency cepstrum coefficient siamese neural network band-pass filter

来源：评论

学校读者我要写书评

暂无评论

Qualitative and Quantitative Perspectives on Video Watermarking Approaches

Qualitative and Quantitative Perspectives on Video Watermark...

引用

Automation and Computation (AUTOCOM), International Conference on

作者： Vaishali Kalra Chirag Sharma Department of Computer Science and Engineering Lovely Professional University Jalandhar

ISBN: (数字)9798331542375

ISBN: (纸本)9798331542382

The technique of video watermarking performs a critical part in ensuring protection of digital video content by embedding an imperceptible yet detectable mark into the video stream. This paper explores both the qualitative and quantitative aspects of various video watermarking techniques. The study categorizes watermarking methods into spatial domain and frequency domain techniques, providing a comprehensive comparison of their effectiveness regarding visual being imperceptible and resilient to attacks, computational complexity, along with embedding capacity. Important quantitative evaluation measures are provided, such as PSNR(Peak Signal to Noise Ratio), Bit Error Rate(BER), and Structural Similarity Index(SSIM).

关键词： Visualization Measurement uncertainty Watermarking Transforms Streaming media Robustness Noise measurement Computational complexity Protection Signal to noise ratio

来源：评论

学校读者我要写书评

暂无评论

Object-Centric World Model for Language-Guided Manipulation

arXiv

引用

arXiv 2025年

作者： Jeong, Youngjoon Chun, Junha Cha, Soonwoo Kim, Taesup Graduate School of Data Science Department of Electrical and Computer Engineering

A world model is essential for an agent to predict the future and plan in domains such as autonomous driving and robotics. To achieve this, recent advancements have focused on video generation, which has gained significant attention due to the impressive success of diffusion models. However, these models require substantial computational resources. To address these challenges, we propose a world model leveraging object-centric representation space using slot attention, guided by language instructions. Our model perceives the current state as an object-centric representation and predicts future states in this representation space conditioned on natural language instructions. This approach results in a more compact and computationally efficient model compared to diffusion-based generative alternatives. Furthermore, it flexibly predicts future states based on language instructions, and offers a significant advantage in manipulation tasks where object recognition is crucial. In this paper, we demonstrate that our latent predictive world model surpasses generative world models in visuo-linguo-motor control tasks, achieving superior sample and computation efficiency. We also investigate the generalization performance of the proposed method and explore various strategies for predicting actions using object-centric representations. Copyright © 2025, The Authors. All rights reserved.

关键词： Robot programming

来源：评论

学校读者我要写书评

暂无评论

Eye Fundus Disease Classification Using Artificial Intelligence 1st

Eye Fundus Disease Classification Using Artificial Intellige...

引用

1st International Research Conference on Computing Technologies for Sustainable Development, IRCCTSD 2024

作者： Harisudhan, A.S. Prasanna, Raghul Vaibavi, J. Sridhar, Sridevi Department of Computer Science and Engineering SRM Institute of Science and Technology Vadapalani Chennai India

ISBN: (纸本)9783031823886

Eye fundus conditions are dangerous and can cause significant visual impairment if not detected early. Diabetic retinopathy, cataracts, and glaucoma are among the conditions for which manual assessment is directly impacted by ophthalmologists’ experience. The study intends to use artificial intelligence to develop a diagnostic system that is concentrated on the precise and effective classification of eye fundus diseases in order to address this challenge. The research involves the curation of an extensive dataset, named Eye Diseases Classification, that consists of a variety of eye fundus images that illustrate different conditions, including cataracts, glaucoma, and diabetic retinopathy. Expert ophthalmologists have painstakingly annotated every image in the dataset, offering precise ground-truth information that is essential for segmentation tasks. The findings of the experiment demonstrate how well the suggested AI system performs in accurately classifying eye fundus diseases and recognizing impacted areas in the images. This study could potentially reduce the risk of blindness and severe vision impairment by revolutionizing the diagnosis of these diseases. The system expedites diagnosis by automating the classification process, enabling earlier intervention and treatment. Additionally, using AI lessens the workload for ophthalmologists, freeing up their time for more complicated cases and improving the effectiveness of healthcare as a whole. In the end, using AI to diagnose eye fundus illnesses is a huge step forward that will impact public and clinical health in many ways. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

SlidEar: Exploring eSense in-Ear Wearable for Voice-Assisted Smart Slide Supervision

SlidEar: Exploring eSense in-Ear Wearable for Voice-Assisted...

引用

International Communication Systems and Networks and Workshops, COMSNETS

作者： Susmita Mondal Sameeran Ravishankar Zingre Suchetana Chakraborty Department of Computer Science and Engineering Indian Institute of Technology Jodhpur

ISBN: (数字)9798331531195

ISBN: (纸本)9798331531201

In corporate settings, conferences, or classrooms, an orator relies on manual slide transitions, which can disrupt their presentation flow. Presentation devices can be inaccessible due to the physical limitations of individuals, reducing engagement with the audience. Traditionally used remote-controlled wireless presenters are hand-held devices that usually require a compatible connector slot for the presentation device and partially engage one hand of the orator. To ease the hand-free presentation of slides, we introduce SlidEar, a voice-assisted smart slide supervision successfully tested on eSense in-ear wearables. SlidEar leverages voice recognition fusion with wearable sensors to facilitate seamless slide operations. It allows real-time feedback on natural language commands during the presentation of slides. Compared with conventional methods, SlidEar enhances engagement and accessibility through hand-free in-ear placement and helps the orator focus on delivering their content without any disruption or distraction. The evaluation shows promising results that improve the presentation experience profusion. SlidEar significantly reduces errors and can be widely adopted in diverse presentation scenarios.

关键词： Wireless communication Wireless sensor networks Software algorithms Systems architecture Speech recognition Sensor fusion Software Wearable devices Wearable sensors Smart phones

来源：评论

学校读者我要写书评

暂无评论

BERTopic for Topic Modeling of Hindi Short Texts: A Comparative Study

arXiv

引用

arXiv 2025年

作者： Mutsaddi, Atharva Jamkhande, Anvi Thakre, Aryan Haribhakta, Yashodhara Department of Computer Science and Engineering COEP Technological University India

As short text data in native languages like Hindi increasingly appear in modern media, robust methods for topic modeling on such data have gained importance. This study investigates the performance of BERTopic in modeling Hindi short texts, an area that has been under-explored in existing research. Using contextual embeddings, BERTopic can capture semantic relationships in data, making it potentially more effective than traditional models, especially for short and diverse texts. We evaluate BERTopic using 6 different document embedding models and compare its performance against 8 established topic modeling techniques, such as Latent Dirichlet Allocation (LDA), Non-negative Matrix Factorization (NMF), Latent Semantic Indexing (LSI), Additive Regularization of Topic Models (ARTM), Probabilistic Latent Semantic Analysis (PLSA), Embedded Topic Model (ETM), Combined Topic Model (CTM), and Top2Vec. The models are assessed using coherence scores across a range of topic counts. Our results reveal that BERTopic consistently outperforms other models in capturing coherent topics from short Hindi texts. © 2025, CC BY.

关键词： Non-negative matrix factorization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：