Extensively used in electric vehicles (EVs), lithium-ion (Li-ion) batteries, undergo significant degradation after several charge-discharge cycles, leading to their retirement from high-demand applications. However, t...
详细信息
The industrial interest for Directed Energy Deposition (DED) proce bes increases;however, intensive experimental work is needed for the determination of proce b inputs each time a new material and machine are investig...
详细信息
The Transformer architecture has been widely used in the field of speech synthesis due to its powerful modeling capabilities and flexibility. However, the existing Transformer architecture still encounters many perfor...
详细信息
ISBN:
(数字)9798331533113
ISBN:
(纸本)9798331533120
The Transformer architecture has been widely used in the field of speech synthesis due to its powerful modeling capabilities and flexibility. However, the existing Transformer architecture still encounters many performance limitations in practical operation, such as insufficient naturalness of sound quality, slow inference speed, poor cross language adaptability, and high computational resource consumption. In response to these issues, this article proposes a series of targeted performance optimization strategies, including improving the model architecture to enhance the naturalness of sound quality, accelerating the inference process to meet real-time requirements, enhancing the model's adaptability to multiple languages and cross domains, and adopting efficient algorithms and hardware optimization to reduce computational resource consumption. Through experimental verification, these optimization strategies effectively promote the performance improvement of speech synthesis systems, greatly enhancing their feasibility and efficiency in practical applications.
This paper introduces an innovative approach to Retrieval-Augmented Generation (RAG) for video question answering (VideoQA) through the development of an adaptive chunking methodology and the creation of a bilingual e...
详细信息
ISBN:
(数字)9798331523114
ISBN:
(纸本)9798331523121
This paper introduces an innovative approach to Retrieval-Augmented Generation (RAG) for video question answering (VideoQA) through the development of an adaptive chunking methodology and the creation of a bilingual educational dataset. Our proposed adaptive chunking technique, powered by CLIP embeddings and SSIM scores, identifies meaningful transitions in video content by segmenting educational videos into semantically coherent chunks. This methodology optimizes the processing of slide-based lectures, ensuring efficient integration of visual and textual modalities for downstream RAG tasks. To support this work, we gathered a bilingual dataset comprising Persian and English mid- to long-duration academic videos, curated to reflect diverse topics, teaching styles, and multilingual content. Each video is enriched with synthetic question-answer pairs designed to challenge pure large language models (LLMs) and underscore the necessity of retrieval-augmented systems. The evaluation compares our CLIP-SSIM-based chunking approach against conventional video slicing methods, demonstrating significant improvements across RAGAS metrics, including Answer Relevance, Context Relevance, and Faithfulness. Fur-thermore, our findings reveal that the multimodal image-text retrieval scenario achieves the best overall performance, emphasizing the importance of integrating complementary modalities. This research establishes a robust framework for video RAG pipelines, expanding the capabilities of multimodal AI systems for educational content analysis and retrieval 1 1 The dataset is publicly available at: https://***/datasets/uIAICIEduViQA.
The rapid advancement of large language models and computer vision systems has opened new frontiers in artificial intelligence. This paper introduces InterACT, a novel cross-modal system that integrates leading langua...
详细信息
ISBN:
(数字)9798331505745
ISBN:
(纸本)9798331505752
The rapid advancement of large language models and computer vision systems has opened new frontiers in artificial intelligence. This paper introduces InterACT, a novel cross-modal system that integrates leading language and vision models to enable more intuitive and context-aware human AI interactions. By leveraging the strengths of both modalities, InterACT processes natural language queries to guide object detection in visual environments, bridging the gap between linguistic and visual understanding. We evaluate the performance of three language models of varying sizes in conjunction with an advanced object detection model, exploring the balance between computational efficiency and interaction quality. Our experiments demonstrate InterACT's potential across diverse real-world scenarios, revealing promising results in maintaining contextual relevance and adaptability. This research contributes to the growing field of cross-modal AI systems and lays the groundwork for more sophisticated human AI interactions, with implications for applications ranging from assistive technologies to educational tools.
In recent times, there has been a notable surge in the exploration of studying human body movements through the utilization of inertial measurement units that can be worn. This trend stems from its substantial impact ...
详细信息
The rising interconnectivity of digital systems has brought about the challenges of network security as high risks of data theft and unauthorized access through intrusions in networks. This paper details the developme...
详细信息
ISBN:
(数字)9798331523893
ISBN:
(纸本)9798331523909
The rising interconnectivity of digital systems has brought about the challenges of network security as high risks of data theft and unauthorized access through intrusions in networks. This paper details the development of a Network Intrusion Detection System that uses deep learning techniques and is designed to detect normal as well as anomalous activities within the network, such as zero-Day vulnerabilities, as discussed has proven that this model was capable of obtaining excellent accuracy, up to 99.25%, with precision and recall metric being well achieved, owed to advanced techniques, in this case, hyperparameter tuning. This will mean the model's capability to clearly differentiate legitimate and malicious traffic. In addition, explainable AI techniques such as SHAP and LIME deliver insights into the contributions or importance of various features about a model's decision-making.
As one of the important electrical parameters in the study of the earth's medium, the complex resistivity of rock can help determine the temperature distribution, oil and gas reservoirs, mineral distribution, and ...
详细信息
The aim of the work is to develop and test a set of measures to improve the efficiency of projects for the construction of hybrid energy complexes with power plants based on renewable energy sources by using digital t...
详细信息
ISBN:
(数字)9798331511241
ISBN:
(纸本)9798331511258
The aim of the work is to develop and test a set of measures to improve the efficiency of projects for the construction of hybrid energy complexes with power plants based on renewable energy sources by using digital twins. The paper shows the effectiveness of research and development of recommendations on the choice of parameters and modes of operation of hybrid power complexes having a diverse composition of generating plants and high dependence on natural uncertain factors at all stages of the life cycle using the developed universal digital twin, which was verified by the example of real power complexes with different equipment composition.
Brain tumours are among the emerging significant health challenges that require early detection for complete management. Though the mainstay of diagnosis remains MRI, it can be applied in its interpretation with some ...
详细信息
ISBN:
(数字)9798331533205
ISBN:
(纸本)9798331533212
Brain tumours are among the emerging significant health challenges that require early detection for complete management. Though the mainstay of diagnosis remains MRI, it can be applied in its interpretation with some challenges. This study investigates the potential of leveraging Convolutional Neural Networks, a form of artificial intelligence, to classify different brain tumours from MRI images with enhanced metrics. CNNs are said to provide the best currently available solution in achieving minimal preprocessing followed by feature engineering especially with respect to image analysis. Accuracy and efficiency in brain tumour diagnosis can be drastically improved by leveraging the power of AI.
暂无评论