Small size, hardware-only, custom data processing cores connected to a host processor are increasingly considered in digital systems design to provide the ability for accelerating data processing operations through ha...
详细信息
ISBN:
(数字)9798331543952
ISBN:
(纸本)9798331543969
Small size, hardware-only, custom data processing cores connected to a host processor are increasingly considered in digital systems design to provide the ability for accelerating data processing operations through hardware acceleration and parallel processing. These can be realized in different forms such as a custom core connected to a host processor on the same die as the processor, as a separate die (e.g., a chiplet) within the same package as the processor or as a separate Integrated Circuit. These require advanced design flows, low-geometry fabrication processes, and suitable test circuit insertion to implement a suitable test strategy. In this paper, the testability of custom hardware cores is considered through test circuit insertion based on the IJTAG standard. The core is to perform data processing operations based on the Mathematics of Arrays and serial communications is based on the Serial Peripheral Interface. This interface is used for both normal operating mode and the IJTAG based test mode. The system design is considered at the pre-synthesis design development stage and identifies architectural considerations for such system designs.
The sign language retrieval system, while similar to conventional video retrieval systems, faces distinct challenges that require a thorough understanding of the meanings conveyed through human actions depicted in the...
详细信息
ISBN:
(数字)9798331521394
ISBN:
(纸本)9798331521400
The sign language retrieval system, while similar to conventional video retrieval systems, faces distinct challenges that require a thorough understanding of the meanings conveyed through human actions depicted in the videos. Typically, meaningful information is extracted using RGB feature extraction techniques. However, this method can lead to redundancy, as it often fails to capture the global contextual information inherent in each video frame. Additionally, the reliance on RGB features can present significant memory challenges due to the large data requirements associated with processing and storing this information. Consequently, a more efficient approach is necessary to enhance both the accuracy and effectiveness of sign language retrieval. To address these issues, we have implemented a semantic masked transformer architecture specifically tailored for the sign language retrieval task. This innovative system not only incorporates RGB features but also integrates pose features, which are derived from the movements and positions of the signer. By fusing these two types of features based on their similarity, the system can provide a more refined and pertinent retrieval of textual information related to the video frames. Our extensive experiments conducted on a custom dataset designed for Tamil Sign Language demonstrate that this framework significantly outperforms current state-of-the-art methods. These results underscore the effectiveness of combining semantic understanding with advanced feature extraction techniques, ultimately leading to improved accuracy and efficiency in sign language retrieval.
Remote sensing image classification is a popular yet challenging field. Many researchers have combined convolutional neural networks (CNNs) and Transformers for hyperspectral image (HSI) classification tasks. However,...
详细信息
This paper is a comprehensive review of recent video search engine technology, focusing on three main areas: video processing, speech recognition, and indexing. With the rise of multimedia in various fields, there is ...
详细信息
ISBN:
(数字)9798331509859
ISBN:
(纸本)9798331509866
This paper is a comprehensive review of recent video search engine technology, focusing on three main areas: video processing, speech recognition, and indexing. With the rise of multimedia in various fields, there is a growing need to process, index, and retrieve video content based on user queries. This paper covers key advancements, including enhanced video frame extraction, natural language processing (NLP) for speech-to-text and indexing strategies for video search. It also discusses current challenges, existing solutions, and future directions and highlights the rapid evolution of video search technology.
The combination of Augmented Reality (AR) and Internet of Things (IoT), often referred to as AR + IoT, offers exciting possibilities for future research and development. IoT serves as a platform for communication and ...
详细信息
ISBN:
(数字)9798331512088
ISBN:
(纸本)9798331512095
The combination of Augmented Reality (AR) and Internet of Things (IoT), often referred to as AR + IoT, offers exciting possibilities for future research and development. IoT serves as a platform for communication and remote control of connected devices, while AR provides an intuitive interface for visualizing and managing these devices through enhanced, information-rich displays. This research aims to develop an interactive and accessible home automation system that integrates IoT and AR technologies, enabling seamless remote control of electrical devices. Utilizing an ESP32/NodeMCU board interfaced with the Blynk IoT platform, the system supports device management through a mobile application and an AR interface designed in Unity. The AR interface facilitates intuitive interactions via URL-based signals, which are transmitted to the IoT hardware for execution. This dual-control mechanism enhances user flexibility and convenience, making smart home automation more accessible to users with varying technical expertise. The proposed system not only improves user interaction and control efficiency but also demonstrates the potential of AR in augmenting IoT applications, bridging the gap between physical and virtual control environments. Future enhancements could include advanced AR features, voice control, and cloud-based analytics to further optimize system performance and scalability.
Emotion recognition in text is a crucial area within Natural Language Processing (NLP), serving diverse applications such as sentiment analysis, mental health assessment, and human-computer interaction. This study exp...
详细信息
ISBN:
(数字)9798331512088
ISBN:
(纸本)9798331512095
Emotion recognition in text is a crucial area within Natural Language Processing (NLP), serving diverse applications such as sentiment analysis, mental health assessment, and human-computer interaction. This study explores the use of Bidirectional Encoder Representations from Transformers (BERT) for this purpose, capitalizing on its bidirectional transformer architecture to grasp intricate contextual relationships in text. By fine tuning a pretrained BERT model on a dataset annotated with emotional categories like happiness, sadness, anger, surprise, and fear, the model achieves precise emotion classification. BERT's pre-trained linguistic knowledge and ability to interpret complex semantic and syntactic patterns provide a strong foundation for this task. Experimental results demonstrate that BERT surpasses traditional machine learning approaches, including Support Vector Machines (SVM) and logistic regression, in terms of accuracy, owing to its advanced contextual embeddings. Additionally, the model exhibits excellent generalization capabilities across diverse emotional expressions, highlighting its effectiveness for emotion-aware NLP applications. This project also leverages Streamlit to develop interactive web applications that integrate NLP-based emotion detection, showcasing its practicality and accessibility for real-world use cases.
Sleep apnea disorder manifests as obstructive sleep apnea, which affects immense portions of society because it causes life-threatening health problems that include cardiovascular disease alongside diabetes and decrea...
详细信息
ISBN:
(数字)9798331505745
ISBN:
(纸本)9798331505752
Sleep apnea disorder manifests as obstructive sleep apnea, which affects immense portions of society because it causes life-threatening health problems that include cardiovascular disease alongside diabetes and decreased productivity. The diagnosis of OSA through manual methods is both time demanding and expensive and contains a significant potential for human errors while requiring heavy resources. This research presents deep learning automation as the novel approach for automatic OSA detection through polysomnographic assessments. Our proposed system implements Genetics Algorithm optimized machine learning algorithms which include Logistic Regression, Support Vector Machines, Random Forests, Artificial Neural Networks and more. The analysis utilizes data from the Sleep Health and Lifestyle Dataset for preprocessing operations and features extraction that leads to better sleep disorder and stage classification accuracy. Through hyperparameter adjustment the Random Forest predictive model reached 95% F1 score which proved most effective for OSA detection. Through automation the classification process benefits from reliability with scalability and efficiency and produces better patient outcomes.
Action detection aims to detect (recognize and localize) human actions spatially and temporally in videos. Existing approaches focus on the closed-set setting where an action detector is trained and tested on videos f...
详细信息
ISBN:
(数字)9798331510831
ISBN:
(纸本)9798331510848
Action detection aims to detect (recognize and localize) human actions spatially and temporally in videos. Existing approaches focus on the closed-set setting where an action detector is trained and tested on videos from a fixed set of action categories. However, this constrained setting is not viable in an open world where test videos inevitably come beyond the trained action categories. In this paper, we address the practical yet challenging Open-Vocabulary Action Detection (OVAD) problem. It aims to detect any action in test videos while training a model on a fixed set of action categories. To achieve such an open-vocabulary capability, we propose a novel method OpenMixer that exploits the inherent semantics and localizability of large vision-language models (VLM) within the family of query-based detection transformers (DETR). Specifically, the OpenMixer is developed by spatial and temporal OpertMixer blocks (S-OMB and T-OMB), and a dynamically fused alignment (DFA) module. The three components collectively enjoy the merits of strong generalization from pretrained VLMs and end-to-end learning from DETR design. Moreover, we established OVAD benchmarks under various settings, and the experimental results show that the OpenMixer performs the best over baselines for detecting seen and unseen actions. We release the codes, models, and dataset splits at https://***/Cogito2012/0penMixer.
Applying diffusion models to image-to-image translation (I2I) has recently received increasing attention due to its practical applications. Previous attempts inject information from the source image into each denoisin...
详细信息
Imagination in world models is crucial for enabling agents to learn long-horizon policy in a sample-efficient manner. Existing recurrent state-space model (RSSM)-based world models depend on single-step statistical in...
详细信息
暂无评论