Texture classification plays a crucial role in applications ranging from object recognition and product design to surface exploration. Utilizing deep learning methods with sensors, such as accelerometers, offers a way...
Texture classification plays a crucial role in applications ranging from object recognition and product design to surface exploration. Utilizing deep learning methods with sensors, such as accelerometers, offers a way to identify key surface features without the need to precisely replicate human touch. A Contextually Guided Convolutional Neural Network (CG-CNN) employs contextual guidance by developing auxiliary tasks during its training. These tasks offer implicit, yet rigorous, internal supervision signals. When trained with these subtasks, CG-CNN learns to represent the innate structure and patterns within the data, resulting in robust, transferrable, and local/contextual-neighborhood-preserving domain representations. This paper extends the CG-CNN framework for texture classification by integrating semisupervised learning. Empirical evaluations on the VibTac-12 texture dataset reveal that CG-CNN effectively generalizes to novel and unfamiliar textures, even when trained with scarce labeled examples. By harnessing vast amounts of unlabeled, contextually relevant data alongside the labeled samples, CG-CNN ensures robust and precise texture classification. Such advancements hold promise for applications in robotics, prosthetics, and haptic interfaces.
In most digital cameras, the sensor uses a Bayer filter array to capture the image. This array records only one colour, either blue or green or red for every image pixel, resulting in a mosaic image. To retrieve the m...
详细信息
ISBN:
(数字)9798331518394
ISBN:
(纸本)9798331518400
In most digital cameras, the sensor uses a Bayer filter array to capture the image. This array records only one colour, either blue or green or red for every image pixel, resulting in a mosaic image. To retrieve the missing colour information in the captured image making use of cross-channel interpolation is referred to as demosaicing. Demosaicing, especially in the context of colour demosaicing (CDM), plays a vital role as the first step in obtaining high-quality colour images with single-chip cameras. Traditional demosaicing methods, used to reconstruct colour images from raw sensor data in digital cameras, have several drawbacks. They often result in a loss of spatial resolution, introduce artifacts, lack robustness in challenging conditions, and may require manual tuning for optimal performance. In contrast, neural networks offer advantages like end-to-end learning, improved image quality, robustness, flexibility, and state-of-the-art performance. They learn complex mappings directly from raw data, adapt to various conditions, and produce visually pleasing, high-resolution images. However, they require substantial training data and computational resources, making them less suitable for resource-constrained applications. In this paper, conventional interpolation methods are compared to deep-learning based approaches for image demosaicing. To validate the approach to image demosaicing, aerial images captured from the Mars Colour Camera have been used.
This research analyzes the challenges faced by Sign Language Recognition (SLR) systems by evaluating the performance of EfficientNetB3, Inception-v3, and GoogLeNet models. Using a dataset of 27,000 American Sign Langu...
详细信息
ISBN:
(数字)9798331532420
ISBN:
(纸本)9798331532437
This research analyzes the challenges faced by Sign Language Recognition (SLR) systems by evaluating the performance of EfficientNetB3, Inception-v3, and GoogLeNet models. Using a dataset of 27,000 American Sign Language (ASL) images across 27 classes, EfficientNetB3 emerged as the top performer. After fine-tuning the model with the full dataset, it achieved a test accuracy of 99.53%. We then integrated this optimized model into a real-time, user-friendly Android app for sign language recognition, offering a valuable tool for enhancing communication accessibility, particularly for the deaf and hard-of-hearing. This research work contributes to advancing SLR and promoting inclusive communication.
Multi-modality classification has flourished in recent years. Traditional methods mainly focus on advancing deep neural networks (DNN) to meet high performance. However, the interpretability of these methods remains b...
Multi-modality classification has flourished in recent years. Traditional methods mainly focus on advancing deep neural networks (DNN) to meet high performance. However, the interpretability of these methods remains blind due to the complexity and ambiguity of DNN, which also causes distrust. This problem is enlarged in sensitive areas, such as biomedical computing. Hence, we propose a novel dual trustworthy mechanism for multi-modality classification (DTMC), which can make the process and results of DNN more credible and interpretable while increasing performance. Specifically, a confidence attention mechanism is performed from local and global views to improve the process’ confidence by evaluating the attention scores and distinguishing the abnormal information. A confidence probability mechanism from local and global perspectives is conducted in the prediction stage to enhance the results’ confidence. Extensive experiments on multi-modality medical classification datasets show superior performance with the interpretability of the proposed method compared to the state-of-the-art (SOTA) methods. Our resources are open at https://***/ghh1125/data.
The JARVIS AI Support System represents a remarkable fusion of modern technology, blending a sophisticated GUI design, seamless voice control, and inventive features like the captivating “Air Canvas” facilitated by ...
详细信息
ISBN:
(数字)9798350354379
ISBN:
(纸本)9798350354386
The JARVIS AI Support System represents a remarkable fusion of modern technology, blending a sophisticated GUI design, seamless voice control, and inventive features like the captivating “Air Canvas” facilitated by OpenCV. This AI-driven virtual assistant offers users a natural and intuitive experience, allowing them to effortlessly perform tasks such as browsing the web, interacting with a chatbot, and executing dynamic voice- controlled actions. Moreover, the system showcases advanced capabilities including motion detection and facial recognition with an accuracy of 95% in multiple runs. Leveraging the power of computer vision, the Air Canvas feature empowers users to express creativity through fluid hand gestures, while voice commands effortlessly manage diverse tasks. This innovative project presents an approachable way to interact with technology in the world of AI.
Wildfires are one of the most destructive natural disasters that cause significant harm to both humans and the environment. Predicting their spread is critical for disaster management and preparedness. In this study, ...
Wildfires are one of the most destructive natural disasters that cause significant harm to both humans and the environment. Predicting their spread is critical for disaster management and preparedness. In this study, we have utilized machine learning algorithms, including Decision Tree Regression, XG Boost Regression, and Artificial Neural Networks, to predict the spread of wildfires using the Next Day Wildfire dataset. The dataset includes satellite images, weather, and geography conditions aggregated across the United States from 2012 to 2020. We preprocessed and engineered the dataset which includes the features such as elevation, wind direction and speed, temperature, humidity, precipitation, drought index, vegetation index, energy release component, and population density. We evaluated the models using the Root Mean Squared Error (RMSE) metric and found that the Decision Tree Regression algorithm performed the best with the lowest RMSE score. Our study highlights the potential of machine learning algorithms in predicting the spread of wildfires, which can aid in better disaster management and preparedness efforts.
Software vulnerabilities continue to be ubiquitous, even in the era of AI-powered code assistants, advanced static analysis tools, and the adoption of extensive testing frameworks. It has become apparent that we must ...
详细信息
This paper presents a data-driven methodology that utilizes Dynamic Mode Decomposition (DMD) for the time-domain (TD) electromagnetic (EM) modeling of microwave devices. As an unsupervised machine learning technique, ...
详细信息
ISBN:
(数字)9798350355581
ISBN:
(纸本)9798350355598
This paper presents a data-driven methodology that utilizes Dynamic Mode Decomposition (DMD) for the time-domain (TD) electromagnetic (EM) modeling of microwave devices. As an unsupervised machine learning technique, DMD leverages a limited set of unlabeled spatio-temporal electromagnetic (EM) data to determine DMD eigenvalues and eigenmodes. Then, the obtained DMD model reconstructs the dynamics as a series of exponential terms based on linear assumptions. The effectiveness of this approach is demonstrated through the TD EM modeling of photonic crystal waveguides. Comparative analysis with the finite-difference time-domain (FDTD) method shows that the DMD model not only achieves precise modeling but also facilitates robust short-term forecasting.
A steganography technique based on the Integer Wavelet Transform (IWT) is proposed. High-frequency coefficients are exploited more to embed the confidential message since they are less sensitive to the human eye. Base...
详细信息
The deepfake generation of singing vocals is a concerning issue for artists in the music industry. In this work, we propose a singing voice deepfake detection (SVDD) system, which uses noise-variant encodings of open...
详细信息
暂无评论