A text encoder within Vision-Language Models (VLMs) like CLIP plays a crucial role in translating textual input into an embedding space shared with images, thereby facilitating the interpretative analysis of vision ta...
详细信息
This study investigated the electrical properties of AlGaN/GaN high-electron-mobility transistors (HEMTs) with varied recess depths under the gate electrode. We demonstrated a recess depth of approximately 6 nm, which...
详细信息
Quantum memory devices with high storage efficiency and bandwidth are essential elements for future quantum networks. Solid-state quantum memories can provide broadband storage, but they primarily suffer from low stor...
详细信息
Quantum memory devices with high storage efficiency and bandwidth are essential elements for future quantum networks. Here, we report a storage efficiency greater than 28% in a Tm3+: YAG crystal in elevated temperatur...
详细信息
Agriculture is a vital industry for the people of Indonesia, but there are several obstacles, including limited space in urban areas and inefficient conventional sorting and harvesting methods. Using computer vision t...
Agriculture is a vital industry for the people of Indonesia, but there are several obstacles, including limited space in urban areas and inefficient conventional sorting and harvesting methods. Using computer vision to choose hydroponic lettuce that is ready to be harvested and a robotic arm to bring the selected lettuce yields, the produced system aims to solve this problem. The system's computer vision algorithms include color space conversion, intensity transformation, and an algorithm for following borders. This project's research methodology combines a quantitative approach with an experimental approach, which consists of conducting several trials on the built system. Several trials demonstrated that the computer vision software was able to accomplish the specified objectives. The average success rate of the developed computer vision system is 93%, while the average success rate of the robot arm is 85%.
Development of Brain computer Interface (BCI) has been rapid since the mid 1990‘s. There are three criteria for BCI, (i) comfortability and possession of a suitable signal acquisition device, (ii) system validation a...
Development of Brain computer Interface (BCI) has been rapid since the mid 1990‘s. There are three criteria for BCI, (i) comfortability and possession of a suitable signal acquisition device, (ii) system validation and dissemination, and (iii) reliability and potentiality. As there are no BCI possessing the optimal criteria, it was essential to consider building a new one. Thereby, the paper investigates building BCI based on the utilization of EEG signals to translate brainwave patterns into actionable commands. The primary objective is to enhance communication capabilities for individuals afflicted with neurological disorders, empowering them to command external devices and engage more effectively with their surroundings. We built our model on EEG online dataset for the purpose of feature extraction and classification. Statistical features and Discrete Wavelet Transform (DWT) have been applied for feature selection. Multi-Layer Perceptron (MLP) and Radial Basis Function (RBF) were the classifiers involved. Results showed that the proposed architecture of MLP and RBF were able to classify the EEG signals into two classes (open eye and closed eye). Results also showed that the proposed approach, which is based on the combination of statistical features and DWT for features selection using AF3 and AF4 channels by the application of MLP, has 98% succession rate. BCI system based on Arduino circuit has been built after the classification Further algorithms and system evaluation need to be considered as future work.
Successful detection of Out-of-Distribution (OoD) data is becoming increasingly important to ensure safe deployment of neural networks. One of the main challenges in OoD detection is that neural networks output overco...
Successful detection of Out-of-Distribution (OoD) data is becoming increasingly important to ensure safe deployment of neural networks. One of the main challenges in OoD detection is that neural networks output overconfident predictions on OoD data, make it difficult to determine OoD-ness of data solely based on their predictions. Outlier exposure addresses this issue by introducing an additional loss that encourages low-confidence predictions on OoD data during training. While outlier exposure has shown promising potential in improving OoD detection performance, all previous studies on outlier exposure have been limited to utilizing visual outliers. Drawing inspiration from the recent advancements in vision-language pre-training, this paper venture out to the uncharted territory of textual outlier exposure. First, we uncover the benefits of using textual outliers by replacing real or virtual outliers in the image-domain with textual equivalents. Then, we propose various ways of generating preferable textual outliers. Our extensive experiments demonstrate that generated textual outliers achieve competitive performance on large-scale OoD and hard OoD benchmarks. Furthermore, we conduct empirical analyses of textual outliers to provide primary criteria for designing advantageous textual outliers: near-distribution, descriptiveness, and inclusion of visual semantics. Code is available at https://***/wiarae/TOE
Several recent studies have elucidated why knowledge distillation (KD) improves model performance. However, few have researched the other advantages of KD in addition to its improving model performance. In this study,...
详细信息
We introduce a lightweight and accurate localization method that only utilizes the geometry of 2D-3D lines. Given a pre-captured 3D map, our approach localizes a panorama image, taking advantage of the holistic 360...
详细信息
ISBN:
(数字)9798350353006
ISBN:
(纸本)9798350353013
We introduce a lightweight and accurate localization method that only utilizes the geometry of 2D-3D lines. Given a pre-captured 3D map, our approach localizes a panorama image, taking advantage of the holistic 360° view. The system mitigates potential privacy breaches or domain discrepancies by avoiding trained or hand-crafted visual descriptors. However, as lines alone can be ambiguous, we express distinctive yet compact spatial contexts from relationships between lines, namely the dominant directions of parallel lines and the intersection between non-parallel lines. The resulting representations are efficient in processing time and memory compared to conventional visual descriptor-based methods. Given the groups of dominant line directions and their intersections, we accelerate the search process to test thousands of pose candidates in less than a millisecond without sacrificing accuracy. We empirically show that the proposed 2D-3D matching can localize panoramas for challenging scenes with similar structures, dramatic domain shifts or illumination changes. Our fully geometric approach does not involve extensive parameter tuning or neural network training, making it a practical algorithm that can be readily deployed in the real world. Project page including the code is available through this link: https://***/fgpl/.
Describing the semantic content of an image via natural language, known as image captioning, has recently attracted substantial interest in computer vision and language processing communities. Current image captioning...
详细信息
暂无评论