Surgical instrument tip detection is an important component in computer-Assisted Laparoscopy, that can be practically applied to tasks such as tracking tool tips, assessing surgeon skills, and more. Current frameworks...
详细信息
With the advancement of information technology, computer-Assisted Pronunciation Training (CAPT) has become an effective method for non-native(L2) speakers to learn foreign language pronunciation. However, existing aut...
详细信息
ISBN:
(纸本)9789819620531;9789819620548
With the advancement of information technology, computer-Assisted Pronunciation Training (CAPT) has become an effective method for non-native(L2) speakers to learn foreign language pronunciation. However, existing automatic pronunciation quality assessment methods have not fully leveraged the inter-granularity relationships and lack further extraction of contextual features at each granularity. To address these issues, this paper proposes Bfhaformer. Bfhaformer employs an LSTM-augmented BranchFormer encoder for encoding GOP features and reference phoneme features. Compared to Transformer encoders, the BranchFormer encoder introduces parallel branch structures, which enhances the capture of local features while retaining global feature information. Additionally, this paper aggregates features across different granularities within a hierarchical model structure. By aggregating and suprasegmental feature fusion of the encoded features at pronunciation granularity such as word level and utterance level, better attention is paid to local information at the current granularity and contextual hierarchical relationships. Experiments on the publicly available Speechocean762 dataset demonstrate that our proposed method significantly improves all metrics at all granularities compared to the baseline models.
Contrastively pretrained vision-language models (VLMs) such as CLIP have shown impressive zero-shot classification performance without any classification-specific training. They create a common embedding space by cont...
详细信息
ISBN:
(纸本)9789819620708;9789819620715
Contrastively pretrained vision-language models (VLMs) such as CLIP have shown impressive zero-shot classification performance without any classification-specific training. They create a common embedding space by contrastively pretraining an image and a text encoder to align positive image-text pairs and repel negative pairs. Then zero-shot classification of an image can be performed by measuring the cosine similarities between the image embedding and embeddings of texts that describe the classes. However, relevant works do not address the scenario in which few image examples for some (not all) classes are available. In this novel task which we term variable-shot (v-shot) classification, these models fail due to the embedding space modality gap, i.e. the fact that image-to-image similarities are higher than image-to-text ones. To this end, we propose to enable v-shot capabilities in pre-trained VLMs with minimal training complexity by re-projecting embeddings of frozen pre-trained image encoders using a shallow network, RectNet, which we train both with the standard CLIP contrastive loss function, as well as a novel modality alignment loss function specifically constructed to bridge the modality gap. Finally, we introduce three v-shot classification benchmarks, on which the proposed architecture achieves 32.22%, 29.58% and 45.15% increases in top-1 classification accuracy respectively.
The proceedings contain 52 papers. The special focus in this conference is on Hybrid Artificial Intelligence Systems. The topics include: The Third Codon Nucleotide’s Role in Genetic Recombination Within SARS-Co...
ISBN:
(纸本)9783031741852
The proceedings contain 52 papers. The special focus in this conference is on Hybrid Artificial Intelligence Systems. The topics include: The Third Codon Nucleotide’s Role in Genetic Recombination Within SARS-CoV-2 Spike Protein: A Pilot Study;machine Learning for the Identification of Biomarker and Risk Factors associated with Depression in Adult Population: Preliminary Results on a Small Cohort;Unveiling HIV-1 U Sequences: Shedding Light Through Transfer Learning on Genomic Spectrograms;model to Early Detection of Autism Spectrum Disorder Through Opinion Mining Approach;A Short Analysis of Hybrid Approaches in COVID‑19 for Detection and Diagnosing;A Graph Neural network with Multi-head Attention for Universal Brain Disease Diagnosis from fMRI Images;a computer Vision Approach to Detect Facial Characteristics Related to Encephalopathy in Term Infants;SPADE Norms: a Distributed General Framework for Normative Multi-agent Systems;Finding Optimal Classroom Arrangements to Minimize Cheating in Exams Using a Hybrid AI System;resilience to the Flowing Unknown: An Open Set Recognition Framework for Data Streams;a Comparison Procedure for the Evaluation of Metaheuristics;Nyström and RFF Ensembles for Large-Scale Kernel Predictions;application of Transfer Learning to Online Models in Malware Detection;a New Training Algorithm for Support Vector Machines;soft Adaptive Segments for Bio-Inspired Temporal Memory;assessing Generative Artificial Intelligence in Fundamental Physics with Gaussian Processes;implementation of Classical Decision Trees in a Quantum Computing Paradigm;machine Learning Methods as Robust Quantum Noise Estimators;The Impact of Data Annotations on the Performance of Object Detection Models in Icon Detection for GUI Images;bayesian Model Selection Pruning in Predictive Maintenance;neonates Crying Detection Through Feature Extraction and Machine Learning Methods;differentiable Prototypes with Distributed Memory network for Continual Learning;efficient Continu
Dermatoscopy is a common diagnostic tool used by dermatologists to examine skin lesions. However, existing classification methods often rely on spatial domain feature extraction. Dermatoscopic images are frequently af...
详细信息
The low resolution (LR) problem is rather challenging in face analysis. Most existing face hallucination methods assume that LR face images have only one resolution, but multiple resolutions may be available from diff...
详细信息
In recent years, the building sector has experienced an increasing legislative pressure to reduce the energy consumption. This has created a global need for affordable building management systems (BMS) in areas such a...
详细信息
ISBN:
(纸本)9783031747373;9783031747380
In recent years, the building sector has experienced an increasing legislative pressure to reduce the energy consumption. This has created a global need for affordable building management systems (BMS) in areas such as lighting-, temperature-, air quality monitoring and control. BMS uses 2D and 3D building representations to visualize various aspects of building operations. Today the creation of these visual building representations relies on labor-intensive and costly computer-aided design (CAD) processes. Hence, to create affordable BMS there is an urgent need to develop methods for cost-effective automatic creation of visual building representations. This paper introduces an automatic, metadata-driven method for constructing building visualizations using metadata from existing smart building infrastructure. The method presented in this study utilizes a Velocity Verlet integration-based physics particle simulation that uses metadata to define the force dynamics within the simulation. This process generates an abstract point cloud representing the organization of BMS components into building zones. The developed system was tested in two buildings of respectively 2,560 m(2) and 18,000 m(2). The method successfully produced visual building representations based on the available metadata, demonstrating its feasibility and cost-effectiveness.
To date, no flexible silicon solar cell capable of directly converting visible light into electrical current has been developed for use in retinal prosthetic devices. In this study, we successfully fabricated silicon ...
详细信息
Deep unfolding compressive sensing (CS) has experienced remarkable advancements. However, there still exist two challenges: (1) Many algorithms either use uniform block-based sampling, which ignore the fact that the c...
详细信息
This paper explores the software of virtual sign processing (DSP) strategies to characterize and examine the effects of electromagnetic compatibility (EMC) in network packages. The paper discusses the fundamentals of ...
详细信息
暂无评论