检索结果-内蒙古大学图书馆

Generalizable Prompts Guided by Image-Redundant Separation for Vehicle Reidentification

IEEE INTERNET OF THINGS JOURNAL 2025年第1期12卷 698-712页

作者： Kuang, Zhenyu Cheng, Lidong Zhang, Hongyang Huang, Yue Ding, Xinghao Xiamen Univ Sch Informat Minist Educ China Key Lab Multimedia Trusted Percept & Efficient Com Xiamen 361005 Peoples R China Xiamen Univ Inst Artificial Intelligence Minist Educ China Key Lab Multimedia Trusted Percept & Efficient Com Xiamen 361005 Peoples R China

Vehicle reidentification (reID) is a critical computer vision task with applications in video surveillance and autonomous vehicles. While significant progress has been made in recent years, domain generalization (DG) in reID remains a challenging and valuable research direction. Learning discriminative features that capture the intrinsic characteristics of vehicles, rather than domain-specific details, is paramount in addressing the domain shift problem, which encompasses disparities in data distribution, feature distribution, and label distribution. Recently, contrastive language image pretraining (CLIP) has attracted widespread attention because of its capacity to generalize knowledge across different domains or contexts. When fine-tuned for DG tasks, it can leverage this broad knowledge to perform well in domains or on tasks it has not specifically seen during training. The foremost work in this context is CLIP-reID, showcasing outstanding experimental performance on vehicle datasets through the integration of learnable prompts. However, the process of acquiring learnable prompts inevitably incorporates noisy text descriptions, such as background and camera style information, resulting in its limitations in DG tasks. To address this distinctive issue, we propose a CLIP-based Image-Redundant Separation (CIRS) framework to remove redundant domain-specific information and then implement visual-text alignment of CLIP. Specifically, we employ a classic variational autoencoder for image reconstruction, which can encourage the images generated by the vector quantized-variational autoencoder (VQ-VAE) network to contain features unrelated to vehicle IDs. Under the precise guidance of the image-redundant separation framework, a set of generalizable and learnable prompts for each vehicle can be effectively generated for reID. Extensive experimental results indicate that our method has achieved remarkable performance on several public datasets.

关键词： Context modeling Optimization Manuals Internet of Things Data models Contrastive language image pretraining (CLIP) domain generalization (DG) variational autoencoder vehicle reidentification (reID) Data models Contrastive language image pretraining (CLIP) domain generalization (DG) variational autoencoder vehicle reidentification (reID)

来源：评论

学校读者我要写书评

暂无评论

Shear behavior of SiCf/SiC interface under the thermo-chemo-mechanical influence and machine-learning-based interfacial microstructure design

引用

MODELLING AND SIMULATION IN MATERIALS SCIENCE AND ENGINEERING 2023年第5期31卷 055005-055005页

作者： Chen, Shaohua Xu, Nuo Nanjing Univ Aeronaut & Astronaut Coll Energy & Power Engn Nanjing 210016 Peoples R China Harbin Inst Technol Sch Mat Sci & Engn Harbin 150001 Peoples R China

The mechanical behavior of composite interface can be influenced by multiple factors, including the morphological roughness, the structure of coating interphase, and the temperature. Here, high-throughput molecular dynamics (MD) simulations are carried out to investigate the entangled effects of these factors on the shear stiffness G, the friction coefficient mu, the debonding strain is an element of(d) and stress T-d, of SiCf/SiC interface. We find that G is maximized by small roughness and high temperature for the optimal chemical bonding effect;mu and.d are maximized by large roughness and low temperature, taking advantage of the mechanical interlocking effect while avoiding cusp softening;T-d demonstrates two local maxima which result from the competition between chemical bonding and mechanical interlocking. Provided the MD simulation results, a variational autoencoder (VAE) model is proposed to design the microstructure of SiCf/SiC interface for desired shear properties. According to the validations, the VAE-predicted interfacial configuration demonstrates highly similar shear properties to the reference one, justifying its potential for the microstructure design of composite interface. The results of this work can be employed to facilitate the development of SiCf/SiC composite by taking advantage of the synergistic effects of multiple designable factors.

关键词： interface interphase interfacial microstructures shear behavior molecular dynamics variational autoencoder

来源：评论

学校读者我要写书评

暂无评论

Enhancing Network Traffic Anomaly Detection: Leveraging Temporal Correlation Index in a Hybrid Framework

引用

IEEE ACCESS 2024年 12卷 136805-136824页

作者： Fathima, A. H. Nasreen Ibrahim, S. P. Syed Khraisat, Ansam Vellore Inst Technol Sch Comp Sci & Engn Chennai Campus Chennai 600127 Tamil Nadu India Deakin Univ Sch Informat Technol Melbourne VIC 3008 Australia

The modern digital environment is becoming increasingly interconnected, underscoring the critical need to safeguard network infrastructures. Detecting anomalies in network traffic remains essential as cyber threats continue to evolve. Analyzing trends, patterns, and relationships in network traffic data over time poses challenges. On the other hand, traditional generative neural networks emphasize detecting network attacks but encounter difficulties due to limitations in capturing the temporal and dynamic aspects of network traffic. This paper introduces a new methodology aimed at enhancing the identification of irregularities in network traffic using a Temporal Metric-Driven GRU Embedded Generative Neural Network (TMG-GRU-VAE). This method incorporates Gated Recurrent Units (GRU) into variational autoencoders to effectively train on the temporal characteristics of network traffic in temporal sequential networks. Moreover, we present a Temporal Correlation Index (TCI) score designed for anomaly detection in Network Intrusion Detection Systems (NIDS). This innovative metric offers a sophisticated and dynamic assessment of temporal behavior within network traffic. TCI's ability to distinguish between normal and anomalous temporal patterns plays a pivotal role in mitigating false positives. Our proposed method greatly improves the detection of small changes in abnormal sequences over time, enhancing accuracy by making anomalies stand out more clearly and reducing false alarms, thereby making the system more reliable. The proposed work, validated using the CIC-IDS-2017 and CIC-IDS-2018 datasets, demonstrates a significant decrease in False Positives (FP) across all models. Notable improvements range from 7.2% to 12.9% for the CIC-IDS-2017 dataset and from 7.1% to 14.1% for the CIC-IDS-2018 dataset. This highlights its significant impact on decreasing false positive rates.

关键词： Telecommunication traffic Network intrusion detection Deep learning Anomaly detection Indexes Correlation Network security deep learning generative model network intrusion detection system network traffic security temporal correlation unknown attack variational autoencoder

来源：评论

学校读者我要写书评

暂无评论

Mind-bridge: reconstructing visual images based on diffusion model from human brain activity

引用

SIGNAL IMAGE AND VIDEO PROCESSING 2024年第SUPPL 1期18卷 953-963页

作者： Liu, Qing Zhu, Hongqing Chen, Ning Huang, Bingcang Lu, Weiping Wang, Ying East China Univ Sci & Technol Sch Informat Sci & Engn Shanghai 200237 Peoples R China Gongli Hosp Shanghai Pudong New Area Dept Radiol Shanghai 200135 Peoples R China Gongli Hosp Shanghai Pudong New Area Shanghai Hlth Commiss Sino French Cooperat Cent Lab Key Lab Artificial Intelligence AI Based Managemen Shanghai 200135 Peoples R China

Human brain vision is mysterious and complex, and it interprets the world through the connection between the brain and the eyes. In recent years, several methods have relied on fMRI to successfully reconstruct visual images from human brain activity. However, these reconstruction methods focus more on the semantics of the reconstruction image and lack attention to the image structure and foreground targets. To alleviate this problem, we propose a diffusion model-based image reconstruction architecture (Mind-Bridge) that utilizes fMRI to reconstruct visual images from human brain activity. Specifically, we first develop a novel Depth Structure variational autoencoder (DSVAE) to capture image structural information at the initial stage. To obtain more foreground target information, we further introduce Edge estimation through the edge detection operator. In addition, we utilize Contrastive Language Image Pre-training (CLIP) text and image encoders as image and text prompt conditions for visual reconstruction. Finally, our proposed Mind-Bridge utilizes the Versatile Diffusion (VD) to fuse different stages of image information for visual images reconstruction. Qualitative and quantitative analysis results on the challenging Natural Scene Dataset (NSD) show that our proposed Mind-Bridge is effective.

关键词： Image reconstruction Diffusion model Functional magnetic resonance imaging (fMRI) Visual decoding variational autoencoder

来源：评论

学校读者我要写书评

暂无评论

Bayesian Hierarchical Sparse autoencoder for Massive MIMO CSI Feedback

引用

IEEE TRANSACTIONS ON SIGNAL PROCESSING 2024年 72卷 3213-3227页

作者： Guo, Huayan Lau, Vincent K. N. Hong Kong Univ Sci & Technol Dept Elect & Comp Engn Kowloon Hong Kong 999077 Peoples R China

Efficient channel state information (CSI) compression and feedback from user equipment to the base station (BS) are crucial for achieving the promised capacity gains in massive multiple-input multiple-output (MIMO) systems. Deep autoencoder (AE)-based schemes have been proposed to improve the efficiency of CSI compression and feedback. However, existing AE-based schemes suffer from critical issues in both CSI dimensionality reduction and latent feature quantization. In this paper, we propose a novel hierarchical sparse AE for efficient CSI compression and feedback for the 5G-NR fixed-length CSI feedback mechanism. Our approach employs a two-tier AE structure to jointly compress the sparse CSI latent feature and its side information. Additionally, we utilize a model-assisted Bayesian Rate-Distortion approach to train the weights of the AE. Specifically, the training loss function is formulated based on the variational Bayesian inference framework given a parametric Bernoulli Laplace Mixture prior model and a sparsity-inducing likelihood model. Furthermore, we propose a model-assisted adaptive coding algorithm to quantize the latent feature under the fixed codeword bit length constraint. Our experimental results demonstrate that the proposed solution outperforms existing AE-based schemes under various feedback budgets.

关键词： Adaptation models Quantization (signal) Training Bayes methods Dimensionality reduction Decoding Rate-distortion Massive MIMO deep learning CSI feedback variational autoencoder

来源：评论

学校读者我要写书评

暂无评论

Deep learning and predictive modelling for generating normalised muscle function parameters from signal images of mandibular electromyography

引用

MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING 2024年第6期62卷 1763-1779页

作者： Farook, Taseef Hasan Haq, Tashreque Mohammed Ramees, Lameesa Dudley, James Univ Adelaide Adelaide Dent Sch Adelaide SA 5000 Australia

Challenges arise in accessing archived signal outputs due to proprietary software limitations. There is a notable lack of exploration in open-source mandibular EMG signal conversion for continuous access and analysis, hindering tasks such as pattern recognition and predictive modelling for temporomandibular joint complex function. To Develop a workflow to extract normalised signal parameters from images of mandibular muscle EMG and identify optimal clustering methods for quantifying signal intensity and activity durations. A workflow utilising OpenCV, variational encoders and Neurokit2 generated and augmented 866 unique EMG signals from jaw movement exercises. k-means, GMM and DBSCAN were employed for normalisation and cluster-centric signal processing. The workflow was validated with data collected from 66 participants, measuring temporalis, masseter and digastric muscles. DBSCAN (0.35 to 0.54) and GMM (0.09 to 0.24) exhibited lower silhouette scores for mouth opening, anterior protrusion and lateral excursions, while K-means performed best (0.10 to 0.11) for temporalis and masseter muscles during chewing activities. The current study successfully developed a deep learning workflow capable of extracting normalised signal data from EMG images and generating quantifiable parameters for muscle activity duration and general functional intensity.

关键词： Mastication Range of motion variational autoencoder Signal processing Clustering

来源：评论

学校读者我要写书评

暂无评论

PCMG:3D point cloud human motion generation based on self-attention and transformer

引用

VISUAL COMPUTER 2024年第5期40卷 3765-3780页

作者： Ma, Weizhao Yin, Mengxiao Li, Guiqing Yang, Feng Chang, Kan Guangxi Univ Sch Comp Elect & Informat 100 East Univ Rd Nanning 530004 Peoples R China Guangxi Univ Guangxi Key Lab Multimedia Commun Network Technol 100 East Univ Rd Nanning 530004 Peoples R China South China Univ Technol Sch Comp Sci & Engn 381 Wushan Rd Guangzhou 510006 Guangdong Peoples R China

Previous methods for human motion generation have predominantly relied on skeleton representations to depict human poses and motion. These methods typically use a series of skeletons to represent the motion of a human. However, they are not directly suitable for handling the 3D point cloud sequences obtained from optical motion capture. To address this limitation, we propose a novel network called point cloud motion generation (PCMG) that can handle both skeleton-based motion representation and point cloud data from the human surface. PCMG is trained on finite point cloud sequences and is capable of generating infinite new point cloud sequences. By providing a predefined action label and shape label as input, PCMG generates a point cloud sequence that captures the semantics associated with these labels. PCMG achieves comparable results to state-of-the-art methods for action-conditional human motion generation, while outperforming previous approaches in terms of generation efficiency. The code for PCMG will be available at https://***/gxucg/PCMG

关键词： Point cloud sequence generation Conditional human motion generation Transformer variational autoencoder

来源：评论

学校读者我要写书评

暂无评论

Cross-Utterance Conditioned VAE for Speech Generation

引用

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 2024年 32卷 4263-4276页

作者： Li, Yang Yu, Cheng Sun, Guangzhi Zu, Weiqin Tian, Zheng Wen, Ying Pan, Wei Zhang, Chao Wang, Jun Yang, Yang Sun, Fanglei Univ Manchester Dept Comp Sci Manchester M13 9PL England ShanghaiTech Univ Sch Creat & Art Shanghai 201210 Peoples R China Univ Cambridge Machine Intelligence Lab Cambridge CB2 1TN England Shanghai Jiao Tong Univ Sch Elect Informat & Elect Engn SEIEE Shanghai 200240 Peoples R China Tsinghua Univ Dept Elect Engn Beijing 100190 Peoples R China UCL Dept Speech Hearing & Phonet Sci London WC1E 6BT England UCL Dept Comp Sci London WC1E 6BT England Hong Kong Univ Sci & Technol Guangzhou Thrust Internet Things Guangzhou 511453 Peoples R China Univ Shanghai Sci & Technol Dept Comp Sci & Engn Shanghai 200093 Peoples R China

Speech synthesis systems powered by neural networks hold promise for multimedia production, but frequently face issues with producing expressive speech and seamless editing. In response, we present the Cross-Utterance Conditioned variational autoencoder speech synthesis (CUC-VAE S2) framework to enhance prosody and ensure natural speech generation. This framework leverages the powerful representational capabilities of pre-trained language models and the re-expression abilities of variational autoencoders (VAEs). The core component of the CUC-VAE S2 framework is the cross-utterance CVAE, which extracts acoustic, speaker, and textual features from surrounding sentences to generate context-sensitive prosodic features, more accurately emulating human prosody generation. We further propose two practical algorithms tailored for distinct speech synthesis applications: CUC-VAE TTS for text-to-speech and CUC-VAE SE for speech editing. The CUC-VAE TTS is a direct application of the framework, designed to generate audio with contextual prosody derived from surrounding texts. On the other hand, the CUC-VAE SE algorithm leverages real mel spectrogram sampling conditioned on contextual information, producing audio that closely mirrors real sound and thereby facilitating flexible speech editing based on text such as deletion, insertion, and replacement. Experimental results on the LibriTTS datasets demonstrate that our proposed models significantly enhance speech synthesis and editing, producing more natural and expressive speech.

关键词： Multimedia systems Neural networks Natural languages Production Speech enhancement Feature extraction Acoustics Text to speech Mirrors Spectrogram Pre-trained language model speech editing speech synthesis TTS variational autoencoder

来源：评论

学校读者我要写书评

暂无评论

Three Dimensional Microwave Data Inversion in Feature Space for Stroke Imaging

引用

IEEE TRANSACTIONS ON MEDICAL IMAGING 2024年第4期43卷 1365-1376页

作者： Guo, Rui Lin, Zhichao Xin, Jingyu Li, Maokun Yang, Fan Xu, Shenheng Abubakar, Aria Tsinghua Univ Beijing Natl Res Ctr Informat Sci & Technol BNRist Dept Elect Engn Beijing 100084 Peoples R China SLB Houston TX 77056 USA

Microwave imaging is a promising method for early diagnosing and monitoring brain strokes. It is portable, non-invasive, and safe to the human body. Conventional techniques solve for unknown electrical properties represented as pixels or voxels, but often result in inadequate structural information and high computational costs. We propose to reconstruct the three dimensional (3D) electrical properties of the human brain in a feature space, where the unknowns are latent codes of a variational autoencoder (VAE). The decoder of the VAE, with prior knowledge of the brain, acts as a module of data inversion. The codes in the feature space are optimized by minimizing the misfit between measured and simulated data. A dataset of 3D heads characterized by permittivity and conductivity is constructed to train the VAE. Numerical examples show that our method increases structural similarity by 14% and speeds up the solution process by over 3 orders of magnitude using only 4.8% number of the unknowns compared to the voxel-based method. This high-resolution imaging of electrical properties leads to more accurate stroke diagnosis and offers new insights into the study of the human brain.

关键词： Microwave brain imaging stroke imaging deep learning variational autoencoder data inversion

来源：评论

学校读者我要写书评

暂无评论

Open-Set Fault Recognition and Inference for Rolling Bearing Based on Open Fault Semantic Subspace

引用

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT 2024年 73卷 1页

作者： Chen, Yu Tao, Laifa Liu, Xue Ma, Jian Lu, Chen Liu, Hongmei Beihang Univ Sch Reliabil & Syst Engn Beijing 100191 Peoples R China

Intelligent fault diagnosis methods have gained much attention in industry. An important premise of these methods is that the training and test data maintain the same set of fault classes, known as the closed-set hypothesis, which, however, cannot be guaranteed in fault diagnosis tasks. This can result in potentially unknown faults being incorrectly randomly classified as a known fault (KF) class. To overcome this problem, we introduce open-set recognition and proposed an open fault semantic subspace-based open-set fault diagnosis and inference framework (OFS-FDI), which identifies unknown faults while completing the diagnosis of a known class of faults, and furthermore, inferring the possible fault type of unknown samples. First, a fault semantic subspace construction method is proposed to transform the original signal into a set of low-dimensional representation subsets conforming to a conditional Gaussian distribution, which is related to the fault semantic. Then, an outlier score is proposed to determine whether a sample is from an unknown class. Finally, a class directional index (CDI) metric is proposed to perform inferential analysis of possible fault classes. In the experiments based on two typical rotating machines, the outlier detection accuracy of OFS-FDI is improved by up to 0.04% compared with the comparative methods. The accuracy for unknown fault inference (UFI) is up to 99.65%.

关键词： Fault diagnosis open-set recognition rolling bearing unknown fault inference (UFI) variational autoencoder

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：