Vehicle reidentification (reID) is a critical computer vision task with applications in video surveillance and autonomous vehicles. While significant progress has been made in recent years, domain generalization (DG) ...
详细信息
Vehicle reidentification (reID) is a critical computer vision task with applications in video surveillance and autonomous vehicles. While significant progress has been made in recent years, domain generalization (DG) in reID remains a challenging and valuable research direction. Learning discriminative features that capture the intrinsic characteristics of vehicles, rather than domain-specific details, is paramount in addressing the domain shift problem, which encompasses disparities in data distribution, feature distribution, and label distribution. Recently, contrastive language image pretraining (CLIP) has attracted widespread attention because of its capacity to generalize knowledge across different domains or contexts. When fine-tuned for DG tasks, it can leverage this broad knowledge to perform well in domains or on tasks it has not specifically seen during training. The foremost work in this context is CLIP-reID, showcasing outstanding experimental performance on vehicle datasets through the integration of learnable prompts. However, the process of acquiring learnable prompts inevitably incorporates noisy text descriptions, such as background and camera style information, resulting in its limitations in DG tasks. To address this distinctive issue, we propose a CLIP-based Image-Redundant Separation (CIRS) framework to remove redundant domain-specific information and then implement visual-text alignment of CLIP. Specifically, we employ a classic variational autoencoder for image reconstruction, which can encourage the images generated by the vector quantized-variational autoencoder (VQ-VAE) network to contain features unrelated to vehicle IDs. Under the precise guidance of the image-redundant separation framework, a set of generalizable and learnable prompts for each vehicle can be effectively generated for reID. Extensive experimental results indicate that our method has achieved remarkable performance on several public datasets.
The mechanical behavior of composite interface can be influenced by multiple factors, including the morphological roughness, the structure of coating interphase, and the temperature. Here, high-throughput molecular dy...
详细信息
The mechanical behavior of composite interface can be influenced by multiple factors, including the morphological roughness, the structure of coating interphase, and the temperature. Here, high-throughput molecular dynamics (MD) simulations are carried out to investigate the entangled effects of these factors on the shear stiffness G, the friction coefficient mu, the debonding strain is an element of(d) and stress T-d, of SiCf/SiC interface. We find that G is maximized by small roughness and high temperature for the optimal chemical bonding effect;mu and.d are maximized by large roughness and low temperature, taking advantage of the mechanical interlocking effect while avoiding cusp softening;T-d demonstrates two local maxima which result from the competition between chemical bonding and mechanical interlocking. Provided the MD simulation results, a variational autoencoder (VAE) model is proposed to design the microstructure of SiCf/SiC interface for desired shear properties. According to the validations, the VAE-predicted interfacial configuration demonstrates highly similar shear properties to the reference one, justifying its potential for the microstructure design of composite interface. The results of this work can be employed to facilitate the development of SiCf/SiC composite by taking advantage of the synergistic effects of multiple designable factors.
The modern digital environment is becoming increasingly interconnected, underscoring the critical need to safeguard network infrastructures. Detecting anomalies in network traffic remains essential as cyber threats co...
详细信息
The modern digital environment is becoming increasingly interconnected, underscoring the critical need to safeguard network infrastructures. Detecting anomalies in network traffic remains essential as cyber threats continue to evolve. Analyzing trends, patterns, and relationships in network traffic data over time poses challenges. On the other hand, traditional generative neural networks emphasize detecting network attacks but encounter difficulties due to limitations in capturing the temporal and dynamic aspects of network traffic. This paper introduces a new methodology aimed at enhancing the identification of irregularities in network traffic using a Temporal Metric-Driven GRU Embedded Generative Neural Network (TMG-GRU-VAE). This method incorporates Gated Recurrent Units (GRU) into variational autoencoders to effectively train on the temporal characteristics of network traffic in temporal sequential networks. Moreover, we present a Temporal Correlation Index (TCI) score designed for anomaly detection in Network Intrusion Detection Systems (NIDS). This innovative metric offers a sophisticated and dynamic assessment of temporal behavior within network traffic. TCI's ability to distinguish between normal and anomalous temporal patterns plays a pivotal role in mitigating false positives. Our proposed method greatly improves the detection of small changes in abnormal sequences over time, enhancing accuracy by making anomalies stand out more clearly and reducing false alarms, thereby making the system more reliable. The proposed work, validated using the CIC-IDS-2017 and CIC-IDS-2018 datasets, demonstrates a significant decrease in False Positives (FP) across all models. Notable improvements range from 7.2% to 12.9% for the CIC-IDS-2017 dataset and from 7.1% to 14.1% for the CIC-IDS-2018 dataset. This highlights its significant impact on decreasing false positive rates.
Human brain vision is mysterious and complex, and it interprets the world through the connection between the brain and the eyes. In recent years, several methods have relied on fMRI to successfully reconstruct visual ...
详细信息
Human brain vision is mysterious and complex, and it interprets the world through the connection between the brain and the eyes. In recent years, several methods have relied on fMRI to successfully reconstruct visual images from human brain activity. However, these reconstruction methods focus more on the semantics of the reconstruction image and lack attention to the image structure and foreground targets. To alleviate this problem, we propose a diffusion model-based image reconstruction architecture (Mind-Bridge) that utilizes fMRI to reconstruct visual images from human brain activity. Specifically, we first develop a novel Depth Structure variational autoencoder (DSVAE) to capture image structural information at the initial stage. To obtain more foreground target information, we further introduce Edge estimation through the edge detection operator. In addition, we utilize Contrastive Language Image Pre-training (CLIP) text and image encoders as image and text prompt conditions for visual reconstruction. Finally, our proposed Mind-Bridge utilizes the Versatile Diffusion (VD) to fuse different stages of image information for visual images reconstruction. Qualitative and quantitative analysis results on the challenging Natural Scene Dataset (NSD) show that our proposed Mind-Bridge is effective.
Efficient channel state information (CSI) compression and feedback from user equipment to the base station (BS) are crucial for achieving the promised capacity gains in massive multiple-input multiple-output (MIMO) sy...
详细信息
Efficient channel state information (CSI) compression and feedback from user equipment to the base station (BS) are crucial for achieving the promised capacity gains in massive multiple-input multiple-output (MIMO) systems. Deep autoencoder (AE)-based schemes have been proposed to improve the efficiency of CSI compression and feedback. However, existing AE-based schemes suffer from critical issues in both CSI dimensionality reduction and latent feature quantization. In this paper, we propose a novel hierarchical sparse AE for efficient CSI compression and feedback for the 5G-NR fixed-length CSI feedback mechanism. Our approach employs a two-tier AE structure to jointly compress the sparse CSI latent feature and its side information. Additionally, we utilize a model-assisted Bayesian Rate-Distortion approach to train the weights of the AE. Specifically, the training loss function is formulated based on the variational Bayesian inference framework given a parametric Bernoulli Laplace Mixture prior model and a sparsity-inducing likelihood model. Furthermore, we propose a model-assisted adaptive coding algorithm to quantize the latent feature under the fixed codeword bit length constraint. Our experimental results demonstrate that the proposed solution outperforms existing AE-based schemes under various feedback budgets.
Challenges arise in accessing archived signal outputs due to proprietary software limitations. There is a notable lack of exploration in open-source mandibular EMG signal conversion for continuous access and analysis,...
详细信息
Challenges arise in accessing archived signal outputs due to proprietary software limitations. There is a notable lack of exploration in open-source mandibular EMG signal conversion for continuous access and analysis, hindering tasks such as pattern recognition and predictive modelling for temporomandibular joint complex function. To Develop a workflow to extract normalised signal parameters from images of mandibular muscle EMG and identify optimal clustering methods for quantifying signal intensity and activity durations. A workflow utilising OpenCV, variational encoders and Neurokit2 generated and augmented 866 unique EMG signals from jaw movement exercises. k-means, GMM and DBSCAN were employed for normalisation and cluster-centric signal processing. The workflow was validated with data collected from 66 participants, measuring temporalis, masseter and digastric muscles. DBSCAN (0.35 to 0.54) and GMM (0.09 to 0.24) exhibited lower silhouette scores for mouth opening, anterior protrusion and lateral excursions, while K-means performed best (0.10 to 0.11) for temporalis and masseter muscles during chewing activities. The current study successfully developed a deep learning workflow capable of extracting normalised signal data from EMG images and generating quantifiable parameters for muscle activity duration and general functional intensity.
Previous methods for human motion generation have predominantly relied on skeleton representations to depict human poses and motion. These methods typically use a series of skeletons to represent the motion of a human...
详细信息
Previous methods for human motion generation have predominantly relied on skeleton representations to depict human poses and motion. These methods typically use a series of skeletons to represent the motion of a human. However, they are not directly suitable for handling the 3D point cloud sequences obtained from optical motion capture. To address this limitation, we propose a novel network called point cloud motion generation (PCMG) that can handle both skeleton-based motion representation and point cloud data from the human surface. PCMG is trained on finite point cloud sequences and is capable of generating infinite new point cloud sequences. By providing a predefined action label and shape label as input, PCMG generates a point cloud sequence that captures the semantics associated with these labels. PCMG achieves comparable results to state-of-the-art methods for action-conditional human motion generation, while outperforming previous approaches in terms of generation efficiency. The code for PCMG will be available at https://***/gxucg/PCMG
Speech synthesis systems powered by neural networks hold promise for multimedia production, but frequently face issues with producing expressive speech and seamless editing. In response, we present the Cross-Utterance...
详细信息
Speech synthesis systems powered by neural networks hold promise for multimedia production, but frequently face issues with producing expressive speech and seamless editing. In response, we present the Cross-Utterance Conditioned variational autoencoder speech synthesis (CUC-VAE S2) framework to enhance prosody and ensure natural speech generation. This framework leverages the powerful representational capabilities of pre-trained language models and the re-expression abilities of variational autoencoders (VAEs). The core component of the CUC-VAE S2 framework is the cross-utterance CVAE, which extracts acoustic, speaker, and textual features from surrounding sentences to generate context-sensitive prosodic features, more accurately emulating human prosody generation. We further propose two practical algorithms tailored for distinct speech synthesis applications: CUC-VAE TTS for text-to-speech and CUC-VAE SE for speech editing. The CUC-VAE TTS is a direct application of the framework, designed to generate audio with contextual prosody derived from surrounding texts. On the other hand, the CUC-VAE SE algorithm leverages real mel spectrogram sampling conditioned on contextual information, producing audio that closely mirrors real sound and thereby facilitating flexible speech editing based on text such as deletion, insertion, and replacement. Experimental results on the LibriTTS datasets demonstrate that our proposed models significantly enhance speech synthesis and editing, producing more natural and expressive speech.
Microwave imaging is a promising method for early diagnosing and monitoring brain strokes. It is portable, non-invasive, and safe to the human body. Conventional techniques solve for unknown electrical properties repr...
详细信息
Microwave imaging is a promising method for early diagnosing and monitoring brain strokes. It is portable, non-invasive, and safe to the human body. Conventional techniques solve for unknown electrical properties represented as pixels or voxels, but often result in inadequate structural information and high computational costs. We propose to reconstruct the three dimensional (3D) electrical properties of the human brain in a feature space, where the unknowns are latent codes of a variational autoencoder (VAE). The decoder of the VAE, with prior knowledge of the brain, acts as a module of data inversion. The codes in the feature space are optimized by minimizing the misfit between measured and simulated data. A dataset of 3D heads characterized by permittivity and conductivity is constructed to train the VAE. Numerical examples show that our method increases structural similarity by 14% and speeds up the solution process by over 3 orders of magnitude using only 4.8% number of the unknowns compared to the voxel-based method. This high-resolution imaging of electrical properties leads to more accurate stroke diagnosis and offers new insights into the study of the human brain.
Intelligent fault diagnosis methods have gained much attention in industry. An important premise of these methods is that the training and test data maintain the same set of fault classes, known as the closed-set hypo...
详细信息
Intelligent fault diagnosis methods have gained much attention in industry. An important premise of these methods is that the training and test data maintain the same set of fault classes, known as the closed-set hypothesis, which, however, cannot be guaranteed in fault diagnosis tasks. This can result in potentially unknown faults being incorrectly randomly classified as a known fault (KF) class. To overcome this problem, we introduce open-set recognition and proposed an open fault semantic subspace-based open-set fault diagnosis and inference framework (OFS-FDI), which identifies unknown faults while completing the diagnosis of a known class of faults, and furthermore, inferring the possible fault type of unknown samples. First, a fault semantic subspace construction method is proposed to transform the original signal into a set of low-dimensional representation subsets conforming to a conditional Gaussian distribution, which is related to the fault semantic. Then, an outlier score is proposed to determine whether a sample is from an unknown class. Finally, a class directional index (CDI) metric is proposed to perform inferential analysis of possible fault classes. In the experiments based on two typical rotating machines, the outlier detection accuracy of OFS-FDI is improved by up to 0.04% compared with the comparative methods. The accuracy for unknown fault inference (UFI) is up to 99.65%.
暂无评论