Fine-tuning pre-trained language models, such as BERT, has shown enormous success among various NLP tasks. Though simple and effective, the process of fine-tuning has been found unstable, which often leads to unexpect...
详细信息
Fine-tuning pre-trained language models, such as BERT, has shown enormous success among various NLP tasks. Though simple and effective, the process of fine-tuning has been found unstable, which often leads to unexpected poor performance. To increase stability and generalizability, most existing works resort to maintaining the parameters or representations of pre-trained models during fine-tuning. Nevertheless, very little work explores mining the reliable part of pre-learned information that can help to stabilize fine-tuning. To address this challenge, we introduce a novel solution in which we fine-tune BERT with stabilized cross-layer mutual information. Our method aims to preserve the reliable behaviors of cross-layer information propagation, instead of preserving the information itself, of the pre-trained model. Therefore, our method circumvents the domain conflicts between pre-trained and target tasks. We conduct extensive experiments with popular pre-trained BERT variants on NLP datasets, demonstrating the universal effectiveness and robustness of our method.
A new approach for color image segmentation is proposed based on Kuramoto model in this paper. Firstly, the classic Kuramoto model which describes a global coupled oscillator network is changed to be one that is local...
详细信息
A new approach for color image segmentation is proposed based on Kuramoto model in this paper. Firstly, the classic Kuramoto model which describes a global coupled oscillator network is changed to be one that is locally coupled to simulate the neuron activity in visual cortex and to describe the influence for phase changing by external stimuli. Secondly, a rebuilt method of coupled neuron activities is proposed by introducing and computing instantaneous frequency. Three oscillating curves corresponding to the pixel values of R, G, B for color image are formed by the coupled network and are added up to produce the superposition of oscillation. Finally, color images are segmented according to the synchronization of the oscillating superposition by extracting and checking the frequency of the oscillating curves. The performance is compared with that from other representative segmentation approaches.
The distribution of cardinalities of zero-sum sets in abelian groups is completely determined. A summation involving the Möbius function is given for the general abelian group, while in many special cases, includ...
详细信息
This paper presents OmniVL, a new foundation model to support both image-language and video-language tasks using one universal architecture. It adopts a unified transformer-based visual encoder for both image and vide...
详细信息
With the development of data acquisition technology, large amounts of multi-channel data are collected and widely used in many fields. Most of them, such as RGB images and vector fields, can be expressed as different ...
详细信息
Classical finite-difference time-domain (FDTD) method has been widely used in computational electromagnetics, but for electrically large domains and for late-time analysis, FDTD method begins to show its limitations d...
详细信息
Classical finite-difference time-domain (FDTD) method has been widely used in computational electromagnetics, but for electrically large domains and for late-time analysis, FDTD method begins to show its limitations due to the accumulation of phase errors. To solve this problem, several methods have been proposed such as high-order schemes and four-stage Runge-Kutta integrator. Recently, the symplectic methods have been adopted for using in computational electromagnetics. In this paper, the concentration is on the derivation of an optimized fourth-order symplectic scheme in electromagnetic simulations.
Anomaly detection and localization are widely used in industrial manufacturing for its efficiency and effectiveness. Anomalies are rare and hard to collect and supervised models easily over-fit to these seen anomalies...
详细信息
Video diffusion models are able to generate high-quality videos by learning strong spatial-temporal priors on large-scale datasets. In this paper, we aim to investigate whether such priors derived from a generative pr...
详细信息
The diffusion model is widely leveraged for either video generation or video editing. As each field has its task-specific problems, it is difficult to merely develop a single diffusion for completing both tasks simult...
暂无评论