Vision language models (VLMs) have achieved impressive progress in diverse applications, becoming a prevalent research direction. In this paper, we build FIRE, a feedback-refinement dataset, consisting of 1.1M multi-t...
ISBN:
(纸本)9798331314385
Vision language models (VLMs) have achieved impressive progress in diverse applications, becoming a prevalent research direction. In this paper, we build FIRE, a feedback-refinement dataset, consisting of 1.1M multi-turn conversations that are derived from 27 source datasets, empowering VLMs to spontaneously refine their responses based on user feedback across diverse tasks. To scale up the data collection, FIRE is collected in two components: FIRE-100K and FIRE-1M, where FIRE-100K is generated by GPT-4V, and FIRE-1M is freely generated via models trained on FIRE-100K. Then, we build FIRE-Bench, a benchmark to comprehensively evaluate the feedback-refining capability of VLMs, which contains 11K feedback-refinement conversations as the test data, two evaluation settings, and a model to provide feedback for VLMs. We develop the FIRE-LLaVA model by fine-tuning LLaVA on FIRE-100K and FIRE-1M, which shows remarkable feedback-refining capability on FIRE-Bench and outperforms untrained VLMs by 50%, making more efficient user-agent interactions and underscoring the significance of the FIRE dataset.
Faced with the evolving attacks in recommender systems, many detection features have been proposed by human engineering and used in supervised or unsupervised detection methods. However, the detection features extract...
详细信息
Faced with the evolving attacks in recommender systems, many detection features have been proposed by human engineering and used in supervised or unsupervised detection methods. However, the detection features extracted by human engineering are usually aimed at some specific types of attacks. To further detect other new types of attacks, the traditional methods have to re-extract detection features with high knowledge cost. To address these limitations, the method for automatic extraction of robust features is proposed and then an Adaboost-based detection method is presented. Firstly, to obtain robust representation with prior knowledge, unlike uniform corruption rate in traditional mLDA(marginalized Linear Denoising Autoencoder), different corruption rates for items are calculated according to the ratings’ distribution. Secondly, the ratings sparsity is used to weight the mapping matrix to extract low-dimensional representation. Moreover, the uniform corruption rate is also set to the next layer in mSLDA(marginalized Stacked Linear Denoising Autoencoder) to extract the stable and robust user features. Finally, under the robust feature space, an Adaboost-based detection method is proposed to alleviate the imbalanced classification problem. Experimental results on the Netflix and Amazon review datasets indicate that the proposed method can effectively detect various attacks.
Consensus in multi-agent dynamical systems is prone to be sabotaged by the adversary, which has attracted much attention due to its key role in broad applications. In this paper, we study a new false data injection (F...
详细信息
Due to the mobility and frequent disconnections, the correctness of mobile interaction systems, such as mobile robot systems and mobile payment systems, are often difficult to analyze. This paper introduces three crit...
详细信息
This paper proposes regenerative particle Thompson sampling (RPTS), a flexible variation of Thompson sampling. Thompson sampling itself is a Bayesian heuristic for solving stochastic bandit problems, but it is hard to...
详细信息
This paper proposes the improved indoor human detection algorithm from data provided by W-Band FMCW radar by using k-mean technique. The data provided by the radar contain both humans and nun-human objects. To identif...
详细信息
ISBN:
(纸本)9781728195841;9781728195858
This paper proposes the improved indoor human detection algorithm from data provided by W-Band FMCW radar by using k-mean technique. The data provided by the radar contain both humans and nun-human objects. To identify non-human objects, the obtained data are filtered by the velocity and RCS of the detected objects. The K-means algorithm is applied to the filtered dataset to estimate the positions of humans. The estimated positions of humans are much closed to the actual ones.
The stability of multi-vendor, multi-terminal HVDC systems can be analyzed in frequency domain by black-box impedance models using the generalized Nyquist stability criterion. Based on the impedance stability analysis...
详细信息
High Bandwidth Memory (HBM) provides massive aggregated memory bandwidth by exposing multiple memory channels to the processing units. To achieve high performance, an accelerator built on top of an FPGA configured wit...
详细信息
The cache can improve the DSP processor's access speed to external memory and solve the “Storage Wall” problem. Designing an efficient and flexible cache module plays an vital role in improving the memory access...
详细信息
ISBN:
(纸本)9781665432078
The cache can improve the DSP processor's access speed to external memory and solve the “Storage Wall” problem. Designing an efficient and flexible cache module plays an vital role in improving the memory access efficiency and overall performance of the DSP. Based on the analysis of DSP storage level design requirements, combined with the actual pipeline structure of the DSP processor SWIFT independently developed by our laboratory, in this paper, we design and implement first- level instruction and data cache. For L1 D-Cache, the relevant conflict detection and processing module is designed to support four parallel Load/Store access requests. The size of the cache can be flexibly configured according to actual application requirements to achieve a balance between power consumption and latency. To write back the modified data left in the cache before the cache size is changed, the Cacheclean instruction is designed. Finally, module-level functional verification and logic synthesis are carried out. The results show that the cache function meets expectations, and the critical path after logic synthesis optimization meets the requirement of 1GHz frenuency.
Ultrathin flat metalenses have emerged as promising alternatives to conventional diffractive lenses,offering new possibilities for myriads of miniaturization and interfacial ***-based materials can achieve both phase ...
详细信息
Ultrathin flat metalenses have emerged as promising alternatives to conventional diffractive lenses,offering new possibilities for myriads of miniaturization and interfacial ***-based materials can achieve both phase and amplitude modulations simultaneously at a single position due to the modification of the complex refractive index and thickness by laser conversion from graphene oxide into graphene like *** this work,we develop graphene oxide metalenses to precisely control phase and amplitude modulations and to achieve a holistic and systematic lens design based on a graphene-based material *** experimentally validate our strategies via demonstrations of two graphene oxide metalenses:one with an ultra-long(~16λ)optical needle,and the other with axial multifocal spots,at the wavelength of 632.8 nm with a 200 nm thin *** proposed graphene oxide metalenses unfold unprecedented opportunities for accurately designing graphene-based ultrathin integratable devices for broad applications.
暂无评论