Despite the effectiveness of vision-language supervised fine-tuning in enhancing the performance of vision large language models(VLLMs), existing visual instruction tuning datasets include the following limitations.(1...
详细信息
Despite the effectiveness of vision-language supervised fine-tuning in enhancing the performance of vision large language models(VLLMs), existing visual instruction tuning datasets include the following limitations.(1) Instruction annotation quality: despite existing VLLMs exhibiting strong performance,instructions generated by those advanced VLLMs may still suffer from inaccuracies, such as hallucinations.(2) Instructions and image diversity: the limited range of instruction types and the lack of diversity in image data may impact the model's ability to generate diversified and closer to real-world scenarios outputs. To address these challenges, we construct a high-quality, diverse visual instruction tuning dataset MMInstruct,which consists of 973k instructions from 24 domains. There are four instruction types: judgment, multiplechoice, long visual question answering, and short visual question answering. To construct MMInstruct, we propose an instruction generation data engine that leverages GPT-4V, GPT-3.5, and manual correction. Our instruction generation engine enables semi-automatic, low-cost, and multi-domain instruction generation at 1/6 the cost of manual construction. Through extensive experiment validation and ablation experiments,we demonstrate that MMInstruct could significantly improve the performance of VLLMs, e.g., the model fine-tuning on MMInstruct achieves new state-of-the-art performance on 10 out of 12 benchmarks. The code and data shall be available at https://***/yuecao0119/MMInstruct.
In light of the escalating privacy risks in the big data era, this paper introduces an innovative model for the anonymization of big data streams, leveraging in-memory processing within the Spark framework. The approa...
详细信息
This article defines embeddings between state-based and action-based probabilistic logics which can be used to support probabilistic model checking. First, we slightly modify the model embeddings proposed in the liter...
详细信息
Large language models (LLMs) have recently shown remarkable performance in a variety of natural language processing (NLP) *** further explore LLMs'reasoning abilities in solving complex problems,recent research [1...
详细信息
Large language models (LLMs) have recently shown remarkable performance in a variety of natural language processing (NLP) *** further explore LLMs'reasoning abilities in solving complex problems,recent research [1-3]has investigated chain-of-thought (CoT) reasoning in complex multimodal scenarios,such as science question answering (scienceQA) tasks [4],by fine-tuning multimodal models through human-annotated CoT ***,collected CoT rationales often miss the necessary rea-soning steps and specific expertise.
In the era of big data, with the increase in volume and complexity of data, the main challenge is how to use big data while preserving the privacy of users. This study was conducted with the aim of finding a solution ...
详细信息
The study address the challenge of forecasting per unit energy prices in a microgrid environment consisting of solar and hydro power resources under multi-seasonal *** deep learning techniques such as LSTM,GRU and ESN...
详细信息
This paper improves the performance of linear prediction (LP) in precise spectral estimation of bone-conducted (BC) speech. Inherently, BC speech contains a wide spectral dynamic range that causes ill conditioning in ...
详细信息
In the contemporary era,the global expansion of electrical grids is propelled by various renewable energy sources(RESs).Efficient integration of stochastic RESs and optimal power flow(OPF)management are critical for n...
详细信息
In the contemporary era,the global expansion of electrical grids is propelled by various renewable energy sources(RESs).Efficient integration of stochastic RESs and optimal power flow(OPF)management are critical for network *** study introduces an innovative solution,the Gaussian Bare-Bones Levy Cheetah Optimizer(GBBLCO),addressing OPF challenges in power generation systems with stochastic *** primary objective is to minimize the total operating costs of RESs,considering four functions:overall operating costs,voltage deviation management,emissions reduction,voltage stability index(VSI)and power loss ***,a carbon tax is included in the objective function to reduce carbon *** scrutiny,using modified IEEE 30-bus and IEEE 118-bus systems,validates GBBLCO’s superior performance in achieving optimal *** results demonstrate GBBLCO’s efficacy in six optimization scenarios:total cost with valve point effects,total cost with emission and carbon tax,total cost with prohibited operating zones,active power loss optimization,voltage deviation optimization and enhancing voltage stability index(VSI).GBBLCO outperforms conventional techniques in each scenario,showcasing rapid convergence and superior solution ***,GBBLCO navigates complexities introduced by valve point effects,adapts to environmental constraints,optimizes costs while considering prohibited operating zones,minimizes active power losses,and optimizes voltage deviation by enhancing the voltage stability index(VSI)*** research significantly contributes to advancing OPF,emphasizing GBBLCO’s improved global search capabilities and ability to address challenges related to local *** emerges as a versatile and robust optimization tool for diverse challenges in power systems,offering a promising solution for the evolving needs of renewable energy-integrated power grids.
DolphinAttacks (i.e., inaudible voice commands) modulate audible voices over ultrasounds to inject malicious commands silently into voice assistants and manipulate controlled systems (e.g., doors or smart speakers). E...
详细信息
DolphinAttacks (i.e., inaudible voice commands) modulate audible voices over ultrasounds to inject malicious commands silently into voice assistants and manipulate controlled systems (e.g., doors or smart speakers). Eliminating DolphinAttacks is challenging if ever possible since it requires to modify the microphone hardware. In this paper, we design EarArray, a lightweight method that can not only detect such attacks but also identify the direction of attackers without requiring any extra hardware or hardware modification. Essentially, inaudible voice commands are modulated on ultrasounds that inherently attenuate faster than the one of audible sounds. By inspecting the command sound signals via the built-in multiple microphones on smart devices, EarArray is able to estimate the attenuation rate and thus detect the attacks. We propose a model of the propagation of audible sounds and ultrasounds from the sound source to a voice assistant, e.g., a smart speaker, and illustrate the underlying principle and its feasibility. We implemented EarArray using two specially-designed microphone arrays and our experiments show that EarArray can detect inaudible voice commands with an accuracy of above 99% and recognize the direction of the attackers with an accuracy of 97.89% and can also detect the laser-based attack with an accuracy of 100%. IEEE
Recently, deep learning has been widely employed across various domains. The Convolution Neural Network (CNN), a popular deep learning algorithm, has been successfully utilized in object recognition tasks, such as fac...
详细信息
暂无评论