In today’s era, smartphones are used in daily lives because they are ubiquitous and can be customized by installing third-party apps. As a result, the menaces because of these apps, which are potentially risky for u...
详细信息
In this work, we present DocPedia, a novel large multimodal model(LMM) for versatile OCRfree document understanding, capable of parsing images up to 2560 × 2560 resolution. Unlike existing studies that either str...
详细信息
In this work, we present DocPedia, a novel large multimodal model(LMM) for versatile OCRfree document understanding, capable of parsing images up to 2560 × 2560 resolution. Unlike existing studies that either struggle with high-resolution documents or give up the large language model thus vision or language ability constrained, our DocPedia directly processes visual input in the frequency domain rather than the pixel space. The unique characteristic enables DocPedia to capture a greater amount of visual and textual information using a limited number of visual tokens. To consistently enhance both the perception and comprehension abilities of our DocPedia, we develop a dual-stage training strategy and enrich instructions/annotations of all training tasks covering multiple document types. Extensive quantitative and qualitative experiments are conducted on various publicly available benchmarks and the results confirm the mutual benefits of jointly learning perception and comprehension tasks. The results provide further evidence of the effectiveness and superior performance of our DocPedia over other methods.
This study examines the use of experimental designs, specifically full and fractional factorial designs, for predicting Alzheimer’s disease with fewer variables. The full factorial design systematically investigates ...
详细信息
The underlying vertical components represented by vehicleto-everything networks will largely accelerate the advance of the 6th generation wireless communications [1]. In this context, a plethora of Internet-of-Vehicle...
The underlying vertical components represented by vehicleto-everything networks will largely accelerate the advance of the 6th generation wireless communications [1]. In this context, a plethora of Internet-of-Vehicles(IoV) applications have increasingly permeated our daily lives with the development of advocated intelligent connected vehicles [2].
Data-driven process monitoring is an effective approach to assure safe operation of modern manufacturing and energy systems, such as thermal power plants being studied in this work. Industrial processes are inherently...
详细信息
Data-driven process monitoring is an effective approach to assure safe operation of modern manufacturing and energy systems, such as thermal power plants being studied in this work. Industrial processes are inherently dynamic and need to be monitored using dynamic algorithms. Mainstream dynamic algorithms rely on concatenating current measurement with past data. This work proposes a new, alternative dynamic process monitoring algorithm, using dot product feature analysis(DPFA).DPFA computes the dot product of consecutive samples, thus naturally capturing the process dynamics through temporal correlation. At the same time, DPFA's online computational complexity is lower than not just existing dynamic algorithms, but also classical static algorithms(e.g., principal component analysis and slow feature analysis). The detectability of the new algorithm is analyzed for three types of faults typically seen in process systems:sensor bias, process fault and gain change fault. Through experiments with a numerical example and real data from a thermal power plant, the DPFA algorithm is shown to be superior to the state-of-the-art methods, in terms of better monitoring performance(fault detection rate and false alarm rate) and lower computational complexity.
Plant diseases are one of the major contributors to economic loss in the agriculture industry worldwide. Detection of disease at early stages can help in the reduction of this loss. In recent times, a lot of emphasis ...
详细信息
Matrix minimization techniques that employ the nuclear norm have gained recognition for their applicability in tasks like image inpainting, clustering, classification, and reconstruction. However, they come with inher...
详细信息
Matrix minimization techniques that employ the nuclear norm have gained recognition for their applicability in tasks like image inpainting, clustering, classification, and reconstruction. However, they come with inherent biases and computational burdens, especially when used to relax the rank function, making them less effective and efficient in real-world scenarios. To address these challenges, our research focuses on generalized nonconvex rank regularization problems in robust matrix completion, low-rank representation, and robust matrix regression. We introduce innovative approaches for effective and efficient low-rank matrix learning, grounded in generalized nonconvex rank relaxations inspired by various substitutes for the ?0-norm relaxed functions. These relaxations allow us to more accurately capture low-rank structures. Our optimization strategy employs a nonconvex and multi-variable alternating direction method of multipliers, backed by rigorous theoretical analysis for complexity and *** algorithm iteratively updates blocks of variables, ensuring efficient convergence. Additionally, we incorporate the randomized singular value decomposition technique and/or other acceleration strategies to enhance the computational efficiency of our approach, particularly for large-scale constrained minimization problems. In conclusion, our experimental results across a variety of image vision-related application tasks unequivocally demonstrate the superiority of our proposed methodologies in terms of both efficacy and efficiency when compared to most other related learning methods.
"The Siri Bhoovalaya is a seminal work of literature, believed to have been composed approximately a millennium ago, which encompasses diverse information encrypted using numerals of the Kannada language—a predo...
详细信息
In the data stream, the data has non-stationary quality because of continual and inconsistent change. This change is represented as the concept drift in the classifying process of the streaming data. Representing this...
详细信息
Interpretable visual recognition is essential for decision-making in high-stakes situations. Recent advancements have automated the construction of interpretable models by leveraging Visual Language Models (VLMs) and ...
详细信息
暂无评论