In response to the COVID-19 crisis, higher education institutions increasingly rely on e-learning systems. Indeed, the higher education market has become increasingly competitive with the addition of open education mo...
详细信息
Pixel-level structure segmentations have attracted considerable attention,playing a crucial role in autonomous driving within the metaverse and enhancing comprehension in light field-based machine ***,current light fi...
详细信息
Pixel-level structure segmentations have attracted considerable attention,playing a crucial role in autonomous driving within the metaverse and enhancing comprehension in light field-based machine ***,current light field modeling methods fail to integrate appearance and geometric structural information into a coherent semantic space,thereby limiting the capability of light field transmission for visual *** this paper,we propose a general light field modeling method for pixel-level structure segmentation,comprising a generative light field prompting encoder(LF-GPE)and a prompt-based masked light field pretraining(LF-PMP)*** LF-GPE,serving as a light field backbone,can extract both appearance and geometric structural cues *** aligns these features into a unified visual space,facilitating semantic ***,our LF-PMP,during the pretraining phase,integrates a mixed light field and a multi-view light field *** prioritizes considering the geometric structural properties of the light field,enabling the light field backbone to accumulate a wealth of prior *** evaluate our pretrained LF-GPE on two downstream tasks:light field salient object detection and semantic *** results demonstrate that LF-GPE can effectively learn high-quality light field features and achieve highly competitive performance in pixel-level segmentation tasks.
Recommender systems (RSs) are prominent tools massively used in different fields of social life, e-commerce, and online platforms. The use of machine learning techniques to build RSs gives good results, but it cannot ...
详细信息
Detailed 3D human surface reconstruction and editing relies on reasonable and elaborate representations. Currently, representation for 3D human surface can be broadly categorized into mesh-based and function-based app...
详细信息
Texture defect detection is an essential technology in large-scale industrial abnormal detection. Currently, researchers in the defect detection field primarily calculate the anomaly scores of images and perform thres...
详细信息
In the field of industrial anomaly detection, the scarcity of anomalous data and labels poses significant challenges, necessitating models that can efficiently detect and localize anomalies with minimal reliance on an...
Despite the effectiveness of vision-language supervised fine-tuning in enhancing the performance of vision large language models(VLLMs), existing visual instruction tuning datasets include the following limitations.(1...
详细信息
Despite the effectiveness of vision-language supervised fine-tuning in enhancing the performance of vision large language models(VLLMs), existing visual instruction tuning datasets include the following limitations.(1) Instruction annotation quality: despite existing VLLMs exhibiting strong performance,instructions generated by those advanced VLLMs may still suffer from inaccuracies, such as hallucinations.(2) Instructions and image diversity: the limited range of instruction types and the lack of diversity in image data may impact the model's ability to generate diversified and closer to real-world scenarios outputs. To address these challenges, we construct a high-quality, diverse visual instruction tuning dataset MMInstruct,which consists of 973k instructions from 24 domains. There are four instruction types: judgment, multiplechoice, long visual question answering, and short visual question answering. To construct MMInstruct, we propose an instruction generation data engine that leverages GPT-4V, GPT-3.5, and manual correction. Our instruction generation engine enables semi-automatic, low-cost, and multi-domain instruction generation at 1/6 the cost of manual construction. Through extensive experiment validation and ablation experiments,we demonstrate that MMInstruct could significantly improve the performance of VLLMs, e.g., the model fine-tuning on MMInstruct achieves new state-of-the-art performance on 10 out of 12 benchmarks. The code and data shall be available at https://***/yuecao0119/MMInstruct.
While spin-orbit interaction has been extensively studied,few investigations have reported on the interaction between orbital angular momenta(OAMs).In this work,we study a new type of orbit-orbit coupling between the ...
详细信息
While spin-orbit interaction has been extensively studied,few investigations have reported on the interaction between orbital angular momenta(OAMs).In this work,we study a new type of orbit-orbit coupling between the longitudinal OAM and the transverse OAM carried by a three-dimensional(3D)spatiotemporal optical vortex(STOV)in the process of tight *** 3D STOV possesses orthogonal OAMs in the x-y,t-x,and y-t planes,and is preconditioned to overcome the spatiotemporal astigmatism effect.x,y,and t are the axes in the spatiotemporal *** corresponding focused wavepacket is calculated by employing the Debye diffraction theory,showing that a phase singularity ring is generated by the interactions among the transverse and longitudinal vortices in the highly confined *** Fourier-transform decomposition of the Debye integral is employed to analyze the mechanism of the orbit-orbit *** is the first revelation of coupling between the longitudinal OAM and the transverse OAM,paving the way for potential applications in optical trapping,laser machining,nonlinear light-matter interactions,and more.
Deep neural networks virtually dominate the domain of most modern vision systems, providing high performance at a cost of increased computational complexity. Since for those systems it is often required to operate bot...
详细信息
Quantum Neural Networks (QNNs) are an emerging technology that can be used in many applications including computervision. In this paper, we presented a traffic sign classification system implemented using a hybrid qu...
详细信息
暂无评论