Images obtained from hyperspectral sensors provide information about the target area that extends beyond the visible portions of the electromagnetic ***,due to sensor limitations and imperfections during the image acq...
详细信息
Images obtained from hyperspectral sensors provide information about the target area that extends beyond the visible portions of the electromagnetic ***,due to sensor limitations and imperfections during the image acquisition and transmission phases,noise is introduced into the acquired image,which can have a negative impact on downstream analyses such as classification,target tracking,and spectral *** in hyperspectral images(HSI)is modelled as a combination from several sources,including Gaussian/impulse noise,stripes,and *** HSI restoration method for such a mixed noise model is ***,a joint optimisation framework is proposed for recovering hyperspectral data corrupted by mixed Gaussian-impulse noise by estimating both the clean data as well as the sparse/impulse noise ***,a hyper-Laplacian prior is used along both the spatial and spectral dimensions to express sparsity in clean image ***,to model the sparse nature of impulse noise,anℓ_(1)−norm over the impulse noise gradient is *** the proposed methodology employs two distinct priors,the authors refer to it as the hyperspectral dual prior(HySpDualP)*** the best of authors'knowledge,this joint optimisation framework is the first attempt in this *** handle the non-smooth and nonconvex nature of the generalℓ_(p)−norm-based regularisation term,a generalised shrinkage/thresholding(GST)solver is ***,an efficient split-Bregman approach is used to solve the resulting optimisation *** results on synthetic data and real HSI datacube obtained from hyperspectral sensors demonstrate that the authors’proposed model outperforms state-of-the-art methods,both visually and in terms of various image quality assessment metrics.
Video question answering(VideoQA) is a challenging yet important task that requires a joint understanding of low-level video content and high-level textual semantics. Despite the promising progress of existing efforts...
详细信息
Video question answering(VideoQA) is a challenging yet important task that requires a joint understanding of low-level video content and high-level textual semantics. Despite the promising progress of existing efforts, recent studies revealed that current VideoQA models mostly tend to over-rely on the superficial correlations rooted in the dataset bias while overlooking the key video content, thus leading to unreliable results. Effectively understanding and modeling the temporal and semantic characteristics of a given video for robust VideoQA is crucial but, to our knowledge, has not been well investigated. To fill the research gap, we propose a robust VideoQA framework that can effectively model the cross-modality fusion and enforce the model to focus on the temporal and global content of videos when making a QA decision instead of exploiting the shortcuts in datasets. Specifically, we design a self-supervised contrastive learning objective to contrast the positive and negative pairs of multimodal input, where the fused representation of the original multimodal input is enforced to be closer to that of the intervened input based on video perturbation. We expect the fused representation to focus more on the global context of videos rather than some static keyframes. Moreover, we introduce an effective temporal order regularization to enforce the inherent sequential structure of videos for video representation. We also design a Kullback-Leibler divergence-based perturbation invariance regularization of the predicted answer distribution to improve the robustness of the model against temporal content perturbation of videos. Our method is model-agnostic and can be easily compatible with various VideoQA backbones. Extensive experimental results and analyses on several public datasets show the advantage of our method over the state-of-the-art methods in terms of both accuracy and robustness.
Databases play a vital role in data management in many fields,such as finance,government,telecommunications,energy,electricity,transportation,*** the database management system has become a core foundational *** is an...
详细信息
Databases play a vital role in data management in many fields,such as finance,government,telecommunications,energy,electricity,transportation,*** the database management system has become a core foundational *** is an enterprise-grade open-source database,a product of deep integration of research and development from Huawei,Tsinghua University,and China Mobile in the past decade.
Cohesive subgraph search is a fundamental problem in bipartite graph *** integers k andℓ,a(k,ℓ)-biplex is a cohesive structure which requires each vertex to disconnect at most k orℓvertices in the other ***(k,ℓ)-biple...
详细信息
Cohesive subgraph search is a fundamental problem in bipartite graph *** integers k andℓ,a(k,ℓ)-biplex is a cohesive structure which requires each vertex to disconnect at most k orℓvertices in the other ***(k,ℓ)-biplexes has been a popular research topic in recent years and has various ***,most existing studies considered the problem of finding(k,ℓ)-biplex with the largest number of *** this paper,we instead consider another variant and focus on the maximum vertex(k,ℓ)-biplex problem which aims to search for a(k,ℓ)-biplex with the maximum *** first show that this problem is Non-deterministic Polynomial-time hard(NP-hard)for any positive integers k andℓwhile max{k,ℓ}is at least *** by this negative result,we design an efficient branch-and-bound algorithm with a novel *** particular,we introduce a branching strategy based on whether there is a pivot in the current set,with which our proposed algorithm has the time complexity ofγ^(n)n^(O(1)),whereγ<*** addition,we also apply multiple speed-up techniques and various pruning ***,we conduct extensive experiments on various real datasets which demonstrate the efficiency of our proposed algorithm in terms of running time.
In the evolving landscape of surveillance and security applications, the task of person re-identification(re-ID) has significant importance, but also presents notable difficulties. This task entails the process of acc...
详细信息
In the evolving landscape of surveillance and security applications, the task of person re-identification(re-ID) has significant importance, but also presents notable difficulties. This task entails the process of accurately matching and identifying persons across several camera views that do not overlap with one another. This is of utmost importance to video surveillance, public safety, and person-tracking applications. However, vision-related difficulties, such as variations in appearance, occlusions, viewpoint changes, cloth changes, scalability, limited robustness to environmental factors, and lack of generalizations, still hinder the development of reliable person re-ID methods. There are few approaches have been developed based on these difficulties relied on traditional deep-learning techniques. Nevertheless, recent advancements of transformer-based methods, have gained widespread adoption in various domains owing to their unique architectural properties. Recently, few transformer-based person re-ID methods have developed based on these difficulties and achieved good results. To develop reliable solutions for person re-ID, a comprehensive analysis of transformer-based methods is necessary. However, there are few studies that consider transformer-based techniques for further investigation. This review proposes recent literature on transformer-based approaches, examining their effectiveness, advantages, and potential challenges. This review is the first of its kind to provide insights into the revolutionary transformer-based methodologies used to tackle many obstacles in person re-ID, providing a forward-thinking outlook on current research and potentially guiding the creation of viable applications in real-world scenarios. The main objective is to provide a useful resource for academics and practitioners engaged in person re-ID. IEEE
In the application of big data, one of the most challenging problems is how to consider the requirements of users. To avoid this problem, we proposed IOC-FP-growth. This added user-defined pre-term or post-item constr...
详细信息
Customized keyword spotting needs to adapt quickly to small user *** methods primarily solve the problem under moderate noise *** work increases the level of difficulty in detecting keywords by introducing keyword ***...
详细信息
Customized keyword spotting needs to adapt quickly to small user *** methods primarily solve the problem under moderate noise *** work increases the level of difficulty in detecting keywords by introducing keyword ***,the current solution has been explored on large models with many parameters,making it unsuitable for deployment on small *** applying the current solution to lightweight models with minimal training data,the performance degrades compared to the baseline ***,we propose a light-weight multi-task architecture(<9.0×10^(4)parameters)created from integrating the triplet attention module in the ConvMixer networks and a new auxiliary mixed labeling encoding to address the *** results of our experiment show that the proposed model outperforms similar light-weight models for keyword spotting,with accuracy gains ranging from 0.73%to 2.95%for a clean set and from 2.01%to 3.37%for a mixed set under different scales of training ***,our model shows its robustness in different low-resource language datasets while converging faster.
XStorm, an FRP language for small-scale embedded systems, allows us to concisely describe state-dependent behaviors based on the state transition model. However, when we use different sets of peripheral devices depend...
详细信息
As blockchain technology becomes prevalent, smart contracts have shown significant utility in finance and supply chain management. However, vulnerabilities in smart contracts pose serious threats to blockchain securit...
详细信息
Accurate 3D hand pose estimation is a challenging computer vision problem primarily because of self-occlusion and viewpoint variations. Existing methods address viewpoint variations by applying data-centric transforma...
详细信息
Accurate 3D hand pose estimation is a challenging computer vision problem primarily because of self-occlusion and viewpoint variations. Existing methods address viewpoint variations by applying data-centric transformations, such as data alignments or generating multiple views, which are prone to data sensitivity, error propagation, and prohibitive computational requirements. We improve the estimation accuracy by mitigating the impact of self-occlusion and viewpoint variations from the network side and propose MH-Net, a novel multiheaded network for accurate 3D hand pose estimation from a depth image. MH-Net comprises three key components. First, a multiscale feature extraction backbone based on an improved multiscale vision transformer (MViTv2) is proposed to extract shift-invariant global features. Second, a 3D anchorset generator is proposed to generate three disjoint sets of 3D anchors that serve two purposes: formulating hand pose estimation as an anchor-to-joint offset estimation and defining three unique viewpoints from a single depth image. Third, three identical regression heads are proposed to regress 3D joint positions based on unique viewpoints defined by their respective anchorsets. Extensive ablation studies have been conducted to investigate the impact of anchorsets, regression heads, and feature extraction backbones. Experiments on three public datasets, ICVL, MSRA, and NYU, show significant improvements over the state-of-the-art. IEEE
暂无评论