Successive proposals of several self-supervised training schemes (STSs) continue to emerge, taking one step closer to developing a universal foundation model. In this process, unsupervised downstream tasks are recogni...
详细信息
Successive proposals of several self-supervised training schemes (STSs) continue to emerge, taking one step closer to developing a universal foundation model. In this process, unsupervised downstream tasks are recognized as one of the evaluation methods to validate the quality of visual features learned with self-supervised training. However, unsupervised dense semantic segmentation has yet to be explored as a downstream task, which can utilize and evaluate the quality of semantic information introduced in patch-level feature representations during self-supervised training of vision transformers. Therefore, we propose a novel data-driven framework, DatUS, to perform unsupervised dense semantic segmentation (DSS) as a downstream task. DatUS generates semantically consistent pseudosegmentation masks for an unlabeled image dataset without using visual prior or synchronized data. The experiment shows that the proposed framework achieves the highest MIoU (24.90) and average F1 score (36.3) by choosing DINOv2 and the highest pixel accuracy (62.18) by choosing DINO as the STS on the training set of SUIM dataset. It also outperforms state-of-the-art methods for the unsupervised DSS task with 15.02% MIoU, 21.47% pixel accuracy, and 16.06% average F1 score on the validation set of SUIM dataset. It achieves a competitive level of accuracy for a large-scale COCO dataset.
Can we teach a robot to recognize and make predictions for activities that it has never seen before? We tackle this problem by learning models for video from text. This paper presents a hierarchical model that general...
详细信息
Can we teach a robot to recognize and make predictions for activities that it has never seen before? We tackle this problem by learning models for video from text. This paper presents a hierarchical model that generalizes instructional knowledge from large-scale text corpora and transfers the knowledge to video. Given a portion of an instructional video, our model recognizes and predicts coherent and plausible actions multiple steps into the future, all in rich natural language. To demonstrate the capabilities of our model, we introduce the Tasty Videos dataset V2, a collection of 4022 recipes for zero-shot learning, recognition and anticipation. Extensive experiments with various evaluation metrics demonstrate the potential of our method for generalization, given limited video data for training models.
In this paper, a lightweight blockchain simulation and transaction graph visualization application is presented, crafted to augment the identification of pivotal nodes within blockchain networks. Harnessing sophistica...
详细信息
data cleansing has become an essential task not only in dataanalysis but also in artificial intelligence processes. As datasets dramatically increase in size, the efficiency and effectiveness of dataset preparation p...
详细信息
We document an interactive half-day tutorial in which participants explore the advanced applications of National Science data Fabric (NSDF) services and strategies for comprehensive scientific dataanalysis. Targeting...
详细信息
Deep learning-based object detectors, while offering exceptional performance, are data-dependent and can suffer from generalization issues. In this work, we investigated deep neural networks for detecting people and m...
详细信息
ISBN:
(纸本)9798350308006;9798350307993
Deep learning-based object detectors, while offering exceptional performance, are data-dependent and can suffer from generalization issues. In this work, we investigated deep neural networks for detecting people and medical instruments for the vision-based workflow analysis system inside Catheterization Laboratories (Cath Labs). The central problem explored in this paper is the fact that the performance of the detector can degrade drastically if it is trained and tested on data from different Cath Labs. Our research aimed to investigate the underlying causes of this specific performance degradation and find solutions to mitigate this issue. We employed the YOLOv8 object detector and created datasets from clinical procedures recorded at Reinier de Graaf Hospital (RdGG) and Philips Best Campus, supplemented with publicly accessible images. Through a series of experiments complemented by datavisualization, we discovered that the performance degradation primarily stems from data distribution shifts in the feature space. Notably, the object detector trained on non-sensitive online images can generalize to unseen Cath Labs, outperforming the model trained on a procedure recording from a different Cath Lab. The detector trained on the online images achieved an mAP@0.5 of 0.517 on the RdGG dataset. Furthermore, by switching to the most suitable camera for each object in the Cath Lab, the multi-camera system can further improve the detection performance significantly. An aggregated 1-camera mAP@0.5 of 0.679 is achieved for single-object classes on the RdGG dataset.
Numerical stability is a crucial requirement of reliable scientific computing. However, despite the pervasiveness of Python in data science, analyzing large Python programs remains challenging due to the lack of scala...
详细信息
Numerical stability is a crucial requirement of reliable scientific computing. However, despite the pervasiveness of Python in data science, analyzing large Python programs remains challenging due to the lack of scalable numerical analysis tools available for this language. To fill this gap, we developed PyTracer, a profiler to quantify numerical instability in Python applications. PyTracertransparently instruments Python code to produce numerical traces and visualize them interactively in a Plotly dashboard. We designed PyTracerto be agnostic to numerical noise model, allowing for numerical profiling through Monte-Carlo Arithmetic, random rounding, random data perturbation, or structured noise for a particular application. We illustrate PyTracer's capabilities by testing the numerical stability of key functions in both SciPy and Scikit-learn, two dominant Python libraries for mathematical modeling. Through these evaluations, we demonstrate PyTraceras a scalable, automated, and generic framework for numerical profiling in Python.
With the development of enterprise informatization, the scale of sales data within enterprises is gradually expanding, and phenomena such as data redundancy, dispersion, and incompleteness often occur. In addition, vi...
详细信息
Millions of people have been impacted by the COVID-19 pandemic drastically. This pandemic has challenged different areas of people's lives, such as employment, health, the economy, education, etc. Employment rate ...
详细信息
We introduce an ML-driven approach that enables interactive example-based queries for similar behavior in ensembles of spatiotemporal scientific data. This addresses an important use case in the visual exploration of ...
详细信息
We introduce an ML-driven approach that enables interactive example-based queries for similar behavior in ensembles of spatiotemporal scientific data. This addresses an important use case in the visual exploration of simulation and experimental data, where data is often large, unlabeled and has no meaningful similarity measures available. We exploit the fact that nearby locations often exhibit similar behavior and train a Siamese Neural Network in a self-supervised fashion, learning an expressive latent space for spatiotemporal behavior. This space can be used to find similar behavior with just a few user-provided examples. We evaluate this approach on several ensemble datasets and compare with multiple existing methods, showing both qualitative and quantitative results.
暂无评论