Implicit Neural Representation (INR), which utilizes a neural network to map coordinate inputs to corresponding attributes, is causing a revolution in the field of signal processing. However, current INR techniques su...
详细信息
ISBN:
(数字)9798350353006
ISBN:
(纸本)9798350353013
Implicit Neural Representation (INR), which utilizes a neural network to map coordinate inputs to corresponding attributes, is causing a revolution in the field of signal processing. However, current INR techniques suffer from a re-stricted capability to tune their supported frequency set, re-sulting in imperfect performance when representing complex signals with multiple frequencies. We have identified that this frequency-related problem can be greatly alleviated by introducing variableperiodic activation functions, for which we propose FINER. By initializing the bias of the neural network within different ranges, sub-functions with various frequencies in the variableperiodic function are selected for activation. Consequently, the supported frequency set of FINER can be flexibly tuned, leading to improved performance in signal representation. We demon-strate the capabilities of FINER in the contexts of2D image fitting, 3D signed distance field representation, and 5D neural radiance fields optimization, and we show that it outper-forms existing INRs.
Due to the rapid growth in the number of vehicles over the last decade, there has been a dramatic increase in demand for highway capacity analysis. Vehicle counting, in particular, has become a key element of vision-b...
详细信息
ISBN:
(纸本)9781665448994
Due to the rapid growth in the number of vehicles over the last decade, there has been a dramatic increase in demand for highway capacity analysis. Vehicle counting, in particular, has become a key element of vision-based intelligent traffic systems deployed across metropolitan areas. Most methods solved the vehicle counting problem under the assumption of state-of-the-art computing systems. However, large-scale deployment of such systems for multi-camera processing.is very inefficient. With the recent advancement of cost-efficient Internet-of-Things (IoT) devices alongside machine learning methods developed specifically for such devices, solving the vehicle counting problem for real-time traffic analysis on IoT edge devices, and thereby facilitating its large-scale deployment have become highly favorable. In this paper, we propose a framework of vehicle counting designed specifically for IoT edge computers which follows the detection-tracking-counting (DTC) model. The proposed solution aims at addressing the multimodality of contextual dynamics in traffic scenes with a small detector model, a robust tracker and a counting process that accurately estimate both a vehicle's motion of interest and its exit time from observation areas. Experimental results on AI City 2021 Track-1 Dataset showed that ours outperformed related methods with promising results regarding both accuracy and execution speed.
3D modeling of articulated bodies of humans or animals and using these models for synthetic 2D and 3D pose data generation can mitigate the small data challenges faced by many critical applications such as healthcare....
详细信息
ISBN:
(纸本)9781665448994
3D modeling of articulated bodies of humans or animals and using these models for synthetic 2D and 3D pose data generation can mitigate the small data challenges faced by many critical applications such as healthcare. In this paper, we present our efficient 3D synthetic model generation (3D-SMG) pipeline used for body pose data augmentation. 3D-SMG pipeline starts with scanning point clouds from various angles around the subject using an off-the-shelf RGBD camera. We then implement a dual objective iterative closest point (ICP) algorithm that uses both color (if available) as well as geometric information from point cloud and apply a pose graph node optimization to form one single rigid body mesh. 3D-SMG also includes a series of post processing.steps to obtain a smooth mesh at the end of the pipeline. The approach allows it to be applied to any articulated object such as a human body or an animal. Our experiments also show high level of accuracy in dimensions of obtained 3D meshes, when compared to the original subject. As the final step towards developing augmented pose dataset, we perform model rigging to articulate the 3D model of the subject and generate dynamic avatars within variety of context-feasible poses(1).
Monocular (relative or metric) depth estimation is a critical task for various applications, such as autonomous vehicles, augmented reality and image editing. In recent years, with the increasing availability of mobil...
详细信息
ISBN:
(纸本)9781665448994
Monocular (relative or metric) depth estimation is a critical task for various applications, such as autonomous vehicles, augmented reality and image editing. In recent years, with the increasing availability of mobile devices, accurate and mobile-friendly depth models have gained importance. Increasingly accurate models typically require more computational resources, which inhibits the use of such models on mobile devices. The mobile use case is arguably the most unrestricted one, which requires highly accurate yet mobile-friendly architectures. Therefore, we try to answer the following question: How can we improve a model without adding further complexity (i.e. parameters)? Towards this end, we systematically explore the design space of a relative depth estimation model from various dimensions and we show, with key design choices and ablation studies, even an existing architecture can reach highly competitive performance to the state of the art, with a fraction of the complexity. Our study spans an in-depth backbone model selection process, knowledge distillation, intermediate predictions, model pruning and loss rebalancing. We show that our model, using only DIW as the supervisory dataset, achieves 0.1156 WHDR on DIW with 2.6M parameters and reaches 37 FPS on a mobile GPU, without pruning or hardware-specific optimization. A pruned version of our model achieves 0.1208 WHDR on DIW with 1M parameters and reaches 44 FPS on a mobile GPU.
China is one of the first countries to invent pottery in the world. Like India, West Asia, Japan and the central Balkans, our ancestors invented pottery in the Neolithic period of primitive society. As a product of co...
详细信息
The ability of three-dimensional (3D) spheroid modeling to study the invasive behavior of breast cancer cells has drawn increased attention. The deep learning-based imageprocessing.framework is very effective at spee...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
The ability of three-dimensional (3D) spheroid modeling to study the invasive behavior of breast cancer cells has drawn increased attention. The deep learning-based imageprocessing.framework is very effective at speeding up the cell morphological analysis process. Out-of-focus photos taken while capturing 3D cells under several z-slices, however, could negatively impact the deep learning model. In this work, we created a new algorithm to handle blurry images while preserving the stacked image quality. Furthermore, we proposed a unique training architecture that leverages consistency training to help reduce the bias of the model when dense-slice stacking is applied. Additionally, the model’s stability is increased under the sparse-slice stacking effect by utilizing the self-training approach. The new blurring stacking technique and training flow are combined with the suggested architecture and self-training mechanism to provide an innovative yet easy-to-use framework. Our methods produced noteworthy experimental outcomes in terms of both quantitative and qualitative aspects.
image dehazing, a pivotal task in low-level vision, aims to restore the visibility and detail from hazy images. Many deep learning methods with powerful representation learning capability demonstrate advanced performa...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
image dehazing, a pivotal task in low-level vision, aims to restore the visibility and detail from hazy images. Many deep learning methods with powerful representation learning capability demonstrate advanced performance on non-homogeneous dehazing, however, these methods usually struggle with processing.high-resolution images (e.g., 4000 × 6000) due to their heavy computational demands. To address these challenges, we introduce an innovative non-homogeneous Dehazing method via Deformable Convolutional Transformer-like architecture (DehazeDCT). Specifically, we first design a transformer-like network based on deformable convolution v4, which offers long-range dependency and adaptive spatial aggregation capabilities and demonstrates faster convergence and forward speed. Furthermore, we leverage a lightweight Retinex-inspired transformer to achieve color correction and structure refinement. Extensive experiment results and highly competitive performance of our method in NTIRE 2024 Dense and Non-Homogeneous Dehazing Challenge, ranking second among all 16 submissions, demonstrate the superior capability of our proposed method. The code is available: https://***/movingforward100/Dehazing_R.
In this work, we present a marker-based multi-view spine tracking method that is specifically adjusted to the requirements for movements in sports. A maximal focus is on the accurate detection of markers and fast usag...
In this work, we present a marker-based multi-view spine tracking method that is specifically adjusted to the requirements for movements in sports. A maximal focus is on the accurate detection of markers and fast usage of the system. For this task, we take advantage of the prior knowledge of the arrangement of dots in perforated kinesiology tape. We detect the tape and its dots using a Mask R-CNN and a blob detector. Here, we can focus on detection only while skipping any image-based feature encoding or matching. We conduct a reasoning in 3D by a linear program and Markov random fields, in which the structure of the kinesiology tape is modeled and the shape of the spine is optimized. In comparison to state-of-the-art systems, we demonstrate that our system achieves high precision and marker density, is robust against occlusions, and capable of capturing fast movements.
Retrieval-augmented generation (RAG) is used in natural language processing.(NLP) to provide query-relevant information in enterprise documents to large language models (LLMs). Such enterprise context enables the LLMs...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Retrieval-augmented generation (RAG) is used in natural language processing.(NLP) to provide query-relevant information in enterprise documents to large language models (LLMs). Such enterprise context enables the LLMs to generate more informed and accurate responses. When enterprise data is primarily videos, AI models like vision language models (VLMs) are necessary to convert information in videos into text. While essential, this conversion is a bottleneck, especially for large corpus of videos. It delays the timely use of enterprise videos to generate useful *** propose ViTA, a novel method that leverages two unique characteristics of VLMs to expedite the conversion process. As VLMs output more text tokens, they incur higher latency. In addition, large (heavyweight) VLMs can extract intricate details from images and videos, but they incur much higher latency per output token when compared to smaller (lightweight) VLMs that may miss details. To expedite conversion, ViTA first employs a lightweight VLM to quickly understand the gist or overview of an image or a video clip, and directs a heavyweight VLM (through prompt engineering) to extract additional details by using only a few (preset number of) output tokens. Our experimental results show that ViTA expedites the conversion time by as much as 43%, without compromising the accuracy of responses when compared to a baseline system that only uses a heavyweight VLM.
With the normalization of prevention and control of COVID-19, the market of smart healthcare will be further opened. Smart healthcare and active assisted living have important applications in infectious disease preven...
详细信息
暂无评论