Healthcare systems all over the world are strained as the COVID-19 pandemic39;s spread becomes more widespread. The only realistic strategy to avoid asymptomatic transmission is to monitor social distance, as there ...
详细信息
This project introduces a comprehensive home automation system leveraging Wi-Fi connectivity and ESP32 micro controllers, aimed at enhancing modern living through seamless automation and user-centric design. The backg...
详细信息
Inefficient garbage collection not only leads to overflowing bins and unpleasant but also poses significant environmental and health risks. This paper proposes an effective garbage monitoring system utilizing a GSM mo...
详细信息
The centralized electric energy transaction has deficiencies of large network overhead and high transaction cost, however, users trading electric energy in the distributed mode characterized by inactivity. Simultaneou...
详细信息
India, where a third of the world39;s blind people reside, has about 12 million blind people, especially in comparison to a total of 39 million worldwide, according to the National Programme for control of Blindness...
详细信息
Nowadays, the constantly expanding coverage of expressways leads to an astronomical volume of daily high-speed traffic. In order to address the poor real-time performance of traditional traffic statistics and low accu...
详细信息
We present an approach to efficiently and effectively adapt a masked image modeling (MIM) pre-trained vanilla vision Transformer (ViT) for object detection, which is based on our two novel observations: (i) A MIM pre-...
ISBN:
(纸本)9798350307184
We present an approach to efficiently and effectively adapt a masked image modeling (MIM) pre-trained vanilla vision Transformer (ViT) for object detection, which is based on our two novel observations: (i) A MIM pre-trained vanilla ViT encoder can work surprisingly well in the challenging object-level recognition scenario even with randomly sampled partial observations, e.g., only 25% similar to 50% of the input embeddings. (ii) In order to construct multi-scale representations for object detection from single-scale ViT, a randomly initialized compact convolutional stem supplants the pre-trained patchify stem, and its intermediate features can naturally serve as the higher resolution inputs of a feature pyramid network without further upsampling or other manipulations. While the pre-trained ViT is only regarded as the 3rd-stage of our detector's backbone instead of the whole feature extractor. This naturally results in a ConvNet-ViT hybrid architecture. The proposed detector, named MIMDET, enables a MIM pre-trained vanilla ViT to outperform leading hierarchical architectures such as Swin Transformer, MViTv2 and ConvNeXt on COCO object detection & instance segmentation, and achieves better results compared with the previous best adapted vanilla ViT detector using a more modest fine-tuning recipe while converging 2.8x faster. Code and pre-trained models are available at https://***/hustvl/MIMDet.
As the main carrier of Metaverse VR technology, the head-mounted display system plays a vital role in leading this technological trend. This paper implements a head-mounted display system based on Metaverse VR technol...
详细信息
This research focuses on designing and implementing a processor with a five-stage pipeline for educational purposes. The proposed processor can execute five 16-bit instructions simultaneously and is designed and simul...
详细信息
The K-nearest neighbor (KNN) algorithm is widely used in navigation, such as traffic management, driverless vehicles, and logistics planning. While it offers powerful instance-based learning, its performance can be in...
详细信息
暂无评论