作者:
Liu, HaozheZhang, WentianLiu, FengWu, HaoqianShen, LinlinThe Computer Vision Institute
College of Computer Science and Software Engineering SZU Branch Shenzhen Institute of Artificial Intelligence and Robotics for Society The National Engineering Laboratory for Big Data System Computing Technology The Guangdong Key Laboratory of Intelligent Information Processing Shenzhen University Shenzhen518060 China
The vulnerability of automated fingerprint recognition systems (AFRSs) to presentation attacks (PAs) promotes the vigorous development of PA detection (PAD) technology. However, PAD methods have been limited by inform...
详细信息
With the success of multimodal pre-training models in the video-language field and various downstream tasks, previous multimodal models used 3DCNN networks as video feature extractors, which have limitations in intera...
With the success of multimodal pre-training models in the video-language field and various downstream tasks, previous multimodal models used 3DCNN networks as video feature extractors, which have limitations in interacting and fusing with text features. This paper proposes a multimodal pre-training model that utilizes a Video-Swin-Transformer-based network to encode both video and text data, to achieve better performance in video understanding. The model consists of four modules: video encoder, text encoder, interact encoder, and caption decoder to accomplish the task of ocean scene video captioning. A dataset of ocean scene videos, including various content types such as sea surfaces and shores, is also constructed. The training process is divided into two stages: pre-training and fine-tuning. Pre-training is performed on the Howto100m dataset to allow the model to learn video captions in natural scenes and complete video-language matching tasks. The fine-tuning stage is then performed on the ocean1000 dataset to better understand the events and content in ocean scene videos and generate captions that conform to ocean scene video descriptions. The model achieves satisfying results on both the public dataset YouCook2 and the proprietary dataset Ocean1000, demonstrating its ability in video-text information fusion and interaction.
Orthogonal time frequency space (OTFS) modulation has been viewed as a promising technique for integrated sensing and communication (ISAC) systems and aerial-terrestrial networks, due to its delay-Doppler domain trans...
详细信息
Salient object detection (SOD) is a key preprocessing step in various computervision tasks, aiming to replicate the human visual system to identify the most significant objects or regions in images or videos. However...
详细信息
For efficient and high-fidelity local facial attribute editing, most existing editing methods either require additional finetuning for different editing effects or tend to affect beyond the editing regions. Alternativ...
Hair editing is a critical image synthesis task that aims to edit hair color and hairstyle using text descriptions or reference images, while preserving irrelevant attributes (e.g., identity, background, cloth). Many ...
详细信息
Due to cloud security concerns, there is an increasing interest in information retrieval systems that can support private queries over public documents. It is desirable for oblivious document ranking and retrieval in ...
详细信息
ISBN:
(数字)9798350381993
ISBN:
(纸本)9798350382006
Due to cloud security concerns, there is an increasing interest in information retrieval systems that can support private queries over public documents. It is desirable for oblivious document ranking and retrieval in public cloud at lower cost and faster speed without revealing query-related information. Currently, the term frequency-inverse document frequency (TF-IDF) and private information retrieval (PIR) techniques are used to solve this problem, but the encryption operation time is over dominant. Motivated by the observation of the sparsity of the TF-IDF matrix, we propose an efficient approach for oblivious document ranking and retrieval, called E-Coeus. It takes advantage of the high sparsity of the TF-IDF matrix to rearrange the matrix. Our method accelerates the speed of PIR inadvertently retrieving documents and reduces the user retrieval delay time. In a stand-alone experiment for a TF-IDF matrix of 1.2M rows and 64K columns with the sparsity of 10%, E-Coeus improves the document ranking and retrieval performance by 23% over the state-of-the-art approach, Coeus. With cluster of 64 machines, E-Coeus improves the performance by 34% over Coeus when the TF-IDF matrix sparsity is 30%.
Video-based action recognition is becoming a vital tool in clinical research and neuroscientific study for disorder detection and ***,action recognition currently used in non-human primate(NHP)research relies heavily ...
详细信息
Video-based action recognition is becoming a vital tool in clinical research and neuroscientific study for disorder detection and ***,action recognition currently used in non-human primate(NHP)research relies heavily on intense manual labor and lacks standardized *** this work,we established two standard benchmark datasets of NHPs in the laboratory:Monkeyin Lab(Mi L),which includes 13 categories of actions and postures,and MiL2D,which includes sequences of two-dimensional(2D)skeleton ***,based on recent methodological advances in deep learning and skeleton visualization,we introduced the Monkey Monitor Kit(Mon Kit)toolbox for automatic action recognition,posture estimation,and identification of fine motor activity in *** the datasets and Mon Kit,we evaluated the daily behaviors of wild-type cynomolgus monkeys within their home cages and experimental environments and compared these observations with the behaviors exhibited by cynomolgus monkeys possessing mutations in the MECP2 gene as a disease model of Rett syndrome(RTT).Mon Kit was used to assess motor function,stereotyped behaviors,and depressive phenotypes,with the outcomes compared with human manual *** Kit established consistent criteria for identifying behavior in NHPs with high accuracy and efficiency,thus providing a novel and comprehensive tool for assessing phenotypic behavior in monkeys.
Erasure codes have been widely applied to in-memory key-value storage systems for high reliability and low redundancy. In distributed in-memory key-value storage systems, update operations are relatively frequent, esp...
详细信息
ISBN:
(数字)9798350381993
ISBN:
(纸本)9798350382006
Erasure codes have been widely applied to in-memory key-value storage systems for high reliability and low redundancy. In distributed in-memory key-value storage systems, update operations are relatively frequent, especially the partial-stripe update, which makes data update more challenging. Recently, existing research has been based on appending logs to accelerate parity data write. However, its logs are stored on disks, which decreases the system performance significantly. Therefore, we propose a novel in-memory key-value storage architecture, DNVPL, which utilizes NVRAM to log parity data. Our main idea is to design an appending-only update scheme to tradeoff the memory cost and the update overhead. We implement DNVPL with an in-memory key-value storage prototype, called LogKV. We evaluate it with different workloads. The experiments show that our scheme achieves high update performance from different metrics. Our scheme can reduce update latency by up to 49% and save storage space by 48% compared to the state-of-the-art schemes.
Regenerating codes are new network codes proposed to reduce the data required for fault repair, which can improve the recovery efficiency of faulty nodes in data storage systems. However, unlike Reed-Solomon code, whi...
Regenerating codes are new network codes proposed to reduce the data required for fault repair, which can improve the recovery efficiency of faulty nodes in data storage systems. However, unlike Reed-Solomon code, which repairs at the granularity of bytes, regenerating codes require data stored in large chunks but leads to severe read amplification, which reads out excess data when degraded read objects and increases degraded read time. That reflects a mutual constraint between improving recovery efficiency and degraded read performance, as manifested in the amplification of data reads, a vital issue considered in this *** solve this problem, we propose a new type of data layout — discrete geometric, which splits the object into a series of geometric sequences of data blocks. They are placed discretely into corresponding containers on the disks at different nodes, with containers of the same size made into a strip for encoding. The discrete characteristic ensures lower repair costs for degraded reads. The geometric characteristic ensures the repair performance of regenerating codes by large blocks, and read amplification can be mitigated through small blocks. To reduce IOPS for discrete geometric, we propose Discrete Geometric-Locally Regenerating Codes (DG-LRCs), guaranteeing lower degraded read latency while improving recovery *** results show that the degraded read time of DG-LRCs compared to regenerating codes combined with geometric partitioning is 22.56% lower at 2Gbpsand 60.56% lower at 4Gbps, and the recovery performance is 7.04 times better than that of RS code.
暂无评论