检索结果-内蒙古大学图书馆

arXiv 2024年

作者： Liu, Haozhe Zhang, Wentian Liu, Feng Wu, Haoqian Shen, Linlin The Computer Vision Institute College of Computer Science and Software Engineering SZU Branch Shenzhen Institute of Artificial Intelligence and Robotics for Society The National Engineering Laboratory for Big Data System Computing Technology The Guangdong Key Laboratory of Intelligent Information Processing Shenzhen University Shenzhen518060 China

The vulnerability of automated fingerprint recognition systems (AFRSs) to presentation attacks (PAs) promotes the vigorous development of PA detection (PAD) technology. However, PAD methods have been limited by information loss and poor generalization ability, resulting in new PA materials and fingerprint sensors. This paper thus proposes a global-local model-based PAD (RTK-PAD) method to overcome those limitations to some extent. The proposed method consists of three modules, called: 1) the global module;2) the local module;and 3) the rethinking module. By adopting the cut-out-based global module, a global spoofness score predicted from nonlocal features of the entire fingerprint images can be achieved. While by using the texture in-painting-based local module, a local spoofness score predicted from fingerprint patches is obtained. The two modules are not independent but connected through our proposed rethinking module by localizing two discriminative patches for the local module based on the global spoofness score. Finally, the fusion spoofness score by averaging the global and local spoofness scores is used for PAD. Our experimental results evaluated on LivDet 2017 show that the proposed RTK-PAD can achieve an average classification error (ACE) of 2.28% and a true detection rate (TDR) of 91.19% when the false detection rate (FDR) equals 1.0%, which significantly outperformed the state-of-the-art methods by ∼10% in terms of TDR (91.19% versus 80.74%). Copyright © 2024, The Authors. All rights reserved.

关键词： Supervised learning

来源：评论

学校读者我要写书评

暂无评论

Enhancing Ocean Scene Video Captioning with Multimodal Pre-Training and Video-Swin-Transformer

Enhancing Ocean Scene Video Captioning with Multimodal Pre-T...

引用

Annual Conference of Industrial Electronics Society

作者： Xinyu Chen Meng Zhao Fan Shi Meng'en Zhang Yu He Shengyong Chen Laboratory of Computer Vision and System of Ministry of Education School of Computer Science and Engineering Tianjin University of Technology Tianjin China Key Laboratory of Space Utilization Technology and Engineering Center for Space Utilization Chinese Academy of Sciences Beijing China

With the success of multimodal pre-training models in the video-language field and various downstream tasks, previous multimodal models used 3DCNN networks as video feature extractors, which have limitations in interacting and fusing with text features. This paper proposes a multimodal pre-training model that utilizes a Video-Swin-Transformer-based network to encode both video and text data, to achieve better performance in video understanding. The model consists of four modules: video encoder, text encoder, interact encoder, and caption decoder to accomplish the task of ocean scene video captioning. A dataset of ocean scene videos, including various content types such as sea surfaces and shores, is also constructed. The training process is divided into two stages: pre-training and fine-tuning. Pre-training is performed on the Howto100m dataset to allow the model to learn video captions in natural scenes and complete video-language matching tasks. The fine-tuning stage is then performed on the ocean1000 dataset to better understand the events and content in ocean scene videos and generate captions that conform to ocean scene video descriptions. The model achieves satisfying results on both the public dataset YouCook2 and the proprietary dataset Ocean1000, demonstrating its ability in video-text information fusion and interaction.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Deep Learning-Enabled ISAC-OTFS Pre-equalization Design for Aerial-Terrestrial Networks

arXiv

引用

arXiv 2024年

作者： Wang, Weihao Guo, Jing Wang, Siqiang Wang, Xinyi Yuan, Weijie Fei, Zesong the School of Information and Electronics Beijing Institute of Technology Beijing100081 China the School of System Design and Intelligent Manufacturing The Shenzhen Key Laboratory of Robotics and Computer Vision Southern University of Science and Technology Shenzhen518055 China

Orthogonal time frequency space (OTFS) modulation has been viewed as a promising technique for integrated sensing and communication (ISAC) systems and aerial-terrestrial networks, due to its delay-Doppler domain transmission property and strong Doppler-resistance capability. However, it also suffers from high processing complexity at the receiver. In this work, we propose a novel pre-equalization based ISAC-OTFS transmission framework, where the terrestrial base station (BS) executes pre-equalization based on its estimated channel state information (CSI). In particular, the mean square error of OTFS symbol demodulation and Cramér-Rao lower bound of sensing parameter estimation are derived, and their weighted sum is utilized as the metric for optimizing the pre-equalization matrix. To address the formulated problem while taking the time-varying CSI into consideration, a deep learning enabled channel prediction-based pre-equalization framework is proposed, where a parameter-level channel prediction module is utilized to decouple OTFS channel parameters, and a low-dimensional prediction network is leveraged to correct outdated CSI. A CSI processing module is then used to initialize the input of the pre-equalization module. Finally, a residual-structured deep neural network is cascaded to execute pre-equalization. Simulation results show that under the proposed framework, the demodulation complexity at the receiver as well as the pilot overhead for channel estimation, are significantly reduced, while the symbol detection performance approaches those of conventional minimum mean square error equalization and perfect CSI. Copyright © 2024, The Authors. All rights reserved.

关键词： Channel estimation

来源：评论

学校读者我要写书评

暂无评论

Parameter-Efficient Multi-Modal Tuning for Salient Object Detection

SSRN

引用

SSRN 2025年

作者： Zhang, Zixuan Shi, Fan Jia, Chen Wang, Mianzhao Louis, Assale Adje Cheng, Xu Engineering Research Center of Learning-Based Intelligent System Ministry of Education Tianjin University of Technology Tianjin300384 China Key Laboratory of Computer Vision and System Ministry of Education Tianjin University of Technology Tianjin300384 China Yamoussoukro999063 Ivory Coast Smart Innovation Norway Norway

Salient object detection (SOD) is a key preprocessing step in various computer vision tasks, aiming to replicate the human visual system to identify the most significant objects or regions in images or videos. However, existing multi-modal SOD works introduce expensive fusion modules with a large number of parameters to integrate cues across modalities. This cannot effectively reduce the resource overhead required for computing devices. To address this issue, we propose a parameter-efficient multi-modal tuning method for salient object detection. Our method employs a hierarchical progressive encoder with a limited set of learnable parameters to capture aligned and intrinsic representations. Based on the encoder, we further introduce a learnable weighting parameter λ to exploit semantic understanding across different levels and patterns. Extensive experiments on 11 benchmark datasets with 32 state-of-the-art (SOTA) methods demonstrate the effectiveness of our method, which requires only 4.4M training parameters. © 2025, The Authors. All rights reserved.

关键词： Object recognition

来源：评论

学校读者我要写书评

暂无评论

CA-Edit: Causality-Aware Condition Adapter for High-Fidelity Local Facial Attribute Editing

arXiv

引用

arXiv 2024年

作者： Xian, Xiaole He, Xilin Niu, Zenghao Zhang, Junliang Xie, Weicheng Song, Siyang Yu, Zitong Shen, Linlin Computer Vision Institute School of Computer Science & Software Engineering Shenzhen University China National Engineering Laboratory for Big Data System Computing Technology Shenzhen University China Guangdong Key Laboratory of Intelligent Information Processing China University of Exeter United Kingdom Great Bay University China

For efficient and high-fidelity local facial attribute editing, most existing editing methods either require additional finetuning for different editing effects or tend to affect beyond the editing regions. Alternatively, inpainting methods can edit the target image region while preserving external areas. However, current inpainting methods still suffer from the generation misalignment with facial attributes description and the loss of facial skin details. To address these challenges, (i) a novel data utilization strategy is introduced to construct datasets consisting of attribute-text-image triples from a data-driven perspective, (ii) a Causality-Aware Condition Adapter is proposed to enhance the contextual causality modeling of specific details, which encodes the skin details from the original image while preventing conflicts between these cues and textual conditions. In addition, a Skin Transition Frequency Guidance technique is introduced for the local modeling of contextual causality via sampling guidance driven by low-frequency alignment. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method in boosting both fidelity and editability for localized attribute editing. The code is available at https://***/connorxian/CA-Edit. © 2024, CC BY.

关键词：

来源：评论

学校读者我要写书评

暂无评论

HairDiffusion: Vivid Multi-Colored Hair Editing via Latent Diffusion

arXiv

引用

arXiv 2024年

作者： Zeng, Yu Zhang, Yang Liu, Jiachen Shen, Linlin Deng, Kaijun He, Weizhao Wang, Jinbao Computer Vision Institute School of Computer Science & Software Engineering Shenzhen University China Shenzhen Institute of Artificial Intelligence and Robotics for Society China National Engineering Laboratory for Big Data System Computing Technology Shenzhen University China Guangdong Provincial Key Laboratory of Intelligent Information Processing China

Hair editing is a critical image synthesis task that aims to edit hair color and hairstyle using text descriptions or reference images, while preserving irrelevant attributes (e.g., identity, background, cloth). Many existing methods are based on StyleGAN to address this task. However, due to the limited spatial distribution of StyleGAN, it struggles with multiple hair color editing and facial preservation. Considering the advancements in diffusion models, we utilize Latent Diffusion Models (LDMs) for hairstyle editing. Our approach introduces Multi-stage Hairstyle Blend (MHB), effectively separating control of hair color and hairstyle in diffusion latent space. Additionally, we train a warping module to align the hair color with the target region. To further enhance multi-color hairstyle editing, we fine-tuned a CLIP model using a multi-color hairstyle dataset. Our method not only tackles the complexity of multi-color hairstyles but also addresses the challenge of preserving original colors during diffusion editing. Extensive experiments showcase the superiority of our method in editing multi-color hairstyles while preserving facial attributes given textual descriptions and reference images. © 2024, CC BY.

关键词： Color image processing

来源：评论

学校读者我要写书评

暂无评论

Sparsity Aware of TF-IDF Matrix to Accelerate Oblivious Document Ranking and Retrieval

Sparsity Aware of TF-IDF Matrix to Accelerate Oblivious Docu...

引用

IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

作者： Zeshi Zhang Guangping Xu Hongzhang Yang Yulei Wu School of Computer Science and Engineering Tianjin University of Technology Tianjin China Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology Tianjin China Ministry of Education Key Laboratory of Computer Vision and System China Faculty of Engineering University of Bristol Bristol U.K.

ISBN: (数字)9798350381993

ISBN: (纸本)9798350382006

Due to cloud security concerns, there is an increasing interest in information retrieval systems that can support private queries over public documents. It is desirable for oblivious document ranking and retrieval in public cloud at lower cost and faster speed without revealing query-related information. Currently, the term frequency-inverse document frequency (TF-IDF) and private information retrieval (PIR) techniques are used to solve this problem, but the encryption operation time is over dominant. Motivated by the observation of the sparsity of the TF-IDF matrix, we propose an efficient approach for oblivious document ranking and retrieval, called E-Coeus. It takes advantage of the high sparsity of the TF-IDF matrix to rearrange the matrix. Our method accelerates the speed of PIR inadvertently retrieving documents and reduces the user retrieval delay time. In a stand-alone experiment for a TF-IDF matrix of 1.2M rows and 64K columns with the sparsity of 10%, E-Coeus improves the document ranking and retrieval performance by 23% over the state-of-the-art approach, Coeus. With cluster of 64 machines, E-Coeus improves the performance by 34% over Coeus when the TF-IDF matrix sparsity is 30%.

关键词： Time-frequency analysis Privacy Costs Cloud computing security Information retrieval Encryption Delays

来源：评论

学校读者我要写书评

暂无评论

Deep learning-based activity recognition and fine motor identification using 2D skeletons of cynomolgus monkeys

引用

Zoological Research 2023年第5期44卷 967-980页

作者： Chuxi Li Zifan Xiao Yerong Li Zhinan Chen Xun Ji Yiqun Liu Shufei Feng Zhen Zhang Kaiming Zhang Jianfeng Feng Trevor W.Robbins Shisheng Xiong Yongchang Chen Xiao Xiao School of Information Science and Technology Micro Nano System Center Fudan UniversityShanghai 200433China Department of Anesthesiology Huashan HospitalKey Laboratory of Computational Neuroscience and Brain-Inspired IntelligenceMinistry of EducationBehavioral and Cognitive Neuroscience CenterInstitute of Science and Technology for Brain-Inspired IntelligenceMOE Frontiers Center for Brain ScienceFudan UniversityShanghai 200433China Kuang Yaming Honors School Nanjing UniversityNanjingJiangsu 210023China Shanghai Key Laboratory of Intelligent Information Processing School of Computer ScienceFudan UniversityShanghai 200433China State Key Laboratory of Primate Biomedical Research Institute of Primate Translational MedicineKunming University of Science and TechnologyKunmingYunnan 650500China New Vision World LLC. Aliso ViejoCalifornia 92656USA Behavioural and Clinical Neuroscience Institute University of CambridgeCambridgeCB21TNUK

Video-based action recognition is becoming a vital tool in clinical research and neuroscientific study for disorder detection and ***,action recognition currently used in non-human primate(NHP)research relies heavily on intense manual labor and lacks standardized *** this work,we established two standard benchmark datasets of NHPs in the laboratory:Monkeyin Lab(Mi L),which includes 13 categories of actions and postures,and MiL2D,which includes sequences of two-dimensional(2D)skeleton ***,based on recent methodological advances in deep learning and skeleton visualization,we introduced the Monkey Monitor Kit(Mon Kit)toolbox for automatic action recognition,posture estimation,and identification of fine motor activity in *** the datasets and Mon Kit,we evaluated the daily behaviors of wild-type cynomolgus monkeys within their home cages and experimental environments and compared these observations with the behaviors exhibited by cynomolgus monkeys possessing mutations in the MECP2 gene as a disease model of Rett syndrome(RTT).Mon Kit was used to assess motor function,stereotyped behaviors,and depressive phenotypes,with the outcomes compared with human manual *** Kit established consistent criteria for identifying behavior in NHPs with high accuracy and efficiency,thus providing a novel and comprehensive tool for assessing phenotypic behavior in monkeys.

关键词： Action recognition Fine motor identification Two-stream deep model 2D skeleton Non-human primates Rett syndrome

来源：评论

学校读者我要写书评

暂无评论

Towards Survivable In-Memory Stores with Parity Coded NVRAM

Towards Survivable In-Memory Stores with Parity Coded NVRAM

引用

IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

作者： Zhixuan Wang Guangping Xu Hongzhang Yang Yulei Wu School of Computer Science and Engineering Tianjin University of Technology Tianjin China Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology Tianjin China Key Laboratory of Computer Vision and System Ministry of Education China Faculty of Engineering University of Bristol Bristol U.K.

ISBN: (数字)9798350381993

ISBN: (纸本)9798350382006

Erasure codes have been widely applied to in-memory key-value storage systems for high reliability and low redundancy. In distributed in-memory key-value storage systems, update operations are relatively frequent, especially the partial-stripe update, which makes data update more challenging. Recently, existing research has been based on appending logs to accelerate parity data write. However, its logs are stored on disks, which decreases the system performance significantly. Therefore, we propose a novel in-memory key-value storage architecture, DNVPL, which utilizes NVRAM to log parity data. Our main idea is to design an appending-only update scheme to tradeoff the memory cost and the update overhead. We implement DNVPL with an in-memory key-value storage prototype, called LogKV. We evaluate it with different workloads. The experiments show that our scheme achieves high update performance from different metrics. Our scheme can reduce update latency by up to 49% and save storage space by 48% compared to the state-of-the-art schemes.

关键词： Privacy Nonvolatile memory system performance Redundancy Random access memory Prototypes computer architecture

来源：评论

学校读者我要写书评

暂无评论

Discrete Geometric Coded Data Layout for Large-scale Object Storage systems

Discrete Geometric Coded Data Layout for Large-scale Object ...

引用

IEEE International Conference on Big Data and Cloud Computing (BdCloud)

作者： Yi Tian Guangping Xu Hongzhang Yang Yue Ni JiaXin Cao Lei Yang School of Computer Science and Engineering Tianjin University of Technology Tianjin China Tianjin Key Laboratory of Intelligence Computing Novel Software Technology Tianjin China Key Laboratory of Computer Vision and System Ministry of Education China Roycom (Tianjin) Information Technology Co. Ltd.

Regenerating codes are new network codes proposed to reduce the data required for fault repair, which can improve the recovery efficiency of faulty nodes in data storage systems. However, unlike Reed-Solomon code, which repairs at the granularity of bytes, regenerating codes require data stored in large chunks but leads to severe read amplification, which reads out excess data when degraded read objects and increases degraded read time. That reflects a mutual constraint between improving recovery efficiency and degraded read performance, as manifested in the amplification of data reads, a vital issue considered in this *** solve this problem, we propose a new type of data layout — discrete geometric, which splits the object into a series of geometric sequences of data blocks. They are placed discretely into corresponding containers on the disks at different nodes, with containers of the same size made into a strip for encoding. The discrete characteristic ensures lower repair costs for degraded reads. The geometric characteristic ensures the repair performance of regenerating codes by large blocks, and read amplification can be mitigated through small blocks. To reduce IOPS for discrete geometric, we propose Discrete Geometric-Locally Regenerating Codes (DG-LRCs), guaranteeing lower degraded read latency while improving recovery *** results show that the degraded read time of DG-LRCs compared to regenerating codes combined with geometric partitioning is 22.56% lower at 2Gbpsand 60.56% lower at 4Gbps, and the recovery performance is 7.04 times better than that of RS code.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：