检索结果-内蒙古大学图书馆

International Conference on Tools for Artificial Intelligence (ICTAI)

作者： Zhenliang Guo Zhen Huang Yong Dou Xiubin Yu Sijie Wang Zhongwu Chen Xinxin Su Xiaohang Liu National Key Laboratory of Parallel and Distributed Processing National University of Defense Technology Changsha China

Discourse structure analysis has shown to be useful for many artificial intelligence (AI) tasks such as text sum-marization and text categorization. However, for the Chinese news domain, the discourse structure analysis system is still immature due to the limitation of the lack of expert-annotated datasets. In this paper, we present CNA, a Chinese news corpus containing 1155 news articles annotated by human experts, which covers four domains and four news media sources. Next, we implement several text classification methods as baselines. Experimental results demonstrate that document-level method can achieve a better performance, and we further propose a document-level neural network model with multiple sentence features which achieves the state-of-the-art performance. In the end, we analyze the content type distribution of each sentence in CNA and the prediction errors of our model that occurred on the test set. The codes and dataset will be open-sourced at https://***/gzl98/Chinese_Discourse_Profiling.

关键词： Analytical models Codes Text categorization Neural networks Predictive models Media Artificial intelligence

来源：评论

学校读者我要写书评

暂无评论

Offline Imitation Learning Using Reward-free Exploratory Data 22

Offline Imitation Learning Using Reward-free Exploratory Dat...

引用

Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence

作者： Hao Wang Dawei Feng Bo Ding Wei Li College of Computer National University of Defense Technology China National Laboratory for Parallel and Distributed Processing National University of Defense Technology China Independent Researcher China

ISBN: (纸本)9781450398336

Offline imitative learning(OIL) is often used to solve complex continuous decision-making tasks. For these tasks such as robot control, automatic driving and etc., it is either difficult to design an effective reward for learning or very expensive and time-consuming for agents to collect data interactively with the environment. However, the data used in previous OIL methods are all gathered by reinforcement learning algorithms guided by task-specific rewards, which is not a true reward-free premise and still suffers from the problem of designing an effective reward function in real tasks. To this end, we propose the reward-free exploratory data driven offline imitation learning (ExDOIL) framework. ExDOIL first trains an unsupervised reinforcement learning agent by interacting with the environment, and collects enough unsupervised exploration data during training; Then, a task independent yet simple and efficient reward function is used to relabel the collected data; Finally, an agent is trained to imitate the expert to complete the task through a conventional RL algorithm such as TD3. Extensive experiments on continuous control tasks demonstrate that the proposed framework can achieve better imitation performance(28% higher episode returns on average) comparing with previous SOTA method(ORIL) without any task-specific rewards.

关键词： dataset

来源：评论

学校读者我要写书评

暂无评论

Quality evaluation of airfoil hybrid mesh based on graph neural network 19th

Quality evaluation of airfoil hybrid mesh based on graph neu...

引用

19th Chinese Intelligent Systems Conference, CISC 2023

作者： Wang, Huaiqing Pang, Yufei Xiao, Sumei Wang, Zhichao School of Manufacturing Science and Engineering Southwest University of Science and Technology Mianyang621010 China Computational Aerodynamics Istitute China Aerodynamics Research and Development Center Mianyang621010 China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha410073 China

ISBN: (纸本)9789819968817

In airfoil numerical simulation, the mesh quality has an important influence on the accuracy and error of numerical simulation. The existing mesh quality evaluation requires a lot of manual interaction, which greatly reduces the efficiency of mesh generation and necessitates the implementation of intelligent mesh evaluation methods. Graph neural networks can extract features from graph data, possess self-adaptability and generalization ability, and have been successfully applied in many industries. In this paper, we propose a deep graph neural network, SDeepNet, to evaluate mesh quality and construct a large-scale mixed mesh dataset, MixSet, for training and validating the model. We test and compare the performance of the mesh quality evaluation models GridNet, GMeshNet, and SDeepNet on the mesh dataset MixSet. The experimental results show that the SDeepNet model can achieve high accuracy and recall in the mixed mesh quality evaluation task. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023.

关键词： Mesh generation

来源：评论

学校读者我要写书评

暂无评论

Sparse Matrix Reordering Method Selection with parallel Computing and Deep Learning

Sparse Matrix Reordering Method Selection with Parallel Comp...

引用

International Joint Conference on Neural Networks (IJCNN)

作者： Rui Xia Jihu Guo Huajian Zhang Shun Yang Qinglin Wang Jie Liu College of Computer Science and Techonology National University of Defense Technology Changsha China Laboratory of Digitizing Software for Frontier Equipment National University of Defense Technology Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology

ISBN: (数字)9798350359312

ISBN: (纸本)9798350359329

Sparse matrix reordering is an important step in Cholesky decomposition. By reordering the rows and columns of the matrix, the time of computation and storage cost can be greatly reduced. With the proposal of various reordering algorithms, the selection of suitable reordering methods for various matrices has become an important research topic. In this paper, we propose a method to predict the optimal reordering method by visualizing sparse matrices in chunks in a parallel manner and feeding them into a deep convolutional neural network. The results show that the theoretical performance can reach 95% of the optimal performance, the prediction accuracy of the method can reach up to 85%, the parallel framework achieves an average speedup ratio of 11.35 times over the serial framework, and the performance is greatly improved compared with the traversal selection method on large sparse matrices.

关键词： Deep learning Visualization Accuracy Costs Neural networks parallel processing Prediction algorithms

来源：评论

学校读者我要写书评

暂无评论

Feature and Performance Comparison of FaaS Platforms 14

Feature and Performance Comparison of FaaS Platforms

引用

14th IEEE International Conference on Software Engineering and Service Science, ICSESS 2023

作者： Ma, Penghui Shi, Peichang Yi, Guodong College of Computer Science National University of Defense Technology National Key Laboratory of Parallel and Distributed Processing Changsha410073 China College of Computer Science National University of Defense Technology Key Laboratory of Software Engineering for Complex Systems Changsha410073 China Xiangjiang Lab Changsha410073 China School of Advanced Interdisciplinary Studies Hunan University of Technology and Business Changsha410073 China

ISBN: (纸本)9798350336269

With serverless computing offering more efficient and cost-effective application deployment, the diversity of serverless platforms presents challenges to users, including platform lock-in and costly migration. Moreover, due to the black box nature of function computing, traditional performance benchmarking methods are not applicable, necessitating new studies. This article presents a detailed comparison of six major public cloud function computing platforms and introduces a benchmarking framework for function computing performance. This framework aims to help users make comprehensive comparisons and select the most suitable platform for their specific needs. © 2023 IEEE.

关键词： Benchmarking

来源：评论

学校读者我要写书评

暂无评论

YOLOv5_CSL_F: YOLOv5's Loss Improvement and Attention Mechanism Application for Remote Sensing Image Object Detection 3

YOLOv5_CSL_F: YOLOv5's Loss Improvement and Attention Mechan...

引用

3rd International Conference on Wireless Communications and Smart Grid, ICWCSG 2021

作者： Wang, Junhua Xiao, Tao Gu, Qinyi Chen, Qian National University of Defense Technology Laboratory of Parallel and Distributed Processing Changsha China No.61646 Troop Department of Information Beijing China

ISBN: (纸本)9781665425988

With the continuous improvement of the resolution of satellite remote sensing images and aerial remote sensing images, more and more useful data and information are obtained from remote sensing images. At the same time, compared with ordinary images, remote sensing images have the characteristics of variable directions, unbalanced categories, complex backgrounds, and difficult detection of small objects. All of these make remote sensing image object detection very challenging. In this paper, based on the deep learning framework and the YOLOv5 object detection algorithm, according to the characteristics of remote sensing images, adopting Circular Smooth Label (CSL) [1] to calculate the loss of the rotating object detection bounding box and introducing the FcaNet [2] attention mechanism to design new feature fusion modules, we propose the remote sensing image object detection algorithm YOLOv5_CSL_F. We tested the algorithm model on the DOTA dataset. Compared with the detection performance of the original YOLOv5 algorithm, our algorithm improves the detection accuracy by 0.6%. © 2021 IEEE.

关键词： Object detection

来源：评论

学校读者我要写书评

暂无评论

ST-PINN: A Self-Training Physics-Informed Neural Network for Partial Differential Equations

ST-PINN: A Self-Training Physics-Informed Neural Network for...

引用

International Joint Conference on Neural Networks (IJCNN)

作者： Junjun Yan Xinhai Chen Zhichao Wang Enqiang Zhoui Jie Liu Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China Laboratory of Digitizing Software for Frontier Equipment National University of Defense Technology Changsha China

Partial differential equations (PDEs) are an essential computational kernel in physics and engineering. With the advance of deep learning, physics-informed neural networks (PINNs), as a mesh-free method, have shown great potential for fast PDE solving in various applications. To address the issue of low accuracy and convergence problems of existing PINNs, we propose a self-training physics-informed neural network, ST-PINN. Specifically, ST-PINN introduces a pseudo label based self-learning algorithm during training. It employs governing equation as the pseudo-labeled evaluation index and selects the highest confidence examples from the sample points to attach the pseudo labels. To our best knowledge, we are the first to incorporate a self-training mechanism into physics-informed learning. We conduct experiments on five PDE problems in different fields and scenarios. The results demonstrate that the proposed method allows the network to learn more physical information and benefit convergence. The ST-PINN outperforms existing physics-informed neural network methods and improves the accuracy by a factor of 1.33x-2.54x.

关键词：

来源：评论

学校读者我要写书评

暂无评论

parallel Implementation of SHA256 on Multizone Heterogeneous Systems

Parallel Implementation of SHA256 on Multizone Heterogeneous...

引用

IEEE International Conference on Big Data and Cloud Computing (BdCloud)

作者： Yongtao Luo Jie Liu Tiaojie Xiao Chunye Gong Science and Technology on Parallel and Distributed Processing Laboratory Laboratory of Digitizing program for Frontier Equipment National University of Defense Technology Changsha China National Supercomputer Center in Tianjin Tianjin China

SHA-256 plays an important role in widely used applications, such as data security, data integrity, digital signatures, and cryptocurrencies. However, most of the current optimized implementations of SHA-256 are based on CPUs or dedicated hardware, such as ASICs and FPGAs. Consequently, there is a need to explore whether new heterogeneous parallel framework can improve the computational performance of the hash function. To address this issue, we conducted a study on the MT-3000 platform, which is a special architecture processor for the next-generation exascale prototype supercomputer. We proposed MT-SHA256, a heterogeneous multistage parallel implementation for hashing multiple messages on the MT-3000. Combining the architectural features of this processor, we developed an effective solution that significantly improved the computational performance of SHA-256. As a result, MT-SHA256 achieved a maximum throughput of 1045.68 MB/s on a single acceleration core of MT-3000. This is 9.84x higher than the C code implementation on one CPU core of MT-3000. We also performed a scalability test and found that MT-SHA256 achieved a throughput of 98.04 GB/s on a computing node, and extended to 512 nodes (2048 acceleration clusters) on this system with good scalability.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Bi-Objective Scheduling Algorithm for Hybrid Workflow in JointCloud

Bi-Objective Scheduling Algorithm for Hybrid Workflow in Joi...

引用

IEEE International Conference on Joint Cloud Computing (JCC)

作者： Rui Li Huaimin Wang Peichang Shi National Key Laboratory of Parallel and Distributed Processing College of Computer Science National University of Defense Technology Changsha China State Key Laboratory of Complex & Critica Software Environment College of Computer Science National University of Defense Technology Changsha China

ISBN: (数字)9798350387339

ISBN: (纸本)9798350387346

Big data workflows are widely used in IoT, recommended systems, and real-time vision applications, and they continue to grow in complexity. These hybrid workflows consist of both resource-intensive batch jobs and latency-sensitive stream jobs. Examples include the data analytics workflow, which incorporates batch data transformations and low-latency querying, and the machine learning workflow, which processes stream data feature extraction before performing batch training and low-latency inference. However, existing research on workflow scheduling primarily focuses on either stream or batch workflows, neglecting the efficient scheduling of hybrid workflows that respect their diverse resource requirements and the costly data transfers between *** this article, we propose a hybrid workflow model that defines the optimal placement of hybrid workflows (OHWP) as a bi-objective optimization problem. Our proposed model takes into account parameters related to inter-communication between stream and batch jobs, as well as the heterogeneous resources in JointCloud environment. Additionally, we present OHWP-PS (OHWP on a Pruned Space), a scheduling algorithm for hybrid workflows that minimizes both cost and latency by improving the initial population and dynamically updating the search space. The results demonstrate that the proposed OHWP-PS algorithm is effective and competitive across all experiments.

关键词： Training Scheduling algorithms Machine vision Heuristic algorithms Machine learning Feature extraction Remote working

来源：评论

学校读者我要写书评

暂无评论

Merak: An Efficient distributed DNN Training Framework with Automated 3D parallelism for Giant Foundation Models

arXiv

引用

arXiv 2022年

作者： Lai, Zhiquan Li, Shengwei Tang, Xudong Ge, Keshi Liu, Weijie Duan, Yabo Qiao, Linbo Li, Dongsheng The National Laboratory for Parallel and Distributed Processing College of Computer National University of Defense Technology in Changsha Hunan China

Foundation models are in the process of becoming the dominant deep learning technology. Pretraining a foundation model is always time-consuming due to the large scale of both the model parameter and training dataset. Besides being computing-intensive, the pretraining process is extremely memory- and communication-intensive. These challenges make it necessary to apply 3D parallelism, which integrates data parallelism, pipeline model parallelism, and tensor model parallelism, to achieve high training efficiency. However, current 3D parallelism frameworks still encounter two issues: i) they are not transparent to model developers, requiring manual model modification to parallelize training, and ii) their utilization of computation resources, GPU memory, and network bandwidth is insufficient. We propose Merak, an automated 3D parallelism deep learning training framework with high resource utilization. Merak automatically deploys 3D parallelism with an automatic model partitioner, which includes a graph-sharding algorithm and proxy node-based model graph. Merak also offers a non-intrusive API to scale out foundation model training with minimal code modification. In addition, we design a high-performance 3D parallel runtime engine that employs several techniques to exploit available training resources, including a shifted critical path pipeline schedule that increases computation utilization, stage-aware recomputation that makes use of idle worker memory, and sub-pipelined tensor model parallelism that overlaps communication and computation. Experiments on 64 GPUs demonstrate Merak's capability to speed up training performance over state-of-the-art 3D parallelism frameworks of models with 1.5, 2.5, 8.3, and 20 billion parameters by up to 1.42, 1.39, 1.43, and 1.61×, respectively. The code for Merak has been open-sourced at https://***/hpdl-group/Merak. Copyright © 2022, The Authors. All rights reserved.

关键词： Pipelines

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：