检索结果-内蒙古大学图书馆

International Conference on Machine Learning and Applications (ICMLA)

作者： Joel W. Reed Yu Jiao Thomas E. Potok Brian A. Klump Mark T. Elmore Ali R. Hurson Applied Software Engineering Research Group Computational Sciences and Engineering Division Oak Ridge National Laboratory Oak Ridge TN USA Computer Science and Engineering Department Pennsylvania State University University Park PA USA

In this paper, we propose a new term weighting scheme called term frequency-inverse corpus frequency (TF-ICF). It does not require term frequency information from other documents within the document collection and thus, it enables us to generate the document vectors of N streaming documents in linear time. In the context of a machine learning application, unsupervised document clustering, we evaluated the effectiveness of the proposed approach in comparison to five widely used term weighting schemes through extensive experimentation. Our results show that TF-ICF can produce document clusters that are of comparable quality as those generated by the widely recognized term weighting schemes and it is significantly faster than those methods

关键词： Computational complexity Data engineering Frequency conversion Machine learning Software engineering Laboratories Computer science Vectors parallel algorithms Information filtering

来源：评论

学校读者我要写书评

暂无评论

Detecting Distributed Scans Using High-Performance Query-Driven Visualization

Detecting Distributed Scans Using High-Performance Query-Dri...

引用

Supercomputing Conference

作者： Kurt Stockinger E. Wes Bethel Scott Campbell Eli Dart Kesheng Wu Computational Research Division Lawrence Berkeley National Laboratory University of California Berkeley CA USA National Energy Research Scientific Computing Center Division Lawrence Berkeley National Laboratory University of California Berkeley CA USA Energy Sciences Network Lawrence Berkeley National Laboratory University of California Berkeley CA USA

Modern forensic analytics applications, like network traffic analysis, perform high-performance hypothesis testing, knowledge discovery and data mining on very large datasets. One essential strategy to reduce the time required for these operations is to select only the most relevant data records for a given computation. In this paper, we present a set of parallel algorithms that demonstrate how an efficient selection mechanism - bitmap indexing - significantly speeds up a common analysis task, namely, computing conditional histogram on very large datasets. We present a thorough study of the performance characteristics of the parallel conditional histogram algorithms. As a case study, we compute conditional histograms for detecting distributed scans hidden in a dataset consisting of approximately 2.5 billion network connection records. We show that these conditional histograms can be computed on interactive time scale (i.e., in seconds). We also show how to progressively modify the selection criteria to narrow the analysis and find the sources of the distributed scans

关键词： Histograms Performance analysis Data visualization Forensics Telecommunication traffic Performance evaluation Testing Data mining parallel algorithms Indexing

来源：评论

学校读者我要写书评

暂无评论

New parallel algorithms for frequent itemset mining in very large databases

New parallel algorithms for frequent itemset mining in very ...

引用

International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

作者： A. Veloso W. Meira Srinivasan Parthasarathy Computer Science Department Universidade Federal de Minas Gerais Brazil Computer and Information Science Department Ohio State Uinversity USA

Frequent itemset mining is a classic problem in data mining. It is a nonsupervised process which concerns in finding frequent patterns (or itemsets) hidden in large volumes of data in order to produce compact summaries or models of the database. These models are typically used to generate association rules, but recently they have also been used in far reaching domains like e-commerce and bio-informatics. Because databases are increasing in terms of both dimension (number of attributes) and size (number of records), one of the main issues in a frequent itemset mining algorithm is the ability to analyze very large databases. Sequential algorithms do not have this ability, especially in terms of run-time performance, for such very large databases. Therefore, we must rely on high performance parallel and distributed computing. We present new parallel algorithms for frequent itemset mining. Their efficiency is proven through a series of experiments on different parallel environments, that range from shared-memory multiprocessors machines to a set of SMP clusters connected together through a high speed network. We also briefly discuss an application of our algorithms to the analysis of large databases collected by a Brazilian Web portal.

关键词： parallel algorithms Itemsets Data mining Databases Data analysis Algorithm design and analysis Association rules Runtime Distributed computing High-speed networks

来源：评论

学校读者我要写书评

暂无评论

A Framework for Adaptive Communication Modeling on Heterogeneous Hierarchical Clusters

A Framework for Adaptive Communication Modeling on Heterogen...

引用

IEEE International Conference on Cluster Computing

作者： Wahid Nasri Hajer Hamad Hadhemi Fejjari Département d'Informatique ESSTT Tunis Tunisia

Today, due to the wide variety of existing parallel systems consisting on collections of heterogeneous machines, it is very difficult for a user to solve a target problem by using a single algorithm or to write portable programs that perform well on multiple computational supports. The inherent heterogeneity and the diversity of networks of such environments represent a great challenge to model the communications for high performance computing applications. Our objective within this work is to propose a generic framework based on communication models and adaptive techniques for dealing with prediction of communication performances on based-clusters hierarchical platforms. Toward this goal, we introduce the concept of Poly-model of communications that corresponds to techniques to better model the communications in terms of the characteristics of the hardware resources of the target parallel system. We apply this methodology on collective communication operations and show that the framework provides significant performances while determining the best combination model-algorithm depending on the problem and architecture parameters

关键词： High performance computing Predictive models Concurrent computing Computer architecture parallel algorithms Scalability Adaptive algorithm Portable computers Hardware Costs

来源：评论

学校读者我要写书评

暂无评论

parallel three-list algorithm for the knapsack problem without memory conflicts

引用

Jisuanji Xuebao/Chinese Journal of Computers 2006年第2期29卷 345-352页

作者： Li, Ken-Li Li, Ren-Fa Li, Qing-Hua School of Computer and Communication Hunan University Changsha 410082 China Department of Computer Science University of Illinois at Urbana-Champaign Champaign IL 61801 United States School of Computer Science and Technology Huazhong University of Science and Technology Wuhan 430074 China

The knapsack problem is a famous NP-hard problem, the solution for which usually requires not only exponential time, but also exponential space. It is for this cause that it is very important in cryptosystem and number theory. Based on the two-list algorithm and the parallel three-list algorithm, this paper proposes a parallel three-list algorithm for the solution of knapsack problem. To avoid the possible memory conflicts, the method of dividing and conquering, and parallel merging without memory conflicts are adopted in the algorithm. The proposed algorithm needs O(23n/8) time when O(23n/8) shared memory units and O(2n/4) processors are available for the objectivity to find a solution for a n-element knapsack instance. The comparison of the proposed algorithm with the past researches show that it is the first EREW parallel algorithm that can solve the knapsack instances in less than O(2n/2) time when available hardware resource is also less than O(2n/2), and thus it is an improved result over the past researches, and may have some impact on the researches on the knapsack based public-key cryptosystem.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Total airport and airspace model (TAAM) parallelization combining sequential and parallel algorithms for performance enhancement

Total airport and airspace model (TAAM) parallelization comb...

引用

Simulation Winter Conference

作者： Sood Wieland Center for Advanced Aviation System Development MITRE Corporation McLean VA USA

This paper describes how to achieve a desired speedup by careful selection of appropriate algorithms for parallelization. Our target simulation is the total airport and airspace model (TAAM), a worldwide standard for aviation analysis. TAAM is designed as a sequential program, and we have increased its speed by incorporating multithreaded algorithms with minimal changes to the underlying simulation architecture. Our method was to identify algorithms that are bottlenecks in the computation and that can be executed concurrently, producing a hybrid sequential and parallel simulation. Our results show a performance gain that varied between 14% and 33%.

关键词： Airports parallel algorithms Concurrent computing parallel processing Computational modeling Aircraft Algorithm design and analysis Detection algorithms Analytical models Computer architecture

来源：评论

学校读者我要写书评

暂无评论

Optimal Control of Network Services Based on Generalized Particle Model

Optimal Control of Network Services Based on Generalized Par...

引用

International Conference on Services Systems and Services Management, ICSSSM

作者： Dianxun Shuai Yuming Dong Qing Shuai East China University of Science and Technology China Qingdao Technological University China Huazhong University of Science and Technology China

The bandwidth allocation problem in ATM networks is NP-complete. This paper presents a novel generalized particle approach (GPA) to optimize the bandwidth allocation and QoS parameter for ATM networks. The GPA transforms the optimization of ATM networks into a kinematics and dynamics of numerous particles in a force-field. The GPA has many advantages in terms of the higher parallelism, multi-objective optimization, multi-type coordination, and easiness for hardware implementation. During the ATM networks optimization, the GPA may deal with a variety of random and emergent phenomena, such as the congestion, failure, and interaction. This paper also gives the GPA's properties regarding its correctness, convergency and stability. The simulations have shown the effectiveness and suitability of the GPA to the optimization of ATM networks

关键词： Optimal control Asynchronous transfer mode Channel allocation Quality of service Bandwidth Kinematics B-ISDN Telecommunication traffic Resource management parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel implementation and computing efficiency analysis for NAPA code

引用

Nanjing Hangkong Hangtian Daxue Xuebao/Journal of Nanjing University of Aeronautics and Astronautics 2006年第4期38卷 413-418页

作者： Jin, Jun Liang, Dewang Huang, Guoping Lei, Yubing College of Energy and Power Engineering Nanjing University of Aeronautics and Astronautics Nanjing 210016 China

The development and the validation of a parallel finite difference Navier-Stokes solver called the NAPA code for unsteady, three-dimensional flow simulations on workstation clusters are described. The solver is parallelized by divided-zone technique and message passing interface (MPI) communication library and validated for flow problems in hypersonic inlet, around micro aviation vehicle and NACA0012 airfoil. Results of the NACA0012 problem by original and the parallelized NAPA code are agreement with the experimental data. All of the comparison show that the parallel implementation of NAPA code is successful. Experiments identify the efficiency and speedups of the parallel code. Results show that the parallel code has good performance. Influencing factors, such as the ratio of communication, load balance and communication mode are discussed. Finally, the code is tuned with Intel Cluster tools and Vtune and the algorithm with the low efficiency in the code is improved. The calculation time for the hypersonic flow is reduced by 55.33%.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Comparison of different parallel modified Gram-Schmidt algorithms

Comparison of different parallel modified Gram-Schmidt algor...

引用

11th International Euro-Par Conference

作者： Rünger, G Schwind, M Tech Univ Chemnitz Dept Comp Sci D-09107 Chemnitz Germany

ISBN: (纸本)3540287000

The modified Gram-Schmidt algorithm (MGS) is used in many fields of computational science as a basic part for problems which relate to Numerical Linear Algebra. In this paper we describe different parallel implementations (blocked and unblocked) of the MGS-algorithm and show how computation and calculation overlap can increase the performance up to 38 percent on the two different Clusters platforms which where used for performance evaluation.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

An Empirical Analysis of parallel Random Permutation algorithms on SMPs 18

An Empirical Analysis of Parallel Random Permutation Algorit...

引用

18th International Conference on parallel and Distributed Computing Systems, PDCS 2005

作者： Cong, Guojing Bader, David A. T.J. Watson Research Center IBM Yorktown HeightsNY10598 United States College of Computing Georgia Institute of Technology AtlantaGA30332 United States

ISBN: (纸本)9781604234565

We compare parallel algorithms for random permutation generation on symmetric multiprocessors (SMPs). algorithms considered are the sorting-based algorithm, Anderson’s shuffling algorithm, the dart-throwing algorithm, and Sanders’ algorithm. We investigate the impact of synchronization method, memory access pattern, cost of generating random numbers and other parameters on the performance of the algorithms. Within the range of inputs used and processors employed, Anderson’s algorithm is preferable due to its simplicity when random number generation is relatively costly, while Sanders’ algorithm has superior performance due to good cache performance when a fast random number generator is available. There is no definite winner across all settings. In fact we predict our new dart-throwing algorithm performs best when synchronization among processors becomes costly and memory access is relatively fast. We also compare the performance of our parallel implementations with the sequential implementation. It is unclear without extensive experimental studies whether fast parallel algorithms beat efficient sequential algorithms due to mismatch between model and architecture. Our implementations achieve speedups up to 6 with 12 processors on the Sun E4500. © 2005 18th ISCA International Conference on parallel and Distributed Computing Systems 2005, PDCS 2005. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：