检索结果-内蒙古大学图书馆

Multiple pass streaming algorithms for learning mixtures of distributions in R^d

THEORETICAL COMPUTER SCIENCE 2009年第19期410卷 1765-1780页

作者： Chang, Kevin L. Yahool Inc Yahool Labs Sunnyvale CA 94089 USA

We present a multiple pass streaming algorithm for learning the density function of a mixture of k uniform distributions over rectangles in R-d, for any d > 0. Our learning model is: samples drawn according to the mixture are placed in arbitrary order in a data stream that may only be accessed sequentially by an algorithm with a very limited random access memory space. Our algorithm makes 2l + 2 passes, for any l > 0, and requires memory at most (O) over tilde(epsilon(-2/l) k(2)d(4) + (4k)(d)), where E is the tolerable error of the algorithm. This exhibits a strong memory-pass tradeoff in terms of E: a few more passes significantly lower its memory requirements, thus trading one of the two most important resources in streaming computation for the other. Chang and Karman first considered this problem for of d = 1, 2. Our learning algorithm is especially appropriate for situations where massive data sets of samples are available, but computation with such large inputs requires very restricted models of computation. (C) 2009 Elsevier B.V. All rights reserved.

关键词： streaming algorithm Machine learning Mixture model Computational learning theory

来源：评论

学校读者我要写书评

暂无评论

Multiple pass streaming algorithms for learning mixtures of distributions in R^d

Multiple pass streaming algorithms for learning mixtures of ...

引用

18th International Conference on algorithmic Learning Theory

作者： Chang, Kevin L. Yahool Inc Yahool Labs Sunnyvale CA 94089 USA

ISBN: (纸本)9783540752240

关键词： streaming algorithm Machine learning Mixture model Computational learning theory

来源：评论

学校读者我要写书评

暂无评论

Development of an HL7 interface engine, based on tree structure and streaming algorithm, for large-size messages which include image

引用

COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2005年第2期80卷 126-140页

作者： Um, KS Kwak, YS Cho, H Kim, IK NCI Ctr Bioinformat NIH Rockville MD 20852 USA Kyungpook Natl Univ Sch Med Dept Med Informat Taegu 700422 South Korea

A basic assumption of Health Level Seven (HL7) protocol is 'No limitation of message length'. However, most existing commercial HL7 interface engines do limit message length because they use the string array method, which is run in the main memory for the HL7 message parsing process. Specifically, messages with image and multi-media data create a tong string array and thus cause the computer system to raise critical and fatal problem. Consequently, HL7 messages cannot handle the image and multi-media data necessary in modern medical records. This study aims to solve this problem with the 'streaming algorithm' method. This new method for HL7 message parsing applies the character-stream object which process character by character between the main memory and. hard disk device with the consequence that the processing load on main memory could be alleviated. The main functions of this new engine are generating, parsing, validating, browsing, sending, and receiving HL7 messages. Also, the engine can parse and generate XML-formatted HL7 messages. This new HL7 engine successfully exchanged HL7 messages with 10 megabyte size images and discharge summary information between two university hospitals. Published by Elsevier Ireland Ltd.

关键词： HL7 interface engine discharge summary MIME tree structure streaming algorithm

来源：评论

学校读者我要写书评

暂无评论

Real-Time Detection of Invisible Spreaders

Real-Time Detection of Invisible Spreaders

引用

IEEE Global Telecommunications Conference (GLOBECOM 08)

作者： Yoon, MyungKeun Chen, Shigang Univ Florida Dept Comp & Informat Sci & Engn Gainesville FL 32611 USA

ISBN: (纸本)9781424423248

Detecting spreaders can help an intrusion detection system identify potential attackers. The existing work can only detect aggressive spreaders that scan a large number of distinct addresses in a short period of time. However, stealthy spreaders may perform scanning deliberately at a low rate. We observe that these spreaders can easily evade the detection because their small traffic footprint will be covered by the large amount of background normal traffic that frequently flushes any spreader information out of the intrusion detection system's memory. We propose a new streaming scheme to detect stealthy spreaders that are invisible to the current systems. The new scheme stores information about normal traffic within a limited portion of the allocated memory, so that it will not interfere with spreaders' information stored elsewhere in the memory. The proposed scheme is light weight;it can detect invisible spreaders in highspeed networks while residing in SRAM. Through experiments using real Internet traffic traces, we demonstrate that our new scheme detects invisible spreaders efficiently while keeping both false-positives (normal sources misclassified as spreaders) and false-negatives (spreaders misclassified as normal sources) to low level.

关键词： stealthy spreader network security intrusion detection streaming algorithm

来源：评论

学校读者我要写书评

暂无评论

Estimating Cardinality Distributions in Network Traffic

Estimating Cardinality Distributions in Network Traffic

引用

International Conference on Measurement and Modeling of Computer Systems

作者： Chen, Aiyou Li, Li Cao, Jin Bell Labs Alcatel Lucent Technol Murray Hill NJ 07974 USA

ISBN: (纸本)9781605580050

Information on network host, connectivity patterns are important for network monitoring and traffic engineering. In this paper, all efficient streaming algorithm is proposed to estimate cardinality distributions including connectivity distributions, e.g. percent of hosts with any given number of distinct communicating peers or flows.

关键词： Cardinality distribution streaming algorithm

来源：评论

学校读者我要写书评

暂无评论

Estimating cardinality distributions in network traffic: extended abstract 08

Estimating cardinality distributions in network traffic: ext...

引用

Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems

作者： Aiyou Chen Li Li Jin Cao Bell Labs Alcatel-Lucent Technologies Murray Hill NJ USA

ISBN: (纸本)9781605580050

Information on network host connectivity patterns are important for network monitoring and traffic engineering. In this paper, an efficient streaming algorithm is proposed to estimate cardinality distributions including connectivity distributions, e.g. percent of hosts with any given number of distinct communicating peers or flows.

关键词： streaming algorithm cardinality distribution

来源：评论

学校读者我要写书评

暂无评论

Analysis of data streams: Computational and algorithmic challenges

引用

TECHNOMETRICS 2007年第3期49卷 346-356页

作者： Gilbert, A. C. Strauss, M. J. Univ Michigan Dept Math Ann Arbor MI 48109 USA

Over the past 15 years, our ability to collect massive data sets has increased dramatically. Concomitantly, our need to process, compress, store, analyze, and summarize these data sets has grown as well. Scientific, engineering, medical, and industrial applications require that we carry out these tasks efficiently and reasonably accurately. Data streams are one type of modern massive data sets, characterized by their size and by their distributed and dynamic properties. We give an expository discussion of data stream models and the algorithmic challenges that these models pose for computational statistical analysis, then present an overview of three streaming algorithms and a discussion of the computational challenges with each.

关键词： approximation approximation algorithm streaming algorithm sublinear algorithm

来源：评论

学校读者我要写书评

暂无评论

Finding frequent items in data streams

引用

THEORETICAL COMPUTER SCIENCE 2004年第1期312卷 3-15页

作者： Charikar, M Chen, K Farach-Colton, M Princeton Univ Dept Comp Sci Princeton NJ 08544 USA Univ Calif Berkeley Div Comp Sci Berkeley CA 94720 USA Rutgers State Univ Dept Comp Sci Piscataway NJ 08855 USA

We present a I-pass algorithm for estimating the most frequent items in a data stream using limited storage space. Our method relies on a data structure called a COUNT SKETCH, which allows us to reliably estimate the frequencies of frequent items in the stream. Our algorithm achieves better space bounds than the previously known best algorithms for this problem for several natural distributions on the item frequencies. In addition, our algorithm leads directly to a 2-pass algorithm for the problem of estimating the items with the largest (absolute) change in frequency between two data streams. To our knowledge, this latter problem has not been previously studied in the literature. (C) 2003 Elsevier B.V. All rights reserved.

关键词： frequent items streaming algorithm approximation

来源：评论

学校读者我要写书评

暂无评论

Finding frequent items in data streams

引用

29th International Colloquium on Automata, Languages and Programming

ISBN: (纸本)3540438645

关键词： frequent items streaming algorithm approximation

来源：评论

学校读者我要写书评

暂无评论

Better streaming algorithms for clustering problems 03

Better streaming algorithms for clustering problems

引用

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing

作者： Moses Charikar Liadan O'Callaghan Rina Panigrahy Princeton University Stanford University Cisco Systems

ISBN: (纸本)9781581136746

We study clustering problems in the streaming model, where the goal is to cluster a set of points by making one pass (or a few passes) over the data using a small amount of storage space. Our main result is a randomized algorithm for the k--Median problem which produces a constant factor approximation in one pass using storage space O(k poly log n). This is a significant improvement of the previous best algorithm which yielded a 2O(1/ε) approximation using O(nε) space. Next we give a streaming algorithm for the k--Median problem with an arbitrary distance function. We also study algorithms for clustering problems with outliers in the streaming model. Here, we give bicriterion guarantees, producing constant factor approximations by increasing the allowed fraction of outliers slightly.

关键词： clustering k-median streaming algorithm

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：