检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

17 篇 会议
1 篇 学位论文

馆藏范围

18 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

17 篇 工学
- 16 篇 计算机科学与技术...
- 6 篇 电气工程
- 5 篇 信息与通信工程
- 2 篇 软件工程
- 1 篇 控制科学与工程
- 1 篇 交通运输工程
- 1 篇 安全科学与工程
- 1 篇 网络空间安全
1 篇 管理学
- 1 篇 管理科学与工程(可...
- 1 篇 图书情报与档案管...

主题

18 篇 scalable data an...
12 篇 distributed data...
7 篇 runtime predicti...
7 篇 resource allocat...
6 篇 resource managem...
5 篇 cluster manageme...
4 篇 performance mode...
2 篇 dynamic scaling
1 篇 customizable con...
1 篇 profiling
1 篇 model parallel
1 篇 job interference
1 篇 applied machine ...
1 篇 deep neural netw...
1 篇 distributed data...
1 篇 data clustering
1 篇 data sharing
1 篇 big data
1 篇 image segmentati...
1 篇 kernel methods

机构

12 篇 tech univ berlin
1 篇 oak ridge natl l...
1 篇 oak ridge natl l...
1 篇 technische unive...
1 篇 google mountain ...
1 篇 tech univ darmst...
1 篇 univ lugano usi ...
1 篇 oak ridge natl l...
1 篇 univ glasgow gla...
1 篇 university of wa...
1 篇 univ debrecen fa...
1 篇 oak ridge natl l...
1 篇 humboldt univ
1 篇 univ glasgow gla...
1 篇 univ debrecen me...
1 篇 univ potsdam has...
1 篇 univ calif san d...
1 篇 purdue univ w la...
1 篇 oak ridge natl l...
1 篇 univ wisconsin m...

作者

13 篇 thamsen lauritz
9 篇 kao odej
8 篇 will jonathan
7 篇 scheinert domini...
5 篇 renner thomas
4 篇 bader jonathan
3 篇 verbitskiy ilya
2 篇 zhu houkun
2 篇 wittkopp thorste...
2 篇 schmidt florian
2 篇 acker alexander
1 篇 li fengan
1 篇 byfeld marvin
1 篇 geldenhuys morga...
1 篇 chaudhuri kamali...
1 篇 blocher marcel
1 篇 lunga dalton
1 篇 toth janos
1 篇 binnig carsten
1 篇 seal sudip k.

语言

18 篇 英文

检索条件"主题词=scalable data analytics"

共 18 条记录，以下是11-20 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

CoBell: Runtime Prediction for Distributed dataflow Jobs in Shared Clusters 10

CoBell: Runtime Prediction for Distributed Dataflow Jobs in ...

引用

10th IEEE International Conference on Cloud Computing Technology and Science (IEEE CloudCom)

作者： Verbitskiy, Ilya Thamsen, Lauritz Renner, Thomas Kao, Odej Tech Univ Berlin Berlin Germany

ISBN: (纸本)9781538678992

Distributed dataflow systems have been developed to help users analyze and process large datasets. While they make it easier for users to develop massively-parallel programs, users still have to choose the amount of resources for the execution of their jobs. Yet, users do not necessarily understand workload and system dynamics, while they often have constraints like runtime targets and budgets. Addressing this problem, systems have been developed that automatically select the required amount of resources to fulfill the users' constraints. However, interference with co-located workloads can introduce a significant variance into the runtimes of jobs and make accurate runtime prediction harder. This paper presents CoBell, a resource allocation system that incorporates information about co-located workloads to improve the runtime prediction for jobs in shared clusters. CoBell receives jobs from users with runtime and scale-out constraints and then reserves resources based on predicted runtimes. We implemented CoBell as a job submission tool for YARN. As such, it works with existing YARN cluster setups. The paper evaluates CoBell using five different distributed dataflow jobs, showing that using CoBell results in runtimes that do not violate the runtime constraints by more than 7.2%.

关键词： scalable data analytics Distributed dataflows Runtime Prediction Resource Allocation Cluster Management

来源：评论

学校读者我要写书评

暂无评论

Scheduling Recurring Distributed dataflow Jobs Based on Resource Utilization and Interference 6

Scheduling Recurring Distributed Dataflow Jobs Based on Reso...

引用

IEEE 6th International Congress on Big data (Bigdata Congress)

作者： Thamsen, Lauritz Rabier, Benjamin Schmidt, Florian Renner, Thomas Kao, Odej Tech Univ Berlin Berlin Germany

ISBN: (纸本)9781538619964

Resource management systems like YARN or Mesos enable users to share cluster infrastructures by running analytics jobs in temporarily reserved containers. These containers are typically not isolated to achieve high degrees of overall resource utilizations despite the often fluctuating resource usage of single analytic jobs. However, some combinations of jobs utilize the resources better and interfere less with each others when running on the same nodes than others. This paper presents an approach for improving the resource utilization and job throughput when scheduling recurring data analysis jobs in shared cluster environments. Using a reinforcement learning algorithm, the scheduler continuously learns which jobs are best executed simultaneously on the cluster. Our evaluation of an implementation built on Hadoop YARN shows that this approach can increase resource utilization and decrease job runtimes. While interference between jobs can be avoided, co-locations of jobs with complementary resource usage are not yet always fully recognized. However, with a better measure of co-location goodness, our solution can be used to automatically adapt the scheduling to workloads with recurring batch jobs.

关键词： scalable data analytics Distributed dataflows Resource Management Cluster Scheduling Job Interference

来源：评论

学校读者我要写书评

暂无评论

Ellis: Dynamically Scaling Distributed dataflows To Meet Runtime Targets 9

Ellis: Dynamically Scaling Distributed Dataflows To Meet Run...

引用

9th IEEE International Conference on Cloud Computing Technology and Science (CloudCom)

作者： Thamsen, Lauritz Verbitskiy, Ilya Beilharz, Jossekin Renner, Thomas Polze, Andreas Kao, Odej Tech Univ Berlin Berlin Germany Univ Potsdam Hasso Plattner Inst Potsdam Germany

ISBN: (纸本)9781538606926

Distributed dataflow systems like MapReduce, Spark, and Flink help users in analyzing large datasets with a set of cluster resources. Performance modeling and runtime prediction is then used for automatically allocating resources for specific performance goals. However, the actual performance of distributed dataflow jobs can vary significantly due to factors like interference with co-located workloads, varying degrees of data locality, and failures. We address this problem with Ellis, a system that allocates an initial set of resources for a specific runtime target, yet also continuously monitors a job's progress towards the target and if necessary dynamically adjusts the allocation. For this, Ellis models the scale-out behavior of individual stages of distributed dataflow jobs based on previous executions. Our evaluation of Ellis with iterative Spark jobs shows that dynamic adjustments can reduce the number of constraint violations by 30.7-75.0% and the magnitude of constraint violations by 70.6-94.5%.

关键词： scalable data analytics Distributed dataflows Dynamic Scaling Runtime Prediction Resource Management

来源：评论

学校读者我要写书评

暂无评论

Bolt-on Differential Privacy for scalable Stochastic Gradient Descent-based analytics 17

Bolt-on Differential Privacy for Scalable Stochastic Gradien...

引用

ACM International Conference on Management of data

作者： Wu, Xi Li, Fengan Kumar, Arun Chaudhuri, Kamalika Jha, Somesh Naughton, Jeffrey Google Mountain View CA 94043 USA Univ Calif San Diego San Diego CA 92103 USA Univ Wisconsin Madison Madison WI USA

ISBN: (纸本)9781450341974

While significant progress has been made separately on analytics systems for scalable stochastic gradient descent (SGD) and private SGD, none of the major scalable analytics frameworks have incorporated differentially private SGD. There are two inter-related issues for this disconnect between research and practice: (1) low model accuracy due to added noise to guarantee privacy, and (2) high development and runtime overhead of the private algorithms. This paper takes a first step to remedy this disconnect and proposes a private SGD algorithm to address both issues in an integrated manner. In contrast to the white-box approach adopted by previous work, we revisit and use the classical technique of output perturbation to devise a novel "bolt-on" approach to private SGD. While our approach trivially addresses (2), it makes (1) even more challenging. We address this challenge by providing a novel analysis of the L-2-sensitivity of SGD, which allows, under the same privacy guarantees, better convergence of SGD when only a constant number of passes can be made over the data. We integrate our algorithm, as well as other state-of-the-art differentially private SGD, into Bismarck, a popular scalable SGD-based analytics system on top of an RDBMS. Extensive experiments show that our algorithm can be easily integrated, incurs virtually no overhead, scales well, and most importantly, yields substantially better (up to 4X) test accuracy than the state-of-the-art algorithms on many real datasets.

关键词： differential privacy optimization scalable data analytics stochastic gradient descent

来源：评论

学校读者我要写书评

暂无评论

Visually Programming dataflows for Distributed data analytics 4

Visually Programming Dataflows for Distributed Data Analytic...

引用

4th IEEE International Conference on Big data (Big data)

作者： Thamsen, Lauritz Renner, Thomas Byfeld, Marvin Paeschke, Markus Schroeder, Daniel Boehm, Felix Tech Univ Berlin Berlin Germany

ISBN: (纸本)9781467390057

Distributed dataflow systems like Spark and Flink allow to analyze large datasets using clusters of computers. These frameworks provide automatic program parallelization and manage distributed workers, including worker failures. Moreover, they provide high-level programming abstractions and execute programs efficiently. Yet, the programming abstractions remain textual while the dataflow model is essentially a graph of transformations. Thus, there is a mismatch between the presented abstraction and the underlying model here. One can also argue that developing dataflow programs with these textual abstractions requires needless amounts of coding and coding skills. A dedicated programming environment could instead allow constructing dataflow programs more interactively and visually. In this paper, we therefore investigate how visual programming can make the development of parallel dataflow programs more accessible. In particular, we built a prototypical visual programming environment for Flink, which we call Flision. Flision provides a graphical user interface for creating dataflow programs, a code generation engine that generates code for Flink, and seamless deployment to a connected cluster. Users of this environment can effectively create jobs by dragging, dropping, and visually connecting operator components. To evaluate the applicability of this approach, we interviewed ten potential users. Our impressions from this qualitative user testing strengthened our believe that visual programming can be a valuable tool for users of scalable data analysis tools.

关键词： scalable data analytics Distributed dataflows Visual Programming End-user Development

来源：评论

学校读者我要写书评

暂无评论

Selecting Resources for Distributed dataflow Systems According to Runtime Targets 35

Selecting Resources for Distributed Dataflow Systems Accordi...

引用

35th IEEE International Performance Computing and Communications Conference (IPCCC)

作者： Thamsen, Lauritz Verbitskiy, Ilya Schmidt, Florian Renner, Thomas Kao, Odej Tech Univ Berlin Berlin Germany

ISBN: (纸本)9781509052523

Distributed dataflow systems like Spark or Flink enable users to analyze large datasets. Users create programs by providing sequential user-defined functions for a set of well-defined operations, select a set of resources, and the systems automatically distribute the jobs across these resources. However, selecting resources for specific performance needs is inherently difficult and users consequently tend to overprovision, which results in poor cluster utilization. At the same time, many important jobs are executed recurringly in production clusters. This paper presents Bell, a practical system that monitors job execution, models the scale-out behavior of jobs based on previous runs, and selects resources according to user-provided runtime targets. Bell automatically chooses between different runtime prediction models to optimally support different distributed dataflow systems. Bell is implemented as a job submission tool for YARN and, thus, works with existing cluster setups. We evaluated Bell's runtime prediction with six exemplary data analytics jobs using both Spark and Flink. We present the learned scale-out models for these jobs and evaluate the relative prediction error using cross-validation, showing that our model selection approach provides better overall performance than the individual prediction models.

关键词： scalable data analytics Distributed dataflows Runtime Prediction Resource Allocation Cluster Management

来源：评论

学校读者我要写书评

暂无评论

scalable Embeddings for Kernel Clustering onMapReduce

Scalable Embeddings for Kernel Clustering onMapReduce

引用

作者： Elgohary, Ahmed University of Waterloo

学位级别：master

There is an increasing demand from businesses andindustries to make the best use of their data. Clustering is apowerful tool for discovering natural groupings in data. Thek-means algorithm is the most commonly-used data clustering method,having gained popularity for its effectiveness on various data setsand ease of implementation on different computing architectures. Itassumes, however, that data are available in an attribute-valueformat, and that each data instance can be represented as a vectorin a feature space where the algorithm can be applied. Theseassumptions are impractical for real data, and they hinder the useof complex data structures in real-world clustering applications. The kernel k-means is an effective method for data clusteringwhich extends the k-means algorithm to work on a similarity matrixover complex data structures. The kernel k-means algorithm ishowever computationally very complex as it requires the completedata matrix to be calculated and stored. Further, the kernelizednature of the kernel k-means algorithm hinders the parallelizationof its computations on modern infrastructures for distributedcomputing. This thesis defines a family of kernel-basedlow-dimensional embeddings that allows for scaling kernel k-meanson MapReduce via an efficient and unified parallelization ***, three practical methods for low-dimensional embedding thatadhere to our definition of the embedding family are *** the proposed parallelization strategy with any of thethree embedding methods constitutes a complete scalable andefficient MapReduce algorithm for kernel k-means. The efficiencyand the scalability of the presented algorithms are demonstratedanalytically and empirically

关键词： data Clustering Kernel Methods scalable data analytics MapReduce Big data

来源：评论

学校读者我要写书评

暂无评论

Extending the visualization capabilities of a genome browser

Extending the visualization capabilities of a genome browser

引用

4th IEEE International Conference on Cognitive Infocommunications (CogInfoCom)

作者： Lamfalusi, Csaba Girus, David Kruppa, Kinga Toth, Janos Pocsai, Eniko Hajdune Kunkli, Roland Hajdu, Andras Balint, Balint L. Univ Debrecen Fac Informat POB 12 H-4010 Debrecen Hungary Univ Debrecen Med & Hlth Sci Ctr Dept Biochem & Mol Biol H-4032 Debrecen Hungary

ISBN: (纸本)9781479915439;9781479915460

scalable visualization of big data is highly desired in several fields. A demonstrative example is annotated genetic data, where the DNA sequence can be better visualized using space-filling fractal curves. Though such approaches are already available in the literature, they are not applied routinely in the clinical practice. The reason of this undesired fact is that clinicians use well-established, but rather old gene browsers in their work. Naturally, these browsers lack state-of-the-art programming support and also visualization features. To help with this problem, in this paper, we propose a completion of a very widely used genome browser with scalable visualization techniques. The motivation of the work came from the clinical partner, since a good overview of the whole annotated DNA segment can lead to the recognition of currently unknown relations between genes. We explain what steps were needed to add the new visualization feature to the browser in terms of graphical elements, data extraction, and a scalable user interface.

关键词： Customizable content management Information visualization scalable data analytics Genome browsing Space-filling curves

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共2页 << < 1 2 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：