The Internet of Things (IoT) will grow seamlessly with advancements in data and communication technologies leading to the deploy meat of trillions of end devices. Its application starts with a simple home automation t...
详细信息
ISBN:
(纸本)9781728144993
The Internet of Things (IoT) will grow seamlessly with advancements in data and communication technologies leading to the deploy meat of trillions of end devices. Its application starts with a simple home automation to a very large scale industrial automation system. The trend is leading towards huge data generation requiring high processing power. In the near future, computing resources might not be sufficient for handling dynamic humongous data production. As the technology advances, microcontrollers or System-on-Chips (SoCs) used for IoT end devices are becoming cheaper and more powerful. Hence, there is a requirement of effectively making use of huge number of underutilized IoT of the future by allocating additional microtasks in parallel which would solve the upcoming needs of the technological trend.
In this paper, we present O 3 , a social link based private storage cloud for decentralized collaboration. O 3 allows users to share and collaborate on collections of files (shared folders) with others, in a decentra...
详细信息
ISBN:
(数字)9781728190747
ISBN:
(纸本)9781728183824
In this paper, we present O 3 , a social link based private storage cloud for decentralized collaboration. O 3 allows users to share and collaborate on collections of files (shared folders) with others, in a decentralized manner, without the need for intermediaries (such as public cloud storage servers) to intervene. Thus, users can keep their working relationships (who they work with) and what they work on private from third parties. Using benchmarks and traces from real workloads, we experimentally evaluate O 3 and demonstrate that the system scales linearly when synchronizing increasing numbers of concurrent users, while performing on-par with the ext4 non-version-tracking filesystem.
Frequent Itemset Mining (FIM) from large-scale databases has emerged as an important problem in the data mining and knowledge discovery research community. However, FIM suffers from three important limitations with th...
详细信息
With the rapid advancement of World Wide Web, people can share their knowledge and information via online tools such as sharing systems and ecommerce applications. Many approaches have been proposed to process and org...
详细信息
An increasing number of companies are using data analytics to improve their products, services, and business processes. However, learning knowledge effectively from massive data sets always involves nontrivial computa...
详细信息
ISBN:
(纸本)9781450362955
An increasing number of companies are using data analytics to improve their products, services, and business processes. However, learning knowledge effectively from massive data sets always involves nontrivial computational resources. Most businesses thus choose to migrate their hardware needs to a remote cluster computing service (e.g., AWS) or to an in-house cluster facility which is often run at its resource capacity. In such scenarios, where jobs compete for available resources utilizing resources effectively to achieve high-performance data analytics becomes desirable. Although cluster resource management is a fruitful research area having made many advances (e.g., YARN, Kubernetes), few projects have investigated how further optimizations can be made specifically for training multiple machine learning (ML) / deep learning (DL) models. In this work, we introduce FlowCon, a system which is able to monitor loss functions of ML/DL jobs at runtime, and thus to make decisions on resource configuration elastically. We present a detailed design and implementation of FlowCon, and conduct intensive experiments over various DL models. Our experimental results show that FlowCon can strongly improve DL job completion time and resource utilization efficiency, compared to existing approaches. Specifically, FlowCon can reduce the completion time by up to 42.06% for a specific job without sacrificing the overall makespan, in the presence of various DL job workloads.
Bayesian optimization has become a popular method for high-throughput computing, like the design of computer experiments or hyperparameter tuning of expensive models, where sample efficiency is mandatory. In these app...
ISBN:
(纸本)9780999241141
Bayesian optimization has become a popular method for high-throughput computing, like the design of computer experiments or hyperparameter tuning of expensive models, where sample efficiency is mandatory. In these applications, distributed and scalable architectures are a necessity. However, Bayesian optimization is mostly sequential. Even parallel variants require certain computations between samples, limiting the parallelization bandwidth. Thompson sampling has been previously applied for distributed Bayesian optimization. But, when compared with other acquisition functions in the sequential setting, Thompson sampling is known to perform suboptimally. In this paper, we present a new method for fully distributed Bayesian optimization, which can be combined with any acquisition function. Our approach considers Bayesian optimization as a partially observable Markov decision process. In this context, stochastic policies, such as the Boltzmann policy, have some interesting properties which can also be studied for Bayesian optimization. Furthermore, the Boltzmann policy trivially allows a distributed Bayesian optimization implementation with high level of parallelism and scalability. We present results in several benchmarks and applications that show the performance of our method.
Sub-graph can be used to recognize functional and non-functional characteristics in various graph applications. The sub-graph isomorphism is the problem of detection of input graph inside the target graph. However, if...
详细信息
When using the compressed sensing method in Synthetic Aperture Radar(SAR) imaging, there are two major problems: long calculation time and insufficient scalability of t calculation ability. In order to solve the above...
详细信息
ISBN:
(数字)9781728190457
ISBN:
(纸本)9781728190464
When using the compressed sensing method in Synthetic Aperture Radar(SAR) imaging, there are two major problems: long calculation time and insufficient scalability of t calculation ability. In order to solve the above problems, this paper proposes a distributed imaging method for SAR compressed sensing imaging based on MapReduce. First, the sparse data is labeled, then the range and azimuth image are reconstructed by two MapReduce calculation processes. With parallelcomputing advantages, the acceleration of SAR compressed sensing imaging is realized.
Ideas from multi-level relaxation methods are combined with load balancing techniques to achieve a convergence acceleration for a homogeneous work load distribution over a given set of processors when the underlying w...
详细信息
ISBN:
(纸本)9788412110111
Ideas from multi-level relaxation methods are combined with load balancing techniques to achieve a convergence acceleration for a homogeneous work load distribution over a given set of processors when the underlying work function is inhomogeneously distributed in space. The algorithm is based on an orthogonal recursive bisection approach which is evaluated via a hierarchically refined coarse integration. The method only requires a minimal information transfer across processors during the tree traversal steps. It is described of how to partition the system of processors to geometrical space, when global information is needed for the spatial tesselation.
Recently, heterogeneous network mining has gained tremendous attention from researcher due to its wide applications. Link prediction is one of the most important task in information network mining. From the past, most...
详细信息
ISBN:
(纸本)9783030009793;9783030009786
Recently, heterogeneous network mining has gained tremendous attention from researcher due to its wide applications. Link prediction is one of the most important task in information network mining. From the past, most of the networked data mining approaches are mainly applied for homogenous network which is considered as single-typed objects and links. Moreover, there are remained challenges related to thoroughly evaluating the content of linked objects which are considered as important in predicting the potential relationships between objects. Like a common problem of predicting co-authorship in bibliographic network such as: DBLP, DBIS, etc. There is no doubt that an author who is interesting in "data mining" field tend to cooperate with the other authors who contribute on this field only. Hence, predicting co-authorships between authors work on "data mining" with others who work on "hardware" is dull as well. Moreover, in the context of large-scaled network, traditional standalone computing mechanism also is not affordable due to low-performance in time-consuming. To overcome these challenges, n this paper, we propose an approach of topic-driven meta-path-based prediction in heterogeneous network, called T-MPP which is implemented on distributedcomputing environment of Spark. The T-MPP not only enables to discover potential relationships in given bibliographic network but also supports to capture the topic similarity between authors. We present experiments on a real-world DBLP network. The outputs show that our proposed T-MPP model can generate more accurate prediction results as compared to previous approaches.
暂无评论