Large amounts of data processing bring parallel computing into sharp focus. In order to solve the sorting problem of large-scale data in the era of internet, a large-scale distributed sorting algorithm based on cloud ...
详细信息
ISBN:
(纸本)9783662486832;9783662486825
Large amounts of data processing bring parallel computing into sharp focus. In order to solve the sorting problem of large-scale data in the era of internet, a large-scale distributed sorting algorithm based on cloud computing is proposed. The algorithm uses the ideas of quick-sort and merge-sort to sort and integrate the data on each cloud, which making best of clouds' computing and storage resources. Taking advantage of the idea of parallel computing, we can reduce the computing time and improve the efficiency of sorting. By evaluating algorithm's time complexity and organizing a simulation test, the effectiveness was verified.
Finite mixture models have been widely used for the modelling and analysis of data from heterogeneous populations. Maximum likelihood estimation of the parameters is typically carried out via the Expectation-Maximizat...
详细信息
Finite mixture models have been widely used for the modelling and analysis of data from heterogeneous populations. Maximum likelihood estimation of the parameters is typically carried out via the Expectation-Maximization (EM) algorithm. The complexity of the implementation of the algorithm depends on the parametric distribution that is adopted as the component densities of the mixture model. In the case of the skew normal and skew t-distributions, for example, the E-step would involve complicated expressions that are computationally expensive to evaluate. This can become quite time-consuming for large and/or high-dimensional datasets. In this paper, we develop a multithreaded version of the EM algorithm for the fitting of finite mixture models. Due to the structure of the algorithm for these models, the E- and M-steps can be easily reformulated to be executed in parallel across multiple threads to take advantage of the processing power available in modern-day multicore machines. Our approach is simple and easy to implement, requiring only small changes to standard code. To illustrate the approach, we focus on a fairly general mixture model that includes as special or limiting cases some of the most commonly used mixture models including the normal, t-, skew normal, and skew t-mixture models. The performance gain with our approach is illustrated using two real datasets.
Globalization and cloud computing have allowed major strides forward in terms of communication possibilities, but it is also illuminating how many different resource options and formats exist access to which would dra...
详细信息
ISBN:
(纸本)9781467390064
Globalization and cloud computing have allowed major strides forward in terms of communication possibilities, but it is also illuminating how many different resource options and formats exist access to which would dramatically increase the accuracy and reliability of choices made as a result of computational output. As a result, there is increasing need for methods resolving levels of data translations necessary for the effectiveness of distributed Linked Data platforms, especially in the context of the Building Information Modeling domain in which we conduct our work. The following research presents the current state of our Linked Data Platform for handling and processing Resilience and BIM data from existing and remotely located data stores and simulation models. This approach focuses on data interoperability and illustrates the functional interactions of a set of lightweight API endpoints that serve resources as requests are extensible with the HYDRA Core Vocabulary to increase data discoverability opportunities.
This work studies utilization of shared caches by applications running concurrently on different cores of multicore systems. Knowledge about program contention due to shared resources is important for various design p...
详细信息
ISBN:
(纸本)9781479984909
This work studies utilization of shared caches by applications running concurrently on different cores of multicore systems. Knowledge about program contention due to shared resources is important for various design problems concerning multicore architectures. It is needed for power estimation, scheduling of parallelapplications and design of shared memories. Moreover, deep understanding of programs behavior is especially needed for the development of accurate models that are able to predict misses caused by shared resources in multicore systems. We present a methodology that is able to examine the interaction of applications in shared caches. Our experiments show a positive impact of data sharing by minimizing misses in shared L2 caches over a wide range of L2 cache sizes for applications from the Mediabench suite. Up to 25% lower misses in the last level cache can be observed for embedded applications, when data are allowed to be shared among programs running on different cores.
Virtualization helps a lot in handling critical applications in a highly availability manner, stream lining in applications deployment, application migrations etc. It is because of the virtualization that users are ab...
详细信息
ISBN:
(纸本)9781509002115
Virtualization helps a lot in handling critical applications in a highly availability manner, stream lining in applications deployment, application migrations etc. It is because of the virtualization that users are able to reduce the extra requirements of hardware, power consumption, trimming the space requirements, air conditioning requirements etc. Server consolidation, disaster recovery, dynamic load balancing, testing and deployment, virtual desktops, improved system reliability all these facilities are being provided by the virtualization techniques. Our given proposal of work uses the same for implementing cluster programming in order to implement Winograd's variant of Strassen's matrix multiplication. An API known as RMI (Remote Method Invocation) is being used with the help of which programmer can create distributedapplications so that objects which are residing on the different systems can interact with each other in an efficient manner. The divide and conquer approach of the Winograd's variant method is the main factor which helps us in distributed computing. Partitioning a given matrix into sub matrix is done at the master side and at each slave the logical partitioning into 2*2 matrixes has been done for implementing given algorithm. For the purpose of analysis of the given proposed work various performance metrics like excessive parallel overhead speed up, efficiency, and total execution time are being used.
MapReduce has become a dominant parallel computing paradigm for storing and processing massive data due to its excellent scalability, reliability, and elasticity. in this paper, we present a new architecture of Distri...
详细信息
ISBN:
(纸本)9781467387095
MapReduce has become a dominant parallel computing paradigm for storing and processing massive data due to its excellent scalability, reliability, and elasticity. in this paper, we present a new architecture of distributed Beta Wavelet Networks {DBWN} for large image classification in MapReduce model. First to prove the performance of wavelet networks, a parallelized learning algorithm based on the Beta Wavelet Transform is proposed. T hen the proposed structure of the {DBWN} is itemized. However the new algorithm is realized in MapReduce model. Comparisons with Fast Beta Wavelet Network {FBWN} are presented and discussed. Results of comparison have shown that the {DBWN} model performs better than {FBWN} model in classification rate and in the context of training run time.
Approximate pattern discovery is one of the fundamental and challenging problems in computer science. Fast and high performance algorithms are highly demanded in many applications in bioinformatics and computational m...
详细信息
ISBN:
(纸本)9781479984909
Approximate pattern discovery is one of the fundamental and challenging problems in computer science. Fast and high performance algorithms are highly demanded in many applications in bioinformatics and computational molecular biology, which are the domains that are mostly and directly benefit from any enhancement of pattern matching theoretical knowledge and solutions. This paper proposed an efficient GPU implementation of fuzzified Aho-Corasick algorithm using Levenshtein method and N-gram technique as a solution for approximate pattern matching problem.
In this paper, we utilize the framework of compressed sensing (CS) for device detection and distributed resource allocation in large-scale machine-to-machine (M2M) communication networks. The devices are partitioned i...
详细信息
ISBN:
(纸本)9781479999897
In this paper, we utilize the framework of compressed sensing (CS) for device detection and distributed resource allocation in large-scale machine-to-machine (M2M) communication networks. The devices are partitioned into clusters according to some pre-defined criteria, e.g., proximity or service type. Moreover, by the sparse nature of the event occurrence in M2M communications, the activation pattern of the M2M devices can be formulated as a particular block sparse signal with additional in-block structure in CS based applications. This paper introduces a novel scheme for distributed resource allocation to the M2M devices based on block-CS related techniques, which mainly consists of three phases: (1) In a full-duplex acquisition phase, the network activation pattern is collected in a distributed manner. (2) The base station detects the active clusters and the number of active devices in each cluster, and then assigns a certain amount of resources accordingly. (3) Each active device detects the order of its index among all the active devices in the cluster and accesses the corresponding resource for transmission. The proposed scheme can efficiently reduce the acquisition time with much less computation complexity compared with standard CS algorithms. Finally, extensive simulations confirm the robustness of the proposed scheme under noisy conditions.
Nowadays, IT community is experiencing great shift in computing and information storage infrastructures by using powerful, flexible and reliable alternative of cloud computing. The power of cloud computing may also be...
详细信息
ISBN:
(纸本)9781509006663
Nowadays, IT community is experiencing great shift in computing and information storage infrastructures by using powerful, flexible and reliable alternative of cloud computing. The power of cloud computing may also be realized for mankind if some dedicated disaster management clouds will be developed at various countries cooperating each other on some common standards. The experimentation and deployment of cloud computing by governments of various countries for mankind may be the justified use of IT at social level. It is possible to realize a real-time disaster management cloud where applications in cloud will respond within a specified time frame. If a Real-Time Cloud (RTC) is available then for intelligent machines like robots the complex processing may be done on RTC via request and response model. The complex processing is more desirable as level of intelligence increases in robots towards humans even more. Therefore, it may be possible to manage disaster sites more efficiently with more intelligent cloud robots without great lose of human lives waiting for various assistance at disaster site. Real-time garbage collector, real-time specification for Java, multicore CPU architecture with network-on-chip, parallel algorithms, distributed algorithms, high performance database systems, high performance web servers and gigabit networking can be used to develop real-time applications in cloud.
Big Data is an immense term for working with large volume and complex data sets. When data set is large in volume and traditional processingapplications are inadequate then distributed databases are needed. Big data ...
详细信息
ISBN:
(纸本)9781467394178
Big Data is an immense term for working with large volume and complex data sets. When data set is large in volume and traditional processingapplications are inadequate then distributed databases are needed. Big data came into existence because earlier technologies were not able to handle such large data from autonomous sources. To find meaningful and accurate data from large unstructured data, is a dreary task for any user. This is the reason why classification techniques came into picture for big data. With the help of classification methods unstructured data can be turned into organized form so that a user can access the required data easily. These classification techniques can be applied over big transactional databases to provide data services to users from large volume data sets. Classification is an aspect of machine learning and there are basically two broad categories: Supervised and unsupervised classification. In this paper we worked on to study variants of supervised classification methods. A comparison is also done on the basis of their advantages and limitations.
暂无评论