As accelerators are integrated into many of the newest workstations and supercomputers to provide large amounts of additional processing power, selecting the appropriate one is critical for achieving the best performa...
详细信息
As accelerators are integrated into many of the newest workstations and supercomputers to provide large amounts of additional processing power, selecting the appropriate one is critical for achieving the best performance. The new Intel® Xeon® Phi™ coprocessor provides more processing cores than a CPU but less than a graphics processing unit (GPU). However, the Phi's cores do not have the limitations of a GPU's cores and can also run code written for traditional CPUs instead of requiring code written specifically for GPUs. We used the traveling salesman problem, the knapsack problem, and the party problem, three NP-complete problems, as benchmark applications to compare the relative performance of the Phi with a contemporary GPU and CPU. Programs were written to solve the problems on the CPU and on the coprocessor that could be easily ported to run on the GPU with only minor modifications. The length of time the programs took to complete on the coprocessor, the GPU, and the CPU was measured. While the GPU attained speedups of almost 14 to 80 over a single CPU core for the problems, the coprocessor only attained speedups between 4 and 7 for the problems.
The Internet of Things (IoT) concept has been around for some time and applications such as transportation, health-care, education, travel, smart grid, retail, are and will be major benefactors of this concept. Howeve...
详细信息
ISBN:
(纸本)9781450328098
The Internet of Things (IoT) concept has been around for some time and applications such as transportation, health-care, education, travel, smart grid, retail, are and will be major benefactors of this concept. However, only recently, due to technological advances in sensor devices and rich wireless connectivity, Internet of Things at scale is becoming reality. For example, Cisco's Internet of Things Group predicts over 50 billion connected sensory devices by 2020. In this talk, we will discuss the Internet of Mobile Things (IoMT) since several game-changing technological advances happened on mobile things' such as mobile phones, trains, and cars, where rich sets of sensors, connected via diverse sets of wireless Internet technologies, are changing and influencing how people communicate, move, and download and distribute information. In this space, challenges come from the needs to determine (1) contextual information such as location, duration of contact, density of devices, utilizing networked sensory information;(2) higher level knowledge such as users' activity detection, mood detection, applications usage pattern detection and user interactions on 'mobile things', utilizing contextual information;and (3) adaptive and real-time parallel and distributed architectures that integrate context, activity, mood, usage patterns into mobile application services on mobile 'things'. Solving these challenges will provide enormous opportunities to improve the utility of mobile 'things', optimizing scarce resources on mobile 'things' such as energy, memory, and bandwidth.
Wireless Sensor Networks (WSN) can experience faults during deployment either due to its hardware malfunctioning or software failure or even harsh environmental factors. This results into presence of anomalies in thei...
详细信息
ISBN:
(纸本)9781479959075
Wireless Sensor Networks (WSN) can experience faults during deployment either due to its hardware malfunctioning or software failure or even harsh environmental factors. This results into presence of anomalies in their time-series collected data. So, these anomalies demand for reliable detection strategies to support in long term and/or in large scale WSN deployments. These data of physical variables are transmitted continuously to a repository for further processing of information as data stream. centralized fault detection based on centralized approach has become an emerging technology for building scalable and energy balanced applications for WSN. In our work, we try to implement the distributed Fault Detection (DFD) algorithm in a central unit named SOFREL S550 which represent the base station or the sink node and detects the suspicious nodes by exchanging heartbeat messages in active manner. By analyzing the collected heartbeat information, the unit finally identifies failed nodes according to a pre-defined failure detection rule.
The Barnes-Hut algorithm is a widely used approximation method for the N-Body simulation problem. The irregular nature of this tree walking code presents interesting challenges for its computation on parallel systems....
详细信息
Despite their theoretical appeal and grounding in tractable convex optimization techniques, kernel methods are often not the first choice for large-scale speech applications due to their significant memory requirement...
详细信息
ISBN:
(纸本)9781479928934
Despite their theoretical appeal and grounding in tractable convex optimization techniques, kernel methods are often not the first choice for large-scale speech applications due to their significant memory requirements and computational expense. In recent years, randomized approximate feature maps have emerged as an elegant mechanism to scale-up kernel methods. Still, in practice, a large number of random features is required to obtain acceptable accuracy in predictive tasks. In this paper, we develop two algorithmic schemes to address this computational bottleneck in the context of kernel ridge regression. The first scheme is a specialized distributed block coordinate descent procedure that avoids the explicit materialization of the feature space data matrix, while the second scheme gains efficiency by combining multiple weak random feature models in an ensemble learning framework. We demonstrate that these schemes enable kernel methods to match the performance of state of the art Deep Neural Networks on TIMIT for speech recognition and classification tasks. In particular, we obtain the best classification error rates reported on TIMIT using kernel methods.
In mobile sensor networks (MSNs), since sensor nodes and wireless networks are highly resource constrained, it is required to manage sensor data in flexible and efficient manners. Under the MEXT research project(1) en...
详细信息
ISBN:
(纸本)9781479926527
In mobile sensor networks (MSNs), since sensor nodes and wireless networks are highly resource constrained, it is required to manage sensor data in flexible and efficient manners. Under the MEXT research project(1) entitled "Studies on Efficient Data processingtechniques for Mobile Sensor Networks," we have conducted researches on data management issues in MSNs. In this paper, we report some of our achievements in a sub-area of this project, which addresses data allocation for efficient query processing in MSNs. In particular, we first show our achievements on how to effectively allocate sensor data on mobile nodes and how to efficiently process top-k and k-nearest neighbor queries on the allocated data. Then, we also show our achievements on how to quantify the impacts of mobility on data availability and data dissemination in MSNs.
In the coming big data era, the demand for data analysis capability in real applications is growing at amazing pace. The memory's increasing capacity and decreasing price make it possible and attractive for the di...
详细信息
ISBN:
(纸本)9783662439845;9783662439838
In the coming big data era, the demand for data analysis capability in real applications is growing at amazing pace. The memory's increasing capacity and decreasing price make it possible and attractive for the distributed OLAP system to load all the data into memory and thus significantly improve the data processing performance. In this paper, we model the performance of pipelined execution in distributed in-memory OLAP system and figure out that the data communication among the computation nodes, which is achieved by data exchange operator, is the performance bottleneck. Consequently, we explore the pipelined data exchange in depth and give a novel solution that is efficient, scalable, and skew-resilient. Experimental results show the effectiveness of our proposals by comparing with state-of-art techniques.
The amount of data which is produced is huge in current world and more importantly it is increasing exponentially. Traditional data storage and processingtechniques are ineffective in handling such huge data [10]. Ma...
详细信息
ISBN:
(纸本)9781479930807
The amount of data which is produced is huge in current world and more importantly it is increasing exponentially. Traditional data storage and processingtechniques are ineffective in handling such huge data [10]. Many real life applications require iterative computations in general and in particular used in most of machine learning and data mining algorithms over large datasets, such as web link structures and social network graphs. MapReduce is a software framework for easily writing applications which process large amount of data (multi-terabyte) in parallel on large clusters (thousands of nodes) of commodity hardware. However, because of batch oriented processing of MapReduce we are unable to utilize the benefits of MapReduce in iterative computations. Our proposed work is mainly focused on optimizing three factors resulting in performance improvement of iterative algorithms in MapReduce environment. In this paper, we address the key issues based on execution of tasks, the unnecessary creation of new task in each iteration and excessive shuffling of data in each iteration. Our preliminary experiments have shown promising results over the basic MapReduce framework. The comparative study with existing solutions based on MapReduce framework like HaLoop, has also shown better performance w.r.t algorithm run time and amount of data traffic over Hadoop Cluster.
Several algorithms applied to the solution of specific problems in ph sics require high performance computing. This is the case, for exanjple, in the field of digital image processing, where the required performance i...
详细信息
Several algorithms applied to the solution of specific problems in ph sics require high performance computing. This is the case, for exanjple, in the field of digital image processing, where the required performance in terms of speed, and sometimes running an a real time environment, leads to the use of parallel programrmng tools. To meet this demand it is important to understand these tools, highlighting differences and their possible applications. Moreover, research centers around the world has available a clusters of computer, or a multi-core platform. with a strong potential of using parallel programming techniques. Ibis study aims to charaetertre threads and forks parallel programming techniques. Both techniques allow the develcpnient of parallel codes, which with its own restrictions on the inter process comniunication and programming format. This Technical Note aims to highlight the use of each of these techniques, and to present an agplication in the area of image processing in which they were used. The application part of this work was develctped in the international collaboration with the JET Laboratory (Join European Torus of the European Atomic Energy Community I EURATOM). The TET Laboratory investigates the process of forming the plasma and its nstability, which appears as a toroidal ring of increased radiation, known as MARFE (Multifaceted Asymmetric Radiation From The Edge). The activities have explored the techniques of parallel programming algorithms in digital image processing. The presented algorithms allow achieving a processing rate higher than 10 000 images per second and use threads and shared memory communication between independent processes, which is equivalent to fork.
Selection algorithms find the kth smallest element from a set of elements. Although there are optimal parallel selection algorithms available for theoretical machines, these algorithms are not only difficult to implem...
详细信息
ISBN:
(纸本)9783642552243
Selection algorithms find the kth smallest element from a set of elements. Although there are optimal parallel selection algorithms available for theoretical machines, these algorithms are not only difficult to implement but also inefficient in practice. Consequently, scalable applications can only use few special cases such as minimum and maximum, where efficient implementations exist. To overcome such limitations, we propose a general parallel selection algorithm that scales even on today's largest supercomputers. Our approach is based on an efficient, unbiased median approximation method, recently introduced as median-of-3 reduction, and Hoare's sequential QuickSelect idea from 1961. The resulting algorithm scales with a time complexity of O(log(2) n) for n distributed elements while needing only O(1) space. Furthermore, we prove it to be a practical solution by explaining implementation details and showing performance results for up to 458, 752 processor cores.
暂无评论