Nowadays the leading world scientists and engineers center their attention to data mining and machine learning algorithms optimization and acceleration rather than inventing new ones. the natural language processing m...
详细信息
ISBN:
(纸本)9783319993164;9783319993157
Nowadays the leading world scientists and engineers center their attention to data mining and machine learning algorithms optimization and acceleration rather than inventing new ones. the natural language processing methods and tools are widely in use in production in the area of machine translation. the researches in the area of search engines and semantic search are mostly concentrated on data storage and further analysis. the majority of search engines use the huge amounts of previously accumulated user requests for predicting the search output without taking in attention this user intention by qualitative processingthe request. In this paper we explore the idea of usage the semantic cognitive spaces for extracting the exact user intentions by analysis the natural language input requests. the final goal of our research is to develop a valid search query model for further usage in semantic search engines.
For high dimension optimization problems, the performance of sequential particle swarm optimization is time-consuming. this paper presented a GPU-based parallel local particle swarm optimization algorithm at dimension...
详细信息
Nowadays, there are several different architectures available not only for the industry, but also for normal consumers. Traditional multicore processors, GPUs, accelerators such as the Sunway SW26010, or even energy e...
详细信息
ISBN:
(纸本)9781728116440
Nowadays, there are several different architectures available not only for the industry, but also for normal consumers. Traditional multicore processors, GPUs, accelerators such as the Sunway SW26010, or even energy efficiency-driven processors such as the ARM family, present very different architectural characteristics. this wide range of characteristics presents a challenge for the developers of applications. Developers must deal with different instruction sets, memory hierarchies, or even different programming paradigms when programming for these architectures. therefore, the same application can perform well when executing on one architecture, but poorly on another architecture. To optimize an application, it is important to have a deep understanding of how it behaves on different architectures. the related work in this area mostly focuses on a limited analysis encompassing execution time and energy. In this paper, we perform a detailed investigation on the impact of the memory subsystem of different architectures, which is one of the most important aspects to be considered. For this study, we performed experiments in the Broadwell CPU and Pascal GPU, using applications from the Rodinia benchmark suite. In this way, we were able to understand why an application performs well on one architecture and poorly on others.
this paper addresses the problem of continuously finding highly correlated pairs of time series over the most recent time window and possibly use the discovered correlations to select features for training a regressio...
详细信息
ISBN:
(纸本)9789897583650
this paper addresses the problem of continuously finding highly correlated pairs of time series over the most recent time window and possibly use the discovered correlations to select features for training a regression model for prediction. the implementation builds upon the ParCorr parallel method for online correlation discovery and is designed to run continuously on top of the UPM-CEP data streaming engine through efficient streaming operators.
the theory of predictive processing encompasses several elements that make it attractive as the underlying computational approach for a cognitive architecture. We introduce a new cognitive architecture, Scruff, capabl...
详细信息
ISBN:
(纸本)9783319993164;9783319993157
the theory of predictive processing encompasses several elements that make it attractive as the underlying computational approach for a cognitive architecture. We introduce a new cognitive architecture, Scruff, capable of implementing predictive processing models by incorporating key properties of neural networks into the Bayesian probabilistic programming framework. We illustrate the Scruff approach with conditional linear Gaussian (CLG) models, noisy-or models, and a Bayesian variation of the Rao-Ballard linear algebra model of predictive vision.
Nowadays many applications require to analyse the continuous flow of data produced by different data sources before the data is stored. Data streaming engines emerged as a solution for processing data on the fly. At t...
详细信息
ISBN:
(纸本)9789897583650
Nowadays many applications require to analyse the continuous flow of data produced by different data sources before the data is stored. Data streaming engines emerged as a solution for processing data on the fly. At the same time, computer architectures have evolved to systems with several interconnected CPUs and Non Uniform Memory Access ( NUMA), where the cost of accessing memory from a core depends on how CPUs are interconnected. this paper presents UPM-CEP, a data streaming engine designed to take advantage of NUMA architectures. the preliminary evaluation using Intel HiBench benchmark shows that NUMA aware deployment improves performance.
High Performance Computing (HPC) demand is on the rise, particularly for large distributed computing. HPC systems have, by design, very heterogeneous architectures, both in computation and in communication bandwidth, ...
详细信息
ISBN:
(纸本)9781450362955
High Performance Computing (HPC) demand is on the rise, particularly for large distributed computing. HPC systems have, by design, very heterogeneous architectures, both in computation and in communication bandwidth, resulting in wide variations in the cost of communications between compute units. If large distributed applications are to take full advantage of HPC, the physical communication capabilities must be taken into consideration when allocating workload. Hypergraphs are good at modelling total volume of communication in parallel and distributed applications. To the best of our knowledge, there are no hypergraph partitioning algorithms to date that are architecture-aware. We propose a novel restreaming hypergraph partitioning algorithm (HyperPRAW) that takes advantage of peer to peer physical bandwidth profiling data to improve distributed applications performance in HPC systems. Our results show that not only the quality of the partitions achieved by our algorithm is comparable with state-of-the-art multilevel partitioning, but that the runtime performance in a synthetic benchmark is significantly reduced in 10 hypergraph models tested, with speedup factors of up to 14x.
When a new problem is given to be solved with high performance, deep learning needs large effort to tune the model hyperparameters including model architectures and training hyperparameters. Many previous works have t...
详细信息
ISBN:
(纸本)9781538662496
When a new problem is given to be solved with high performance, deep learning needs large effort to tune the model hyperparameters including model architectures and training hyperparameters. Many previous works have tried to tune the model hyperparameters automatically, but the algorithms only target on searching either model architectures or training hyperparameters. However, simultaneous optimization of the model architectures and the training hyperparameters works slow and falls into bad local minimums because of the search space enlarged by the high correlation between the two model hyperparameters. In this paper, we propose a novel algorithm to efficiently find the best set of model architectures and training hyperparameters. To efficiently handle the large search space, the proposed algorithm selectively utilizes the given training samples, while limiting the search space by a novel ensemble sampling method. Also, the evaluation time is further reduced by a novel termination mechanism. the accelerated computation of the proposed algorithm is validated by using complex image datasets, which shows the state-of-the-art performance withthe 70:9% reduction of computational time.
Tracing code paths to form extended basic blocks is useful in many areas, compiler optimizations [1], improving instruction cache behavior [2] and custom-hardware offloading [3]. Prior work has been plagued by small t...
详细信息
ISBN:
(纸本)9781728136134
Tracing code paths to form extended basic blocks is useful in many areas, compiler optimizations [1], improving instruction cache behavior [2] and custom-hardware offloading [3]. Prior work has been plagued by small traces, limited either by the overheads of dynamic profiling, statically available information [4], or side-exit branches [5]. In this work, we rethink what code path sequences to fuse and construct long traces for offloading to spatial accelerators, while minimizing the occurrence of side exits which limit dynamic coverage. We introduce a novel technique that recasts learning a program's execution patterns as a natural-language-processing problem, CBOW (Continuous Bag of Words). We then use a deep learning network to learn the relationships among paths. During the compilation phase, the compiler uses a sequence miner to decide what paths are likely to occur. the learning network predicts a Deepframe online, which is an extended basic block comprising a multi-path sequence (each path itself is composed of multiple basic blocks). We demonstrate the efficacy of Deepframe on spatial hardware accelerators and find the following: i) Deepframe can construct up to 5x (max: 27x) longer offload regions compared to prior approaches. ii) Surprisingly far-flung ILP (instruction-level parallelism) and MLP (memory-level parallelism) can be mined from the frames statically (5.5x increase in ILP and 10.5x increase in MLP). iii) the frames offloaded to the spatial accelerator have minimal side exits (mis-speculation) and achieve sufficient dynamic coverage to improve overall application performance (up to 9x improvement). We will be releasing open-source our end-to-end compiler prototype based on LLVM.
In the era of the Internet of things, the amount of data generated by industrial sensors is large and dynamic, and its processing requirements are also complex and variable, requiring developers to customize the algor...
详细信息
ISBN:
(纸本)9781728111902
In the era of the Internet of things, the amount of data generated by industrial sensors is large and dynamic, and its processing requirements are also complex and variable, requiring developers to customize the algorithm, the process is cumbersome and the development cycle is long. In this paper, a real-time processing system of sensor streaming data based on data flow view is designed. the data stream processing process is abstractly encapsulated and customized assembly based on the minimum processing unit to form a complete processing logic link. the view logic is mapped to the processing function at the bottom of the distributed system through the process parsing engine to form a task list, which drives the stream data processing process based on the Kafka message queue and the Flink parallel computing framework. the processing result is pushed to the application system in real time via the service route to meet different business requirements. the case study shows that the system has flexible and configurable basic functions of stream processing and has good horizontal scalability.
暂无评论