GPUs are known to benefit structured applications with ample parallelism, such as deep learning in a datacenter. Recently, GPUs have shown promise for irregular streaming network tasks. However, the GPU's co-proce...
详细信息
ISBN:
(纸本)9781728136134
GPUs are known to benefit structured applications with ample parallelism, such as deep learning in a datacenter. Recently, GPUs have shown promise for irregular streaming network tasks. However, the GPU's co-processor dependence on a CPU for task management, inefficiencies with fine-grained tasks, and limited multiprogramming capabilities introduce challenges with efficiently supporting latency-sensitive streaming tasks. This paper proposes an event-driven GPU execution model, EDGE, that enables non-CPU devices to directly launch pre-configured tasks on a GPU without CPU interaction. Along with freeing up the CPU to work on other tasks, we estimate that EDGE can reduce the kernel launch latency by 4.4x compared to the baseline CPU-launched approach. This paper also proposes a warp-level preemption mechanism to further reduce the end-to-end latency of fine-grained tasks in a shared GPU environment. We evaluate multiple optimizations that reduce the average warp preemption latency by 35.9x over waiting for a preempted warp to naturally flush the pipeline. When compared to waiting for the first available resources, we find that warp-level preemption reduces the average and tail warp scheduling latencies by 2.6x and 2.9x, respectively, and improves the average normalized turnaround time by 1.4x.
Embedding of data into a normed vector space or linear manifold constitutes a fundamental approach in machine learning. A generalization is embedding into a metric space, where the distance is not induced by a norm. T...
详细信息
ISBN:
(数字)9781538682661
ISBN:
(纸本)9781538682678
Embedding of data into a normed vector space or linear manifold constitutes a fundamental approach in machine learning. A generalization is embedding into a metric space, where the distance is not induced by a norm. This paper explores the embedding of a time series into a topological metric space of Hankel matrices. The rank metric, along with a windowing scheme, is used to design a score and a detection method, for the purposes of anomaly identification. Assuming that the non-anomalous behavior can be represented as a linear combination of a finite number of frequencies, the rank metric can be used to measure the number of frequency changes in realtime to detect the anomalies. Accordingly, the Hankel matrix rank is used as a metric to develop a Hankel-based unsupervised Anomaly Detection (HAD) algorithm. Extensive experiments are conducted to test the proposed method on the Numenta anomaly benchmark dataset, as well as artificially generated random time-series data. Results show that the proposed HAD method is promising with respect to anomaly detection precision and computational performance.
Process scheduling is an important and necessary task of a multiprogramming operating system where the process manager handles the selection and removal of processes based on a strategy. One such strategy is the Round...
详细信息
Power efficiency has become one of the most important design constraints for high-performance systems. In this paper, we revisit the design of low-power virtually-addressed caches. While virtually-addressed caches ena...
详细信息
Power efficiency has become one of the most important design constraints for high-performance systems. In this paper, we revisit the design of low-power virtually-addressed caches. While virtually-addressed caches enable significant power savings by obviating the need for Translation Lookaside Buffer (TLB) lookups, they suffer from several challenging design issues that curtail their widespread commercial adoption. We focus on one of these challenges-cache flushes due to virtual page remappings. We use detailed studies on an ARM many-core server to show that this problem degrades performance by up to 25 percent for a mix of multi-programmed and multi-threaded workloads. Interestingly, we observe that many of these flushes are spurious, and caused by an indiscriminate invalidation broadcast on ARM architecture. In response, we propose a low-overhead and readily implementable hardware mechanism using bloom filters to reduce spurious invalidations and mitigate their ill effects.
Speech produced by a speaker in emotionally charged situations, such as anger, happiness, and shout corresponds to high arousal speech. Changes in the production characteristics such as increase in the subglottal air ...
详细信息
Speech produced by a speaker in emotionally charged situations, such as anger, happiness, and shout corresponds to high arousal speech. Changes in the production characteristics such as increase in the subglottal air pressure, increase in the glottal closed phase in each cycle, and increase in the rate of glottal vibration are observed in the high arousal speech. Acoustic parameters such as glottal closed quotient and fundamental frequency ( F-0) are used to characterize the high arousal speech. In this paper, high arousal is characterized by features extracted using the zero-time windowing (ZTW) method. The spectrum derived from the ZTW method emphasizes the instantaneous spectral characteristics in the speech signal. In the glottal open region, changes are clearly observed in the lower frequency range of the spectrum. Distinctive spectral features are observed during the glottal open region in the case of high arousal speech, when compared to neutral speech. These features are used to develop a method for identification of high arousal speech. Simple and maybe somewhat ad hoc rules, based on these features seem to give good performance in the identification of high arousal speech, even without using neutral speech as reference. (C) 2019 Acoustical Society of America.
In the half century since Edsger Dijkstra published “The Structure of the ‘THE’-multiprogramming System,” it has become clear that the ability to design a software system’s structure is at least as important as t...
详细信息
In the half century since Edsger Dijkstra published “The Structure of the ‘THE’-multiprogramming System,” it has become clear that the ability to design a software system’s structure is at least as important as the ability to design efficient algorithms or write code in a particular programming language. Although the word “structure” appeared in the paper’s title and was used seven more times, Dijkstra never defined the term. Closer examination revealed that he was discussing at least three distinct structures. His failure to define “structure,” or to clearly distinguish the structures that were important in his software, has led many to confuse those structures. This article aims to clarify what those structures are, their differences, and each one’s importance.
In order to study the influence of environmental wind on natural smoke exhaust characteristics, a railway passenger station is selected to study the smoke exhaust effect under windowing modes M1 which to open the wind...
详细信息
ISBN:
(数字)9781728153223
ISBN:
(纸本)9781728153230
In order to study the influence of environmental wind on natural smoke exhaust characteristics, a railway passenger station is selected to study the smoke exhaust effect under windowing modes M1 which to open the windward side window and M2 which to open the leeward side window with the wind speeds of 0 m/s, 3.4 m/s and 10 m/s by FDS software. The laws of smoke movement, temperature field changes and the impact on personnel safety evacuation are compared and analyzed in the station with different working conditions. The results show that when the wind speed is 0 m/s, both windowing modes M1 and M2 can effectively discharge the smoke out of the station. When the wind speed is 3.4 m/s, the windowing mode M1 exhaust time is delayed, the environment wind suppresses the natural smoke exhaust while the windowing mode M2 exhaust time is advanced, and the environment wind promotes the natural smoke exhaust. When the wind speed is 10 m/s strong wind, both of the windowing modes M1 and M2 natural smoke exhaust are invalid. The smoke layer height drops below 2 m from the ground after 600 s, and the visibility decreases to less than 10 m after 800 s, which seriously affects the personnel safety evacuation. Therefore, for the station, the windward exhaust window should be closed and the leeward exhaust window should be opened according to the wind direction to exhaust smoke in case of fire. For strong winds, due to the failure of natural smoke exhaust in both windowing modes M1 and M2, other measures should be taken to exhaust smoke.
The increasing amount of resources available on current GPUs sparked new interest in the problem of sharing its resources by different kernels. While new generations of GPUs support concurrent kernel execution, their ...
详细信息
The increasing amount of resources available on current GPUs sparked new interest in the problem of sharing its resources by different kernels. While new generations of GPUs support concurrent kernel execution, their scheduling decisions are taken by the hardware at runtime. The hardware decisions, however, heavily depend on the order at which the kernels are submitted to execution. In this work, we propose a novel optimization approach to reorder the kernels invocation focusing on maximizing the resources utilization, improving the average turnaround time. We model the kernel assignments to the hardware resources as a series of knapsack problems and use a dynamic programming approach to solve them. We evaluate our method using kernels with different sizes and resource requirements. Our results show significant gains in the average turnaround time and system throughput compared to the kernels submission implemented in modern GPUs.
Throughput of the system in multiprogramming and time sharing systems mainly depends on the careful scheduling of the CPU and other I/O devices. CPU scheduling should control the waiting time, response time, turnaroun...
详细信息
ISBN:
(纸本)9789811024719;9789811024702
Throughput of the system in multiprogramming and time sharing systems mainly depends on the careful scheduling of the CPU and other I/O devices. CPU scheduling should control the waiting time, response time, turnaround time, and number of context switches. One of the most extensively used scheduling algorithms is shortest next remaining time first (SRTF), which gives the reduced amount of average waiting time. But this algorithm suffers from some drawbacks. One such is that, every upcoming process if selected for execution, causes a context switch even though it is slightly shorter than the currently running process. As the number of such situations increases, the number of context switches increases, causing the reduction in performance of the system. In this paper, we modify the traditional SRTF to intelligent SRTF, by changing the decision of the preemption, to decrease the number of context switches. The main idea of our proposed algorithm is to make a context switch only if the next process plus context switch over head is shorter than the currently running process. By this we can reduce the number of context switches and thereby the performance of the system is improved.
NASA Technical Reports Server (Ntrs) 20060014983: Model Checking Real Time Java Using Java Pathfinder by NASA Technical Reports Server (Ntrs); published by
NASA Technical Reports Server (Ntrs) 20060014983: Model Checking Real Time Java Using Java Pathfinder by NASA Technical Reports Server (Ntrs); published by
暂无评论