An accurate prediction of scheduling and execution of instruction streams is a necessary prerequisite for predicting the in-core performance behavior of throughput-bound loop kernels on out-of-order processor architec...
详细信息
We present a methodology for proposing and evaluating components of cognitive architectures comprising complementary approaches: hypothesizing and behavioral mechanisms, evaluating the aptness of those mechanisms in p...
详细信息
ISBN:
(数字)9781728147932
ISBN:
(纸本)9781728147949
We present a methodology for proposing and evaluating components of cognitive architectures comprising complementary approaches: hypothesizing and behavioral mechanisms, evaluating the aptness of those mechanisms in providing accounts of human behaviors; embed models of hypothesized mechanisms within simulationsystems and observe synthesized model behaviors. We illustrate these theoretical approaches with examples.
The quest for novel computing architectures is currently driven by (1) machine learning applications and (2) the need to reduce power consumption. To address both needs, we present a novel hierarchical reservoir compu...
详细信息
The quest for novel computing architectures is currently driven by (1) machine learning applications and (2) the need to reduce power consumption. To address both needs, we present a novel hierarchical reservoir computing architecture that relies on energy-efficient memcapacitive devices. Reservoir computing is a new brain-inspired machine learning architecture that typically relies on a monolithic, i.e., unstructured, network of devices. We use memcapacitive devices to perform the computations because they do not consume static power. Our results show that hierarchical memcapacitive reservoir computing device networks have a higher kernel quality, outperform monolithic reservoirs by 10%, and reduce the power consumption by a factor of 3.4× on our benchmark tasks. The proposed new architecture is relevant for building novel, adaptive, and power-efficient neuromorphic hardware with applications in embeddedsystems, the Internet-of-Things, and robotics.
Cyber-Physical systems (CPS) are networks of heterogeneous embeddedsystems immersed within a physical environment. modeling such heterogeneous systems is actively researched. However, there still lacks a systematic a...
详细信息
ISBN:
(纸本)9781538626672
Cyber-Physical systems (CPS) are networks of heterogeneous embeddedsystems immersed within a physical environment. modeling such heterogeneous systems is actively researched. However, there still lacks a systematic approach to model characteristics of CPS. To solve the problem, we propose a flexible co-modeling approach that relies on SysML/MARTE/pCCSL to capture different aspects of CPS, including structure, behavior, clock constraints and NFP. The novelty of our approach lies in the use of logical clocks and SysML/MARTE/pCCSL to drive and coordinate different models, which supports a standard language-based modeling for CPS. To capture the characteristics of CPS such as stochastic behavior and continuous behavior, we extend some meta-models of SysML/MARTE. For the block diagram, we extend it with four new stereotypes of blocks and the type of model. We adopt a new stereotype FMIConnection to describe the information transmission between blocks, which denotes that the blocks will be exported as corresponding FMU components. It will be of great benefit to the co-simulation of CPS. For the state machine diagram, we attach Ordinary Differential Equation (ODE) and TimedDelay to a state, which models the continuous behavior and stochastic time delay. The consistency between various models is specified with pCCSL. To implement our approach, we develop the toolset based on GEMOC. Finally, to demonstrate the feasibility of our co-modeling approach, we present some multi-view models of an energy-aware building as a case study.
To enable energy-efficient embedded execution of Deep Neural Networks (DNNs), the critical sections of these workloads, their multiply-accumulate (MAC) operations, need to be carefully optimized. The SotA pursues this...
详细信息
ISBN:
(纸本)9781538678855
To enable energy-efficient embedded execution of Deep Neural Networks (DNNs), the critical sections of these workloads, their multiply-accumulate (MAC) operations, need to be carefully optimized. The SotA pursues this through run-time precision-scalable MAC operators, which can support the varying precision needs of DNNs in an energy-efficient way. Yet, to implement the adaptable precision MAC operation, most SotA solutions rely on separately optimized low precision multipliers and a precision-variable accumulation scheme, with the possible disadvantages of a high control complexity and degraded throughput. This paper, first optimizes one of the most effective SotA techniques to support fully-connected DNN layers. This mode, exploiting the transformation of a high precision multiplier into independent parallel low-precision multipliers, will be called the Sum Separate (SS) mode. In addition, this work suggests an alternative low-precision scheme, i.e. the implicit accumulation of multiple low precision products within the multiplier itself, called the Sum Together (ST) mode. Based on the two types of MAC arrangements explored, corresponding architectures have been proposed to implement DNN processing. The two architectures, yielding the same throughput, are compared in different working precisions (2/4/8/16-bit), based on Post-Synthesis simulation. The result shows that the proposed ST-Mode based architecture outperforms the earlier SS-Mode by up to ×1.6 on Energy Efficiency (TOPS/W) and ×1.5 on Area Efficiency (GOPS/mm 2 ).
Hardware-Software Co-Design is a highly successful strategy for improving performance of domain-specific computing systems. We argue for the application of the same methodology to deep learning; specifically, we propo...
详细信息
ISBN:
(数字)9781728144849
ISBN:
(纸本)9781728144856
Hardware-Software Co-Design is a highly successful strategy for improving performance of domain-specific computing systems. We argue for the application of the same methodology to deep learning; specifically, we propose to extend neural architecture search with information about the hardware to ensure that the model designs produced are highly efficient in addition to the typical criteria around accuracy. Using the task of keyword spotting in audio on edge computing devices, we demonstrate that our approach results in neural architecture that is not only highly accurate, but also efficiently mapped to the computing platform which will perform the inference. Using our modified neural architecture search, we demonstrate 0.88% increase in TOP-1 accuracy with 1.85× reduction in latency for keyword spotting in audio on an embedded SoC, and 1.59× on a high-end GPU.
As we approach the limits of Moore's law, there is increasing interest in non-Von Neuman architectures such as neuromorphic computing to take advantage of improved compute and low power capabilities. Spiking neura...
详细信息
Approximate computing techniques are often used to improve the performance of applications that can tolerate some amount of impurity in the calculations or data. In the context of embedded and mobile systems, a broad ...
详细信息
ISBN:
(数字)9781728144849
ISBN:
(纸本)9781728144856
Approximate computing techniques are often used to improve the performance of applications that can tolerate some amount of impurity in the calculations or data. In the context of embedded and mobile systems, a broad number of applications have exploited approximation techniques to improve performance and overcome the limited capabilities of the hardware. On such systems, even small performance improvements can be sufficient to meet scheduled requirements such as hard real-time deadlines. We study the approximation of memory-bound applications on mobile GPUs using kernel perforation, an approximation technique that exploits the availability of fast GPU local memory to provide high performance with more accurate results. Using this approximation technique, we approximated six applications and evaluated them on two mobile GPU architectures with very different memory layouts: a Qualcomm Adreno 506 and an ARM Mali T860 MP2. Results show that, even when the local memory is not mapped to dedicated fast memory in hardware, kernel perforation is still capable of 1.25× speedup because of improved memory layout and caching effects. Mobile GPUs with local memory show a speedup of up to 1.38×.
With the growing complexity of embeddedsystems, a systematic design process and tool are vital to help designers assure that their design meets specifications. The design of an embedded system evolves through multipl...
详细信息
In recent years, Artificial Intelligence (AI) has been widely deployed in a variety of business sectors and industries, yielding numbers of revolutionary applications and services that are primarily driven by high-per...
详细信息
ISBN:
(纸本)9781538678855
In recent years, Artificial Intelligence (AI) has been widely deployed in a variety of business sectors and industries, yielding numbers of revolutionary applications and services that are primarily driven by high-performance computation and storage facilities in the cloud. On the other hand, embedding intelligence into edge devices is highly demanded by emerging applications such as autonomous systems, human-machine interactions, and the Internet of Things (IoT). In these applications, it is advantageous to process data near or at the source of data to improve energy & spectrum efficiency and security, and decrease latency. Although the computation capability of edge devices has increased tremendously during the past decade, it is still challenging to perform sophisticated AI algorithms in these resource-constrained edge devices, which calls for not only low-power chips for energy efficient processing at the edge but also a system-level framework to distribute resources and tasks along the edge-cloud continuum. In this overview, we summarize dedicated edge hardware for machine learning from embedded applications to sub-mW “always-on” IoT nodes. Recent advances of circuits and systems incorporating joint design of architectures and algorithms will be reviewed. Fog computing paradigm that enables processing at the edge while still offering the possibility to interact with the cloud will be covered, with focus on opportunities and challenges of exploiting fog computing in AI as a bridge between the edge device and the cloud.
暂无评论