With the increasing prevalence of mobile robots, there is a growing demand for powerful system software. Integrating multiple operating systems into a single platform has become necessary, and virtualization offers a ...
详细信息
ISBN:
(纸本)9798400706165
With the increasing prevalence of mobile robots, there is a growing demand for powerful system software. Integrating multiple operating systems into a single platform has become necessary, and virtualization offers a cost-effective solution for managing multiple OSes. While several types of hypervisors for embeddedsystems have been proposed to manage guest OSes, significant work is still needed to make hypervisors applicable in robot systems. This paper introduces SmartVisor, a microkernel-based hypervisor designed explicitly for robotic systems. Our work focuses on designing the virtual machine management module to improve robot performance and user experience. The goal is to ensure that the hypervisor-based robot meets the demands of real-world scenarios for robot developers and users.
Running machine learning inference on tiny devices, known as TinyML, is an emerging research area. This task requires generating inference code that uses memory frugally, a task that standard ML frameworks are ill-sui...
详细信息
ISBN:
(纸本)9798400701740
Running machine learning inference on tiny devices, known as TinyML, is an emerging research area. This task requires generating inference code that uses memory frugally, a task that standard ML frameworks are ill-suited for. A deployment framework for TinyML must a) be parametric in the number representation to take advantage of the emerging representations like posits, b) carefully assign high-precision to a few tensors so that most tensors can be kept in low-precision while still maintaining model accuracy, and c) avoid memory fragmentation. We describe MinUn, the first TinyML framework that holistically addresses these issues to generate efficient code for ARM microcontrollers (e.g., Arduino Uno, Due and STM32H747) that outperforms the prior TinyML
The Princeton ZebraNet project is a collaboration of engineers and biologists to build mobile, wireless embeddedsystems for wildlife tracking. Over the lifetime of the project, we have implemented a number of compres...
详细信息
ISBN:
(纸本)159593362X
The Princeton ZebraNet project is a collaboration of engineers and biologists to build mobile, wireless embeddedsystems for wildlife tracking. Over the lifetime of the project, we have implemented a number of compression, communication, and data management algorithms specifically tailored for the small memory, constrained energy and sparse connectivity of these long-lifetime systems. We have gone through three major generations of hardware and software implementations, and have done two successful real-world deployments on Plains Zebras in Kenya, with a third deployment planned for Summer, 2007. In this talk, I will discuss our real-life experiences with Grafting embeddedsystems hardware and software, and our deployment experiences in Africa. I will also put forward a vision for how portability, reliability, and energy-efficiency can be well-supported in future embeddedsystems.
The automotive industry has a growing demand for the seamless integration of safety analysis tools into the model-based development toolchain for embeddedsystems. This requires translating concepts of the automotive ...
详细信息
embeddedsystems have limited energy resources. Hence, they should conserve these resources to extend their period of operation. Recently, dynamic frequency scaling (DFS) and dynamic voltage scaling (DVS) have been ad...
详细信息
ISBN:
(纸本)9781581135275
embeddedsystems have limited energy resources. Hence, they should conserve these resources to extend their period of operation. Recently, dynamic frequency scaling (DFS) and dynamic voltage scaling (DVS) have been added to a various embedded processors as a means to increase battery life. A number of scheduling techniques have been developed to exploit DFS and DVS for real-time systems to reduce energy consumption. These techniques exploit idle and slack time of a schedule. Idle time can be consumed by lowering the processor frequency of selected tasks while slack time allows later tasks to execute at lower frequencies with reduced voltage demands. Our work delivers energy savings beyond the level of prior work. We enhance the earliest-deadline first (EDF) scheduling to exploit slack time generated by the invocation of the task at multiple frequency levels within the same invocation. The technique relies strictly on operating system support within the scheduler to implement the approach. Early scaling at a low frequency, determined by a feedback mechanism and facilitated by a slack-passing scheme, capitalizes on high probabilities of a task to finish its execution without utilizing its worst-case execution budget. If a task does not complete at a certain point in time within its low frequency range, the remainder of it continues to execute at a higher frequency. Our experiments demonstrate that the resulting energy savings exceed those of previously published work by up to 33%. In addition, our method only adds a constant complexity at each scheduling point, which has not been achieved by prior work, to the best of our knowledge.
Power conservation has become a key design issue for many systems, including clusters deployed for embeddedsystems, where power availability ultimately determines system lifetime. These clusters execute a high rate o...
详细信息
ISBN:
(纸本)9781595930187
Power conservation has become a key design issue for many systems, including clusters deployed for embeddedsystems, where power availability ultimately determines system lifetime. These clusters execute a high rate of requests of highly-variable length, such as in satellite-based multiprocessor systems. The goal of power management in such systems is to minimize the aggregate energy consumption of the whole cluster while ensuring timely responses to requests. In the past, dynamic voltage scaling (DVS) and on/off schemes have been studied under the assumptions of continuously tunable processor frequencies and perfect load-balancing. In this work, we focus on the more realistic case of discrete processor frequencies and propose a new policy that adjusts the number of active nodes based on the system load, not system frequency. We also design a threshold scheme which prevents the system from reacting to short-lived temporary workload changes in the presence of unstable incoming workload. Simulation and implementation results on real hardware show that our policy is very effective in reducing the overall power consumption of clusters executing embedded applications.
Tensor contraction is a fundamental operation in many algorithms with a plethora of applications ranging from quantum chemistry over fluid dynamics and image processing to machine learning. The performance of tensor c...
详细信息
ISBN:
(纸本)9781450367240
Tensor contraction is a fundamental operation in many algorithms with a plethora of applications ranging from quantum chemistry over fluid dynamics and image processing to machine learning. The performance of tensor computations critically depends on the efficient utilization of on-chip memories. In the context of low-power embedded devices, efficient management of the memory space becomes even more crucial, in order to meet energy constraints. This work aims at investigating strategies for performance- and energy-efficient tensor contractions on embeddedsystems, using racetrack memory (RTM)-based scratch-pad memory (SPM). Compiler optimizations such as the loop access order and data layout transformations paired with architectural optimizations such as prefetching and preshifting are employed to reduce the shifting overhead in RTMs. Experimental results demonstrate that the proposed optimizations improve the SPM performance and energy consumption by 24% and 74% respectively compared to an iso-capacity SRAM.
This paper describes the FlexCC2 register allocation framework. FlexCC2 is an optimizing retargetable C compiler for embedded processors, and in particular for DSP processors. embedded processors often contain feature...
详细信息
ISBN:
(纸本)9781581138061
This paper describes the FlexCC2 register allocation framework. FlexCC2 is an optimizing retargetable C compiler for embedded processors, and in particular for DSP processors. embedded processors often contain features such as irregular and constrained register sets that complicate register allocation, making traditional methods inefficient. In this paper, we present a register allocation framework specifically tailored for embedded processor specificities. This framework has been integrated in the FlexCC2 production compiler and is used by FlexCC2 customers.
In embeddedsystems, controlling a shared resource like the bus, or improving a property like power consumption, may be hard to achieve when programming device drivers individually. There is a need for global resource...
详细信息
ISBN:
(纸本)9781450305556
In embeddedsystems, controlling a shared resource like the bus, or improving a property like power consumption, may be hard to achieve when programming device drivers individually. There is a need for global resource control, taking decisions based on a centralized view of the devices' states. In this paper, we study power consumption in sensor networks, where the nodes are small embeddedsystems powered by batteries. We concentrate on the hardware/software architecture of a node, where significant gains can be achieved by controlling the consumption modes of the various devices globally. The architecture we propose involves a simple adaptation of the application level, to communicate with the hardware via a control layer. The control layer itself is built from a set of simple automata: the drivers of the devices, whose states correspond to power consumption modes, and a controller that enforces global properties. All these automata are programmed using a synchronous language, whose compiler performs static scheduling and produces a single piece of C code. We explain the approach in details, demonstrate its use with either Contiki or a traditional multithreading operating system, and report on our experiments.
Accurate timing analysis is key to efficient embedded system synthesis and integration. Caches are needed to increase the processor performance but they are hard to use because of their complex behavior especially in ...
详细信息
ISBN:
(纸本)9781595930187
Accurate timing analysis is key to efficient embedded system synthesis and integration. Caches are needed to increase the processor performance but they are hard to use because of their complex behavior especially in preemptive scheduling. Current approaches use simplified assumptions or propose exponentially complex analysis algorithms to bound the cache related preemption delay at a context switch. Existing approaches consider only direct mapped caches or propose non conservative approximation for set associative caches. In this paper we propose a novel cache related preemption delay analysis for set-associative instruction caches where the designer can adjust the analysis precision by scaling the problem complexity. Furthermore, this precise preemption delay analysis is integrated into a scheduling analysis to determine the response time of tasks accurately. In experiments we evaluate this tradeoff between analysis precision and analysis time. The results show an improvement of 22 %-71 % in analysis precision of cache related preemption delay and 5 %-21 % in response time analysis compared to previous conservative approaches.
暂无评论