Discovering a suitable neural network architecture for modeling complex dynamical systems poses a formidable challenge, often involving extensive trial and error and navigation through a high-dimensional hyper-paramet...
详细信息
A serious challenge when finding influential actors in real-world social networks, to enable efficient community-wide interventions, is the lack of knowledge about the structure of the underlying network. Current stat...
详细信息
A serious challenge when finding influential actors in real-world social networks is the lack of knowledge about the structure of the underlying network. Current state-of-the-art methods rely on hand-crafted sampling ...
详细信息
Two-dimensional Summed Area Tables (SAT) is a fundamental primitive used in image processing and machine learning applications. We present a collection of optimization methods for computing SAT on CUDA-enabled GPUs. C...
详细信息
Two-dimensional Summed Area Tables (SAT) is a fundamental primitive used in image processing and machine learning applications. We present a collection of optimization methods for computing SAT on CUDA-enabled GPUs. Conventional approaches rely on computing the prefix sum in one dimension in parallel, transposing the matrix, then computing the prefix sum for the other dimension in parallel. Additionally, conventional methods use the scratchpad memory as cache. We propose a collection of algorithms that are scalable with respect to problem size. We use the register cache technique instead of the scratchpad memory and also employ a naive serial scan on the thread level for computing the prefix sum for one of the dimensions. Using a novel transpose-in-registers method we increase the inter-thread parallelism and outperform conventional SAT implementations. In addition, we significantly reduce both the communication between threads and the number of arithmetic instructions. On an Nvidia Pascal P100 GPU and Volta V100, our evaluations demonstrate that our implementations outperform state of the art libraries and yield up to 2.3x and 3.2x speedup over OpenCV and Nvidia NPP libraries, respectively.
Street light poles will be a key enabler for a smart city's hardware infrastructure, thanks to their ubiquity throughout the city as well as access to power. We propose an IoT test bed around light poles for the c...
详细信息
Street light poles will be a key enabler for a smart city's hardware infrastructure, thanks to their ubiquity throughout the city as well as access to power. We propose an IoT test bed around light poles for the c...
详细信息
ISBN:
(纸本)9781450349666
Street light poles will be a key enabler for a smart city's hardware infrastructure, thanks to their ubiquity throughout the city as well as access to power. We propose an IoT test bed around light poles for the city, with a modular hardware and software architecture to enable experimentation with various technologies.
暂无评论