It is quite a headache for developers to online detect performance problems in large-scale cloud computing systems. The behavior and the hidden connections among the huge amount of runtime request execution paths in c...
详细信息
This paper presents Wi-FLIP, a vision-enabled WSN node resulting from the integration of FLIP-Q, a prototype vision chip, and Imotel, a commercial WSN platform. In Wi-FLIP, imageprocessing is not only constrained to ...
详细信息
This paper presents Wi-FLIP, a vision-enabled WSN node resulting from the integration of FLIP-Q, a prototype vision chip, and Imotel, a commercial WSN platform. In Wi-FLIP, imageprocessing is not only constrained to the digital domain like in conventional architectures. Instead, its image sensor - the FLIP-Q prototype - incorporates pixel-level processing elements (PEs) implemented by analog circuitry. These PEs are interconnected, rendering a massively parallel SIMD-based focal-plane array. Low-level imageprocessing tasks fit very well into this processing scheme. They feature a heavy computational load composed of pixel-wise repetitive operations which can be realized in parallel with moderate accuracy. In such circumstances, analog circuitry, not very precise but faster and more area- and power-efficient than its digital counterpart, has been extensively reported to achieve better performance. The Wi-FLIP's image sensor does not therefore output raw but pre-processed images that make the subsequent digital processing much lighter. The energy cost of such pre-processing is really low - 5.6mW for the worst-case scenario. As a result, for the configuration where the Imote2's processor works at minimum clock frequency, the maximum power consumed by our prototype represents only the 5.2% of the whole system power consumption. This percentage gets even lower as the clock frequency increases. We report experimental results for different algorithms, image resolutions and clock frequencies. The main drawback of this first version of Wi-FLIP is the low frame rate reachable due to the non-standard GPIO-based FLIPQ-to-Imote2 interface.
ISDEP (Integrator of Stochastic Differential Equations for Plasmas) is a Monte Carlo code that solves the plasma dynamics in a fusion device and perfectly scales on distributed computing platforms. Montera is a recent...
详细信息
ISDEP (Integrator of Stochastic Differential Equations for Plasmas) is a Monte Carlo code that solves the plasma dynamics in a fusion device and perfectly scales on distributed computing platforms. Montera is a recent framework developed for achieving Grid efficient executions of Monte Carlo applications, as ISDEP is. In this work, the improvement of performing the calculations of ISDEP with Montera, which rise up to 34.9%, is shown as well as an analysis on the implications it could have, which aim to show to the fusion research community the benefits of using Montera.
Optical coherence tomography (OCT) has become a promising diagnostic method in many medical fields. Non-invasive real-time optical biopsy of internal organs is one of the most attractive applications of OCT enabling i...
详细信息
ISBN:
(纸本)9780819484260
Optical coherence tomography (OCT) has become a promising diagnostic method in many medical fields. Non-invasive real-time optical biopsy of internal organs is one of the most attractive applications of OCT enabling in-situ diagnostic of cancer in its early stage, i.e. optical biopsy. For the application, faster OCT methods are required to reduce the inspection time and motion artifacts in images. A criterion to satisfy the purpose is an endoscopic-OCT method capable to display volumetric tomography continuously in real-time at a rate of video movie like conventional endoscopes. In our previous work, we demonstrated ultra-high speed OCT at an A-scan rate of 60 MHz. However, movies were rendered after the data acquisition. In this work, we have developed an ultra-fast data processing system, installed it in the ultra-high speed OCT system, and enabled real time display of various 3D tomography images without limitation of diagnostic time, i.e. 4D OCT imaging, at an A-scan rate, B-scan rate and volume rate of 10 MHz, 4 kHz and 12 volumes/sec, respectively. Various image presentations in real-time are demonstrated such as continuous rendered 3D imaging and continuous 2D-slice scanning 3D imaging.
Increasingly complex systems need parallelized simulation engines. In the context of SystemC simulation, existing proposals require predicting communication in the simulated system. However, this is often unpredictabl...
详细信息
Increasingly complex systems need parallelized simulation engines. In the context of SystemC simulation, existing proposals require predicting communication in the simulated system. However, this is often unpredictable. In order to deal with unpredictable systems, this paper presents a parallelization approach using asynchronous communication without modification of the SystemC simulation engine. Simulated system model is cut up and distributed across separate simulation engines, each part being evaluated in parallel of others. Functional consistency is preserved thanks to the simulated system write exclusive memory access policy while temporal consistency is guaranteed using explicit synchronization. Experimental results show up a speed-up up to 13× on 16 processors.
During the present decade, emerging architectures like multicore CPUs and graphics processing units (GPUs) have steadily gained popularity for their ability to deploy high computational power at a low cost. In this pa...
详细信息
During the present decade, emerging architectures like multicore CPUs and graphics processing units (GPUs) have steadily gained popularity for their ability to deploy high computational power at a low cost. In this paper, we combine parallelization techniques on a cooperative cluster of multicore CPUs and multisocket GPUs to apply their joint computational power to an automatic image registration algorithm intended for the analysis of high-resolution microscope images. Registration methods pose a computational challenge within the biomedical field due to the large size of microscope image data sets, which typically extend to the Terabyte scale. We analyze this application to identify those parts which are more favorable to the CPU and GPU execution models and decompose the process accordingly. Performance results are presented for two sets of images: mouse placenta (16K × 16K pixels) and mouse mammary tumor (23K × 62K pixels). Execution times are shown on different multi-node, multi-socket and multi-core configurations to provide performance insights about the most effective approach.
The segmentation of tissue regions in high-resolution microscopy is a challenging problem due to both the size and appearance of digitized pathology sections. The two point correlation function (TPCF) has proved to be...
详细信息
The segmentation of tissue regions in high-resolution microscopy is a challenging problem due to both the size and appearance of digitized pathology sections. The two point correlation function (TPCF) has proved to be an effective feature to address the textural appearance of tissues. However the calculation of the TPCF functions is computationally burdensome and often intractable in the gigapixel images produced by slide scanning devices for pathology application. In this paper we present several approaches for accelerating deterministic calculation of point correlation functions using theory to reduce computation, parallelization on distributed systems, and parallelization on graphics processors. Previously we show that the correlation updating method of calculation offers an 8-35x speedup over frequency domain methods and decouples efficient computation from the select scales of Fourier methods. In this paper, using distributed computation on 64 compute nodes provides a further 42x speedup. Finally, parallelization on graphics processors (GPU) results in an additional 11-16x speedup using an implementation capable of running on a single desktop machine.
A low power vision system has been developed incorporating the SCAMP3 pixel-parallel processor array vision chip. A test algorithm to detect loitering targets has shown an average power consumption of
A low power vision system has been developed incorporating the SCAMP3 pixel-parallel processor array vision chip. A test algorithm to detect loitering targets has shown an average power consumption of
Due to FPGA's flexibility and parallelism, it is popular for accelerating imageprocessing. In this paper, a double-parallel architecture based on FPGA has been exploited to speed up median filter and edge detecti...
详细信息
Due to FPGA's flexibility and parallelism, it is popular for accelerating imageprocessing. In this paper, a double-parallel architecture based on FPGA has been exploited to speed up median filter and edge detection tasks, which are essential steps during imageprocessing. The double-parallel scheme includes an image-level parallel and an operation-level parallel. The image-level parallel is a high-level parallel which divides one image into different parts and processes them concurrently. The operation-level parallel, which is embedded in each image-level parallel thread, fully explores every parallel part inside the concrete algorithms. The corresponding design is based on a DE2 Development Board which contains a CYCLONE II FPGA device. Meanwhile, the same task has also been implemented on PC and DSP for performance comparison. Despite the fact that operating frequencies of used PC and DSP are much higher than FPGA's, FPGA costs less time per computed image than both of them. By taking advantage of the double-parallel technique, the speed/frequency ratio of FPGA is 202 times faster than PC and 147 times faster than DSP. Finally, a detailed discussion about different platforms is conducted, which analyzes advantages and disadvantages of used computing platforms. This paper reveals that the proposed double-parallel scheme can dramatically speed up imageprocessingmethods even on a low-cost FPGA platform with low frequency and limited resources, which is very meaningful for practical applications.
In the present work, automatic corner detection in soccer games based on image features (e.g., object-based features) has been studied. For this purpose, a framework has been proposed that consists of five steps. This...
详细信息
In the present work, automatic corner detection in soccer games based on image features (e.g., object-based features) has been studied. For this purpose, a framework has been proposed that consists of five steps. This paper mainly focuses on the first three steps and specially ball detection step. Ball position on the field plays an important role in determining which event has occurred in the game. Therefore, it is necessary to detect exact position of the ball in the playfield and then track it. Ball trajectory that can be obtained via tracking is useful for identifying and detecting main events of soccer games. In these three steps, the most important processing that has been applied to the images is based on image segmentation to detect playfield, field lines, and ball. Cleaning morphological method is applied to detect the ball. This method is real-time and automatic, it yields superior results in comparison with other common methods such as Template Matching and Circular Hough Transform (CHT). By applying the proposed method, only one candidate for the ball position is obtained and non-ball candidates are removed, hence, it is more reliable than other methods. The results of the proposed method are compared with those of CHT. They illustrate that the proposed method is fast, effective, and reliable.
暂无评论