For large scale inverse problems, inference can be tackled with distributed algorithms, dividing the task over multiple computing nodes or cores referred to as workers. Since random sampling methods yield not only est...
详细信息
ISBN:
(纸本)9789082797091
For large scale inverse problems, inference can be tackled with distributed algorithms, dividing the task over multiple computing nodes or cores referred to as workers. Since random sampling methods yield not only estimates but also credibility intervals, we leverage data augmentations and MCMC algorithms to design a distributed sampler. In contrast with usual approaches relying on a client-server architecture, we propose a flexible distributed sampler relying on a singleprogrammultipledata implementation, in which all workers have a similar task. This distributed strategy allows the computing time and volume of communications to be reduced by separately handling blocks of data and parameters on different workers. Experiments on a large synthetic image inpainting problem illustrate the performance of the proposed approach to produce high quality estimates in a small amount of time.
A spectral General Circulation Model at horizontal resolutions T21 and T42 has been integrated upto 30 d on 16 and 32 processors of Meiko T800. The model at resolution T21 is also implemented on 16 processors (T800) o...
详细信息
A spectral General Circulation Model at horizontal resolutions T21 and T42 has been integrated upto 30 d on 16 and 32 processors of Meiko T800. The model at resolution T21 is also implemented on 16 processors (T800) of a parallel computer (CHIPPS) built in India. The wallclock timings of model integration for 1, 10 and 30 d are noted and the speedup and efficiency of 16 and 32 processors have been computed. Results show that a T42 parallel model with nine levels in the vertical takes less than 36 elapsed minutes on 32 processors for 1 d integration. In case of T21 model integration, the maximum speedup and efficiency achieved on 16 processors are about 10 and 63%, respectively. When the horizontal resolution of the model is doubled to T42, the maximum speedup and efficiency obtained on 32 processors are about 9 and 29%, respectively. It is also found that when the physical parametrisation schemes are included in the model and thereby the number of arithmetic operations are increased, the speedup and efficiency of 16 as well as 32 processors increase compared to the case with no physics in the model.
暂无评论