The recently discovered '***' worm has drastically changed the perception that systems managing critical infrastructure are invulnerable to software security attacks. Here we present an architecture that enhan...
详细信息
Modern supercomputers consist of clusters of thousands of independent nodes interconnected through fast networks. These nodes run independent operating system kernels, thus synchronization among them is demanded for u...
详细信息
Modern supercomputers consist of clusters of thousands of independent nodes interconnected through fast networks. These nodes run independent operating system kernels, thus synchronization among them is demanded for user mode programs. This means that temporal synchronization of the nodes is a daunting task. On the other hand, HPC cluster applications often require a rather strict temporal synchronization for activities like performance analysis, application debugging, or data checkpointing. Therefore, the performance of an HPC parallel application may be severely impaired by the lack of temporal synchronization among the activities of the nodes of the cluster; this poses a severe limit on the scalability of such architectures. In this paper we introduce CAOS, an extension of the Linux kernel that aims to address the temporal synchronization problems of modern HPC clusters. We describe the general ideas behind CAOS, and we discuss some details of a possible implementation. We also illustrate some experiments performed on a prototype implementation of CAOS including a centralized network time tick, which allows a master node to synchronize the activities of all other nodes in the cluster, and a specific task scheduler tailored for HPC applications. These experiments, performed on a modern HPC cluster, witness that this new component has no measurable impact on the efficiency of the nodes while reducing the OS noise and providing better performance prediction. An implementation of CAOS based on this component can achieve a significant gain in terms of synchronization, global control, and scalability of the cluster.
The next phase of LHC Operations-High Luminosity LHC (HL-LHC), which is aimed at ten-fold increase in the luminosity of proton-proton collisions at the energy of 14 TeV, is expected to start operation in 2027-2028 and...
详细信息
The next phase of LHC Operations – High Luminosity LHC (HL-LHC), which is aimed at ten-fold increase in the luminosity of proton-proton collisions at the energy of 14 TeV, is expected to start operation in 2027-2028 ...
The next phase of LHC Operations – High Luminosity LHC (HL-LHC), which is aimed at ten-fold increase in the luminosity of proton-proton collisions at the energy of 14 TeV, is expected to start operation in 2027-2028 and will deliver an unprecedented scientific data volume of multi-exabyte scale. This amount of data has to be stored and the corresponding storage system should ensure fast and reliable data delivery for processing by scientific groups distributed all over the world. The present LHC computing and data processing model will not be able to provide the required infrastructure growth even taking into account the expected hardware technology evolution. To address this challenge the new state-of-the-art computing infrastructure technologies are now being developed and are presented here. The possibilities of application of the HL-LHC distributed data handling technique for other particle and astro-particle physics experiments dealing with large-scale data volumes like DUNE, LSST, Belle-II, JUNO, SKAO etc. are also discussed.
暂无评论