The java vector api (JVA) is a novel feature of the java Virtual Machine (JVM), allowing developers to express vector computations that are automatically translated to vector hardware instructions at runtime. This pap...
详细信息
ISBN:
(纸本)9798331505356;9798331505349
The java vector api (JVA) is a novel feature of the java Virtual Machine (JVM), allowing developers to express vector computations that are automatically translated to vector hardware instructions at runtime. This paper focuses on the vectorization capability of the api, which has not been studied by the literature yet. We investigate how effective the JVA is in automatically vectorizing scalar instructions, comparing it with the auto-vectorization capability of the HotSpot C2 compiler. Our results show that using the JVA results in much fewer (i.e., 79.62% on average on processors supporting AVX-512) instructions executed than using C2 auto-vectorization for carrying out the same work.
Several methods of the java Class Library ( JCL) rely on vectorized intrinsics. While these intrinsics undoubtedly lead to better performance, implementing them is extremely challenging, tedious, error-prone, and sign...
详细信息
ISBN:
(纸本)9798400704444
Several methods of the java Class Library ( JCL) rely on vectorized intrinsics. While these intrinsics undoubtedly lead to better performance, implementing them is extremely challenging, tedious, error-prone, and significantly increases the effort in understanding and maintaining the code. Moreover, their implementation is platform-dependent. An unexplored, easier-to-implement alternative is to replace vectorized intrinsics with portable java code using the java vector api. However, this is attractive only if the java code achieves similar steady-state performance as the intrinsics. This paper shows that this is the case. We focus on the hashCode and equals computations for byte arrays. We replace the platform-dependent vectorized intrinsics with pure-java code employing the java vector api, resulting in similar steady-state performance. We show that our java implementations are easy to fine-tune by exploiting characteristics of the input (i.e., the array length), while such tuning would be much more difficult and cumbersome in a vectorized intrinsic. Additionally, we propose a new vectorized hashCode computation for long arrays, for which a corresponding intrinsic is currently missing. We evaluate the performance of the tuned implementations on four popular benchmark suites, showing that the performance are in line with those of the original OpenJDK 21 with intrinsics. Finally, we describe a general approach to integrate code using the java vector api into the core classes of the JCL, which is challenging because premature use of the java vector api would crash the JVM during its fragile initialization phase. Our approach can be adopted by developers to modify JCL classes without any changes to the native codebase.
暂无评论