RuQing Xu - BLIS & TBLIS on SVE and Apple AMX - AHUG SC21
Автор: ARM HPC User Group
Загружено: 2021-11-09
Просмотров: 701
Описание:
Abstract:
The portable open-source BLAS implementation BLIS was ported to SVE and Apple's AMX2. On A64FX, SVE512 chip manufactured by Fujitsu and used on the current reported world top supercomputer, this work yields higher performance than vendor BLAS for most of the level-3 test cases. On the other side, BLIS for AMX2 is an experimental work to push GEMM performance to the limit on Apple's M1 and A13 through A15 processors. It uses a hidden coprocessor in the mentioned chips to produce over 1.3 single-precision FLOPS/sec and over 2.5 half-precision FLOPS/sec. This might further provide ideas for porting BLIS onto the on-chip GPU by Apple.
Bio:
RuQing (G) Xu is a 3rd year postgrad in physics now in the University of Tokyo. He is primarily working on computational sciences in a solid state physics context, with a special focus on variational wavefunction optimization and tensor network methods.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: