You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 30, 2024. It is now read-only.
* save
* save(some error with kslicing)
* fix kslicing bug
* save(g128 MTL 270Gflops bug on g32)
save(g128 MTL 270Gflops bug on g32)
add UT for gemv
* add Specialized for FPU
* support int scale col_major(with opt 10% perf when g = 32)
* support int4x8 for int32 weight
* save(perf bug with int4x8 load)
* save
* add first token UT
* opt mma code
* opt perf for int4x8
* support load one fp16 data
* support zero_pt
* support ASYM and SYM
* save
* ut improve
* support sg_n > 1
* add #pragma unroll
* support HF zero pt layout K x N, compress int4 along N dimensions
* save
* sg_m =4 for first token
* Extract dequant func
* update row_major for origin PVC/ARC template
* save(fix HPC 2D load)
* fix XEHPC 2D load
* fix compile for all UT
* sync ipex 20240618
* opt PVC arch
* fix group_qkv
* fix group_qkv
* fix sdp bug
* channel num ->1 8 16 32
* remove comments of unused code
* add -ftemplate-backtrace-limit=0 only UNIX
---------
Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
Co-authored-by: Ding, Yi1 <yi1.ding@intel.com>
0 commit comments