-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Summary
Many LLM models take tensors of 64-bit integers at input/output (GPT2, etc). If one uses them from ONNX Runtime or CUDA, both natively support long integers. OneDNN doesn't, even in the most basic way of creating a memory descriptor, or data reordering.
Problem statement
The users of oneDNN (usually higher-level libraries) typically rely on DNNL as a base to build tensors on CPUs. Since OneDNN doesn't suport 64-bit integers, such implementations can't ingest data coming form other sources that work with them, such as cuDNN (supports long integers) or ONNX Runtime (also support long tensors).
Preferred solution
The solution doesn't even to cover most of oneDNN routines, since these would probably be complicated to implement. Here's what I think is not very complicated, and yet will increase the usefulness of oneDNN:
- Support 64-bits integers as data types in dnnl_memory_desc
- Support reorder primitive, and other simple-ish primitives, even with sub-optimal performance
Nobody is going to use 64-bit integers as their first choice, but sometimes there's data, and at leas enable oneDNN to recognize this data, and easily convert it into a more optimal data-type (int32, float, or whatever).