Hey,
just discovered this repo - cool that you try to push the maximum out of M-series hardware!
Considering some open bullets in the readme, could you elaborate a little whether inferencing speed for the newly released model below is already maxing out what would be potentially possible?
https://huggingface.co/microsoft/bitnet-b1.58-2B-4T
Also, are you still working on this?
Thanks!
Hey,
just discovered this repo - cool that you try to push the maximum out of M-series hardware!
Considering some open bullets in the readme, could you elaborate a little whether inferencing speed for the newly released model below is already maxing out what would be potentially possible?
https://huggingface.co/microsoft/bitnet-b1.58-2B-4T
Also, are you still working on this?
Thanks!