Alibaba's LLM Qwen2-0.5B for VOXL 2

Suvasis M

hi,
Has anyone tried Qwen2-0.5B for the VOXL 2?

Here's are some requirements and compatibility:

RAM Requirements for Qwen2-0.5B

Base model (FP16): ~1GB of RAM
Quantized to INT8: ~500MB of RAM
Quantized to INT4: ~250MB of RAM

VOXL 2 Specifications

The VOXL 2 has approximately 4GB of RAM available
It uses a Qualcomm QRB5165 processor
Has AI acceleration capabilities via the Hexagon DSP
Primarily designed for computer vision tasks

Compatibility Assessment
Qwen2-0.5B could potentially run on the VOXL 2 with quantization to INT4 or INT8, the memory footprint would be manageable.

Inference speed would likely be slow, perhaps 1-2 seconds per token

Question:

Has ModalAI explored using TensorFlow Lite with the Hexagon delegate to run compact language models like Qwen2-0.5B on the VOXL 2 platform? I'm interested in whether you've experimented with leveraging the Qualcomm GPU for language model inference, even though I understand the VOXL 2 is primarily optimized for computer vision workloads. Have you conducted any experiments or performance testing with small LLMs on this hardware?
Thanks.
suvasis