hi,
Has anyone tried Qwen2-0.5B for the VOXL 2?
Here's are some requirements and compatibility:
- RAM Requirements for Qwen2-0.5B
Base model (FP16): ~1GB of RAM
Quantized to INT8: ~500MB of RAM
Quantized to INT4: ~250MB of RAM
VOXL 2 Specifications
The VOXL 2 has approximately 4GB of RAM available
It uses a Qualcomm QRB5165 processor
Has AI acceleration capabilities via the Hexagon DSP
Primarily designed for computer vision tasks
Compatibility Assessment
Qwen2-0.5B could potentially run on the VOXL 2 with quantization to INT4 or INT8, the memory footprint would be manageable.
Inference speed would likely be slow, perhaps 1-2 seconds per token
Question:
Has ModalAI explored using TensorFlow Lite with the Hexagon delegate to run compact language models like Qwen2-0.5B on the VOXL 2 platform? I'm interested in whether you've experimented with leveraging the Qualcomm GPU for language model inference, even though I understand the VOXL 2 is primarily optimized for computer vision workloads. Have you conducted any experiments or performance testing with small LLMs on this hardware?
Thanks.
suvasis