Alibaba's LLM Qwen2-0.5B for VOXL 2
-
hi,
Has anyone tried Qwen2-0.5B for the VOXL 2?Here's are some requirements and compatibility:
- RAM Requirements for Qwen2-0.5B
Base model (FP16): ~1GB of RAM
Quantized to INT8: ~500MB of RAM
Quantized to INT4: ~250MB of RAMVOXL 2 Specifications
The VOXL 2 has approximately 4GB of RAM available
It uses a Qualcomm QRB5165 processor
Has AI acceleration capabilities via the Hexagon DSP
Primarily designed for computer vision tasksCompatibility Assessment
Qwen2-0.5B could potentially run on the VOXL 2 with quantization to INT4 or INT8, the memory footprint would be manageable.Inference speed would likely be slow, perhaps 1-2 seconds per token
Question:
Has ModalAI explored using TensorFlow Lite with the Hexagon delegate to run compact language models like Qwen2-0.5B on the VOXL 2 platform? I'm interested in whether you've experimented with leveraging the Qualcomm GPU for language model inference, even though I understand the VOXL 2 is primarily optimized for computer vision workloads. Have you conducted any experiments or performance testing with small LLMs on this hardware?
Thanks.
suvasis