Alibaba's LLM Qwen2-0.5B for VOXL 2
-
hi,
Has anyone tried Qwen2-0.5B for the VOXL 2?Here's are some requirements and compatibility:
- RAM Requirements for Qwen2-0.5B
Base model (FP16): ~1GB of RAM
Quantized to INT8: ~500MB of RAM
Quantized to INT4: ~250MB of RAMVOXL 2 Specifications
The VOXL 2 has approximately 4GB of RAM available
It uses a Qualcomm QRB5165 processor
Has AI acceleration capabilities via the Hexagon DSP
Primarily designed for computer vision tasksCompatibility Assessment
Qwen2-0.5B could potentially run on the VOXL 2 with quantization to INT4 or INT8, the memory footprint would be manageable.Inference speed would likely be slow, perhaps 1-2 seconds per token
Question:
Has ModalAI explored using TensorFlow Lite with the Hexagon delegate to run compact language models like Qwen2-0.5B on the VOXL 2 platform? I'm interested in whether you've experimented with leveraging the Qualcomm GPU for language model inference, even though I understand the VOXL 2 is primarily optimized for computer vision workloads. Have you conducted any experiments or performance testing with small LLMs on this hardware?
Thanks.
suvasis
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login