Deploying Neural Networks with Multi-Modal Inputs (Camera + Additional Sensor Data)

Ege Yüceel

Hi ModalAI team. I’ve been successfully using voxl-tflite-server to run custom neural networks that take image input from the mpa/hires camera. Now, I’d like to expand to more general neural networks that also incorporate other forms of input. For example, prior waypoint information stored in memory or additional sensory data (e.g., IMU, rangefinders, or temporal data from previous frames).

What would be the recommended roadmap to support this?

Specifically:

How should I modify or extend the current model helper pipeline to feed both image and non-image inputs?
Is there support in voxl-tflite-server for multiple input tensors (e.g., combining a camera frame other forms of information)?