Research

Research Scientist, Multimodal

Research Bangalore, India Full Time

Push the frontier of multimodal video understanding — vision, audio, speech, and text fused into a single model.

What you'll do

Design and run experiments on large-scale multimodal models.
Publish or open-source impactful results where appropriate.
Collaborate with engineering to ship research into production.

What we're looking for

PhD (or equivalent experience) in ML, CV, NLP, or speech.
Track record of strong publications or production-shipped models.
Deep experience with large-scale model training.

Nice to have

Experience with video foundation models or long-context architectures.

Apply for this role

Tell us about yourself — it takes about 2 minutes.