Product

Mikshi Search 1.0: a Video Embedding Model for Archive‑Scale Retrieval.

Mikshi Search is the embedding model behind the platform. Hand it any video — a short clip, an uploaded recording, or an hour‑long CCTV file — and it produces embeddings that power retrieval across collections of any size.

See our capability overview for how teams ship with it in production.

Mikshi Research · 2026

Introducing

Mikshi Search

1.0

One model, one embedding space. The same network encodes your videos at upload time and your queries at search time — text, reference clip, or (soon) a reference image — so retrieval lands on the right seconds, not the right hour.

Query Types

One embedding space. Three ways to ask.

All query types are projected into the same space as the indexed video and matched the same way. You do not interact with the embeddings directly — you upload videos to a collection, you issue queries, you get back ranked moments.

Available

Natural language

Describe the moment in words. The query is encoded into the same space as the video and matched directly.

"red sedan running a red light"
Roadmap

Reference video clip

Hand it a clip; find me other moments that look like this. Useful when language can't describe it precisely.

"find moments like this 4-second clip"
Roadmap

Reference image

Find moments that look like a single image. On the roadmap; same embedding space, same retrieval path.

"find moments like this still frame"
Mikshi Search pipeline — a video and a natural‑language query are encoded into the same embedding space, returning ranked, time‑stamped moments.
Video and query into the same embedding space. Ranked, time‑stamped moments out.
Ranked moments

Not the right video. The right seconds.

Each result is a ranked moment, not a whole file. Operators do not need the right hour inside a day of footage — they need the right seconds.

  • Source video and the collection it belongs to.
  • Start and end timestamp locating the matched moment inside the source video.
  • A score indicating how well the moment matched the query.
  • A thumbnail for the matched moment.
  • Raw embeddings as an opt-in field on the request — 1024-dimensional.
"red sedan running a red light"
collection: intersection‑north
  • 01
    thumb
    00:14:08.200:14:11.6 0.94

    Red sedan crosses south stop line ~0.4s after signal turns red.

  • 02
    thumb
    00:47:22.000:47:25.8 0.88

    Red sedan, different vehicle — runs the east‑bound red without slowing.

  • 03
    thumb
    01:09:33.401:09:36.1 0.81

    Red SUV (high match for sedan) crosses on yellow→red transition.

Multi‑Vector

Built for hour‑long footage, end to end.

Mikshi Search generates embeddings for full videos, including long‑form CCTV recordings. You do not pre‑chunk, you do not pre‑segment, you do not configure window sizes. Hand it the video; it produces the embeddings.

Internally, Mikshi Search uses a multi‑vector representation rather than a single pooled vector per video. If a long recording were collapsed to one vector, the best a query could return would be "this video contains your event." Multi‑vector embeddings preserve sub‑minute temporal resolution all the way through retrieval, so query results land on the right seconds.

cam_07.mp4 · 01:47:22multi‑vector
Single‑vector (collapsed)
→ "this video contains your event"
Multi‑vector (Mikshi Search)
→ ranked moments, second‑precise timestamps
1024
Dim per vector
Hours
Video length supported
Seconds
Result granularity

A visual representation of actions and entities.

Mikshi Search is vision‑only today. It looks at frames and at how they evolve over time. It does not transcribe speech or use ambient audio — CCTV deployments cannot rely on audio anyway. The visual + temporal signal carries the load.

Actions

What is happening in the scene — motion patterns, interactions, gestures.

Entities

Who and what is present — people, vehicles, objects.

Spatial relationships

How the entities are arranged and how that arrangement changes over time.

Deployment Shape

One camera, one collection.

The recommended deployment shape is one camera per collection. Each physical camera maintains its own collection, and footage from that camera flows continuously into it. This mapping has three operational consequences.

01

Queries are camera-scoped by default

Asking “red sedan running a light” against the intersection-north collection only searches that camera's footage. No cross-camera bleed unless you explicitly query multiple collections.

02

Cameras are independent units of operation

Adding a camera means creating a new collection. Retiring a camera means archiving its collection. No global re-index, no coordination across cameras.

03

Per-camera scaling

Indexing throughput, storage growth, and query load are isolated per collection. A busy camera does not slow queries against a quiet one.

If you need to search across cameras at a site, issue parallel queries to each camera's collection and merge results client‑side, or maintain a separate site‑level collection that aggregates the cameras you want grouped.

Time windows

Semantic query + time window + camera.

Every result Mikshi Search returns is timestamped, so you can scope queries to time windows. Time filters apply as part of the query, alongside the natural‑language or reference‑clip query itself — no external scheduling or batch step.

Time‑based search works against archive‑scale footage. "Last 24 hours"queries against a camera that has been recording continuously for months do not scan the entire history — only the requested window is searched.

"Anything matching this query in the last 1 hour?"

last 1h

"Find all instances between 08:00 and 12:00 yesterday."

08:00 – 12:00 · -1d

"Search only the last 30 minutes of footage on this camera."

last 30m
Archive Scale

Thousands of camera‑hours. The seconds that matter.

Mikshi Search is designed for the CCTV regime: continuous recordings spanning days or weeks, mostly routine, with the seconds that matter buried deep inside. Hour‑long recordings are first‑class inputs, not edge cases.

A day

of footage from a single camera becomes a searchable space of moments.

Months

of archive remain queryable without per‑query degradation.

ms

query latency against the indexed window, including time‑scoped queries like “last 1 hour.”

Indexing runs in the background as new footage arrives. The unit of consumption is the moment, not the file — that is what makes search useful at archive scale.

Hand it a video. Get back the seconds.

Mikshi Search turns hours of footage into a searchable space of moments — text, clip, or (soon) image queries, ranked and timestamped.