Omni Models Reboot the Whole Game for RAG (GPT-4o)

“Chat with your documents” already seems so 2023.

The new “omni” model can fluently switch among text, audio, still images, and (live) video.

RAG, or Retrieval Augmented Generation (see my LinkedIn Learning course for details) is a technique where you pair up a language model with a database to extend it’s capabilities. This includes the aforementioned “chat with your documents” use case. But with an “omni” model it could be so much moreā€¦

Imagine feeding this an IKEA manual, then pointing a camera at your own assembly process, getting gently walked through the steps needed to successfully complete the job.

No need for extensive re-programming to make a new 3D augmented reality platform (although having one could make certain detailed tasks more straightforward). Existing documentation + an omni model is a powerful combo.

More to say later.

This post 100% free-range human written.

Scroll to Top