Skip to content

Embedding

Embeddings turn text into vectors so you can do semantic search and retrieval. In Auwgent, embeddings are usually used from middleware: you generate vectors, query your vector DB, then inject the retrieved knowledge back into runtime context before the next model call.


Auwgent lets you configure a dedicated embedding model in default config.

agent Main {
default config {
model: gemini("gemini-2.5-flash")
embedding: gemini("gemini-embedding-001")
prompt: "You are a helpful assistant"
}
}

You can use the same provider family as your chat model or a different one. What matters is that embedding is configured, because middleware embed and embedBatch depend on it.


For retrieval, prefer onRunStart.

  • Use onRunStart through the session turns
  • Use onLLMStart this has the prompt as the first args

In most RAG setups, onRunStart is the best fit because it runs once, avoids repeated vector lookups, and lets setContext affect prompt evaluation for the run.


Inside middleware context you get:

  • embed(text) → returns a single vector
  • embedBatch(texts) → returns vectors for many inputs

Use embedBatch when processing many documents for indexing to reduce round-trips.

const ragMiddleware: Middleware = {
name: "rag",
onRunStart: async (session, ctx) => {
const query = session.turns.at(-1)?.input?.text ?? ""
const queryVector = await ctx.embed(query)
const hits = await vectorDb.search(queryVector, { topK: 5 })
const chunks = hits.map((h: any) => h.text)
// Inject retrieved knowledge into runtime context for prompt use.
ctx.setContext({ retrieved_chunks: chunks })
return session
}
}

Injecting retrieved DB data: use middleware setContext

Section titled “Injecting retrieved DB data: use middleware setContext”

When you fetch nearest neighbors from your vector DB, inject them through middleware context, not by mutating prompt strings manually.

Use setContext (set_context in Python) to store retrieved data in your agent context, will automatically be injected`.

This keeps retrieval structured, testable, and reusable across hooks.


  1. User message arrives.
  2. Middleware embeds the query with embed.
  3. You search your vector DB with that embedding.
  4. Middleware injects top matches using setContext / set_context.
  5. setContext/set_context inject the retrieved data.
  6. Model responds with grounded context.

Now that embeddings are wired into middleware context, continue with intent handling.

→ See Intents to understand how model and runtime events flow through your app.