Embedding
Embeddings turn text into vectors so you can do semantic search and retrieval. In Auwgent, embeddings are usually used from middleware: you generate vectors, query your vector DB, then inject the retrieved knowledge back into runtime context before the next model call.
Configure embedding in your agent
Section titled “Configure embedding in your agent”Auwgent lets you configure a dedicated embedding model in default config.
agent Main { default config { model: gemini("gemini-2.5-flash") embedding: gemini("gemini-embedding-001") prompt: "You are a helpful assistant" }}You can use the same provider family as your chat model or a different one. What matters is that embedding is configured, because middleware embed and embedBatch depend on it.
Which middleware hook fits retrieval?
Section titled “Which middleware hook fits retrieval?”For retrieval, prefer onRunStart.
- Use
onRunStartthrough the session turns - Use
onLLMStartthis has the prompt as the first args
In most RAG setups, onRunStart is the best fit because it runs once, avoids repeated vector lookups, and lets setContext affect prompt evaluation for the run.
Embedding utilities in middleware
Section titled “Embedding utilities in middleware”Inside middleware context you get:
embed(text)→ returns a single vectorembedBatch(texts)→ returns vectors for many inputs
Use embedBatch when processing many documents for indexing to reduce round-trips.
const ragMiddleware: Middleware = { name: "rag", onRunStart: async (session, ctx) => { const query = session.turns.at(-1)?.input?.text ?? "" const queryVector = await ctx.embed(query)
const hits = await vectorDb.search(queryVector, { topK: 5 }) const chunks = hits.map((h: any) => h.text)
// Inject retrieved knowledge into runtime context for prompt use. ctx.setContext({ retrieved_chunks: chunks }) return session }}class RagMiddleware(AuwgentMiddleware): name = "rag"
async def onRunStart(self, session, ctx): turns = session.get("turns", []) last_turn = turns[-1] query = last_turn.["input"] query_vector = await ctx["embed"](query)
hits = await vector_db.search(query_vector, top_k=5) chunks = [item["text"] for item in hits]
# Inject retrieved knowledge into runtime context for prompt use. ctx["set_context"]({"retrieved_chunks": chunks}) return sessionInjecting retrieved DB data: use middleware setContext
Section titled “Injecting retrieved DB data: use middleware setContext”When you fetch nearest neighbors from your vector DB, inject them through middleware context, not by mutating prompt strings manually.
Use setContext (set_context in Python) to store retrieved data in your agent context, will automatically be injected`.
This keeps retrieval structured, testable, and reusable across hooks.
Typical RAG flow in Auwgent
Section titled “Typical RAG flow in Auwgent”- User message arrives.
- Middleware embeds the query with
embed. - You search your vector DB with that embedding.
- Middleware injects top matches using
setContext/set_context. - setContext/set_context inject the retrieved data.
- Model responds with grounded context.
Next steps
Section titled “Next steps”Now that embeddings are wired into middleware context, continue with intent handling.
→ See Intents to understand how model and runtime events flow through your app.