Vera
The conversational AI built into this site. A portfolio chat persona engineered to stay in character, grounded in real project facts, and honest about what it doesn't know.
Overview
Most portfolios are read. This one can also be asked. Vera is a conversational AI built into the site, so visitors can explore the work by asking questions in plain language instead of clicking through pages.
Ask in plain language and Vera responds with the right project, context, or detail, turning a static page into a dialogue. The hard part was never the chat box. It was making the persona behind it reliable.
Approach
The interface is built around a single conversational input, paired with a sidebar that surfaces relevant projects as the conversation unfolds.
An LLM is grounded against structured markdown files describing each project. This keeps responses scoped, factual, and on-brand without hallucination.
Engineering Vera
Most LLM personas are brittle — long system prompts that drift, contradict, or lose voice across turns. Vera is built differently: a directory of focused markdown modules, each owning one facet of her character. They are listed below in priority order — when two modules would conflict, the higher-priority module wins.
-
conversation_examples.mdTone & cadenceReal dialogue samples — how Vera answers in live conversation. The highest behavioral layer; defines her voice turn by turn.
-
faq.mdCanonical answersPre-resolved answers to recurring questions about background, process, and professional approach. Prevents reinvention on every ask.
-
projects.mdFactual groundingSingle source of truth for every project. If a detail isn't documented here, Vera won't claim it — preventing hallucinated work history.
-
worldview.mdReasoning styleMental models for how tradeoffs get weighed. Systems-first thinking, judgment over execution — the lens through which she reasons.
-
writing_patterns.mdLinguistic structureSentence cadence, rhetorical shape, and how explanations unfold. Encodes how language is formed — not what it says.
-
interaction_rules.mdIntent mappingMaps recruiter-style and open-ended prompts to the right underlying modules. Stops generic questions from yielding generic answers.
-
anti_patterns.mdBehavioral constraintsNegative space — the tones and phrasings to avoid. Filters out assistant tropes, corporate filler, and motivational hype before they reach the visitor.
-
domain_context.mdProfessional terrainThe domains the work lives in — insurance, enterprise workflows, AI-assisted product environments. Frames which problems she'd recognize on sight.
-
persona.mdPriority contractThe arbiter. Declares which module wins when two conflict — making behavior deterministic instead of dependent on prompt order.
The modules sit under an explicit priority contract. Conversation style is the top layer; canonical answers and project facts come next; reasoning style and constraints sit beneath. When two files would disagree, the higher-priority file wins — making behavior predictable across turns instead of dependent on prompt order.
Tone is encoded separately from facts. The same factual base can be expressed in different registers without rewriting the underlying knowledge, and updating a project detail does not risk shifting Vera's voice. At runtime, only the modules relevant to the current question are assembled, keeping context lean and on-topic.
The system favors precision over fluency. When asked something outside its scope, it acknowledges the gap rather than improvising — correct, occasionally stiff, and trustworthy by default.
Build process
Stack: vanilla HTML/CSS/JS on the frontend, one Node serverless function on the backend. No framework, no third-party chat widget. The whole deploy is static files plus that single function — chosen for control over the security boundary, no vendor lock-in on the chat surface, and a small surface area to maintain.
Connecting to the model went through OpenRouter rather than directly to a model provider. One key, one API surface, free movement across providers — switching from Google Gemini 2.0 Flash to Claude, GPT-4o, or any other supported model is a one-line edit. The key was generated in OpenRouter’s dashboard and stored in Vercel’s project environment as OPENROUTER_API_KEY; it is never committed and never reaches the browser. Locally, the same variable lives in a gitignored .env file for parity. Every model call passes through the serverless function, which is the trust boundary.
The endpoint at /api/chat runs a fixed pipeline on every request: per-IP rate limit (10 requests per 5-minute window, in-memory map) → context load → input injection regex scan → OpenRouter request → output leakage regex scan → response stream. Each stage can short-circuit with a scoped fallback string rather than failing into a generic error message.
OpenRouter emits tokens via Server-Sent Events. Rather than passing them straight through, the function buffers the full reply, scans it for context-leakage patterns, then re-emits the text in 4-character chunks with a 12 ms gap. That restores the typing-cursor UX while preserving the output scan — no half-token can slip past the safety net mid-stream.
Grounding stays current without manual sync. The live page context — every project HTML page stripped to plain text — is built once at cold start and cached for the warm instance lifetime; the context/*.md modules are re-read per request, simple enough that no cache invalidation is needed. When a project page changes and Vercel redeploys, the cache rebuilds automatically.
Multilingual support sits at the prompt layer. The site language is whitelisted server-side — only 'en' or 'de' are accepted, anything else falls back to English — and translated into a single directive prepended to the system prompt. The persona, security rules, and context files all stay in English; only the output language is overridden. Fallback strings are per-language so even the safety net sounds native.
Tradeoffs
Conversational interfaces add friction for users who already know what they want. To balance this, traditional navigation stays available as a fallback. The chat layer is additive, not a replacement for it.
Latency, accuracy, and scope discipline are ongoing challenges. The system errs toward saying "I'm not sure" over guessing, preserving trust at the cost of fluency.
Modularity buys maintainability at the cost of discipline. Each fact has to live in exactly one module, and changes have to propagate through linked files — a deliberate constraint that keeps the persona coherent as it grows.