hashintel · lunelson · Dec 18, 2025 · Dec 18, 2025 · Dec 18, 2025 · Dec 19, 2025
diff --git a/.markdownlint-cli2.jsonc b/.markdownlint-cli2.jsonc
@@ -30,6 +30,6 @@
     "node_modules/**",
     "target/**",
     "**/_temp/**",
-    "**/agent/**"
+    "**/_ai/**"
   ]
 }
diff --git a/AGENTS.md b/AGENTS.md
@@ -82,17 +82,12 @@ For Rust packages, you can add features as needed with `--all-features`, specifi
 
 ## Contextual Rules
 
-CRITICAL: For the files referenced below (e.g., @rules/general.md), use your Read tool to load it on a need-to-know basis, ONLY when relevant to the SPECIFIC task at hand.
+CRITICAL: For the files referenced below, use your Read tool to load it on a need-to-know basis, ONLY when relevant to the SPECIFIC task at hand:
+
+- @.config/agents/rules/*.md
 
 Instructions:
 
 - Do NOT preemptively load all references - use lazy loading based on actual need
 - When loaded, treat content as mandatory instructions that override defaults
 - Follow references recursively when needed
-
-Rule files:
-
-- @.config/agents/rules/ark-ui.md
-- @.config/agents/rules/mastra.md
-- @.config/agents/rules/panda-css.md
-- @.config/agents/rules/zod.md
diff --git a/apps/hash-ai-agent/.gitignore b/apps/hash-ai-agent/.gitignore
@@ -7,8 +7,5 @@ dist
 *.db
 *.db-*
 
-# Fixtures
-src/mastra/fixtures/entity-schemas/*.json
-
 # Quokka files
 *.quokka.*
diff --git a/apps/hash-ai-agent/_ai/wiki/conditional-branching.md b/apps/hash-ai-agent/_ai/wiki/conditional-branching.md
@@ -0,0 +1,208 @@
+# Conditional Branching in Plan Execution
+
+> **Status**: Deferred — see [gaps-and-next-steps.md](./gaps-and-next-steps.md)
+> **Created**: 2024-12-18
+> **Moved to wiki**: 2024-12-19
+> **Context**: Design options for runtime branching in plan execution
+
+## Overview
+
+Conditional branching allows plan execution to take different paths based on
+runtime evaluation results. This is essential for:
+
+- **Evaluation → retry cycles**: When synthesis/evaluation step determines quality is insufficient
+- **Quality gates**: Pass/fail/escalate decisions at key checkpoints
+- **Human-in-the-loop decision points**: Routing to human review when confidence is low
+- **Adaptive execution**: Choosing different strategies based on intermediate results
+
+## Current State
+
+The plan compiler (`plan-compiler.ts`) currently supports:
+
+- Linear execution (`.then()`)
+- Parallel execution (`.parallel()`)
+- Fan-in patterns (multiple steps → single synthesis)
+- Fan-out patterns (single step → multiple parallel steps)
+
+**Not yet supported**:
+
+- Conditional branching (`.branch()`)
+- Loop constructs (`.dowhile()`, `.dountil()`)
+
+## Mastra Primitive
+
+Mastra workflows support conditional branching via `.branch()`:
+
+```typescript
+workflow.branch([
+  [async ({ inputData }) => inputData.decision === "retry", retryStep],
+  [async ({ inputData }) => inputData.decision === "pass", continueStep],
+  [async ({ inputData }) => true, fallbackStep], // default case
+]);
+```
+
+The condition functions receive the output of the previous step and return a boolean.
+The first matching condition determines which branch is taken.
+
+## PlanSpec Extension Options
+
+### Option A: Explicit Conditional Edges
+
+Add a `conditionalEdges` array to PlanSpec with serializable condition specs:
+
+```typescript
+interface ConditionSpec {
+  field: string; // Path in previous step output, e.g., "decision"
+  operator: "eq" | "neq" | "gt" | "lt" | "in" | "contains";
+  value: unknown; // Value to compare against
+}
+
+interface ConditionalEdge {
+  id: string;
+  fromStepId: string; // Source step (typically an evaluation step)
+  conditions: Array<{
+    condition: ConditionSpec;
+    toStepId: string; // Target step if condition matches
+  }>;
+  defaultStepId?: string; // Fallback if no condition matches
+}
+
+// In PlanSpec:
+interface PlanSpec {
+  // ... existing fields ...
+  conditionalEdges?: ConditionalEdge[];
+}
+```
+
+**Pros**:
+
+- Explicit and declarative
+- Easy to validate statically
+- Clear visualization in UI
+
+**Cons**:
+
+- Adds complexity to PlanSpec schema
+- Condition language is limited (no arbitrary expressions)
+
+### Option B: Gateway Steps
+
+Use evaluation steps that output decisions, followed by a special "gateway" step type:
+
+```typescript
+interface GatewayStep {
+  type: "gateway";
+  id: string;
+  dependsOn: [string]; // Must depend on exactly one step
+  routes: Array<{
+    condition: ConditionSpec;
+    toStepId: string;
+  }>;
+  defaultRoute?: string;
+}
+```
+
+**Pros**:
+
+- Gateway is a first-class step type
+- Clearer semantic meaning
+- Easier to reason about in isolation
+
+**Cons**:
+
+- More verbose plans
+- Gateway steps don't "do" anything (just routing)
+
+### Option C: Inline Conditions on dependsOn
+
+Extend `dependsOn` to optionally include conditions:
+
+```typescript
+interface ConditionalDependency {
+  stepId: string;
+  condition?: ConditionSpec; // Only proceed if condition matches
+}
+
+// In PlanStep:
+dependsOn: Array<string | ConditionalDependency>;
+```
+
+**Pros**:
+
+- Minimal schema changes
+- Natural extension of existing pattern
+
+**Cons**:
+
+- Makes dependency analysis more complex
+- Harder to visualize
+
+## Relationship to Decision Points
+
+Earlier discussion identified the need for plans to surface uncertainty:
+
+- Assumptions
+- Missing inputs
+- Clarifying questions
+- Risks / decision points
+
+These are currently tracked in `unknownsMap` as metadata. Decision points for
+conditional branching could be:
+
+| Decision Type   | How It Maps to Branching                       |
+| --------------- | ---------------------------------------------- |
+| `clarification` | Branch to HITL step for user input             |
+| `assumption`    | Branch based on assumption validation          |
+| `risk`          | Branch to mitigation path if risk materializes |
+| `tradeoff`      | Branch based on user preference or heuristic   |
+
+The relationship between `unknownsMap` and conditional edges needs further design:
+
+- Should decision points automatically generate conditional edges?
+- Or should they remain separate (metadata vs control flow)?
+
+**Tracked for future**: Clarify this relationship when implementing Phase 4.
+
+## Security Considerations
+
+Condition evaluation must be carefully constrained:
+
+1. **No arbitrary code execution**: Conditions must be declarative, not executable JS
+2. **Limited operators**: Only safe comparison operators
+3. **Field path validation**: Ensure field paths don't access sensitive data
+4. **Timeout protection**: Condition evaluation should be bounded
+
+## Deferred Because
+
+1. **Current focus**: Phase 1-3 focus on basic DAG execution with streaming
+2. **Schema design**: Need to finalize which option (A, B, or C) to pursue
+3. **Validation complexity**: Conditional edges require additional validation:
+   - All branches must be reachable
+   - No orphaned steps after conditions
+   - Conditions must be evaluable given step outputs
+4. **UI implications**: Conditional branches need visualization support
+
+## Implementation Plan (Phase 4)
+
+When we return to this:
+
+1. **Design decision**: Choose between Option A, B, or C (likely A)
+2. **Schema extension**: Add conditional edges to PlanSpec
+3. **Validation**: Extend plan-validator.ts for conditional edge checks
+4. **Compiler**: Implement `.branch()` generation from conditional edges
+5. **Tests**: Add conditional branching test fixtures
+6. **Topology**: Update topology-analyzer.ts to handle conditional paths
+7. **Streaming**: Emit events for branch decisions
+
+## Related Files
+
+- `src/mastra/tools/plan-compiler.ts` - Main compiler (needs `.branch()` support)
+- `src/mastra/schemas/plan-spec.ts` - Schema (needs conditional edge types)
+- `src/mastra/tools/plan-validator.ts` - Validation (needs edge validation)
+- `src/mastra/tools/topology-analyzer.ts` - Analysis (needs conditional path handling)
+
+## References
+
+- Mastra workflow `.branch()` documentation
+- Earlier conversation about decision points and surfaced uncertainty
+- XState/Stately statechart patterns (for future consideration)
diff --git a/apps/hash-ai-agent/_ai/wiki/deployment-requirements.md b/apps/hash-ai-agent/_ai/wiki/deployment-requirements.md
@@ -0,0 +1,121 @@
+# Deployment Requirements: Mastra Workflow State Management
+
+> Technical requirements for deploying human-in-the-loop workflows with Mastra.
+> Captured 2024-12-22. Source: Analysis of Mastra core (vNext/Evented model).
+
+## Key Finding: Storage is Required for Human-in-the-Loop
+
+**In-memory state is sufficient for single-run workflows**, but **storage is essential for suspend/resume across execution sessions**.
+
+### Why This Matters
+
+- **Single-run workflow**: Steps pass outputs in-memory via accumulated `stepResults` object. No database involved.
+- **Human-in-the-loop workflow**: Step calls `suspend()`, workflow state persists to storage, process can terminate. Hours/days later, `resume()` loads state from storage and continues.
+
+Without storage configured, suspended workflows lose all state on process restart.
+
+## Storage Requirements by Use Case
+
+| Use Case | Storage Required? |
+|----------|-------------------|
+| Workflows completing in single execution | No |
+| Human approval gates | **Yes** |
+| Long-running workflows (survive restarts) | **Yes** |
+| External webhook callbacks | **Yes** |
+| Audit trail / workflow history queries | **Yes** |
+
+## Configuring Storage
+
+```typescript
+import { Mastra } from "@mastra/core";
+import { PostgresStore } from "@mastra/pg";
+
+const mastra = new Mastra({
+  storage: new PostgresStore({ 
+    connectionString: process.env.DATABASE_URL 
+  })
+});
+```
+
+### Supported Backends
+
+- **PostgreSQL** - production recommended
+- **LibSQL/SQLite** - local development, serverless edge
+- **Custom** - implement `BaseStorage` interface
+
+## Suspend/Resume Pattern
+
+### Suspending a Workflow
+
+```typescript
+const approvalStep = createStep({
+  id: "request-approval",
+  inputSchema: z.object({ proposal: z.string() }),
+  suspendSchema: z.object({ 
+    reason: z.string(),
+    context: z.record(z.unknown()) 
+  }),
+  resumeSchema: z.object({ 
+    approved: z.boolean(),
+    notes: z.string().optional() 
+  }),
+  execute: async ({ inputData, resumeData, suspend }) => {
+    // If not yet approved, suspend and wait
+    if (!resumeData?.approved) {
+      await suspend({ 
+        reason: "Human approval required",
+        context: { proposal: inputData.proposal }
+      });
+      return; // Execution stops here
+    }
+
+    // Resumed with approval
+    return { approved: true, notes: resumeData.notes };
+  }
+});
+```
+
+### Resuming a Workflow
+
+```typescript
+// Later, when human approves via API/UI:
+const run = await workflow.getRunById(runId);
+const result = await run.resume({
+  resumeData: { approved: true, notes: "LGTM" }
+});
+```
+
+### Querying Suspended Workflows
+
+```typescript
+// Find workflows awaiting human input
+const pending = await mastra.getStorage().getWorkflowRuns({
+  workflowName: "approval-workflow",
+  status: "suspended"
+});
+```
+
+## vNext (Evented) Model
+
+The evented execution model treats suspend/resume as first-class workflow states:
+
+- **Suspend** publishes `workflow.suspend` event, persists snapshot
+- **Resume** publishes `workflow.resume` event, loads snapshot, continues from `suspendedPaths`
+- **External systems** can subscribe to events for notifications
+
+This model is cleaner for human-in-the-loop because state transitions are explicit events rather than implicit control flow.
+
+## Implementation Checklist
+
+1. **Configure storage backend** (PostgreSQL for production)
+2. **Define `suspendSchema`** - what context surfaces to human reviewer
+3. **Define `resumeSchema`** - validate human input on resume
+4. **Build resume trigger** - API endpoint or UI action
+5. **Handle edge cases** - abandoned workflows, failed resumes, timeouts
+
+## Open Questions
+
+1. **Storage backend**: PostgreSQL vs LibSQL for our deployment?
+2. **Resume mechanism**: API endpoint, UI component, or event subscription?
+3. **Suspend payload design**: What information do human reviewers need?
+4. **Workflow lifecycle**: How to handle abandoned/expired suspended workflows?