Field Notes · By Stephen Gilfus · May 29, 2026
Rebuilding Blackboard in 2026: What I'd Build Now
AI is the substrate, not the sidebar.
Rebuilding Blackboard in 2026 means refactoring the primitives, not bolting on helpers. Additive and peripheral AI are anti-patterns. An AI-native LMS treats orchestration, governance, and model routing as core infrastructure so courses, assessment, tutoring, and gradebooks become accountable, auditable workflows.

AI bolted-on is not an upgrade — it is a rebuild deferred. In 2026, the tempting move is to ship “additive AI” and “peripheral AI”: a chatbot on the nav bar, a sidebar copilot reading the same tables, a summarizer on top of stale content. These are anti-patterns. They patch yesterday’s architecture with tomorrow’s vocabulary, and they entrench the very constraints that made the legacy LMS hard to change. If the goal is to rebuild Blackboard now, AI cannot sit on the margin of a 2005 CRUD schema. It has to be the substrate the system is built on.
The difference is operational, not semantic. The original LMS era—including Blackboard (founded 1997) and WebCT—organized around content repositories, roles, and workflows encoded in relational tables. Integrations arrived as standards like LTI 1.1 (from IMS Global, now 1EdTech) that mapped external tools into those rows and permissions. That architecture scaled enrollment and compliance, but its primitives were static: a course was a container, a quiz was a record, a discussion was a thread. Adding AI as a helper around those objects keeps the objects frozen. The right rebuild changes the objects themselves.
Think of the LMS as a campus power grid, not a building full of extension cords. Additive and peripheral AI are cords — useful for a lamp, dangerous at scale. A native grid routes load, meters usage, logs faults, and supports new buildings without rewiring the city. The rebuild is the grid.
1. The wrong rebuild: additive and peripheral AI
Start with the failure mode because it is seductive. The product team ships a chat widget that can “answer questions about your course.” They add an AI writing assistant to the discussion editor. They surface a copilot that drafts quiz questions from a PDF. None of these change the data model; each reads from and writes to the same tables. The operational benefit is quick time-to-value and demos that feel modern. The operational cost arrives next semester when the edge helpers disagree with each other, generate outcomes no one can audit, and magnify technical debt.
Here is how the anti-pattern plays out step by step:
- Data remains inert. Content, attempts, submissions, and grades all live as records without formal links to learning objectives, evaluation criteria, or policy constraints. When AI reads them, it reverse-engineers intent from artifacts rather than executing an explicit plan.
- Each copilot builds its own memory. Sidebars cache prompts, store embeddings, and keep ad hoc logs that never reconcile. When a student asks the syllabus bot about late penalties and the grading bot re-scores the same work under a different rubric, support cannot explain the discrepancy because there is no shared trace.
- Costs drift and visibility fades. Token usage is invisible, model selection is hard-coded, and retries are silent. The finance team sees a higher bill, but the product cannot attribute spend to workloads or users, and it cannot throttle without blunt-force feature flags.
- Governance bolts on too late. Privacy terms, FERPA configurations, and jurisdictional routing sit in a policy page, not in the primitives. When a new campus or program arrives with different constraints, every helper needs a new mode.
- Pedagogical change stalls. AI remains an assistant to legacy workflows — drafting items, summarizing threads — so instructors and programs never see the payoff of objective-linked sequencing, mastery inferences, or targeted interventions driven by a shared substrate.
Lesson — Audit your AI features for signs of being 'additive' by checking if they share a common, auditable trace.
If each new AI helper builds its own separate memory and log, you are creating disconnected systems that cannot be reconciled. A syllabus bot should not have a different understanding of policy than a grading bot. When support cannot explain a discrepancy between two AI assistants, it’s a clear signal that data is inert and governance is not built into the core.
> Additive and peripheral AI repaint the facade and leave the wiring untouched.
The consequence is not cosmetic. It is strategic stagnation disguised as progress. An LMS rebuilt this way inherits yesterday’s bottlenecks with tomorrow’s bills. The real move is to refactor the primitives and let AI operate where the system makes meaning, not where it takes notes.
2. What "AI at the core" actually means architecturally
“AI at the core” is not a feature list. It is a small set of platform decisions that change how everything else is built and governed. Five elements define it:
- Orchestration substrate, not a feature. The platform ships a durable, auditable workflow layer that composes models, tools, policies, and humans into sequences with state. Prompts, contexts, tool calls, and outcomes are first-class events. You do not call a model; you run a named, versioned workflow.
- Agentic workflows as first-class objects in the data model. Agents — tutoring, evaluation, moderation, curriculum alignment — exist alongside course, person, and artifact. Their goals, capabilities, tools, and policies are explicit, addressable, and versioned. They are not widgets; they are objects that hold obligations.
- Model routing and token economics as platform infrastructure. The system decides which model to use for which step, under what constraints, at what price. Routing is policy-based and dynamic. Budgets, quotas, and cost centers are part of the runtime, not a spreadsheet after the fact.
- Evals, traces, and governance baked into the primitives — not bolted on later. Every agent run emits a structured trace; every workflow is linked to eval suites; policy scopes (privacy, residency, accessibility) are enforced at execution time. Audit is not a mode; it is the default.
- The "course," "assessment," "gradebook," "discussion" are AI-native objects, not rows in a CRUD table with a copilot pointed at them. The objects carry semantics (objectives, rubrics, policies) and contracts (what agents may do with them) that make AI’s actions accountable.
Lesson — Treat AI not as a feature you call, but as an orchestration layer executing named, versioned workflows.
Instead of directly calling a model for a task, you should run a durable workflow that composes models, policies, and tools into a sequence with state. This makes every AI-driven action observable and repeatable, because prompts, tool calls, and outcomes are logged as first-class events. This approach ensures that every agent run emits a structured trace and that policies are enforced at execution time, making audit the default state.
> The core is orchestration plus governance; everything else is a workload on top.
This has concrete operational consequences:
- Versioning becomes routine. A course owns versions of its objectives, its tutoring agents, and its assessment policies. Changes are staged, compared with evals, and released like software.
- Observability shifts left. Traces, cost, and latency show up per workflow run, per student, per course. Product learns which interventions pay off and which are noise, because the signals are bound to named outcomes.
- Safety is tractable. Guardrails run as policies and tools inside the workflow, not as brittle pre/post filters. When an agent refuses, it logs why under a policy name, not a generic “content safety” bucket.
- Integration simplifies. External tools expose agent-compatible endpoints, and the substrate orchestrates them with model calls. The contract is behavioral (what can you do?) rather than just structural (what fields do you send?).
3. The course as an AI-native object (not a folder of files)
A course cannot remain a container of content and roles if AI is to operate well. As an AI-native object, a course carries four kinds of structure that the substrate can reason about:
1) Objectives graph. Learning objectives are nodes with typed relations — prerequisites, equivalences, refinements. They are linked to evidence types (what constitutes progress) and to policies (what is allowed or required when interacting with them). The graph is versioned per term or program variant.
2) Narrative and assets. Units, activities, readings, media, and external tools are referenced as resources mapped to objectives, with affordances (read-only, practice, reflection) and constraints (time windows, proctoring requirements). The narrative is not a folder; it is a plan tied to outcomes.
3) Agent contracts. The course declares which agents may operate and with what scope: tutoring agents per objective cluster; alignment agents that propose changes; moderation agents in discussions; accessibility agents that transform materials under documented rules. Each agent has a purpose, tool permissions, hall passes for data, and eval gates.
4) Cohort context. The course holds the population parameters that matter — program, language mix, accommodation profiles — and passes them to agents declaratively. Cohort context is not inferred from usage; it is set and referenced by policy so behavior is predictable.
A course needs a brain and a map, not just a binder.
Operationally, this re-centers the LMS around intent and evidence rather than uploads and due dates. When an instructor tweaks an objective, the platform can forecast impact: which activities lose alignment, which assessments must be re-evaled, and which tutoring prompts should change. When a new campus teaches the same course in a different regulatory context, the course object carries policies that route data and agents accordingly.
A note on interoperability: none of this rejects existing standards. SCORM remains for legacy content; LTI (1.3/Advantage) continues to launch tools; Caliper/xAPI streams remain useful. The change is that the course object owns the semantics explicitly, and external tools integrate at the level of objectives and evidence, not just launches and grades. A tool can register that it supplies practice evidence for Objective X under Policy Y and that its outputs will be evaluated by Agent Z. The orchestration substrate then composes these capabilities for the student’s path.
4. Assessment as continuous inference (not a quiz table)
Assessment in a CRUD LMS is a table of items, attempts, and scores. In an AI-native LMS, assessment is an inference process running continuously over evidence streams with a formal relationship to objectives and rubrics. The shift has five parts:
- Evidence model. Submissions, interactions, reflections, peer reviews, simulations, and tool traces are typed evidence with provenance and integrity metadata. Evidence is linked to objectives by design, not post hoc keywords. Provenance includes authorship signals and tool use declarations where applicable.
- Rubric-as-code. Criteria and performance levels live as structured objects with machine-readable descriptors and examples. Agents use rubric objects to propose preliminary evaluations, request more evidence, or generate targeted feedback. Rubrics are versioned and testable by eval suites before they grade real work.
- Observation and request loop. Instead of one-shot scoring, the system observes evidence, forms a tentative belief about mastery, and, if uncertain, asks for the next best piece of evidence. This is lightweight and bounded by policy so it does not become a maze.
- Dual-channel scoring. Model-evaluated scores and human-evaluated scores coexist but are never conflated. The platform knows which is which, when the model may auto-apply, and when a human must adjudicate. Model scores carry uncertainty and rationale traces.
- Integrity as part of inference. An integrity agent checks for required disclosures, detects anomalies consistent with policy violations, and routes to human review when thresholds are met. The agent runs within the same governance substrate.
Lesson — Define assessment rubrics as structured, machine-readable code, not just as descriptive text.
Translate evaluation criteria and performance levels into structured objects with examples that an AI agent can use. This allows agents to propose evaluations, generate feedback, or request more evidence based on a clear, versioned contract. Treating rubrics as code means they can be tested with evaluation suites before they are used on real student work, improving consistency and fairness.
> Grading becomes a governed inference pipeline, not a one-time event.
This reframing does not require high-stakes automation. It enables precise low-stakes automation and high-signal triage. For example, a model-evaluated channel may auto-apply with confidence above a threshold and issue feedback immediately for formative tasks. In summative settings, it may produce a recommendation with a structured rationale that a human approves. Because traces and rationales are first-class, appeals have evidence, not anecdotes.
5. Tutoring and communication as orchestrated agents tied to learning objectives
In legacy systems, “tutoring” means help centers and perhaps a forum. In AI-additive systems, it means a general chatbot pointed at the syllabus. In an AI-native system, tutoring is an orchestrated set of agents with explicit goals tied to the objective graph.
Design the tutoring layer with five properties:
- Objective-scoped purpose. Each tutoring agent is bound to a subset of objectives and knows the rubric-as-code definitions for those objectives. This prevents generic advice and anchors explanations in the same criteria used for assessment.
- Conversation as curriculum. Dialog is not free-form memory; it is a path through teachable micro-tasks, explanations, and checks aligned to the objective graph. The agent proposes a new task or a worked example within an inspectable plan.
- Tool-aware but tool-constrained. If an agent can draw, calculate, browse, or simulate, each tool permission is declared and logged. The tool call sits in the trace, bound to the objective and to policy constraints.
- Accessibility and language as first-class behaviors. The agent natively supports multiple languages and accommodations because these are policy-scoped behaviors in the substrate, not bolt-ons. The same objective can be explained through text, audio description, or alternative representations.
- Handoff and escalation as designed states. The agent can escalate to a human tutor or instructor with a structured brief: what was attempted, what evidence exists, what misunderstanding persists. Escalation is not a failure; it is a feature that protects trust.
A tutor is a set of governed behaviors, not a personality with a prompt.
Communication across the course — announcements, discussions, feedback — follows the same pattern. Discussion becomes an AI-native object where moderation, summarization, and nudge agents operate under policy and objective scope. For example, a moderation agent flags off-topic content with transparent policy labels; a summarization agent produces weekly digests tied to objectives discussed. None of these act as free agents; each runs as a workflow with traces and evals attached.
6. Gradebook and analytics as model-evaluated evidence with human-in-the-loop governance
The gradebook in a legacy LMS is a matrix of columns and weights. In an AI-native system, the gradebook becomes an evidence ledger governed by policy, where scores have provenance, uncertainty, and versioned logic.
Key characteristics:
- Lineage on every entry. Each cell links to the evidence used, the rubric version, the agent workflow version, and any human actions taken. A score is not a number; it is a record with a trail.
- Human-in-the-loop as a first-class step. Instructors can accept, modify, or reject model recommendations with structured reasons. Those decisions flow back as labeled data for evals and future optimization.
- Policy-aware rollups. Program and campus policies (e.g., drops, late penalties) are executable objects that compute rollups. Changing policy does not rewrite history; it produces a new computed view with explicit versioning and comparisons.
- Reliability surfaces where it matters. The gradebook surfaces confidence bands and cross-checks for critical decisions. It does not flood the UI with telemetry; it places reliability where action occurs.
- Analytics as questions, not dashboards. The analytics layer runs query agents that answer operational questions with context and constraints: Which objectives produce the most rework? Where does the tutoring agent escalate most?
Lesson — Ensure every gradebook entry includes complete lineage linking it to evidence, rubrics, and the judging agent.
Treat a score not as a number but as a record with a clear audit trail. Each entry must link to the specific evidence submitted, the version of the rubric used for evaluation, and the version of the agent workflow that produced the score. This provides transparency and allows human instructors to adjudicate decisions with full context, feeding their actions back into the system as labeled data for improvement.
A grade is a governed claim about evidence, not a cell in a spreadsheet.
Governance then has teeth. Accreditation reviews can request eval records for assessment agents, policy versions active during a term, and anonymized traces showing how integrity checks ran. Program committees can test a proposed rubric change on historical evidence using frozen model versions before adopting it. Instructors retain control over consequential decisions without having to manually re-score everything the machine touched.
7. Why additive/peripheral AI fails — governance, cost, trust, pedagogical fit
It helps to make the failure modes explicit so buyers and builders can avoid them.
Governance. In bolt-on systems, governance appears as a permission matrix. That is insufficient for AI workloads. Without execution-time policy enforcement, traces, and evals, you cannot answer who did what, under which rule, with what evidence. A privacy checkbox cannot substitute for residency-aware routing and immutable logs. When a campus asks for proof, bolt-ons have screenshots; substrates have records.
Cost. Token spend and model calls are a new class of variable infrastructure cost. Additive AI treats cost as a byproduct and makes routing a hard-coded decision. The result is over-provisioned models and unpredictable bills. An AI-native LMS exposes cost centers, budgets, throttles, and routing strategies so operators can shape spend without gutting features.
Trust. Users extend trust when systems are predictable and accountable. Sidebars that are sometimes brilliant and sometimes off-base erode trust quickly. Instructors will not delegate meaningful work to assistants they cannot audit. Substrates earn trust through explainable workflows, consistent policy enforcement, and designed escalation.
Pedagogical fit. The enduring promise of an LMS is to help people learn with intent. Additive AI props up legacy flows but does not connect explanation, practice, feedback, and evaluation in a coherent loop tied to objectives. When the course, assessment, and tutoring layers share objectives, rubrics, and traces, pedagogy advances because the platform speaks the language instructors use to design learning.
Bolt-ons cannot meet institutional obligations or pedagogical ambitions.
8. What this means for buyers in 2026 — procurement, RFPs, data rights, model routing
Procurement must evolve to test for substrate, not surface. In 2026, RFPs and due diligence should include requirements and proofs that separate AI-native systems from additive ones. A practical checklist:
- Data ownership and portability. Require explicit statements that institutions own course objects, objective graphs, rubrics, agent definitions, traces, and eval artifacts. Demand export formats for all of the above.
- Privacy, residency, and jurisdictional routing. Ask how the platform enforces data residency and access rules at execution time. Require a description of policy scopes and routing constraints.
- Evals and traces. Require demonstration of eval suites tied to key agent workflows (tutoring, assessment) and the ability to run those evals against historical traces.
- Model routing and neutrality. Require a model-agnostic substrate with dynamic routing based on policy and cost. Test that routing can change without code deploys.
- Token economics and spend controls. Demand visibility into token usage per program, course, and workload. Require quota mechanisms, circuit breakers, and backpressure in the runtime.
- Human-in-the-loop governance. Require explicit human adjudication steps for consequential actions, with UI support and audit. Ask to see how a human decision is captured, versioned, and fed back into evals.
- Safety and integrity as workflows. Ask for the safety and integrity agents, their policies, and their eval records. Do not accept a generic “we filter prompts and completions.”
- Interoperability at the semantics layer. Beyond LTI launches, ask how external tools declare the objectives they serve, the evidence they produce, and the policies they honor.
- SLAs for inference. Latency and availability now apply to agent runs, not just page loads. Require SLAs for median and p95 inference latency on key workflows.
Lesson — Demand substrate exports, not just content exports, in your vendor contracts.
Ensure your institution owns not only its content but also the core intellectual property of its digital campus: the course objects, objective graphs, rubrics, and agent definitions. Your exit clause must guarantee you can export these structures in an open format. This is the only way to ensure true portability and avoid being locked into a vendor's black-box AI ecosystem.
> If you cannot export it, route it, eval it, or govern it, you do not own it.
Also update commercial terms:
- Exit clauses that guarantee substrate exports in open formats if you leave.
- Security representations that cover model providers and tool integrations, not just the LMS vendor.
- Compliance mappings that speak to FERPA in terms of execution-time behavior, not only at-rest data controls.
- Accessibility commitments that cover agent behavior and generated content, with WCAG conformance extended to AI outputs.
An AI-native LMS is not just a procurement of software; it is a governance shift. Program chairs and assessment committees should co-own rubric-as-code, objective graphs, and eval suites. Vendor selection that centers operations and leaves academics downstream will miss the point.
9. The operator stance: build the substrate, not the sidebar
Operators — vendors and campus teams alike — have a choice to make. Keep shipping helpers that sit on the edge, or do the slower, more fundamental work to refactor the primitives and stand up the substrate. The latter wins because it compounds. Each new capability is a workflow that plugs into the same objectives, rubrics, policies, and traces.
What does building the substrate look like quarter by quarter?
- Quarter 1: Stand up the orchestration runtime with tracing, policy scopes, and budget controls. Wrap one high-value workflow — e.g., formative feedback generation under rubric-as-code — and ship it with human-in-the-loop. Build the eval suite that gates releases of this workflow.
- Quarter 2: Promote “course as an object” with objectives, rubrics, and agent contracts. Convert two courses end-to-end. Establish export formats for course objects and traces.
- Quarter 3: Introduce tutoring agents scoped to a subset of objectives. Instrument escalation to instructors and measure time-to-insight. Add integrity as a workflow with transparent policy applications.
- Quarter 4: Extend to assessment inference with dual-channel scoring and visible provenance. Roll out policy-aware rollups in the gradebook.
At each stage, reduce scope rather than compromise the architecture. Ship fewer workflows with deeper governance and observability. Resist pressure to “match the market” with edge copilots that dilute the substrate. Buyers will feel the difference within a term — not because the UI looks flashier, but because the system explains itself and improves.
The substrate is the product.
Return to the grid metaphor to close the loop. Cities that grow on extension cords burn out; cities that invest in their grid adapt to new industries and demands without catastrophe. The LMS market is at that fork. Additive AI gives the impression of motion without movement. AI at the core redefines the objects, workflows, and governance so that learning systems can evolve with the tools of this decade rather than the tables of the last.
This is what I would build now: a platform where the course has a brain and a map, assessment is continuous inference under policy, tutoring is an orchestrated behavior aligned to objectives, and the gradebook asserts claims with evidence and lineage. An AI-native LMS is not a collection of smart sidebars. It is a governed, observable, model-neutral substrate where agents, tools, policies, and people work on the same objects with the same understanding of what “good” looks like, and where the institution can prove it.
Share
Preview LinkedIn copy
If you rebuilt Blackboard in 2026, where would AI live? Not in a sidebar. The right move is a governed orchestration substrate where courses, assessments, tutoring, and gradebooks are AI-native objects with traces and evals. Build the grid, not more extension cords.