Internet-Draft L. J. Reilly Intended Status: Informational Independent Expires: October 28, 2026 April 28, 2026 Reilly Model Routing Protocol (RMRP): A Framework for Policy-Governed, Auditable AI Model Routing draft-reilly-rmrp-00 Abstract This document specifies the Reilly Model Routing Protocol (RMRP), a framework for policy-governed, auditable routing of inference requests across heterogeneous artificial intelligence (AI) model environments. RMRP defines the structural metadata, routing policy declaration, execution semantics, audit trail requirements, and cost attribution mechanisms necessary to govern how inference requests are directed to AI models in multi-model deployments. The protocol is AI-provider agnostic and operates independently of any specific model architecture, inference runtime, vendor implementation, or transport layer. RMRP addresses the absence of a standardized protocol-layer specification governing how routing decisions are declared, transmitted, logged, and enforced across AI model deployments at organizational scale. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on October 28, 2026. Copyright Notice Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . 4 1.2. Scope . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3. Design Principles . . . . . . . . . . . . . . . . . . . 5 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . 6 3. RMRP Architecture Overview . . . . . . . . . . . . . . . . 8 3.1. Architectural Layers . . . . . . . . . . . . . . . . . 8 3.2. Component Roles . . . . . . . . . . . . . . . . . . . . 9 3.3. Request Lifecycle . . . . . . . . . . . . . . . . . . . 10 4. Model Routing Decision (MRD) . . . . . . . . . . . . . . . 11 4.1. MRD Structure . . . . . . . . . . . . . . . . . . . . . 11 4.2. MRD Field Definitions . . . . . . . . . . . . . . . . . 12 4.3. Task Classification . . . . . . . . . . . . . . . . . . 15 4.4. Model Tier Definitions . . . . . . . . . . . . . . . . 16 4.5. Complexity Scoring . . . . . . . . . . . . . . . . . . 17 4.6. MRD Example . . . . . . . . . . . . . . . . . . . . . . 18 5. Routing Policy Document (RPD) . . . . . . . . . . . . . . . 19 5.1. RPD Structure . . . . . . . . . . . . . . . . . . . . . 19 5.2. RPD Field Definitions . . . . . . . . . . . . . . . . . 20 5.3. Rule Evaluation Order . . . . . . . . . . . . . . . . . 23 5.4. Fallback Behavior . . . . . . . . . . . . . . . . . . . 23 5.5. RPD Example . . . . . . . . . . . . . . . . . . . . . . 24 6. Routing Execution Semantics . . . . . . . . . . . . . . . . 26 6.1. Pre-Routing Validation . . . . . . . . . . . . . . . . 26 6.2. Policy Resolution . . . . . . . . . . . . . . . . . . . 26 6.3. Model Selection . . . . . . . . . . . . . . . . . . . . 27 6.4. Request Dispatch . . . . . . . . . . . . . . . . . . . 27 6.5. Response Handling . . . . . . . . . . . . . . . . . . . 28 6.6. Error and Fallback Handling . . . . . . . . . . . . . . 28 7. Audit Trail Requirements . . . . . . . . . . . . . . . . . 30 7.1. Audit Log Record (ALR) Structure . . . . . . . . . . . 30 7.2. ALR Field Definitions . . . . . . . . . . . . . . . . . 31 7.3. Audit Level Classes . . . . . . . . . . . . . . . . . . 33 7.4. Retention Requirements . . . . . . . . . . . . . . . . 33 7.5. ALR Example . . . . . . . . . . . . . . . . . . . . . . 34 8. Cost Attribution Framework . . . . . . . . . . . . . . . . 35 8.1. Cost Attribution Record (CAR) . . . . . . . . . . . . 35 8.2. CAR Field Definitions . . . . . . . . . . . . . . . . . 36 8.3. Budget Authority Chain . . . . . . . . . . . . . . . . 37 8.4. Cost Ceiling Enforcement . . . . . . . . . . . . . . . 38 8.5. CAR Example . . . . . . . . . . . . . . . . . . . . . . 38 9. Governance and Authorization . . . . . . . . . . . . . . . 39 9.1. Policy Authority Model . . . . . . . . . . . . . . . . 39 9.2. Policy Issuance and Signing . . . . . . . . . . . . . . 40 9.3. Policy Versioning . . . . . . . . . . . . . . . . . . . 41 9.4. Override Mechanisms . . . . . . . . . . . . . . . . . . 41 10. Transport Considerations . . . . . . . . . . . . . . . . . 42 10.1. HTTP Transport . . . . . . . . . . . . . . . . . . . . 42 10.2. Header Propagation . . . . . . . . . . . . . . . . . . 43 10.3. Non-HTTP Transports . . . . . . . . . . . . . . . . . . 43 11. Security Considerations . . . . . . . . . . . . . . . . . . 44 11.1. Policy Integrity . . . . . . . . . . . . . . . . . . . 44 11.2. MRD Tampering . . . . . . . . . . . . . . . . . . . . . 44 11.3. Audit Log Integrity . . . . . . . . . . . . . . . . . . 45 11.4. Denial of Service . . . . . . . . . . . . . . . . . . . 45 11.5. Credential Exposure . . . . . . . . . . . . . . . . . . 45 12. Privacy Considerations . . . . . . . . . . . . . . . . . . 46 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . 47 14. References . . . . . . . . . . . . . . . . . . . . . . . . 48 14.1. Normative References . . . . . . . . . . . . . . . . . 48 14.2. Informative References . . . . . . . . . . . . . . . . 49 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . 50 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 50 1. Introduction 1.1. Motivation The deployment of artificial intelligence systems at organizational scale increasingly involves multiple AI models operating in parallel or in sequence. A given application may call upon lightweight models for classification tasks, mid-tier models for summarization, and advanced models for complex multi-step reasoning, all within a single request pipeline. This operational pattern is referred to throughout this document as multi-model deployment. Despite the rapid proliferation of multi-model AI deployments, no standardized protocol exists that specifies how routing decisions between models should be declared, communicated, executed, audited, or governed. Current industry practice is characterized by ad hoc engineering decisions embedded in application code, vendor-specific gateway configurations, or informal internal policies with no interoperable representation. This absence of a protocol standard produces several systemic problems: o Routing logic is opaque and non-portable across systems and vendors. o Cost attribution for model usage cannot be reliably traced to the policy decision that produced the expenditure. o Audit records, where they exist, are inconsistent, incomplete, and not interoperable. o Governance over who may define, modify, or override routing policy is informal and unenforceable at the protocol layer. o Failure modes, escalation paths, and fallback behavior are undefined in any portable specification. The Reilly Model Routing Protocol (RMRP) addresses each of these deficiencies by defining a provider-agnostic, transport-agnostic protocol framework for AI model routing governance. 1.2. Scope This document specifies: o The Model Routing Decision (MRD): a structured metadata object that accompanies every routed inference request, carrying the routing decision and its full governance context. o The Routing Policy Document (RPD): a declarative specification that defines the rules by which routing decisions are made. o Routing Execution Semantics: the normative procedures by which an RMRP-compliant router resolves, applies, and validates routing policy against an incoming request. o The Audit Log Record (ALR): a standardized record structure for capturing each routing event for compliance, debugging, and accountability purposes. o The Cost Attribution Record (CAR): a structure that traces the financial cost of each routed request to the policy, cost center, and budget authority that authorized it. o A Governance and Authorization model that defines how routing policy is issued, signed, versioned, and enforced. This document does not specify the internal architecture of any AI model, the machine learning logic used to assess request complexity, the commercial pricing structure of any AI provider, or any application-layer inference API. RMRP is a governance and metadata protocol layer that sits above any such systems. 1.3. Design Principles RMRP is designed according to the following principles: Provider Agnosticism: RMRP MUST NOT assume the use of any specific AI model provider, vendor API, or proprietary infrastructure. All provider-specific identifiers are treated as opaque strings within the protocol. Transport Agnosticism: RMRP metadata structures are defined as JSON objects. They may be transmitted over HTTP, message queues, RPC frameworks, or any other transport that supports structured data payloads. Auditability by Default: Every routing decision produces an auditable record. Audit logging is not optional for conformant implementations. Policy as a First-Class Object: Routing policy is a declared, versioned, signed artifact. Inline or implicit routing logic embedded in application code does not satisfy RMRP conformance. Cost Traceability: Every routed request MUST be attributable to a cost center and budget authority. Unattributed inference costs are a conformance violation. Least-Cost Sufficiency: Routing policy SHOULD direct requests to the least capable model tier sufficient to satisfy the task requirements. Routing to a higher tier MUST be justified by policy conditions. Separation of Concerns: The routing decision layer is separate from the inference execution layer. An RMRP-compliant router makes and records a routing decision; it does not implement inference. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. The following terms are defined for use in this document: AI Model: A software system that accepts a structured input (referred to herein as an inference request) and produces a structured output (referred to herein as an inference response) using a learned computational process. This definition is intentionally broad and encompasses large language models, multimodal models, embedding models, classification models, and other learned inference systems. Budget Authority: The organizational entity, role, or system identity that has been designated as responsible for approving inference expenditure within a defined cost center. Budget Authority is referenced by identifier within RMRP structures and resolved externally to the protocol. Complexity Score: A normalized floating-point value in the range [0.0, 1.0] that represents an assessment of the computational or semantic difficulty of an inference request relative to a defined task type. The method of computing the Complexity Score is outside the scope of this specification; RMRP defines only how this value is represented and used in routing decisions. Conformant Router: An RMRP Routing Engine that implements all REQUIRED behaviors specified in this document. Cost Attribution Record (CAR): A structured JSON object produced by a Conformant Router upon completion of a routed inference request, recording the financial cost of the request and attributing it to the applicable cost center, policy, and Budget Authority. Cost Center: An organizational unit, project, team, application, or other logical grouping to which the financial cost of inference requests is attributed. Fallback Model: The AI model or model tier to which a routing request is directed when the primary selected model is unavailable, returns an error, or exceeds a defined constraint. Inference Request: A structured input submitted to an AI model for processing. The content and format of the inference request are outside the scope of this specification. Inference Response: The structured output produced by an AI model in response to an Inference Request. The content and format of the inference response are outside the scope of this specification. Model Identifier: An opaque string that uniquely identifies a specific AI model or model version within the scope of a deployment. Model Identifiers are defined and managed externally to RMRP. Model Registry: An external system or configuration artifact that maps Model Identifiers to model capabilities, tier assignments, and endpoint information. RMRP does not specify the implementation of a Model Registry but requires that a Conformant Router have access to one. Model Routing Decision (MRD): A structured JSON object produced by a Conformant Router that records the routing decision made for a specific inference request, including the selected model, the policy applied, and the full governance context. Model Tier: A categorical classification of AI models according to their relative capability and cost. This specification defines three normative tiers: LIGHT, STANDARD, and ADVANCED. Implementations MAY define additional tiers subject to the constraints in Section 4.4. Policy Authority: The organizational entity or role that has been granted the right to issue, sign, and publish Routing Policy Documents within a defined scope. Routing Engine: The software component responsible for receiving an inference request, evaluating applicable Routing Policy Documents, producing a Model Routing Decision, and dispatching the request to the selected model. Routing Policy Document (RPD): A structured, versioned, signed JSON document that declares the rules by which a Routing Engine selects a target model for a given inference request. Task Type: A categorical label that describes the nature of an inference request at the application level. RMRP defines a normative set of Task Types in Section 4.3. Implementations MAY extend this set using the extension mechanism defined therein. Audit Log Record (ALR): A structured JSON object produced by a Conformant Router for each routing event, capturing the full decision trace, outcome, and timing of the routing operation. RMRP Version: The protocol version string identifying the version of this specification to which an MRD, RPD, ALR, or CAR conforms. The version string for this specification is "1.0". 3. RMRP Architecture Overview 3.1. Architectural Layers RMRP defines a governance layer that operates between the application layer and the AI model inference layer. This layer is not a transport protocol and does not replace any existing network or application protocol. It defines the metadata, policy, and audit structures that govern routing decisions. The RMRP architectural layers are as follows: +--------------------------------------+ | Application Layer | | (Caller submits inference request) | +--------------------------------------+ | v +--------------------------------------+ | RMRP Governance Layer | | | | +------------------------------+ | | | Routing Policy Document | | | | (RPD) | | | +------------------------------+ | | | | +------------------------------+ | | | Routing Engine | | | | - Evaluates RPD | | | | - Produces MRD | | | | - Writes ALR | | | | - Writes CAR | | | +------------------------------+ | +--------------------------------------+ | v +--------------------------------------+ | AI Model Inference Layer | | (Selected model processes request) | +--------------------------------------+ Figure 1: RMRP Architectural Layers 3.2. Component Roles Policy Authority: Issues, signs, and publishes RPDs. The Policy Authority is responsible for ensuring that RPDs reflect organizational cost governance, capability requirements, and compliance constraints. A Policy Authority MUST NOT be the same entity as the Routing Engine in deployments where separation of concerns is required by organizational policy, though this specification does not mandate such separation in all contexts. Routing Engine: Receives inference requests from the application layer, resolves the applicable RPD, computes a routing decision, produces an MRD, dispatches the request to the selected model, and writes an ALR and CAR upon completion. A Conformant Router MUST perform all of these functions. Model Registry: An external system consulted by the Routing Engine to resolve Model Identifiers to endpoints and validate tier assignments. RMRP does not specify the implementation of the Model Registry. Audit Store: The persistent storage system into which the Routing Engine writes ALRs and CARs. The Audit Store MUST be write-once or append-only with respect to routing records to preserve audit integrity. Implementations MAY use cryptographic mechanisms to further ensure record immutability. Budget Authority: Receives cost attribution data via CARs. Budget Authority systems are external to RMRP and consume CAR records for financial reporting and budget enforcement purposes. 3.3. Request Lifecycle The lifecycle of an inference request under RMRP is as follows: 1. The application layer submits an inference request to the Routing Engine, optionally including request metadata such as Task Type, priority class, and cost center identifier. 2. The Routing Engine performs pre-routing validation as specified in Section 6.1. 3. The Routing Engine resolves the applicable RPD as specified in Section 6.2. 4. The Routing Engine evaluates RPD rules against the request metadata and computes a Complexity Score to select a target model tier and Model Identifier. 5. The Routing Engine produces a Model Routing Decision (MRD) documenting the decision and its full governance context. 6. The Routing Engine dispatches the inference request to the selected model, attaching the MRD as specified in Section 6.4. 7. The selected model processes the request and returns an inference response. 8. The Routing Engine receives the response, records actual token consumption and latency, and updates the ALR and CAR with outcome data. 9. The Routing Engine writes the completed ALR and CAR to the Audit Store. 10. The inference response is returned to the application layer, accompanied by a reference to the MRD for correlation purposes. 4. Model Routing Decision (MRD) 4.1. MRD Structure The Model Routing Decision is a JSON object [RFC8259] that MUST be produced by a Conformant Router for every inference request processed. The MRD captures the routing decision and its full governance context in a portable, inspectable form. The MRD MUST contain all REQUIRED fields defined in Section 4.2. OPTIONAL fields SHOULD be included when the relevant information is available to the Routing Engine. Additional fields not defined in this specification MAY be included using the extension mechanism defined in Section 4.2. All field names are case-sensitive. All string values are UTF-8 encoded [RFC3629]. All timestamp values are ISO 8601 formatted strings in UTC with millisecond precision. 4.2. MRD Field Definitions rmrp_version (string, REQUIRED) The RMRP protocol version string. For documents conforming to this specification, the value MUST be "1.0". mrd_id (string, REQUIRED) A universally unique identifier for this MRD instance, formatted as a UUID [RFC9562]. This identifier is used to correlate the MRD with associated ALRs, CARs, and application logs. request_id (string, REQUIRED) An identifier for the inference request as assigned by the caller. If the caller does not supply a request identifier, the Routing Engine MUST generate one and return it to the caller. Format is implementation-defined but MUST be unique within the scope of the deployment. timestamp (string, REQUIRED) The UTC timestamp at which the Routing Engine produced this MRD. Format: ISO 8601 with millisecond precision. Example: "2026-04-28T17:00:00.000Z" routing_policy_id (string, REQUIRED) The unique identifier of the Routing Policy Document applied to produce this routing decision. This value MUST correspond to the "policy_id" field of the applicable RPD. routing_policy_version (string, REQUIRED) The version string of the RPD applied. This value MUST correspond to the "policy_version" field of the applicable RPD. source_system (string, REQUIRED) An identifier for the application or system component that submitted the inference request. Format is implementation- defined. This field is used for attribution, auditing, and cost allocation. task_type (string, REQUIRED) The Task Type classification of the inference request. MUST be one of the normative Task Type values defined in Section 4.3, or an extended value registered per the extension mechanism in Section 4.3. complexity_score (number, REQUIRED) A floating-point value in the range [0.0, 1.0] representing the assessed complexity of the inference request. The method of computation is outside the scope of this specification. A value of 0.0 represents the minimum assessed complexity for the given task type; a value of 1.0 represents the maximum. selected_model_id (string, REQUIRED) The Model Identifier of the AI model selected to process this inference request. This value MUST resolve to a registered model in the Model Registry. selected_model_tier (string, REQUIRED) The model tier assignment of the selected model. MUST be one of the normative tier values defined in Section 4.4. routing_rationale (string, REQUIRED) A human-readable description of the routing decision, identifying which RPD rule was matched and why the selected model tier was chosen. This field is intended for audit inspection and operational debugging. cost_center (string, REQUIRED) The identifier of the cost center to which the financial cost of this inference request is attributed. This value MUST correspond to a valid cost center identifier in the organization's Cost Attribution framework. budget_authority_id (string, REQUIRED) The identifier of the Budget Authority that approved inference expenditure for this cost center under the applicable RPD. max_token_budget (integer, REQUIRED) The maximum number of tokens (input plus output) authorized for this inference request under the applicable RPD rule. A value of -1 indicates no token ceiling is enforced by policy for this request. Routing Engines MUST NOT route requests where the estimated token consumption exceeds this value without triggering the fallback behavior defined in Section 6.6. priority_class (string, REQUIRED) The priority classification of the inference request. MUST be one of: "CRITICAL", "HIGH", "STANDARD", "BATCH". Priority class MAY influence model selection and queuing behavior. See Section 4.5 for priority class semantics. fallback_model_id (string, OPTIONAL) The Model Identifier of the fallback model to be used if the selected model is unavailable or returns an error. If present, this value MUST resolve to a registered model in the Model Registry. fallback_model_tier (string, OPTIONAL) The model tier assignment of the fallback model, if specified. MUST be one of the normative tier values defined in Section 4.4 if present. chain_id (string, OPTIONAL) An identifier linking this inference request to a broader multi-step request pipeline or agent chain. Used for correlating multiple MRDs produced within a single logical workflow. chain_step (integer, OPTIONAL) The ordinal position of this inference request within the chain identified by "chain_id". MUST be a non-negative integer. The first step in a chain is 0. estimated_input_tokens (integer, OPTIONAL) The estimated number of input tokens for this inference request, as assessed by the Routing Engine prior to dispatch. estimated_output_tokens (integer, OPTIONAL) The estimated number of output tokens for this inference request, as assessed by the Routing Engine prior to dispatch. audit_level (string, REQUIRED) The audit level class applied to this routing event. MUST be one of the normative audit level values defined in Section 7.3. extensions (object, OPTIONAL) A JSON object containing implementation-specific or deployment-specific fields not defined in this specification. Extension field names MUST use a reverse-DNS prefix to avoid collisions (e.g., "com.example.custom_field"). The presence of extension fields MUST NOT alter the interpretation of any normative field defined in this specification. 4.3. Task Classification RMRP defines the following normative Task Types. These values are case-sensitive and MUST be used verbatim in the "task_type" field of the MRD and in RPD rule conditions. CLASSIFICATION A request whose primary output is a categorical label or score applied to input content. Includes sentiment analysis, intent detection, content moderation, and similar tasks. Typically low complexity. EXTRACTION A request whose primary output is structured data extracted from unstructured input, including named entity recognition, key-value extraction, and table parsing. SUMMARIZATION A request whose primary output is a condensed representation of a larger input document or corpus. GENERATION A request whose primary output is novel content generated in response to a prompt, including text generation, code generation, and creative writing tasks. REASONING A request that requires multi-step logical inference, mathematical computation, or structured problem-solving. Typically high complexity. EMBEDDING A request whose primary output is a vector representation of the input content. Embedding requests SHOULD be routed to models optimized for embedding generation. RETRIEVAL A request that involves retrieval-augmented generation or query-driven document retrieval. Complexity is a function of retrieval corpus size and query ambiguity. TRANSFORMATION A request whose primary output is a transformed version of the input (e.g., translation, reformatting, normalization, or style transfer). AGENTIC A request submitted within an autonomous agent pipeline that may produce tool calls, multi-turn interactions, or sub-task decomposition. Agentic requests SHOULD be assigned higher complexity scores by default given their potential for recursive resource consumption. MULTIMODAL A request that includes non-text input modalities such as images, audio, or video, in addition to or in place of text input. Implementations MAY define additional Task Types using the "extensions" mechanism. Extended Task Type values MUST use a reverse-DNS prefix (e.g., "com.example.CUSTOM_TASK"). RPD rules that reference extended Task Types MUST be ignored by Routing Engines that do not recognize the extended value, and fallback behavior as defined in Section 6.6 MUST be applied. 4.4. Model Tier Definitions RMRP defines three normative model tiers. Tier assignment is the responsibility of the operator and is recorded in the Model Registry. RMRP does not prescribe which specific models belong to which tier; this is a deployment-time configuration decision. LIGHT Models in the LIGHT tier are optimized for low-latency, high-throughput processing of tasks with low-to-moderate complexity. LIGHT tier models are expected to be the lowest- cost option in a deployment. LIGHT tier SHOULD be the default routing target for CLASSIFICATION, EXTRACTION, EMBEDDING, and TRANSFORMATION task types unless policy conditions require escalation. STANDARD Models in the STANDARD tier provide a balanced capability-to-cost profile. STANDARD tier is appropriate for SUMMARIZATION, GENERATION, and RETRIEVAL tasks at moderate complexity scores, and for REASONING tasks at low complexity scores. ADVANCED Models in the ADVANCED tier provide maximum available capability for high-complexity tasks. ADVANCED tier MUST only be selected when policy conditions explicitly authorize it and the request complexity or task type requires capabilities unavailable in lower tiers. Routing to ADVANCED tier without explicit RPD authorization is a conformance violation. Implementations MAY define additional tiers using the "extensions" mechanism. Extended tier values MUST NOT replace or supersede the normative tier definitions above. 4.5. Complexity Scoring The Complexity Score is a normalized floating-point value in [0.0, 1.0] that the Routing Engine assigns to each inference request prior to RPD rule evaluation. RMRP does not mandate a specific algorithm for computing the Complexity Score. Conformant implementations MUST document the method used to produce this value for audit purposes. Informative guidance for Complexity Score computation includes: o Input token count relative to model context window capacity. o Presence of multi-step instructions or chained subtasks. o Ambiguity of the input as assessed by a lightweight classifier. o Historical accuracy of lower-tier models on similar inputs. o Structural complexity indicators such as nested conditionals, mathematical expressions, or code with high cyclomatic complexity. Priority class semantics are as follows. Note: the value "STANDARD" used for priority class is distinct from the model tier "STANDARD" defined in Section 4.4. These identifiers exist in separate namespaces within the protocol and MUST NOT be conflated. CRITICAL Requests that require immediate processing. Priority class CRITICAL MUST NOT be routed to BATCH processing queues. CRITICAL requests MAY bypass certain cost ceiling constraints as defined in the applicable RPD. HIGH Requests that require low-latency processing but are not operationally critical. STANDARD Default priority for interactive workloads. BATCH Requests that are tolerant of high latency in exchange for reduced per-token cost. BATCH requests SHOULD be queued for asynchronous processing where the model provider supports it. 4.6. MRD Example The following is a non-normative example of a conformant MRD: { "rmrp_version": "1.0", "mrd_id": "550e8400-e29b-41d4-a716-446655440000", "request_id": "req-20260428-00192", "timestamp": "2026-04-28T17:00:00.000Z", "routing_policy_id": "rpd-prod-engineering-v3", "routing_policy_version": "3.2.1", "source_system": "api-gateway.internal", "task_type": "REASONING", "complexity_score": 0.82, "selected_model_id": "provider-alpha/model-advanced-v2", "selected_model_tier": "ADVANCED", "routing_rationale": "Rule R-07 matched: task_type=REASONING, complexity_score 0.82 exceeds STANDARD tier threshold 0.75. ADVANCED tier authorized by policy for cost_center=eng-ai.", "cost_center": "eng-ai", "budget_authority_id": "ba-vp-engineering-001", "max_token_budget": 8192, "priority_class": "HIGH", "fallback_model_id": "provider-alpha/model-standard-v4", "fallback_model_tier": "STANDARD", "chain_id": "chain-pipeline-20260428-00041", "chain_step": 2, "estimated_input_tokens": 2048, "estimated_output_tokens": 1024, "audit_level": "FULL", "extensions": {} } 5. Routing Policy Document (RPD) 5.1. RPD Structure The Routing Policy Document is a structured, versioned JSON document that declares the rules by which a Routing Engine selects a target model for a given inference request. A conformant RPD MUST be a valid JSON object [RFC8259]. RPDs MUST be digitally signed by the issuing Policy Authority using a mechanism that allows the Routing Engine to verify authenticity and detect tampering. This specification RECOMMENDS the use of JSON Web Signatures (JWS) as defined in [RFC7515]. An RPD MUST be version-controlled. Routing Engines MUST record the specific RPD version applied to each routing decision in the MRD. Superseded RPD versions MUST be retained in the Audit Store for the retention period defined in Section 7.4. 5.2. RPD Field Definitions rmrp_version (string, REQUIRED) The RMRP protocol version string. MUST be "1.0" for this specification. policy_id (string, REQUIRED) A unique identifier for this Routing Policy Document within the deployment scope. Policy IDs MUST be stable across versions of the same policy; different versions of the same policy MUST share the same "policy_id". policy_version (string, REQUIRED) A semantic version string [semver] identifying this version of the policy. Format: MAJOR.MINOR.PATCH. A change to routing logic MUST increment MINOR or MAJOR. A change to metadata only MAY increment PATCH. policy_name (string, REQUIRED) A human-readable name for this policy, suitable for display in audit interfaces. policy_authority_id (string, REQUIRED) The identifier of the Policy Authority that issued this document. effective_date (string, REQUIRED) The UTC timestamp from which this policy version is effective. Routing Engines MUST NOT apply a policy version prior to its effective date. expiration_date (string, OPTIONAL) The UTC timestamp after which this policy version is no longer valid. If present, Routing Engines MUST NOT apply this policy version after the expiration date and MUST trigger the fallback behavior defined in Section 5.4. scope (object, REQUIRED) Defines the set of source systems, cost centers, and task types to which this policy applies. scope.source_systems (array of strings, OPTIONAL) If present, this policy applies only to inference requests originating from the listed source system identifiers. If absent, the policy applies to all source systems unless overridden by a more specific policy. scope.cost_centers (array of strings, OPTIONAL) If present, this policy applies only to inference requests attributed to the listed cost center identifiers. scope.task_types (array of strings, OPTIONAL) If present, this policy applies only to inference requests of the listed task types. default_rule (object, REQUIRED) The routing rule applied when no other rule in the "rules" array produces a match. The default rule MUST specify at minimum a "target_tier" and a "max_token_budget". The default rule MUST NOT specify conditions. rules (array of objects, REQUIRED) An ordered array of routing rules. MUST contain at least one rule. Rules MUST be evaluated in array order. The first rule whose conditions are satisfied by the inference request MUST be applied. Subsequent rules MUST NOT be evaluated after a match. Each rule object contains the following fields: rule_id (string, REQUIRED) A unique identifier for this rule within the RPD. Rule IDs MUST be stable across policy versions. rule_description (string, OPTIONAL) A human-readable description of the rule's intent. conditions (object, REQUIRED for non-default rules) A JSON object specifying the conditions under which this rule applies. All specified conditions MUST be satisfied for the rule to match (logical AND). If no conditions are specified, the rule matches all requests (and SHOULD only appear as the default rule). conditions.task_types (array of strings, OPTIONAL) The rule matches only if the request "task_type" is one of the listed values. conditions.complexity_min (number, OPTIONAL) The rule matches only if the request "complexity_score" is greater than or equal to this value. conditions.complexity_max (number, OPTIONAL) The rule matches only if the request "complexity_score" is less than this value. conditions.priority_classes (array of strings, OPTIONAL) The rule matches only if the request "priority_class" is one of the listed values. conditions.source_systems (array of strings, OPTIONAL) The rule matches only if the request "source_system" is one of the listed values. conditions.cost_centers (array of strings, OPTIONAL) The rule matches only if the request "cost_center" is one of the listed values. conditions.chain_step_max (integer, OPTIONAL) The rule matches only if the request "chain_step" is less than or equal to this value. Used to constrain model tier selection in early pipeline steps. target_tier (string, REQUIRED) The model tier to which matching requests are routed. MUST be one of the normative tier values defined in Section 4.4. target_model_id (string, OPTIONAL) If present, the Routing Engine MUST route matching requests to this specific model, subject to availability. If the specified model is unavailable, fallback behavior applies. fallback_tier (string, OPTIONAL) The model tier to which the request is routed if the primary target model is unavailable. If absent, the Routing Engine MUST use the "default_rule" target as fallback. fallback_model_id (string, OPTIONAL) If present, the specific fallback model identifier. Evaluated after "fallback_tier". max_token_budget (integer, REQUIRED) The maximum total tokens (input plus output) authorized for requests matching this rule. A value of -1 indicates no ceiling is enforced by this rule. Routing Engines MUST enforce this constraint before dispatch. cost_ceiling_usd (number, OPTIONAL) The maximum estimated cost in USD authorized for a single inference request matching this rule. If present, the Routing Engine MUST reject or reroute requests whose estimated cost exceeds this value. Estimation method is implementation- defined and MUST be documented. audit_level (string, REQUIRED) The audit level class applied to routing events matching this rule. MUST be one of the normative audit level values defined in Section 7.3. allow_advanced_escalation (boolean, OPTIONAL) If true, and if the "target_tier" is STANDARD, the Routing Engine MAY escalate to ADVANCED tier if the complexity score exceeds the escalation_threshold. Default: false. escalation_threshold (number, OPTIONAL) The complexity score threshold above which escalation to ADVANCED tier is permitted when "allow_advanced_escalation" is true. MUST be in [0.0, 1.0]. 5.3. Rule Evaluation Order A Conformant Router MUST evaluate RPD rules in the following order: 1. Filter out rules whose conditions do not match the request metadata as described in Section 5.2. 2. Apply the first matching rule in array order. 3. If no rule matches, apply the "default_rule". 4. If the "default_rule" is absent or invalid, the Routing Engine MUST reject the request and write an ALR with outcome "POLICY_ERROR". 5.4. Fallback Behavior The following conditions MUST trigger fallback behavior: o The selected model returns an HTTP 5xx error or equivalent transport-level failure. o The selected model is not resolvable in the Model Registry. o The estimated token count exceeds "max_token_budget". o The estimated cost exceeds "cost_ceiling_usd" (if present). o The RPD "expiration_date" has passed. When fallback is triggered, the Routing Engine MUST: 1. Attempt routing to the "fallback_model_id" (if specified) or the model at "fallback_tier". 2. Record the fallback event in the ALR with the original selection, the fallback target, and the reason for fallback. 3. If the fallback model also fails, the Routing Engine MUST return an error to the caller and write an ALR with outcome "ROUTING_FAILURE". 5.5. RPD Example The following is a non-normative example of a conformant RPD: { "rmrp_version": "1.0", "policy_id": "rpd-prod-engineering-v3", "policy_version": "3.2.1", "policy_name": "Engineering Production AI Routing Policy", "policy_authority_id": "pa-cto-office-001", "effective_date": "2026-04-01T00:00:00.000Z", "expiration_date": "2026-10-01T00:00:00.000Z", "scope": { "source_systems": ["api-gateway.internal", "agent-runner.internal"], "cost_centers": ["eng-ai", "eng-platform"], "task_types": null }, "default_rule": { "target_tier": "LIGHT", "max_token_budget": 2048, "audit_level": "STANDARD" }, "rules": [ { "rule_id": "R-01", "rule_description": "Batch embedding requests to LIGHT tier.", "conditions": { "task_types": ["EMBEDDING"], "priority_classes": ["BATCH"] }, "target_tier": "LIGHT", "max_token_budget": 4096, "audit_level": "MINIMAL" }, { "rule_id": "R-02", "rule_description": "Low-complexity classification to LIGHT.", "conditions": { "task_types": ["CLASSIFICATION", "EXTRACTION"], "complexity_max": 0.4 }, "target_tier": "LIGHT", "max_token_budget": 1024, "audit_level": "MINIMAL" }, { "rule_id": "R-03", "rule_description": "Moderate generation to STANDARD tier.", "conditions": { "task_types": ["GENERATION", "SUMMARIZATION"], "complexity_min": 0.3, "complexity_max": 0.75 }, "target_tier": "STANDARD", "max_token_budget": 4096, "audit_level": "STANDARD" }, { "rule_id": "R-04", "rule_description": "CRITICAL priority requests to STANDARD minimum.", "conditions": { "priority_classes": ["CRITICAL"] }, "target_tier": "STANDARD", "fallback_tier": "ADVANCED", "max_token_budget": 8192, "audit_level": "FULL" }, { "rule_id": "R-05", "rule_description": "High-complexity AGENTIC and REASONING requests to STANDARD with escalation permitted.", "conditions": { "task_types": ["AGENTIC", "REASONING"], "complexity_min": 0.5 }, "target_tier": "STANDARD", "allow_advanced_escalation": true, "escalation_threshold": 0.75, "max_token_budget": 16384, "cost_ceiling_usd": 0.50, "audit_level": "FULL" }, { "rule_id": "R-06", "rule_description": "Multimodal requests to STANDARD tier.", "conditions": { "task_types": ["MULTIMODAL"] }, "target_tier": "STANDARD", "max_token_budget": 8192, "audit_level": "STANDARD" }, { "rule_id": "R-07", "rule_description": "High-complexity REASONING above threshold to ADVANCED.", "conditions": { "task_types": ["REASONING"], "complexity_min": 0.75 }, "target_tier": "ADVANCED", "fallback_tier": "STANDARD", "max_token_budget": 8192, "cost_ceiling_usd": 1.00, "audit_level": "FULL" } ] } 6. Routing Execution Semantics 6.1. Pre-Routing Validation Upon receipt of an inference request, a Conformant Router MUST perform the following validation steps before proceeding: 1. Verify that a valid RPD is available and has not expired. If no valid RPD is resolvable for the request context, the Routing Engine MUST reject the request with error code RMRP-001 (Policy Not Found). 2. Verify that the "source_system" identifier is present and recognized. 3. Verify that a "cost_center" is associated with the request, either supplied by the caller or resolvable from the "source_system" identifier via configuration. 4. Verify that the "budget_authority_id" associated with the cost center is active and has not been revoked. 5. Verify that the "task_type" is a recognized value per Section 4.3 or a registered extension value. Requests that fail pre-routing validation MUST be rejected. Rejected requests MUST have an ALR written with outcome "VALIDATION_FAILURE" identifying which validation step failed. 6.2. Policy Resolution The Routing Engine MUST resolve the applicable RPD using the following procedure: 1. Identify all RPDs whose "scope" matches the request (source_system, cost_center, task_type). 2. If multiple RPDs match, apply the most specific RPD as determined by the number of scope constraints satisfied. 3. If specificity is equal across multiple matching RPDs, apply the RPD with the most recent "effective_date". 4. Record the selected "policy_id" and "policy_version" in the MRD. Implementations that maintain a single global RPD are not required to perform policy resolution but MUST still record the policy_id and policy_version in every MRD. 6.3. Model Selection After RPD resolution, the Routing Engine MUST: 1. Compute or accept a Complexity Score for the request. 2. Evaluate RPD rules in order per Section 5.3. 3. Identify the matched rule and extract "target_tier" and, if present, "target_model_id". 4. If "target_model_id" is specified, resolve it in the Model Registry and verify it is available. 5. If "target_model_id" is absent, select an available model from the Model Registry whose tier matches "target_tier". Model selection within a tier is implementation-defined. 6. Evaluate "allow_advanced_escalation" and "escalation_threshold" if present. 7. Verify that the estimated token count does not exceed "max_token_budget". 8. Verify that the estimated cost does not exceed "cost_ceiling_usd" if present. 9. Produce and record the MRD. 6.4. Request Dispatch The Routing Engine MUST dispatch the inference request to the selected model endpoint with the following requirements: o The MRD MUST be attached to the dispatched request. In HTTP transport, this MUST be accomplished via the "RMRP-MRD" header or request body attachment per Section 10.1. In other transports, attachment is per Section 10.3. o The "request_id" from the MRD MUST be forwarded to the model provider where the provider's API supports a correlation identifier. o All dispatch operations MUST be performed over encrypted transport (TLS 1.2 minimum, TLS 1.3 RECOMMENDED) per [RFC8446]. 6.5. Response Handling Upon receipt of an inference response, the Routing Engine MUST: 1. Record the actual input and output token counts from the response if available. 2. Record the end-to-end latency of the routing and inference operation. 3. Verify that actual token consumption did not exceed "max_token_budget". If it did, this MUST be recorded in the ALR as a budget overrun event. 4. Produce and finalize the ALR and CAR records. 5. Write completed ALR and CAR to the Audit Store. 6. Return the inference response to the caller with the "mrd_id" attached for correlation. 6.6. Error and Fallback Handling Error codes defined by this specification: RMRP-001 Policy Not Found. No valid RPD is resolvable for the request context. RMRP-002 Validation Failure. The request failed pre-routing validation. Details MUST be included in the ALR. RMRP-003 Budget Exceeded. The estimated or actual token or cost consumption exceeds authorized limits. RMRP-004 Model Unavailable. The selected model is not reachable or returned a transport-level error. RMRP-005 Fallback Exhausted. All fallback options have been attempted and failed. RMRP-006 Policy Expired. The applicable RPD has passed its expiration date. RMRP-007 Audit Store Failure. The Routing Engine was unable to write the ALR or CAR to the Audit Store. This is a CRITICAL error; the Routing Engine SHOULD halt request processing until Audit Store availability is restored. All error events MUST result in an ALR record. Routing Engines MUST NOT silently suppress routing errors. 7. Audit Trail Requirements 7.1. Audit Log Record (ALR) Structure A Conformant Router MUST produce one Audit Log Record for each routing event. The ALR is a JSON object [RFC8259]. Each ALR MUST be written to the Audit Store before the inference response is returned to the caller. ALRs MUST be immutable after writing. The Audit Store MUST be append-only or equivalent with respect to routing records. Implementations MAY use cryptographic hash chaining, blockchain anchoring, or other mechanisms to provide tamper-evidence for the ALR sequence. 7.2. ALR Field Definitions rmrp_version (string, REQUIRED) RMRP protocol version. MUST be "1.0". alr_id (string, REQUIRED) A UUID [RFC9562] uniquely identifying this ALR. mrd_id (string, REQUIRED) The "mrd_id" of the MRD associated with this routing event. request_id (string, REQUIRED) The "request_id" of the inference request. timestamp_routing_start (string, REQUIRED) UTC timestamp at which the Routing Engine began processing the request. timestamp_dispatch (string, REQUIRED) UTC timestamp at which the Routing Engine dispatched the request to the selected model. timestamp_response (string, OPTIONAL) UTC timestamp at which the Routing Engine received the inference response. Absent if the request failed before a response was received. timestamp_alr_written (string, REQUIRED) UTC timestamp at which the ALR was committed to the Audit Store. routing_policy_id (string, REQUIRED) The "policy_id" of the RPD applied. routing_policy_version (string, REQUIRED) The "policy_version" of the RPD applied. matched_rule_id (string, REQUIRED) The "rule_id" of the RPD rule that matched this request. MUST be "default_rule" if the default rule was applied. MUST be absent or null if outcome is "VALIDATION_FAILURE" or "POLICY_ERROR". source_system (string, REQUIRED) The source system identifier from the MRD. task_type (string, REQUIRED) The task type from the MRD. complexity_score (number, REQUIRED) The complexity score from the MRD. priority_class (string, REQUIRED) The priority class from the MRD. cost_center (string, REQUIRED) The cost center from the MRD. budget_authority_id (string, REQUIRED) The budget authority from the MRD. selected_model_id (string, REQUIRED) The Model Identifier selected. selected_model_tier (string, REQUIRED) The model tier selected. fallback_triggered (boolean, REQUIRED) True if fallback routing was triggered during this event. fallback_reason (string, OPTIONAL) The reason fallback was triggered. REQUIRED if "fallback_triggered" is true. fallback_model_id (string, OPTIONAL) The Model Identifier used for fallback. REQUIRED if "fallback_triggered" is true. outcome (string, REQUIRED) The outcome of the routing event. MUST be one of: "SUCCESS", "FALLBACK_SUCCESS", "VALIDATION_FAILURE", "POLICY_ERROR", "ROUTING_FAILURE", "BUDGET_EXCEEDED", "POLICY_EXPIRED". error_code (string, OPTIONAL) The RMRP error code (e.g., "RMRP-004") if outcome is not "SUCCESS" or "FALLBACK_SUCCESS". error_detail (string, OPTIONAL) A human-readable description of the error. actual_input_tokens (integer, OPTIONAL) Actual input token count from the inference response. actual_output_tokens (integer, OPTIONAL) Actual output token count from the inference response. actual_total_tokens (integer, OPTIONAL) Sum of actual_input_tokens and actual_output_tokens. budget_overrun (boolean, REQUIRED) True if actual_total_tokens exceeded max_token_budget. latency_routing_ms (integer, OPTIONAL) Duration in milliseconds from routing start to dispatch. latency_inference_ms (integer, OPTIONAL) Duration in milliseconds from dispatch to response receipt. latency_total_ms (integer, OPTIONAL) Total duration in milliseconds from routing start to ALR write. audit_level (string, REQUIRED) The audit level class applied to this event. chain_id (string, OPTIONAL) Chain identifier, if applicable. chain_step (integer, OPTIONAL) Chain step, if applicable. previous_alr_id (string, OPTIONAL) The "alr_id" of the immediately preceding ALR in the Audit Store. Used for hash chaining. RECOMMENDED for implementations that implement tamper-evident audit logs. alr_hash (string, OPTIONAL) A cryptographic hash of the canonical serialization of this ALR, computed prior to writing the "alr_hash" field itself. Hash algorithm MUST be identified in the "alr_hash_algorithm" field if present. alr_hash_algorithm (string, OPTIONAL) The hash algorithm used to compute "alr_hash". RECOMMENDED values: "SHA-256", "SHA3-512", "BLAKE3". 7.3. Audit Level Classes MINIMAL Required for low-risk, high-volume routing events such as batch EMBEDDING tasks. ALR MUST include all REQUIRED fields. Token and latency fields are OPTIONAL. STANDARD Default audit level for interactive workloads. ALR MUST include all REQUIRED fields and all timing fields. FULL Required for ADVANCED tier routing, CRITICAL priority requests, high-cost requests, and any request where "allow_advanced_escalation" is true. ALR MUST include all defined fields. Implementations SHOULD compute and record "alr_hash" for FULL-level records. 7.4. Retention Requirements ALRs and CARs MUST be retained for a minimum of 90 days. Implementations operating in regulated environments SHOULD retain records for a minimum of 7 years or the applicable regulatory retention period, whichever is longer. Superseded RPD versions MUST be retained for the same period as the ALRs that reference them. Audit Store implementations MUST support retrieval of ALRs by "mrd_id", "request_id", "chain_id", "cost_center", "routing_policy_id", and date range. 7.5. ALR Example The following is a non-normative example of a conformant ALR: { "rmrp_version": "1.0", "alr_id": "7f3b2c1a-0001-4d2e-9f8b-112233445566", "mrd_id": "550e8400-e29b-41d4-a716-446655440000", "request_id": "req-20260428-00192", "timestamp_routing_start": "2026-04-28T17:00:00.000Z", "timestamp_dispatch": "2026-04-28T17:00:00.032Z", "timestamp_response": "2026-04-28T17:00:02.187Z", "timestamp_alr_written": "2026-04-28T17:00:02.201Z", "routing_policy_id": "rpd-prod-engineering-v3", "routing_policy_version": "3.2.1", "matched_rule_id": "R-07", "source_system": "api-gateway.internal", "task_type": "REASONING", "complexity_score": 0.82, "priority_class": "HIGH", "cost_center": "eng-ai", "budget_authority_id": "ba-vp-engineering-001", "selected_model_id": "provider-alpha/model-advanced-v2", "selected_model_tier": "ADVANCED", "fallback_triggered": false, "outcome": "SUCCESS", "actual_input_tokens": 2041, "actual_output_tokens": 987, "actual_total_tokens": 3028, "budget_overrun": false, "latency_routing_ms": 32, "latency_inference_ms": 2155, "latency_total_ms": 2201, "audit_level": "FULL", "chain_id": "chain-pipeline-20260428-00041", "chain_step": 2, "alr_hash_algorithm": "SHA-256", "alr_hash": "e3b0c44298fc1c149afb4c8996fb92427ae41e4649b934ca495991b7852b855" } 8. Cost Attribution Framework 8.1. Cost Attribution Record (CAR) A Conformant Router MUST produce one Cost Attribution Record for each routing event that results in an inference response, whether successful or via fallback. CARs MUST NOT be produced for requests that fail before dispatch. The CAR is a JSON object [RFC8259]. CARs MUST be written to the Audit Store concurrently with or immediately following the associated ALR. 8.2. CAR Field Definitions rmrp_version (string, REQUIRED) RMRP protocol version. MUST be "1.0". car_id (string, REQUIRED) A UUID [RFC9562] uniquely identifying this CAR. mrd_id (string, REQUIRED) The "mrd_id" of the associated MRD. alr_id (string, REQUIRED) The "alr_id" of the associated ALR. request_id (string, REQUIRED) The inference request identifier. timestamp (string, REQUIRED) UTC timestamp of CAR production. cost_center (string, REQUIRED) The cost center to which this expenditure is attributed. budget_authority_id (string, REQUIRED) The Budget Authority identifier. routing_policy_id (string, REQUIRED) The RPD policy identifier. routing_policy_version (string, REQUIRED) The RPD policy version. matched_rule_id (string, REQUIRED) The rule that authorized this expenditure. model_provider (string, OPTIONAL) An opaque identifier for the AI model provider. This field is for organizational attribution and does not affect protocol behavior. selected_model_id (string, REQUIRED) The Model Identifier selected. selected_model_tier (string, REQUIRED) The model tier used for this request. actual_input_tokens (integer, OPTIONAL) Actual input token count. actual_output_tokens (integer, OPTIONAL) Actual output token count. actual_total_tokens (integer, OPTIONAL) Total actual token count. estimated_cost_usd (number, OPTIONAL) The estimated cost in USD at the time of routing, as computed by the Routing Engine. Estimation method is implementation-defined and MUST be documented. actual_cost_usd (number, OPTIONAL) The actual cost in USD as reported by the model provider or computed from actual token counts and known pricing. cost_computation_method (string, OPTIONAL) A description of the method used to compute cost figures. MUST be present if either "estimated_cost_usd" or "actual_cost_usd" is present. authorized_cost_ceiling_usd (number, OPTIONAL) The "cost_ceiling_usd" from the matched RPD rule, if any. ceiling_exceeded (boolean, REQUIRED) True if "actual_cost_usd" exceeds "authorized_cost_ceiling_usd". False if no ceiling was defined. chain_id (string, OPTIONAL) Chain identifier, if applicable. chain_step (integer, OPTIONAL) Chain step, if applicable. 8.3. Budget Authority Chain The Budget Authority Chain is the traceable sequence of authorization that links an inference expenditure to the organizational entity responsible for it. In RMRP, this chain is represented implicitly through the combination of: o The "cost_center" field, which identifies the organizational unit incurring the cost. o The "budget_authority_id" field, which identifies the entity that approved inference expenditure for that cost center. o The "routing_policy_id" and "routing_policy_version" fields, which identify the policy document that authorized the specific routing decision. o The "policy_authority_id" field in the RPD, which identifies the entity that issued the policy. External budget management systems consuming CAR records MUST be able to reconstruct the full authorization chain from these fields. RMRP does not specify the implementation of budget management systems. 8.4. Cost Ceiling Enforcement When a "cost_ceiling_usd" is defined in the matched RPD rule, the Routing Engine MUST: 1. Compute or obtain an estimated cost for the request before dispatch. 2. Compare the estimated cost to the "cost_ceiling_usd". 3. If the estimated cost exceeds the ceiling, the Routing Engine MUST attempt to reroute to the "fallback_tier" or "fallback_model_id" as specified in Section 5.4. 4. If fallback also exceeds the ceiling, the Routing Engine MUST reject the request with error code RMRP-003 and write an ALR with outcome "BUDGET_EXCEEDED". Cost ceiling enforcement based on estimated cost is a pre-dispatch control. Post-dispatch overruns MUST be recorded in the CAR as "ceiling_exceeded: true" but do not retroactively fail the completed request. 8.5. CAR Example The following is a non-normative example of a conformant CAR: { "rmrp_version": "1.0", "car_id": "ab12cd34-5678-4ef0-9012-abcdef012345", "mrd_id": "550e8400-e29b-41d4-a716-446655440000", "alr_id": "7f3b2c1a-0001-4d2e-9f8b-112233445566", "request_id": "req-20260428-00192", "timestamp": "2026-04-28T17:00:02.205Z", "cost_center": "eng-ai", "budget_authority_id": "ba-vp-engineering-001", "routing_policy_id": "rpd-prod-engineering-v3", "routing_policy_version": "3.2.1", "matched_rule_id": "R-07", "model_provider": "provider-alpha", "selected_model_id": "provider-alpha/model-advanced-v2", "selected_model_tier": "ADVANCED", "actual_input_tokens": 2041, "actual_output_tokens": 987, "actual_total_tokens": 3028, "estimated_cost_usd": 0.38, "actual_cost_usd": 0.41, "cost_computation_method": "provider_api_reported", "authorized_cost_ceiling_usd": 1.00, "ceiling_exceeded": false, "chain_id": "chain-pipeline-20260428-00041", "chain_step": 2 } 9. Governance and Authorization 9.1. Policy Authority Model RMRP defines a two-role authorization model for routing policy governance: Policy Authority (PA): The entity authorized to issue, sign, update, and revoke Routing Policy Documents within a defined scope. A Policy Authority MUST be identified by a stable "policy_authority_id" and MUST possess a cryptographic signing key pair. Budget Authority (BA): The entity authorized to approve inference expenditure for one or more cost centers. A Budget Authority is referenced by "budget_authority_id" in RPDs and MRDs. The relationship between Budget Authorities and cost centers is defined externally to RMRP. A single organizational entity MAY hold both Policy Authority and Budget Authority roles. Implementations MAY define additional roles using the "extensions" mechanism. 9.2. Policy Issuance and Signing RPDs MUST be signed by the Policy Authority using a digital signature mechanism before they are made available to Routing Engines. This specification RECOMMENDS JWS [RFC7515] with algorithm RS256 or ES256. Routing Engines MUST verify the RPD signature before applying any policy. Routing Engines MUST reject unsigned or invalidly signed RPDs and write an ALR with outcome "POLICY_ERROR". The public key or certificate used to verify RPD signatures MUST be provisioned to Routing Engines through a mechanism outside the scope of this specification. Key management practices SHOULD follow [RFC8551] or applicable organizational PKI policy. 9.3. Policy Versioning RPD versions MUST follow semantic versioning. The full version string MUST be recorded in every MRD, ALR, and CAR produced under that policy version. This enables precise reconstruction of the routing governance context for any historical event. When a Policy Authority issues a new RPD version, the new version MUST specify an "effective_date" in the future to allow Routing Engines time to load and validate the updated policy before it takes effect. A transition period of not less than 15 minutes between publication and "effective_date" is RECOMMENDED. Routing Engines MAY cache active RPDs. Cached policies MUST be revalidated against the Policy Authority's signing key upon each cache refresh. Cache TTL is implementation-defined but MUST NOT exceed the RPD "expiration_date". 9.4. Override Mechanisms RMRP does not define a general-purpose override mechanism that permits callers to bypass routing policy. All routing decisions MUST be governed by a valid, signed RPD. If a deployment requires the ability for privileged callers to escalate routing decisions (e.g., an operations team requesting ADVANCED tier for a specific task), this capability MUST be implemented as an explicit RPD rule with appropriate conditions, not as an out-of-band bypass. Emergency override conditions, if required by an organization, MUST be defined in a dedicated RPD with a named Policy Authority and a "FULL" audit level for all events processed under that policy. Emergency RPDs MUST have short expiration windows. 10. Transport Considerations 10.1. HTTP Transport When RMRP is used in conjunction with HTTP-based inference APIs, the following conventions APPLY: The MRD SHOULD be attached to outbound inference requests using a custom HTTP header: RMRP-MRD: Where base64url encoding is as defined in [RFC4648]. Note: This header field name does not use the "X-" prefix, consistent with the guidance in [RFC6648] deprecating the "X-" convention for newly defined header fields. If the MRD exceeds HTTP header size limits, it MAY be included as a JSON object in the request body under the reserved key "_rmrp_mrd", provided the inference API accepts JSON request bodies. The "mrd_id" SHOULD be returned in the inference response using a custom HTTP header: RMRP-MRD-ID: HTTP responses from the Routing Engine to the caller SHOULD include the "mrd_id" and "request_id" for correlation. RMRP error responses in HTTP transport SHOULD use the Problem Details format defined in [RFC9457] with the following fields: type: A URI identifying the RMRP error class. title: A human-readable RMRP error code (e.g., "RMRP-004"). status: The applicable HTTP status code. detail: A human-readable error description. instance: A URI reference to the specific routing event. 10.2. Header Propagation In multi-hop deployments where inference requests pass through intermediate systems before reaching the Routing Engine, the following APPLY: o The "X-RMRP-MRD" header MUST be propagated unchanged through intermediate systems. o Intermediate systems MUST NOT modify or strip the "X-RMRP-MRD" header. o If an intermediate system performs its own routing, it MUST produce a new MRD and chain it to the original using the "chain_id" mechanism. 10.3. Non-HTTP Transports RMRP metadata structures are transport-agnostic. For non-HTTP transports (e.g., gRPC, AMQP, Kafka): o The MRD MUST be attached as a structured metadata object in the transport envelope. o The transport-specific mechanism for attaching metadata is implementation-defined but MUST be documented by the implementation. o All other normative requirements of this specification apply regardless of transport. 11. Security Considerations 11.1. Policy Integrity RPDs define the governance of all inference expenditure and model selection in a deployment. Unauthorized modification of an RPD could result in unauthorized use of high-cost model tiers, bypass of cost controls, or suppression of audit records. Implementations MUST enforce RPD signature verification as specified in Section 9.2. RPDs MUST be stored and transmitted in a manner that prevents unauthorized modification. 11.2. MRD Tampering A tampered MRD could be used to misattribute inference costs or falsify audit records. In deployments with high-assurance requirements, Routing Engines SHOULD produce a cryptographic signature over each MRD using the Policy Authority's signing key or a dedicated Routing Engine signing key. Receiving systems SHOULD verify this signature. MRDs MUST NOT contain inference request content, prompt text, or user-supplied data. MRDs are governance metadata only. 11.3. Audit Log Integrity ALR and CAR records MUST be written to a system that prevents modification or deletion by the Routing Engine itself or by operators without separate authorization. Implementations SHOULD implement hash chaining over the ALR sequence as described in Section 7.2, using the "previous_alr_id" and "alr_hash" fields. Implementations MAY anchor ALR hash roots to external immutable systems (e.g., transparency logs, public blockchains) for enhanced tamper-evidence. 11.4. Denial of Service A malicious or malfunctioning caller could submit high-volume requests designed to maximize ADVANCED tier routing and exhaust budget ceilings. Routing Engines SHOULD implement rate limiting per source system and per cost center. Rate limiting thresholds are outside the scope of this specification. The Routing Engine itself is a critical component. Its unavailability prevents all inference processing. Deployments SHOULD implement redundant Routing Engine instances. Routing Engines SHOULD implement circuit breakers for Audit Store connectivity, with defined behavior for the case where audit records cannot be written (see RMRP-007). 11.5. Credential Exposure RMRP records MUST NOT contain AI provider API keys, secrets, tokens, or authentication credentials. Model Identifiers in RMRP records are opaque strings and MUST NOT embed credentials. Authentication with model providers is a separate concern handled outside the RMRP governance layer. 12. Privacy Considerations RMRP governance records (MRDs, ALRs, CARs) are operational metadata about routing decisions. They do not, and MUST NOT, contain the content of inference requests or responses. However, the "source_system", "cost_center", and "task_type" fields in RMRP records may be sufficient to infer information about organizational activities or individual user behavior in certain deployment contexts. Implementations SHOULD apply access controls to the Audit Store consistent with the sensitivity of the operational data it contains. In deployments subject to data residency requirements, implementations MUST ensure that ALR and CAR records are stored in jurisdictions consistent with applicable regulations. RMRP does not specify geographic constraints on record storage. The "request_id" field, if it can be linked to an individual user, may constitute personal data under applicable privacy regulations. Organizations MUST assess whether RMRP records are subject to data subject rights obligations under applicable law and implement appropriate controls. 13. IANA Considerations This document requests the following registrations: HTTP Header Field Registration: Header Field Name: RMRP-MRD Status: Provisional Reference: This document, Section 10.1 Change Controller: IETF Header Field Name: RMRP-MRD-ID Status: Provisional Reference: This document, Section 10.1 Change Controller: IETF Media Type Registration: Type name: application Subtype name: rmrp+json Required parameters: none Optional parameters: version Encoding considerations: binary (UTF-8 encoded JSON) Security considerations: See Section 11 Interoperability considerations: none Published specification: This document Applications: AI model routing governance Additional information: none Contact: See Author's Address Intended usage: COMMON Change controller: IETF URN Namespace for RMRP Error Types: This document requests registration of a URN sub-namespace under "urn:ietf:params" per the process defined in [RFC8141] for use as "type" values in RMRP error responses per Section 10.1: urn:ietf:params:rmrp:error: Requested initial error type URNs pending IANA assignment: urn:ietf:params:rmrp:error:policy-not-found urn:ietf:params:rmrp:error:validation-failure urn:ietf:params:rmrp:error:budget-exceeded urn:ietf:params:rmrp:error:model-unavailable urn:ietf:params:rmrp:error:fallback-exhausted urn:ietf:params:rmrp:error:policy-expired urn:ietf:params:rmrp:error:audit-store-failure Note to RFC Editor: This section is to be updated to reflect actual IANA registry assignments prior to publication as an RFC. 14. References 14.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November 2003, . [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, . [RFC7515] Jones, M., Bradley, J., and N. Sakimura, "JSON Web Signature (JWS)", RFC 7515, DOI 10.17487/RFC7515, May 2015, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . [RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data Interchange Format", STD 90, RFC 8259, DOI 10.17487/RFC8259, December 2017, . [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, . [RFC9457] Nottingham, M., Wilde, E., and S. Dalal, "Problem Details for HTTP APIs", RFC 9457, DOI 10.17487/RFC9457, July 2023, . [RFC9562] Davis, K., Peabody, B., and P. Leach, "Universally Unique IDentifiers (UUIDs)", RFC 9562, DOI 10.17487/RFC9562, May 2024, . 14.2. Informative References [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, DOI 10.17487/RFC3986, January 2005, . [RFC6648] Saint-Andre, P., Crocker, D., and M. Nottingham, "Deprecating the 'X-' Prefix and Similar Constructs in Application Protocols", BCP 178, RFC 6648, DOI 10.17487/RFC6648, June 2012, . [RFC6749] Hardt, D., Ed., "The OAuth 2.0 Authorization Framework", RFC 6749, DOI 10.17487/RFC6749, October 2012, . [RFC7519] Jones, M., Bradley, J., and N. Sakimura, "JSON Web Token (JWT)", RFC 7519, DOI 10.17487/RFC7519, May 2015, . [RFC8141] Saint-Andre, P. and J. Klensin, "Uniform Resource Names (URNs)", RFC 8141, DOI 10.17487/RFC8141, April 2017, . [RFC8551] Schaad, J., Ramsdell, B., and S. Turner, "Secure/ Multipurpose Internet Mail Extensions (S/MIME) Version 4.0 Message Specification", RFC 8551, DOI 10.17487/RFC8551, April 2019, . [RFC9110] Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke, Ed., "HTTP Semantics", STD 97, RFC 9110, DOI 10.17487/RFC9110, June 2022, . [semver] Preston-Werner, T., "Semantic Versioning 2.0.0", 2013, . [ROUTELLM] Ong, I., Almahairi, A., Wu, V., Chiang, W., Wu, T., Gonzalez, J., Kadous, M., and I. Stoica, "RouteLLM: Learning to Route LLMs with Preference Data", LMSYS Blog, July 2024, . [PLPES] Reilly, L. J., "Protocol Layer Prompt Engineering Specification (PLPES)", Internet-Draft draft-reilly-plpes-00, April 2026, . [REM] Reilly, L. J., "Reilly EternaMark (REM) Protocol: Dual-Layer Digital Permanence for Intellectual Property", Internet-Draft draft-reilly-rem-protocol-01, . Acknowledgments The author acknowledges the foundational research contributions of the LLM routing research community, whose work establishing cost-quality trade-off frameworks for model selection provided essential context for this protocol-layer specification. This document addresses the governance and standardization layer above that body of research. Author's Address Lawrence J. Reilly Jr. Independent Email: lawrencejohnreilly@gmail.com